Project

General

Profile

SALVIAS issues

Error rate: 64 invalid rows/13661 rows = 0.47%

slopeaspect sometimes uses compass directions instead of degree headings

Sample row:

PlotID project_id PrimOwnerID SiteCode MajorGeo Country slope_aspect slope_gradient
26028 23 peter jorgensen L1 South America Bolivia NE 20

Error:

! DataError: invalid input syntax for type double precision: "NE" 
LINE 1: ...slopegradient, reallongitude, authorcode) VALUES (E'NE', E'1...
                                                             ^
query: INSERT INTO plot (slopeaspect, elevation, confidentialitystatus, reallatitude, slopegradient, reallongitude, authorcode) VALUES (E'NE', E'1800.0', E'0', E'-14.6159', E'20', E'-68.3708', E'L1')

...

! NullValueException: columns: plot_id
cause: null value in column "plot_id" violates not-null constraint
query: INSERT INTO plotevent (plot_id, methodnarrative, project_id, authorcode) VALUES (NULL, 'Line-intercept transects', 13534, '26091')

Solution: We need to translate the input, but more importantly, we need to recover from bad data by setting the value to NULL and retrying the insert

obsstartdate sometimes uses a date range instead of splitting start and end date into obsstartdate and obsenddate

Sample row:

PlotID project_id PrimOwnerID SiteCode MajorGeo Country date_start date_finish
26130 14 peter jorgensen Qumar1 South America Bolivia 22-23/01/2002 NULL

Error:

! SyntaxException: Invalid syntax in XML function: unknown string format
function:
<_date>
    <date>22-23/01/2002</date>
</_date>
output row:
<VegBIEN>
    <plotevent id="13558">
        <!-- ... -->
        <obsstartdate>
            <!--
<_date>
    <date>22-23/01/2002</date>
</_date>
-->
        </obsstartdate>

Data interpretation issues

Some organisms in Letcher plots have multiple stem records but a total stem count = 1

Organisms affected: 96

Sample organism:

PlotObsID collector_code individualCount stemRecordsCount stemCountsSum stem_id stemCount
569531 LETCHER.S 3 3 1 182479 1
569531 LETCHER.S 3 3 1 182480 0
569531 LETCHER.S 3 3 1 182481 0

Query:

SELECT
    counts.PlotObsID AS PlotObsID
    , collector_code
    , individualCount
    , stemRecordsCount
    , stemCountsSum
    , stem_id
    , stems.NoInd AS stemCount
FROM
(
    SELECT
        plotObservations.PlotObsID AS PlotObsID
        , collector_code
        , plotObservations.NoInd AS individualCount
        , count(*) AS stemRecordsCount
        , sum(stems.NoInd) AS stemCountsSum
    FROM plotObservations
    JOIN stems ON stems.PlotObsID = plotObservations.PlotObsID
    AND plotObservations.NoInd > 1
    GROUP BY PlotObsID
    HAVING
        stemCountsSum != individualCount
) counts
JOIN stems ON stems.PlotObsID = counts.PlotObsID
ORDER BY PlotObsID, stem_id;

Some organisms in Gentry (and other) plots have more stem records than the stem count

The Gentry convention for multiple stems per organism seems to be that the first stem record contains the stem count and the rest have stem count = 0

Organisms affected: 2,474

Sample organism:

PlotObsID collector_code individualCount stemRecordsCount stemCountsSum stem_id stemCount
29276 GENTRY.AH 3 5 3 370629 3
29276 GENTRY.AH 3 5 3 370630 0
29276 GENTRY.AH 3 5 3 370631 0
29276 GENTRY.AH 3 5 3 370632 0
29276 GENTRY.AH 3 5 3 370633 0

Query:

SELECT
    counts.PlotObsID AS PlotObsID
    , collector_code
    , individualCount
    , stemRecordsCount
    , stemCountsSum
    , stem_id
    , stems.NoInd AS stemCount
FROM
(
    SELECT
        plotObservations.PlotObsID AS PlotObsID
        , collector_code
        , plotObservations.NoInd AS individualCount
        , count(*) AS stemRecordsCount
        , sum(stems.NoInd) AS stemCountsSum
    FROM plotObservations
    JOIN stems ON stems.PlotObsID = plotObservations.PlotObsID
    AND plotObservations.NoInd > 1
    GROUP BY PlotObsID
    HAVING
        stemRecordsCount > 1
        AND stemRecordsCount > stemCountsSum
) counts
JOIN stems ON stems.PlotObsID = counts.PlotObsID
ORDER BY PlotObsID, stem_id;

Some organisms in Gentry (and other) plots have less stem records than the stem count

Organisms affected: 47

Sample organism:

PlotObsID collector_code individualCount stemRecordsCount stemCountsSum stem_id stemCount
31648 GENTRY.AH 3 2 3 375318 3
31648 GENTRY.AH 3 2 3 375319 0

Query:

SELECT
    counts.PlotObsID AS PlotObsID
    , collector_code
    , individualCount
    , stemRecordsCount
    , stemCountsSum
    , stem_id
    , stems.NoInd AS stemCount
FROM
(
    SELECT
        plotObservations.PlotObsID AS PlotObsID
        , collector_code
        , plotObservations.NoInd AS individualCount
        , count(*) AS stemRecordsCount
        , sum(stems.NoInd) AS stemCountsSum
    FROM plotObservations
    JOIN stems ON stems.PlotObsID = plotObservations.PlotObsID
    AND plotObservations.NoInd > 1
    GROUP BY PlotObsID
    HAVING
        stemRecordsCount > 1
        AND stemRecordsCount < stemCountsSum
) counts
JOIN stems ON stems.PlotObsID = counts.PlotObsID
ORDER BY PlotObsID, stem_id;

Some organisms in Gentry (and other) plots have a single stem record with stem count = individual count

Organisms affected: 12,320

Sample organism:

PlotObsID collector_code individualCount stemCount
29270 GENTRY.AH 2 2

Query:

SELECT
    plotObservations.PlotObsID AS PlotObsID
    , collector_code
    , plotObservations.NoInd AS individualCount
    , stems.NoInd AS stemCount
FROM plotObservations
JOIN stems ON stems.PlotObsID = plotObservations.PlotObsID
WHERE
    stems.NoInd > 1
AND plotObservations.NoInd > 1
ORDER BY PlotObsID;

Some organisms have one stem, but different heights in the organisms and stems tables

For these organisms, the organism height is always 9.9, which appears to denote an invalid value.
Unfortunately, 9.9 is also a valid height, so this value cannot be directly filtered out.

Organisms affected: 578

Sample organisms:

PlotObsID height_m NoInd stem_id stem_height_m
570252 9.9 1 183152 10
570254 9.9 1 183154 12
570255 9.9 1 183155 12
570256 9.9 1 183156 27

Query:

SELECT "PlotObsID", height_m, organisms."NoInd", stem_id, stem_height_m
FROM "SALVIAS".organisms
JOIN "SALVIAS".stems USING ("PlotObsID")
WHERE height_m::float != stem_height_m::float
ORDER BY "PlotObsID", stem_id