SALVIAS issues¶
- Table of contents
- SALVIAS issues
- slopeaspect sometimes uses compass directions instead of degree headings
- obsstartdate sometimes uses a date range instead of splitting start and end date into obsstartdate and obsenddate
- Data interpretation issues
- Some organisms in Letcher plots have multiple stem records but a total stem count = 1
- Some organisms in Gentry (and other) plots have more stem records than the stem count
- Some organisms in Gentry (and other) plots have less stem records than the stem count
- Some organisms in Gentry (and other) plots have a single stem record with stem count = individual count
- Some organisms have one stem, but different heights in the organisms and stems tables
Error rate: 64 invalid rows/13661 rows = 0.47%
slopeaspect sometimes uses compass directions instead of degree headings¶
Sample row:
PlotID | project_id | PrimOwnerID | SiteCode | MajorGeo | Country | slope_aspect | slope_gradient |
26028 | 23 | peter jorgensen | L1 | South America | Bolivia | NE | 20 |
Error:
! DataError: invalid input syntax for type double precision: "NE" LINE 1: ...slopegradient, reallongitude, authorcode) VALUES (E'NE', E'1... ^ query: INSERT INTO plot (slopeaspect, elevation, confidentialitystatus, reallatitude, slopegradient, reallongitude, authorcode) VALUES (E'NE', E'1800.0', E'0', E'-14.6159', E'20', E'-68.3708', E'L1') ... ! NullValueException: columns: plot_id cause: null value in column "plot_id" violates not-null constraint query: INSERT INTO plotevent (plot_id, methodnarrative, project_id, authorcode) VALUES (NULL, 'Line-intercept transects', 13534, '26091')
Solution: We need to translate the input, but more importantly, we need to recover from bad data by setting the value to NULL and retrying the insert
obsstartdate sometimes uses a date range instead of splitting start and end date into obsstartdate and obsenddate¶
Sample row:
PlotID | project_id | PrimOwnerID | SiteCode | MajorGeo | Country | date_start | date_finish |
26130 | 14 | peter jorgensen | Qumar1 | South America | Bolivia | 22-23/01/2002 | NULL |
Error:
! SyntaxException: Invalid syntax in XML function: unknown string format function: <_date> <date>22-23/01/2002</date> </_date> output row: <VegBIEN> <plotevent id="13558"> <!-- ... --> <obsstartdate> <!-- <_date> <date>22-23/01/2002</date> </_date> --> </obsstartdate>
Data interpretation issues¶
Some organisms in Letcher plots have multiple stem records but a total stem count = 1¶
Organisms affected: 96
Sample organism:
PlotObsID | collector_code | individualCount | stemRecordsCount | stemCountsSum | stem_id | stemCount |
569531 | LETCHER.S | 3 | 3 | 1 | 182479 | 1 |
569531 | LETCHER.S | 3 | 3 | 1 | 182480 | 0 |
569531 | LETCHER.S | 3 | 3 | 1 | 182481 | 0 |
Query:
SELECT counts.PlotObsID AS PlotObsID , collector_code , individualCount , stemRecordsCount , stemCountsSum , stem_id , stems.NoInd AS stemCount FROM ( SELECT plotObservations.PlotObsID AS PlotObsID , collector_code , plotObservations.NoInd AS individualCount , count(*) AS stemRecordsCount , sum(stems.NoInd) AS stemCountsSum FROM plotObservations JOIN stems ON stems.PlotObsID = plotObservations.PlotObsID AND plotObservations.NoInd > 1 GROUP BY PlotObsID HAVING stemCountsSum != individualCount ) counts JOIN stems ON stems.PlotObsID = counts.PlotObsID ORDER BY PlotObsID, stem_id;
Some organisms in Gentry (and other) plots have more stem records than the stem count¶
The Gentry convention for multiple stems per organism seems to be that the first stem record contains the stem count and the rest have stem count = 0
Organisms affected: 2,474
Sample organism:
PlotObsID | collector_code | individualCount | stemRecordsCount | stemCountsSum | stem_id | stemCount |
29276 | GENTRY.AH | 3 | 5 | 3 | 370629 | 3 |
29276 | GENTRY.AH | 3 | 5 | 3 | 370630 | 0 |
29276 | GENTRY.AH | 3 | 5 | 3 | 370631 | 0 |
29276 | GENTRY.AH | 3 | 5 | 3 | 370632 | 0 |
29276 | GENTRY.AH | 3 | 5 | 3 | 370633 | 0 |
Query:
SELECT counts.PlotObsID AS PlotObsID , collector_code , individualCount , stemRecordsCount , stemCountsSum , stem_id , stems.NoInd AS stemCount FROM ( SELECT plotObservations.PlotObsID AS PlotObsID , collector_code , plotObservations.NoInd AS individualCount , count(*) AS stemRecordsCount , sum(stems.NoInd) AS stemCountsSum FROM plotObservations JOIN stems ON stems.PlotObsID = plotObservations.PlotObsID AND plotObservations.NoInd > 1 GROUP BY PlotObsID HAVING stemRecordsCount > 1 AND stemRecordsCount > stemCountsSum ) counts JOIN stems ON stems.PlotObsID = counts.PlotObsID ORDER BY PlotObsID, stem_id;
Some organisms in Gentry (and other) plots have less stem records than the stem count¶
Organisms affected: 47
Sample organism:
PlotObsID | collector_code | individualCount | stemRecordsCount | stemCountsSum | stem_id | stemCount |
31648 | GENTRY.AH | 3 | 2 | 3 | 375318 | 3 |
31648 | GENTRY.AH | 3 | 2 | 3 | 375319 | 0 |
Query:
SELECT counts.PlotObsID AS PlotObsID , collector_code , individualCount , stemRecordsCount , stemCountsSum , stem_id , stems.NoInd AS stemCount FROM ( SELECT plotObservations.PlotObsID AS PlotObsID , collector_code , plotObservations.NoInd AS individualCount , count(*) AS stemRecordsCount , sum(stems.NoInd) AS stemCountsSum FROM plotObservations JOIN stems ON stems.PlotObsID = plotObservations.PlotObsID AND plotObservations.NoInd > 1 GROUP BY PlotObsID HAVING stemRecordsCount > 1 AND stemRecordsCount < stemCountsSum ) counts JOIN stems ON stems.PlotObsID = counts.PlotObsID ORDER BY PlotObsID, stem_id;
Some organisms in Gentry (and other) plots have a single stem record with stem count = individual count¶
Organisms affected: 12,320
Sample organism:
PlotObsID | collector_code | individualCount | stemCount |
29270 | GENTRY.AH | 2 | 2 |
Query:
SELECT plotObservations.PlotObsID AS PlotObsID , collector_code , plotObservations.NoInd AS individualCount , stems.NoInd AS stemCount FROM plotObservations JOIN stems ON stems.PlotObsID = plotObservations.PlotObsID WHERE stems.NoInd > 1 AND plotObservations.NoInd > 1 ORDER BY PlotObsID;
Some organisms have one stem, but different heights in the organisms and stems tables¶
For these organisms, the organism height is always 9.9
, which appears to denote an invalid value.
Unfortunately, 9.9
is also a valid height, so this value cannot be directly filtered out.
Organisms affected: 578
Sample organisms:
PlotObsID | height_m | NoInd | stem_id | stem_height_m |
570252 | 9.9 | 1 | 183152 | 10 |
570254 | 9.9 | 1 | 183154 | 12 |
570255 | 9.9 | 1 | 183155 | 12 |
570256 | 9.9 | 1 | 183156 | 27 |
Query:
SELECT "PlotObsID", height_m, organisms."NoInd", stem_id, stem_height_m FROM "SALVIAS".organisms JOIN "SALVIAS".stems USING ("PlotObsID") WHERE height_m::float != stem_height_m::float ORDER BY "PlotObsID", stem_id