Project

General

Profile

SALVIAS validation

completed

  1. include only datasets with projects.ipr_specific="There are no specific use conditions attached to this dataset. [...]"
  2. filter out metadata-only plots (AccessCode=2)
  3. map plot contributors (project contributors are part of these)

2014-3-6+

plots-only import command

time yes|(export log= version=SALVIAS_VegBIEN; make schemas/$version/reinstall; make inputs/SALVIAS/{Source,projects,plotMetadata_}/import_scrub by_col=1 n=; bin/make_analytical_db) # runtime: _ min ("_") @vegbiendev

2014-1-22

subset import command

time yes|(export log= version=SALVIAS_VegBIEN; make schemas/$version/reinstall; make inputs/SALVIAS/import_scrub by_col=1 n=295; bin/make_analytical_db) # enough rows to get first plot; runtime: 3.5 min ("3m16.365s") @vegbiendev

query

SET search_path TO "SALVIAS_VegBIEN"; -- needed for locationevent__contributors(), locationevent__communities() for now
SELECT *
FROM "SALVIAS"."plotMetadata_" 
RIGHT JOIN "SALVIAS"."plotObservations" USING ("PlotID")
LEFT JOIN analytical_plot ON
    analytical_plot."datasource" = 'SALVIAS'
AND analytical_plot."individualObservationID" = "plotObservations"."PlotObsID"::text
ORDER BY "plotObservations"."PlotObsID" 
LIMIT 100;

2013-2-20

See SALVIAS.2013-2-20.100_rows.xls and SALVIAS.2013-2-20.100_rows.csv
Input and output data are in separate tabs. Refer to the VegCore data dictionary for column definitions.

Source data used is on nimoy in salvias_plots

Brad Boyle's comments: (e-mail on 2013-2-26)

[Re Plot data validations for SALVIAS and TEAM:]
Both look good. The only problem I could find was the incorrectly formed scientificName_verbatim, but I believe you have fixed that now, correct? [yes] If that issue is fixed then these are both good and I do not need to see them again. Consider them done.

query:

SELECT *
FROM "SALVIAS"."plotMetadata_" 
RIGHT JOIN "SALVIAS"."plotObservations" USING ("PlotID")
LEFT JOIN /*r7624.*/analytical_plot ON
    analytical_plot."datasource" = 'SALVIAS'
AND analytical_plot."individualObservationID" = "plotObservations"."PlotObsID"::text
ORDER BY "plotObservations"."PlotObsID" 
LIMIT 100;

import command:

make inputs/SALVIAS/import_scrub by_col=1 n=295; bin/make_analytical_db # enough rows to get first plot

2013-1-25

See SALVIAS.2013-1-25.100_rows.xls and SALVIAS.2013-1-25.100_rows.csv
Input and output data are in separate tabs. Refer to the VegCore data dictionary for column definitions.

Brad Boyle's comments: (e-mail on 2013-1-31, SALVIAS.2013-1-25.100_rows_bb.xls)

See attached spreadsheet [SALVIAS.2013-1-25.100_rows_bb.xls]. The worksheet "bb_comments" contains my validation comments.

Most of the issues I identified previously have been fixed, which is great. Thanks!

The most serious remaining problem is that scientificNameWithMorphospecies is still being incorrectly formed. The morphospecies string is now missing. Is this a problem with the TNRS output (the missing value should be in unmatchedTerms)? Or do you need to adjust the algorithm used to form scientificNameWithMorphospecies?

Note also my comment about the additional columns pertaining to the original identification and the matched name. As you can see from the email I just sent out, we need to have a discussion about what to display. Currently I'm suggesting we display the original name provided by the data owner, the matched name and the accepted name found by the TNRs. Please note that this change would apply to the analytical database schema and to ALL validation extracts.

When you make this change, please send me another extract. I main need to see how morphospecies names are being formed. To ensure that I see the range of variation, could you send me a large extract, say, 5000 records?

column name Comments
datasource
country
isNewWorld_bien
projectID
locationID
locationName
subplot
individualCode
elevationInMeters
plotArea_ha Should not be null for this plot. This value is given in plotMetadata.plot_area_ha in original schema. Should be 0.1 for this plot.
samplingProtocol
family
genus
speciesBinomial
taxonName
scientificNameAuthorship
scientificNameWithMorphospecies This is still incorrectly formed. The morphospecies strings are missing. For example, for individualCode=7, should be "Ardisia sp.1". The "sp.1" should have been in unmatchedTerms. If not, let me know.
identifiedBy omit; not relevant for TNRS accepted name
dateIdentified omit; not relevant for TNRS accepted name
identificationRemarks omit; not relevant for TNRS accepted name
growthForm
recordedBy
recordNumber
coverPercent
diameterBreastHeight_cm
height_m
tag
organismX_m
organismY_m
taxonOccurrenceID
authorTaxonCode
individualObservationID
authorStemCode
Add the following columns before `family`:
family_verbatim As provided by author
taxonName_verbatim As provided by author
identifiedBy_verbatim
dateIdentified_verbatim
identificationRemarks_verbatim
family_matched_tnrs As matched by TNRS
taxonName_matched_tnrs As matched by TNRS

query: (using one-time import)

SELECT *
FROM "SALVIAS"."plotObservations" 
LEFT JOIN /*r7371.*/analytical_stem_view analytical_stem ON
    analytical_stem."datasource" = 'SALVIAS'
AND analytical_stem."individualObservationID" = "plotObservations"."PlotObsID"::text
ORDER BY "plotObservations"."PlotObsID" 
LIMIT 100;

2013-1-18

See SALVIAS.2013-1-18.50_rows.xls and SALVIAS.2013-1-18.50_rows.csv
Input and output data are in separate tabs

Brad Boyle's comments: (e-mail on 2013-1-18)

I found two issues, listed below. The rest looks good to me.

1. FIXED: scientificName is incorrectly formed. Do not pre-pend family. Family should only be displayed in this field if there is no name at a lower rank. This problem applies to all records in BIEN, including specimens. Please correct before sending out any other data sets for validation.

2. FIXED: scientificNameWithMorphospecies is incorrectly formed. Do not include the author, and only include family if genus is null. Actually, speciesBinomialWithMorphospecies looks perfect to me, but I would need to see more records to understand how you are composing the morphospecies name when genus is null. If such cases, the morphospecies name should be Family plus unmatched_terms. If there is no family, then the morphospecies is simply unmatched terms. If you still have doubts about this, let's have a call early next week to disucss.
scientificNameWithMorphospecies is now formed like speciesBinomialWithMorphospecies, which does satisfy the above conditions

3. FIXED: We do not need two morphospecies columns. Compose scientificNameWithMorphospecies as you are currently composing speciesBinomialWithMorphospecies, and remove speciesBinomialWithMorphospecies.

4. FIXED: The records with missing values of diameterBreastHeight_cm are not in the original data. For example, both the import and export have two records for the following individual, the second record with no dbh: These are the result of creating a stem record to store stem-like attributes which are asserted on the plant itself rather than one of its stems. The stem table constraints have now been changed to remove these records when there are no plant attributes.

locationName    subplot individualCode  family  genus   speciesBinomial diameterBreastHeight_cm
c2000-1 1       1       Arecaceae       Geonoma Geonoma undata  6.59
c2000-1 1       1       Arecaceae       Geonoma Geonoma undata

I'm not sure how those extra records crept into the import, but when I check salvias_plots directly, I get only one:

mysql> SELECT PlotID, PlotCode, Line, Ind, Family, Genus, Species, Habit, stem_dbh
    -> FROM PlotObservations o JOIN stems s
    -> ON o.PlotObsID=s.PlotObsID
    -> WHERE PlotCode="c2000-1" 
    -> ORDER BY Line, Ind, stem_id;
+--------+----------+------+------+------------------+---------------+----------------+-------+----------+
| PlotID | PlotCode | Line | Ind  | Family           | Genus         | Species        | Habit | stem_dbh |
+--------+----------+------+------+------------------+---------------+----------------+-------+----------+
|    295 | c2000-1  | 1    |    1 | Arecaceae        | Geonoma       | seleri         | T     |     6.59 |
|    295 | c2000-1  | 1    |    2 | Clusiaceae       | Tovomitopsis  | allenii        | T     |     3.09 |
|    295 | c2000-1  | 1    |    2 | Clusiaceae       | Tovomitopsis  | allenii        | T     |      5.5 |
|    295 | c2000-1  | 1    |    3 | Clusiaceae       | Tovomitopsis  | allenii        | T     |     13.1 |
|    295 | c2000-1  | 1    |    4 | Celastraceae     | Celastrus     | vulcanicola    | L     |      2.7 |
|    295 | c2000-1  | 1    |    5 | Myrsinaceae      | Ardisia       | sp.4           | T     |      3.7 |
|    295 | c2000-1  | 1    |    6 | Rubiaceae        | Psychotria    | sylvivaga      | T     |      2.5 |
|    295 | c2000-1  | 1    |    6 | Rubiaceae        | Psychotria    | sylvivaga      | T     |      2.7 |
|    295 | c2000-1  | 1    |    6 | Rubiaceae        | Psychotria    | sylvivaga      | T     |      3.9 |
|    295 | c2000-1  | 1    |    6 | Rubiaceae        | Psychotria    | sylvivaga      | T     |        9 |
|    295 | c2000-1  | 1    |    7 | Myrsinaceae      | Ardisia       | sp.1           | T     |      2.5 |
|    295 | c2000-1  | 1    |    7 | Myrsinaceae      | Ardisia       | sp.1           | T     |      4.4 |
|    295 | c2000-1  | 1    |    7 | Myrsinaceae      | Ardisia       | sp.1           | T     |      6.3 |
|    295 | c2000-1  | 1    |    7 | Myrsinaceae      | Ardisia       | sp.1           | T     |       30 |
|    295 | c2000-1  | 1    |    8 | Myrsinaceae      | Ardisia       | sp.1           | T     |      8.8 |
|    295 | c2000-1  | 1    |    9 | Myrsinaceae      | Ardisia       | sp.1           | T     |      5.4 |

query:

SELECT *
FROM "SALVIAS"."plotObservations" 
LEFT JOIN r7268.analytical_stem_view analytical_stem ON
    analytical_stem."datasource" = 'SALVIAS'
AND analytical_stem."individualObservationID" = "plotObservations"."PlotObsID"::text
ORDER BY "plotObservations"."PlotObsID" 
LIMIT 50;

2012-12-7

See SALVIAS.50_rows.xls and SALVIAS.50_rows.csv

Brad Boyle's comments:

conference call on 2013-1-4:

  • diameterBreastHeight_cm, tag, authorStemCode missing
  • FIXED: need to add authorPlotCode, subplot, authorPlantCode to analytical DB for use in validation
  • when Ind is provided, an individual is also identified by PlotCode + Line + Ind (in addition to PlotObsID)
  • when tags provided, an individual is also identified by PlotCode + Line + tag1/tag2
  • FIXED: Ind should be remapped to authorPlantCode, not recordNumber
  • FIXED: coll_number should be remapped to recordNumber, not catalogNumber
  • FIXED: morphoname is the taxon name assigned in the field (the original taxondetermination), which is unique within the plot
  • Genus, etc. are scrubbed versions of OrigGenus, etc.
    • usually automatically scrubbed by Taxon Scrubber, but some were manually corrected

e-mail on 2013-1-4:

I'm not sure I can validate the SALVIAS data using this extract. Several critical elements are missing, among them:

  1. FIXED: Plot codes
  2. FIXED: Subplot codes or identifiers
  3. Individual codes or other identifiers of individual plants (as opposed to stems)
  4. FIXED: Stem diameters

Perhaps these elements aren't actually missing, but are merely absent from the view you used to extract the validation record. The columns present in your file are appropriate for specimens, but not for plots.

query:

SELECT * FROM "public.2012-12-05-12-51-00.r6646".analytical_stem WHERE "institutionCode" = 'SALVIAS' LIMIT 50;