VegX changes¶
- Table of contents
- VegX changes
Schema¶
- Schema download page: VegX Schema 1.5.3
- Simplified schema used by *NVS*: VegXSimple.xsd
New CSV representation¶
In trying to load a VegX export of the CTFS database, we found that there are several issues with large VegX files:
- Each logical row is spread out throughout the XML document because the data is grouped by top-level table instead of by row
- This requires the entire XML file to be imported into memory at once
- Python expands the size of the file significantly (30x+) when creating the DOM tree, which exhausts the available memory and crashes the import
We propose to solve these problems with a CSV format similar to Darwin Core for plots data, which would be both easier to import and easier to generate:
- This new format, named VegCSV, will use just the leaf names from VegX, as well as terms from DwC and VegBank, as CSV column names.
- It will provide a "grab bag" of terms to map to, in the same way that DwC does.
- Nesting of data (organisms within plots, stems within organisms) would be represented as multiple DB tables or CSVs, with each child table record having a foreign key to its parent record
We hope that this format will preserve the content of VegX, while solving the issues involved with using XML.
Changes¶
Structural¶
- Allow nesting individualOrganismObservation inside aggregateOrganismObservation as an alternative
- Currently, individualOrganismObservation and aggregateOrganismObservation must be linked together using a common taxonNameUsageConcept
- Allow all top-level tables to alternatively be nested inside their parent elements
- e.g. individualOrganismObservation inside plotObservation
- This will cause duplication with the existing pointer-based method of connecting parent and child tables
Convert user-defined fields to first-class fields¶
21 of 26 fields (in 4 tables) will be converted:
- plotObservation
- methodNarrative
- parentPlotObservationID
- precipitation
- sourceAccessionCode
- individualOrganismObservation
- collectionDate
- growthForm
- sourceAccessionCode
- individualOrganism
- authorPlantCode
- identificationLabel: multiple copies will be allowed to accommodate tag2
- abioticObservation: may be standardized to soil exchange schema
- acidity
- base
- calcium
- carbon
- cationExchangeCapacity
- clay
- conductivity
- organic
- sand
- silt
- sodium
- texture
Other fields will remain user-defined:
- individualOrganismObservation
- canopyForm
- canopyPosition
- censusNo
- heightFirstBranch
- lianaInfestation
Closed lists¶
/*s/taxonConcept/Rank/@code
/TaxonomicRankEnum/TaxonomicRankSpeciesGroupEnum: addauth