VegCSV¶
- Map refactoring
- VegCore
- VegCSV-DwC merging
- VegCSV FAQ
- VegCSV subfolders
- VegCSV vision
- VegCSV vs VegX
- VegX to VegCSV
- Table of contents
- VegCSV
Overview¶
- As an extension of *Darwin Core*, VegCSV is intended to expand the benefits of Darwin Core to plots data
- The mappings for each hierarchical level (database or spreadsheet table) are stored in subdirectory
- VegCSV's VegCore provides a "grab bag" of terms to map to, in the same way that Darwin Core does
VegCSV = CSV + VegCore¶
- VegCSV: The overall CSV-based format
- VegCore: The vocabulary of terms, which is a superset of Darwin Core (hence the name)
Why VegCSV?¶
Overall structure¶
See also VegCSV vision
- The source CSVs or DB tables will be grouped into subfolders for each hierarchical level (table)
- When providing a DB export instead of (or in addition to) CSVs, place the PostgreSQL-compatible plain-text .sql file(s) in the top level directory1
- To group files related to one table together (such as part files for large files that have been split up), separate subfolders will be used instead of filename prefixes to indicate the table.
- If any CSV column names are duplicated or empty, the subfolder must contain a header override file
+header.<ext>
, which specifies unique names for each column, with a!
at the beginning of the line
- Each subfolder is named with a descriptive name for the table (see Suggested table names below)
- The import order of the tables is specified in an
import_order.txt
file in the top-level directory
- The import order of the tables is specified in an
- Each subfolder contains a map.csv file containing the mappings (described below)
- Global metadata (such as methodology) can be placed in a single-row CSV table, whose columns are the appropriate DwC terms
1 VegBIEN provides utilities for translating MySQL to PostgreSQL
Important: When exporting relational databases to CSVs, you MUST ensure that embedded quotes are escaped by doubling them, not by preceding them with a "\"
as is the default in phpMyAdmin
You also MUST include column names. (If you don't, you will need to add them back separately.)
Map spreadsheet¶
Each subfolder contains a map.csv
file with the following format, in the Excel dialect2:
Datasource name | VegCore | Filter | Comments |
column | VegCore term | e.g. /_alt/1 |
e.g. Globally-unique identifier for the specimen |
Example: *ARIZ map file*
2 The Excel dialect:
- comma-separated
- fields enclosed by double quotes (")
- quotes escaped by doubling them (a"b -> a""b)
- newlines escaped by enclosing the field in quotes (a<NL>b -> "a<NL>b")
Suggested table names¶
See VegCore tables
- Note: if your datasource is a SQL export, use your datasource's table names instead to match up with the directly-imported tables
- Use as many or as few of these tables as are present in your datasource
- i.e. Darwin Core can continue to be denormalized in one Specimen table
- If one of the VegCore tables is not appropriate, use the name of a Darwin Core term (capitalized)
- Remember to include each table in your
import_order.txt
Note that sometimes, source tables will need to be denormalized to fit within a VegBIEN-compatible VegCSV export:
- Normalized taxonomic hierarchies such as in *VegBank* or CTFS must be denormalized into a Darwin Core-style Taxon table, with each taxonomic level in a column
This is necessary because each taxon is uniquely identified by its "path", which includes all its ancestors, rather than by its lowest-rank epithet
Sample specimens resource: GBIF¶
Directory layout:
Specimen/
map.csv
GBIF.txt
Sample plots resource: CTFS¶
Directory layout:
import_order.txt
:Plot Subplot PlotObservation SubplotObservation TaxonOccurrence StemObservation
_src
bci.sql
Plot/
map.csv
create.sql
Subplot/
map.csv
Quadrat.csv
LocationObservation/
map.csv
Census.csv
SubplotObservation/
map.csv
CensusQuadrat.csv
TaxonObservation/
map.csv
create.sql
StemObservation/
map.csv
create.sql
VegCore¶
See VegCore