Project

General

Profile

VegCSV

Overview

  • As an extension of *Darwin Core*, VegCSV is intended to expand the benefits of Darwin Core to plots data
  • The mappings for each hierarchical level (database or spreadsheet table) are stored in subdirectory
  • VegCSV's VegCore provides a "grab bag" of terms to map to, in the same way that Darwin Core does

VegCSV = CSV + VegCore

  • VegCSV: The overall CSV-based format
  • VegCore: The vocabulary of terms, which is a superset of Darwin Core (hence the name)

Why VegCSV?

VegCSV vs. VegX

Overall structure

See also VegCSV vision

  • The source CSVs or DB tables will be grouped into subfolders for each hierarchical level (table)
    • When providing a DB export instead of (or in addition to) CSVs, place the PostgreSQL-compatible plain-text .sql file(s) in the top level directory1
    • To group files related to one table together (such as part files for large files that have been split up), separate subfolders will be used instead of filename prefixes to indicate the table.
    • If any CSV column names are duplicated or empty, the subfolder must contain a header override file +header.<ext>, which specifies unique names for each column, with a ! at the beginning of the line
  • Each subfolder is named with a descriptive name for the table (see Suggested table names below)
    • The import order of the tables is specified in an import_order.txt file in the top-level directory
  • Each subfolder contains a map.csv file containing the mappings (described below)
  • Global metadata (such as methodology) can be placed in a single-row CSV table, whose columns are the appropriate DwC terms

1 VegBIEN provides utilities for translating MySQL to PostgreSQL

Important: When exporting relational databases to CSVs, you MUST ensure that embedded quotes are escaped by doubling them, not by preceding them with a "\" as is the default in phpMyAdmin
You also MUST include column names. (If you don't, you will need to add them back separately.)

Map spreadsheet

Each subfolder contains a map.csv file with the following format, in the Excel dialect2:

Datasource name VegCore Filter Comments
column VegCore term e.g. /_alt/1 e.g. Globally-unique identifier for the specimen

Example: *ARIZ map file*

2 The Excel dialect:

  • comma-separated
  • fields enclosed by double quotes (")
  • quotes escaped by doubling them (a"b -> a""b)
  • newlines escaped by enclosing the field in quotes (a<NL>b -> "a<NL>b")

Suggested table names

See VegCore tables

  • Note: if your datasource is a SQL export, use your datasource's table names instead to match up with the directly-imported tables
  • Use as many or as few of these tables as are present in your datasource
    • i.e. Darwin Core can continue to be denormalized in one Specimen table
  • If one of the VegCore tables is not appropriate, use the name of a Darwin Core term (capitalized)
  • Remember to include each table in your import_order.txt

Note that sometimes, source tables will need to be denormalized to fit within a VegBIEN-compatible VegCSV export:

  • Normalized taxonomic hierarchies such as in *VegBank* or CTFS must be denormalized into a Darwin Core-style Taxon table, with each taxonomic level in a column
    This is necessary because each taxon is uniquely identified by its "path", which includes all its ancestors, rather than by its lowest-rank epithet

Sample specimens resource: GBIF

Directory layout:

  • Specimen/
    • map.csv
    • GBIF.txt

Sample plots resource: CTFS

Directory layout:

  • import_order.txt:
    Plot
    Subplot
    PlotObservation
    SubplotObservation
    TaxonOccurrence
    StemObservation
    
  • _src
    • bci.sql
  • Plot/
    • map.csv
    • create.sql
  • Subplot/
    • map.csv
    • Quadrat.csv
  • LocationObservation/
    • map.csv
    • Census.csv
  • SubplotObservation/
    • map.csv
    • CensusQuadrat.csv
  • TaxonObservation/
    • map.csv
    • create.sql
  • StemObservation/
    • map.csv
    • create.sql

VegCore

See VegCore