2012-07-26 conference call¶
To do¶
Add datasource_id wherever there's a sourceaccessioncode, and map to it (implicitly?)- This eliminates the need for materialized "breadcrumbs" in the staging tables, because the immediate parent can be looked up with just the input table's fkey and datasource name
Allow an input to appear multiple times in the same map spreadsheet (issue #463)- Best to do this before removing distinct tables, so that collisions requiring _alt are visually obvious
Remove distinct tables in map spreadsheets- Should come before Flatten VegX because once VegX is flat, there won't be a concept of mappings "tables" other than prefixes for column names
Flatten VegX to DwC-like standard column names- Use DwC, then VegX, then VegBank as the "standard" names
- source of the term will be tracked in separate "Source" column in the main map spreadsheet
- also incorporate datasource's custom names when these are unambiguous
- "grab bag" of terms, like we currently do with DwC
- Use DwC, then VegX, then VegBank as the "standard" names
Allow data provider to supply custom ordering of tables (hierarchical levels)- Flatten XML files to CSV for use with staging tables
- Translate remaining XML functions to SQL functions
- Filter correctable errors in errors tables
- Parse date ranges
- Date with unknown day -> middle of month:
00, XX, xx, ??
- or record uncertainty? would require indeterminate dates in Postgres
- Data provider feedback: Query to translate value-by-value table to per-record
- just provide (e-mail) CSV to data provider; don't need web interface
- Detect frameshift errors and discard those rows
- DwC lat/long with
"
for seconds throws off CSV parsing - Need better ways to deal with non-standard CSV dialects
- Heuristic field start/end detection to ignore embedded quotes?
- DwC lat/long with
Questions¶
- Should we reject specimens without a date?
- But date-filtered scientific queries will automatically ignore those, due to NULL propagation, so such dates aren't a problem
- Also, this would lead to us discarding (many?) records from the input data, which BIEN3 was supposed to avoid
- Should we reject specimens without a year/month?
- This would require storing dates as separate YMD columns
- If we make this change, NULL propagation would work as above