2012-07-26 conference call¶

To do¶

Add datasource_id wherever there's a sourceaccessioncode, and map to it (implicitly?)
- This eliminates the need for materialized "breadcrumbs" in the staging tables, because the immediate parent can be looked up with just the input table's fkey and datasource name
~~Allow an input to appear multiple times in the same map spreadsheet (issue #463)~~
- Best to do this before removing distinct tables, so that collisions requiring _alt are visually obvious
~~Remove distinct tables in map spreadsheets~~
- Should come before Flatten VegX because once VegX is flat, there won't be a concept of mappings "tables" other than prefixes for column names
Flatten VegX to DwC-like standard column names
- Use DwC, then VegX, then VegBank as the "standard" names
  - source of the term will be tracked in separate "Source" column in the main map spreadsheet
- also incorporate datasource's custom names when these are unambiguous
- "grab bag" of terms, like we currently do with DwC
~~Allow data provider to supply custom ordering of tables (hierarchical levels)~~
Flatten XML files to CSV for use with staging tables
Translate remaining XML functions to SQL functions
Filter correctable errors in errors tables
- Parse date ranges
- Date with unknown day -> middle of month: 00, XX, xx, ??
  - or record uncertainty? would require indeterminate dates in Postgres
Data provider feedback: Query to translate value-by-value table to per-record
- just provide (e-mail) CSV to data provider; don't need web interface
Detect frameshift errors and discard those rows
- DwC lat/long with " for seconds throws off CSV parsing
- Need better ways to deal with non-standard CSV dialects
  - Heuristic field start/end detection to ignore embedded quotes?

Questions¶

Should we reject specimens without a date?
- But date-filtered scientific queries will automatically ignore those, due to NULL propagation, so such dates aren't a problem
- Also, this would lead to us discarding (many?) records from the input data, which BIEN3 was supposed to avoid
Should we reject specimens without a year/month?
- This would require storing dates as separate YMD columns
- If we make this change, NULL propagation would work as above

Files (0)

Project

General

Profile

Wiki

2012-07-26 conference call¶

To do¶

Questions¶