Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 almost 13 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  backups 3701 over 12 years Aaron Marcuse-Kubitza backups/Makefile: Added synchronization of back...
  bin 4049 over 12 years Aaron Marcuse-Kubitza bin/map: collision_suffix: Changed to use _merg...
  config 272 about 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 4049 over 12 years Aaron Marcuse-Kubitza bin/map: collision_suffix: Changed to use _merg...
  lib 4041 over 12 years Aaron Marcuse-Kubitza xml_func.py: Added simplify()
  mappings 4046 over 12 years Aaron Marcuse-Kubitza mappings/DwC2-VegBIEN.specimens.csv, VegCSV-Veg...
  schemas 4050 over 12 years Aaron Marcuse-Kubitza schemas/functions.sql: join_strs() aggregate: U...
  to_do 2547 over 12 years Aaron Marcuse-Kubitza to_do/timeline.doc: Updated to reflect the mont...
Makefile 10.1 KB 3764 over 12 years Aaron Marcuse-Kubitza root Makefile, input.Makefile: Maps validation:...
README.TXT 9.03 KB 3845 over 12 years Aaron Marcuse-Kubitza README.TXT: After a new import: Added steps to ...
map 1.22 KB 3475 over 12 years Aaron Marcuse-Kubitza root map: Run bin/map with a nice increment of ...

Latest revisions

# Date Author Comment
4050 08/15/2012 07:06 AM Aaron Marcuse-Kubitza

schemas/functions.sql: join_strs() aggregate: Use join_strs_transform_preserve_empty() as an optimization because all our data has already had '' replaced with NULL by sql_io.cleanup_table() in csv2db. This will help speed up _merges now that they are performed on a large scale in the slowest datasource, SpeciesLink.

4049 08/15/2012 07:02 AM Aaron Marcuse-Kubitza

bin/map: collision_suffix: Changed to use _merge instead of _alt to avoid losing source data on import when multiple fields collide

4048 08/15/2012 06:58 AM Aaron Marcuse-Kubitza

bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed

4047 08/15/2012 06:56 AM Aaron Marcuse-Kubitza

bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed

4046 08/15/2012 06:52 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence.sourceaccessioncode mappings: Added catalogNumber mapping, which takes precendence over recordNumber and is applicable to specimens data and direct vouchers. recordNumber should only be used as a last resort (before the taxon name) because this is collector-assigned and often not unique within anything.

4045 08/15/2012 06:34 AM Aaron Marcuse-Kubitza

mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Moved direct/indirect voucher _ifs inwards to wrap just the value of catalognumber_dwc, not the catalognumber_dwc field node, so that a future SQL function implementation of _if only needs to concern itself with returning one value or another, not with handling XML subtrees. The previous moving of the _ifs in r3942 was intended to effect this, but the _ifs weren't moved in far enough to wrap just the value.

4044 08/15/2012 06:21 AM Aaron Marcuse-Kubitza

mappings/VegCSV-VegBIEN.specimens.csv: eventDate mappings: Removed collectiondate mapping because the eventDate refers only to the plot event. Added /_alt suffixes for mergability with DwC.

4043 08/15/2012 06:15 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Split eventDate into eventDate and dateCollected, where eventDate refers only to the date of the sampling event, but dateCollected also refers to the date the particular specimen was collected. (This distinction is important in merging with VegCSV, because in plots data, these two fields are distinct.) Remapped datasources with dateCollected-related fields to new dateCollected.

4042 08/15/2012 05:55 AM Aaron Marcuse-Kubitza

bin/map: Run new xml_func.simplify() on the root before printing the put template, so that _alts and _merges with only one element for the current datasource will be printed in their simplified form (with the _alt/_merge removed). This faciliates automated testing after an _alt/_merge suffix has been added, because the put template provided as part of the automated test will only change for those datasources that actually have an entry for both mappings, which greatly reduces the number of tests that need to be accepted.

4041 08/15/2012 05:51 AM Aaron Marcuse-Kubitza

xml_func.py: Added simplify()

View all revisions | View revisions

Also available in: Atom