Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 over 12 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  backups 3701 over 12 years Aaron Marcuse-Kubitza backups/Makefile: Added synchronization of back...
  bin 4068 over 12 years Aaron Marcuse-Kubitza bin/map: collision_suffix: Setting back to _alt...
  config 272 almost 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 4076 over 12 years Aaron Marcuse-Kubitza schemas/vegbien.sql: Moved collectionnumber fro...
  lib 4041 over 12 years Aaron Marcuse-Kubitza xml_func.py: Added simplify()
  mappings 4076 over 12 years Aaron Marcuse-Kubitza schemas/vegbien.sql: Moved collectionnumber fro...
  schemas 4076 over 12 years Aaron Marcuse-Kubitza schemas/vegbien.sql: Moved collectionnumber fro...
  to_do 2547 over 12 years Aaron Marcuse-Kubitza to_do/timeline.doc: Updated to reflect the mont...
Makefile 10.1 KB 3764 over 12 years Aaron Marcuse-Kubitza root Makefile, input.Makefile: Maps validation:...
README.TXT 9.03 KB 3845 over 12 years Aaron Marcuse-Kubitza README.TXT: After a new import: Added steps to ...
map 1.22 KB 3475 over 12 years Aaron Marcuse-Kubitza root map: Run bin/map with a nice increment of ...

Latest revisions

# Date Author Comment
4076 08/16/2012 02:22 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Moved collectionnumber from specimenreplicate to plantobservation to replace authorplantcode, since these terms are used analogously in plots and specimens data. This code is really the DwC recordNumber (VegBIEN collectionnumber), which "serves as a link between field notes and an Occurrence record, such as a specimen [or plots data] collector's number" (http://rs.tdwg.org/dwc/terms/#recordNumber). Also, this prevents a specimenreplicate from incorrectly being created when plots data provides an authorplantcode.

4075 08/16/2012 01:55 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Mapped individualID for mergability with VegCSV

4074 08/16/2012 01:49 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: Split occurrenceID into occurrenceID and individualID, where individualID refers to the plant in plots data and occurrenceID refers to the specimen in specimens data. This prevents plant sourceaccessioncodes from being mapped to the specimenreplicate, which was messing up stems mappings for the parent plantobservation. It also avoids mapping the specimenreplicate sourceaccessioncode to additional tables where it isn't needed. (Note that occurrenceID is needed for location to ensure that each specimen gets its own location to make locationdeterminations on. Everything else is directly or indirectly scoped by location when its own sourceaccessioncode isn't specified.)

4073 08/16/2012 01:33 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Removed catalogNumber mapping because the catalogNumber applies only to the specimen, not to the occurrence, especially in plots data

4072 08/16/2012 01:14 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Map everything except occurrenceID (which is globally unique) to new authortaxoncode, which only needs to be unique within the locationevent

4071 08/16/2012 12:59 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonoccurrence: Renamed taxonoccurrence_locationevent_1_to_1 to taxonoccurrence_unique_within_locationevent and added new authortaxoncode to it

4070 08/16/2012 12:57 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonoccurrence: Added authortaxoncode to store unique keys that are unique within the locationevent rather than within the datasource

4069 08/16/2012 12:43 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/maps/VegCSV.organisms.csv: Added _alt to height_m, stem_height_m to choose between them when both are specified (rather than having bin/map choose their priority order based on their order in the map). Note that when both of the heights are specified, they are always either the same, or height_m is invalid (see <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Some-organisms-have-one-stem-but-different-heights-in-the-organisms-and-stems-tables&gt;).

4068 08/16/2012 12:39 PM Aaron Marcuse-Kubitza

bin/map: collision_suffix: Setting back to _alt to test if _merge caused the SpeciesLink slowdown. SpeciesLink contains a huge number of equivalent columns due to each DwC term being present with namespaces for all versions of the DwC schema, and these columns can be combined either using _alt or _merge. _merge is only useful if the values in different versions of the same DwC field are different, which is not likely the case.

4067 08/16/2012 12:29 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) doubled, to 16 hours, most likely due to replacing _alt with the slower _merge, which preserves more input data.

View all revisions | View revisions

Also available in: Atom