inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
inputs/VegBank/vegbank.~.clean_up.sql, inputs/CVS/cvs.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined: put tables in alphabetical order for consistency
inputs/CVS/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
inputs/CVS/taxon_observation.**/postprocess.sql: added the project table
inputs/CVS/project/map.csv: mapped stopDate->projectEndDate
mapped inputs/CVS/project/, which includes the projectName for attribution
inputs/CVS/^taxon_observation.**.sample/create.sql: added Mike Lee's additional plots used to validate confidentiality-related fields (wiki.vegpath.org/CVS_validation#plots-to-include)
bugfix: inputs/CVS/^taxon_observation.**.sample/create.sql: include taxonName in the subset of columns that's imported for the validation, because it is _alt-ed with scientificName for forming the TNRS input name. this is unique to CVS, which is why it was not part of the validation subset copied from the VegBank subset.
fix: inputs/CVS/^taxon_observation.**.sample/: added _no_import because this table duplicates part of what's imported from taxon_observation.**
inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: reran test, which added yearCollected/monthCollected/dayCollected
inputs/CVS/plantConcept_/create.sql: documented runtime (3 min)
inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible
inputs/CVS/^taxon_observation.**.sample/create.sql: uncommented identifiedBy since this is now part of taxonObservation_
fix: inputs/CVS/observation_community/create.sql: communityName: populate from commConcept.commName instead, because commInterpretation.commname is not always populated. this requires left-joining to commConcept.
inputs/CVS/observation_community/map.csv: updated output column names to new input column names, to avoid later output column collisions
inputs/CVS/observation_community/header.csv, map.csv: updated input column names for cvs.~.clean_up.sql renamings
inputs/CVS/cvs.~.clean_up.sql: commClass, commConcept fields: prepend table name to avoid inter-table collisions upon join
added inputs/CVS/observation_community/, as for VegBank
inputs/CVS/cvs.~.clean_up.sql: commClass.dba_src_ID: prepend table name to avoid inter-table collisions upon join
added inputs/CVS/observationContributor_/, which adds the people collecting the plot
inputs/CVS/cvs.~.clean_up.sql: observationContributor.dba_src_ID: prepended table name to avoid collision when left-joining to party
fix: inputs/CVS/taxonObservation_/create.sql: mapped identifiedBy, which involves joining to party
inputs/CVS/cvs.~.clean_up.sql: don't rename taxonInterpretation.PARTY_ID, so that this can be USING-joined to party in inputs/CVS/taxonObservation_/create.sql
inputs/CVS/^taxon_observation.**.sample/map.csv: synced output columns to input columns (which removes the extra *s)
fix: inputs/CVS/plot_/postprocess.sql: locality: include the site name (authorLocation), because this is part of the unique specification of the place that was sampled, and Bob wants this to be included in VegBIEN
inputs/CVS/^taxon_observation.**.sample/create.sql: removed parentLocationID, since this is unused in CVS
fix: inputs/CVS/taxon_observation.**/map.csv: omit authorPlantName because it is not specific to the taxonInterpretation row (this is in a separate taxonInterpretation for the original determination instead)
fix: inputs/CVS/plot_/map.csv: PARENT_ID: remapped to UNUSED, to clarify that subplots are not implemented through this field
**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)
added inputs/CVS/^taxon_observation.**.sample/, used for the extract. note that the column list is slightly different than for VegBank.
inputs/CVS/taxonObservation_/map.csv: removed taxonObservation_-- prefix from terms that do not need to be table-specific (like for VegBank)
fix: inputs/CVS/taxonObservation_/map.csv: plantConcept_ columns: synced input and output column names to their names in plantConcept_
inputs/CVS/plantConcept_/map.csv: removed plantConcept_-- prefix from terms that do not need to be table-specific (like for VegBank)
bugfix: inputs/CVS/import_order.txt: added taxon_observation.**
inputs/CVS/: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead
inputs/CVS/: added taxon_observation.** left-join of the tables, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource. this involves renaming taxonOccurrenceID->taxonOccurrenceID__overall_plot so that it can then be joined together with aggregateOrganismObservationID to create the full taxonOccurrenceID (as in VegBank).
inputs/CVS/stemCount_/map.csv: remapped stratum_ID->*STRATUM_ID so it would match up with stratum.*STRATUM_ID
inputs/CVS/taxonObservation_/map.csv: mapped TAXONINTERPRETATION_ID to identificationID
added inputs/CVS/stratum/
added inputs/CVS/stratumType/
inputs/CVS/: prepended the table name to each column name to prevent column collisions, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource
bugfix: inputs/CVS/plantConcept_/map.csv: PLANTCONCEPT_ID: remapped without * prefix so that the USING join in inputs/CVS/taxonObservation_/create.sql would continue to work
inputs/CVS/taxonObservation_/header.csv, map.csv: updated to use plantConcept_ renamed columns
inputs/CVS/: switched to new-style import, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource
inputs/CVS/taxonObservation_/map.csv: updated for CVS refresh
inputs/CVS/taxonObservation_/map.csv: updated input column names to plantConcept_ renamings
inputs/CVS/plantConcept_/header.csv, map.csv: updated for CVS refresh
fix: inputs/CVS/plot_/map.csv: removed filter-less collisions. note that the name county_ is assigned in plot_/create.sql, not cvs.~.clean_up.sql as one might expect, because this is a generated column.
fix: inputs/CVS/plot_/map.csv: removed filter-less collisions
fix: inputs/CVS/taxonObservation_/map.csv: moved inherited derived columns to right after the other columns, because for this table, these are actually real input columns rather than appended derived columns. the column order must match header.csv to avoid mis-renamings.
inputs/CVS/taxonObservation_/map.csv: removed filter functions, which are now performed in plantConcept_
inputs/CVS/taxonObservation_/postprocess.sql: added _parent index to facilitate joins
fix: inputs/CVS/taxonObservation_/header.csv, map.csv: updated for CVS refresh and addition of plantConcept_ derived columns
inputs/CVS/stemCount_/: translated filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns. note that the inserted row count changes, because there is now a primary key (which the table is auto-sorted by) where previously there was none.
inputs/CVS/plot_/: translated column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns
inputs/CVS/plot_/postprocess.sql: added pkey from the primary joined table
inputs/CVS/plot_/map.csv: documented assumptions about the units of fields
inputs/CVS/plot_/map.csv: documented assumptions about the units and meaning of numeric codes for fields
inputs/CVS/plantConcept_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns
inputs/CVS/plantConcept_/postprocess.sql: added pkey from the primary joined table
inputs/CVS/observation_/postprocess.sql: added pkey from the primary joined table. added _parent index to facilitate joins.
bugfix: inputs/CVS/observation_/create.sql: only include one soilObs for each observation (using DISTINCT ON), rather than just left-joining them
bugfix: inputs/CVS/stemCount_/map.csv: ensure the aggregateoccurrence.sourceaccessioncode is always populated, because this is a required field when using sourceaccessioncodes. without it, the import will exclude rows which lack a value in this field because it cannot deduplicate on it for these rows, leading to the dropping of large numbers of occurrences. this shows up when comparing provider_count to the input table's row count, and produces the following error in the .errors table:...
copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.
inputs/CVS/run: `make .../reinstall`: documented vegbiendev runtime (45 min)
removed inputs/CVS/cvs-archive-2012-12-04.schema.sql, which has been replaced by cvs-eep-archive-2013-10-22-VegBIEN.schema.sql
added inputs/CVS/_src/cvs-eep-archive-2013-10-22-VegBIEN.zip.url
added inputs/CVS/cvs-eep-archive-2013-10-22-VegBIEN.schema.sql
inputs/CVS/run: documented `make .../reinstall` runtime (25 min)
added inputs/CVS/_src/cvs-eep-archive-2013-10-22-VegBIEN.schema.sql
added inputs/CVS/_src/cvs-eep-archive-2013-10-22-VegBIEN.schema.sql.run, which makes the SQL suitable for PostgreSQL
mappings/VegCore-VegBIEN.csv: mapped taxon_determination__is_current, taxon_determination__is_original
bugfix: mappings/VegCore-VegBIEN.csv: main taxondetermination: use [!isoriginal=true] instead of [!isoriginal] so that adding a manual isoriginal field does not prevent this selector from matching
mappings/VegCore.htm: regenerated from wiki. added taxon_determination__is_current, taxon_determination__is_original.
inputs/CVS/_src/: added refresh from Mike Lee
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
inputs/CVS/plot_/map.csv: realLatitude, realLongitude: remapped to UNUSED because these columns are actually empty
inputs/CVS/taxonObservation_/map.csv: collector_ID: remapped it to UNUSED and removed the join to party via it, like in VegBank
inputs/CVS/: deleted stemLocation_, because the CVS stemLocation table is empty (unlike VegBank)
inputs/CVS/import_order.txt: added plantConcept_/ so it would get automapped after switching to new-style import
inputs/CVS/taxonObservation_/map.csv: denorm_{tri,quad}*: mapped to infraspecificRank*, infraspecificEpithet*
inputs/CVS/taxonObservation_/map.csv: infraspecific ranks: remapped to EQUIV#to:species (which is the speciesBinomial), because these actually contain the full taxonomic name at that rank, like VegBank
inputs/CVS/taxonObservation_/map.csv: genus: documented that unlike VegBank, does not include genus author
inputs/CVS/taxonObservation_/map.csv: denorm_* terms _alt-ed with normalized terms: use DUPLICATE#of instead where possible. documented where and why _alt was necessary (this applies to a few rows for division, genus).
bugfix: inputs/CVS/taxonObservation_/map.csv: species: remapped to speciesBinomial, not specificEpithet (like for VegBank). however, note that denorm_species is in fact the epithet, unlike VegBank.
fix: inputs/CVS/taxonObservation_/postprocess.sql: removed {} around denorm_genus to match the normalized genus
inputs/CVS/taxonObservation_/map.csv: removed unnecessary alts for terms that don't have a duplicate denorm* or hierarchical field
fix: inputs/CVS/taxonObservation_/postprocess.sql: fix 1 row that has denorm_kingdom != Kingdom (i.e. both NOT NULL but not the same)
bugfix: inputs/CVS/plot_/create.sql: like for VegBank, need to compare place.*PLOT_ID*, not PLOTPLACE_ID, with plot.PLOT_ID
bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.
bugfix: mappings/VegCore-VegBIEN.csv: stratum's locationevent: link this to the parent locationevent, so that the parent locationevent's information (such as locationeventcontributors) is accessible to the stratum's locationevent
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
inputs/CVS/stemLocation_/test.xml.ref: set inserted row count back. it had changed because $version was still set in the environment, and this was causing a non-emtpty public schema to be used as the testing schema.
inputs/CVS/stemLocation_/test.xml.ref: updated inserted row count