bugfix: inputs/SALVIAS/party_code_party_/create.sql: need to remove duplicate entries in party_code_party
inputs/SALVIAS/party_code_party_/map.csv: mapped fullname->event_participant_name for use by other tables
mapped inputs/SALVIAS/party_code_party_/
inputs/SALVIAS/_MySQL/salvias_plots.*.sql: refreshed. this adds the party and party_code_party tables Brad provided for mapping the plot contributors.
fix: inputs/SALVIAS/salvias_plots.~.clean_up.sql: Delete rows that do not satisfy foreign key constraints: also need to do this for plotObservations, since the refreshed data contains dangling rows for that as well
inputs/SALVIAS/run_: documented *.sql install runtime (3 min), as separate from the full `datasrc_make reinstall` runtime (3.5 min)
inputs/SALVIAS/run_: refresh(): `datasrc_make reinstall`: updated runtime. documented that runtimes are from starscream.
added inputs/SALVIAS/run_, which includes a refresh() target
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
bugfix: inputs/.TNRS/schema.sql: scrubbed_family: Name_matched_accepted_family was missing from the TNRS results at one point, so we are now using Family_matched as a workaround to populate this. the workaround is for accepted names only, as no opinion names do not have an Accepted_name_family to prepend to the scrubbed name to parse.
inputs/.TNRS/schema.sql: reexported from live DB, which changes the element order
inputs/VegBank/import_order.txt: added projectcontributor_
inputs/VegBank/projectcontributor_/map.csv, postprocess.sql: added project_participant
added inputs/VegBank/projectcontributor_/
inputs/VegBank/vegbank.~.clean_up.sql: projectcontributor.surname: prepend table name to avoid join collisions
inputs/VegBank/vegbank.~.clean_up.sql, inputs/CVS/cvs.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined: put tables in alphabetical order for consistency
inputs/publishable datasources.xlsx: updated
inputs/datasource_release_status.xlsx: renamed to `publishable datasources.xlsx` to match the spreadsheet title
inputs/VegBank/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
inputs/VegBank/taxon_observation.**/postprocess.sql: added the project table
mapped inputs/VegBank/project/, which includes the projectName for attribution
inputs/CVS/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
inputs/CVS/taxon_observation.**/postprocess.sql: added the project table
inputs/CVS/project/map.csv: mapped stopDate->projectEndDate
mapped inputs/CVS/project/, which includes the projectName for attribution
inputs/VegBIEN/Redmine/svn/.htaccess: updated to use much faster direct repository URL rather than Redmine web interface, now that the repository itself is publicly accessible in addition to the Redmine view of it
fix: inputs/TEX/Specimen*/map.csv, postprocess.sql: habitat: also placed in occurrenceRemarks so that this field gets parsed for growth form information, as requested by Brad (wiki.vegpath.org/TEX_validation#2013-2-26)
fix: inputs/TEX/Specimen*/map.csv: mapped constant values for specimenHolderInstitutions, country. these have to be added with `rm=1 ./inputs/TEX/Specimen.../run postprocess`.
bugfix: inputs/TEX/Specimen2/map.csv: mapped BARCODE to accessionNumber so that we have a unique ID for each row
inputs/datasource_release_status.xlsx: updated
inputs/CVS/^taxon_observation.**.sample/create.sql: added Mike Lee's additional plots used to validate confidentiality-related fields (wiki.vegpath.org/CVS_validation#plots-to-include)
bugfix: inputs/CVS/^taxon_observation.**.sample/create.sql: include taxonName in the subset of columns that's imported for the validation, because it is _alt-ed with scientificName for forming the TNRS input name. this is unique to CVS, which is why it was not part of the validation subset copied from the VegBank subset.
bugfix: inputs/.TNRS/schema.sql: granted bien_read SELECT access to derived views as well as the core tnrs table
updated inputs/datasource_release_status.xlsx
added inputs/datasource_release_status.xlsx, export of Google spreadsheet at https://docs.google.com/spreadsheet/ccc?key=0ArZXrTAXd-TYdDRRb2RxYi11TWZrQVh5bVdKOURCeFE
fix: inputs/CVS/^taxon_observation.**.sample/: added _no_import because this table duplicates part of what's imported from taxon_observation.**
bugfix: inputs/VegBank/plot/: added _no_import because this table is left-joined and should not be imported separately
bugfix: inputs/{.NCBI,CTFS}/*.src/: added _no_import because these tables are left-joined and should not be imported separately
inputs/import.stats.xls: removed table names from datasources where only one table is imported
fix: inputs/import.stats.xls: removed deleted tables from current import
inputs/import.stats.xls: updated import times
inputs/GBIF/raw_occurrence_record_plants/map.csv: row_num: remapped to plain *row_num, like the other datasources that have this field
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: rerun time: noted that this is only fast after manual vacuuming of the table (to remove the deleted rows from the index). autovacuum apparently does not run, although it should.
inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: reran test, which added yearCollected/monthCollected/dayCollected
inputs/CVS/plantConcept_/create.sql: documented runtime (3 min)
inputs/CTFS/*.src/: added test.xml.ref
inputs/CTFS/*.src/: added VegBIEN.csv
bugfix: inputs/CTFS/TaxonOccurrence*/map.csv: things mapped to taxonObservationID: remapped to taxonOccurrenceID since taxonObservationID is not mapped to anything in VegBIEN (denormalized VegCore doesn't distinguish between taxon occurrences and taxon observations of them)
bugfix: inputs/ARIZ/~.clean_up.sql: prevent "column already exists" errors when there is an input column of the same name as an output column
inputs/.geoscrub/import_order.txt: added county_centroids so that it would be installed by new-style import
inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h)
inputs/GBIF/raw_occurrence_record_plants/run: updated import() runtime (same), documented table cleanup runtime (1.5 h)
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: CREATE INDEX ... specimenHolderInstitutions: documented runtime (45 min)
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented runtime (3.5 min)
bugfix: inputs/CTFS/import_order.txt: added *.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in */create.sql. note that VegCore taxonOccurrenceID has been renamed to taxonObservationID since this was last run.
inputs/.geoscrub/run: documented import() runtime (20 min)
bugfix: inputs/.NCBI/import_order.txt: added nodes.src, names.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in nodes/create.sql.
bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)
inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy
inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible
inputs/CVS/^taxon_observation.**.sample/create.sql: uncommented identifiedBy since this is now part of taxonObservation_
fix: inputs/CVS/observation_community/create.sql: communityName: populate from commConcept.commName instead, because commInterpretation.commname is not always populated. this requires left-joining to commConcept.
inputs/CVS/observation_community/map.csv: updated output column names to new input column names, to avoid later output column collisions
inputs/CVS/observation_community/header.csv, map.csv: updated input column names for cvs.~.clean_up.sql renamings
inputs/CVS/cvs.~.clean_up.sql: commClass, commConcept fields: prepend table name to avoid inter-table collisions upon join
added inputs/CVS/observation_community/, as for VegBank
inputs/CVS/cvs.~.clean_up.sql: commClass.dba_src_ID: prepend table name to avoid inter-table collisions upon join
added inputs/CVS/observationContributor_/, which adds the people collecting the plot
inputs/CVS/cvs.~.clean_up.sql: observationContributor.dba_src_ID: prepended table name to avoid collision when left-joining to party
bugfix: inputs/input.Makefile: %/header.csv: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`
fix: inputs/CVS/taxonObservation_/create.sql: mapped identifiedBy, which involves joining to party
inputs/CVS/cvs.~.clean_up.sql: don't rename taxonInterpretation.PARTY_ID, so that this can be USING-joined to party in inputs/CVS/taxonObservation_/create.sql
inputs/CVS/^taxon_observation.**.sample/map.csv: synced output columns to input columns (which removes the extra *s)
fix: inputs/CVS/plot_/postprocess.sql: locality: include the site name (authorLocation), because this is part of the unique specification of the place that was sampled, and Bob wants this to be included in VegBIEN
inputs/CVS/^taxon_observation.**.sample/create.sql: removed parentLocationID, since this is unused in CVS
bugfix: inputs/input.Makefile: `%/install: %/create.sql`: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`
inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled
fix: inputs/CVS/taxon_observation.**/map.csv: omit authorPlantName because it is not specific to the taxonInterpretation row (this is in a separate taxonInterpretation for the original determination instead)
fix: inputs/CVS/plot_/map.csv: PARENT_ID: remapped to UNUSED, to clarify that subplots are not implemented through this field
inputs/input.Makefile: scrub: clarified that using & (background process) also ignores TNRS errors (the primary purpose of & , of course, is to run asynchronously)
inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min)
inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min)
inputs/.geoscrub/Source/map.csv: source__modified_date: updated for current run
**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.
bugfix: inputs/input.Makefile: $(import): except in a full-database import, errexit so that the import will stop on an error and not let it scroll by
added inputs/CVS/^taxon_observation.**.sample/, used for the extract. note that the column list is slightly different than for VegBank.
inputs/CVS/taxonObservation_/map.csv: removed taxonObservation_-- prefix from terms that do not need to be table-specific (like for VegBank)
fix: inputs/CVS/taxonObservation_/map.csv: plantConcept_ columns: synced input and output column names to their names in plantConcept_
inputs/CVS/plantConcept_/map.csv: removed plantConcept_-- prefix from terms that do not need to be table-specific (like for VegBank)
bugfix: inputs/CVS/import_order.txt: added taxon_observation.**
inputs/CVS/: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead
inputs/CVS/: added taxon_observation.** left-join of the tables, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource. this involves renaming taxonOccurrenceID->taxonOccurrenceID__overall_plot so that it can then be joined together with aggregateOrganismObservationID to create the full taxonOccurrenceID (as in VegBank).
inputs/CVS/stemCount_/map.csv: remapped stratum_ID->*STRATUM_ID so it would match up with stratum.*STRATUM_ID