inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
inputs/VegBank/import_order.txt: added projectcontributor_
inputs/VegBank/projectcontributor_/map.csv, postprocess.sql: added project_participant
added inputs/VegBank/projectcontributor_/
inputs/VegBank/vegbank.~.clean_up.sql: projectcontributor.surname: prepend table name to avoid join collisions
inputs/VegBank/vegbank.~.clean_up.sql, inputs/CVS/cvs.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined: put tables in alphabetical order for consistency
inputs/VegBank/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
inputs/VegBank/taxon_observation.**/postprocess.sql: added the project table
mapped inputs/VegBank/project/, which includes the projectName for attribution
bugfix: inputs/VegBank/plot/: added _no_import because this table is left-joined and should not be imported separately
inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled
**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)
copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.
inputs/VegBank/stemlocation_/header.csv: updated from reinstalling stemlocation_
inputs/VegBank/^taxon_observation.**.sample/test.xml.ref: updated inserted row count, now that CVS plots have been removed
bugfix: inputs/VegBank/: need to remove inter-datasource duplicates from plot instead of the left-joined plot_ table, because the fkeys needed to do the cascading deletes are all to the plot table. this requires doing the column-renaming and postprocessing on plot before it's left-joined.
inputs/VegBank/plot_/create.sql: updated runtime (5 s) for previous bugfix
bugfix: inputs/VegBank/import_order.txt: updated name of ^taxon_observation.**.sample table
fix: inputs/VegBank/^taxon_observation.**.sample/create.sql: moved continent before country
inputs/VegBank/^taxon_observation.**.sample/create.sql: added missing columns that were recently mapped to VegBIEN (identifiedBy)
inputs/VegBank/^taxon_observation.**.sample/create.sql: synced column order to analytical_plot
inputs/VegBank/taxonobservation_/map.csv, postprocess.sql: mapped identifiedBy (the join_words() of identifiedBy_first, etc.)
inputs/VegBank/taxonobservation_/create.sql: also join party_id to get the identifiedBy (not mapped yet). note that the inserted row count changes, because taxonobservation_ does not yet have a pkey to do a stable ordering with.
inputs/VegBank/vegbank.~.clean_up.sql: taxoninterpretation.party_id: don't rename to taxoninterpretation_party_id, so that this can be used directly in taxonobservation_/create.sql with a USING join
inputs/VegBank/taxonobservation_/create.sql: join taxonobservation to taxoninterpretation (as in CVS) instead of vice versa, since taxonobservation is the primary, operative table. having VegBank and CVS do things the same way helps ensure that fixes in one can transfer easily to the other.
inputs/VegBank/^taxon_observation.**.sample/create.sql: synced with taxon_observation.**
(for r11396) fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
mappings/VegCore-VegBIEN.csv: mapped taxon_determination__is_current, taxon_determination__is_original
inputs/VegBank/taxonobservation_/map.csv: originalinterpretation, currentinterpretation: removed table name prefix so these would automap
bugfix: inputs/VegBank/plot_/postprocess.sql: coordinateUncertaintyInMeters__from_fuzzing: need to convert km to m in the fuzzing radii. updated derived cols runtimes.
inputs/VegBank/plot_/postprocess.sql: remove duplicated CVS plots (2323 of 7079 CVS plots are removed by this)
fix: inputs/VegBank/taxonobservation_/map.csv: remapped authorplantname to OMIT because these are not specific to the taxoninterpretation row (this is in a separate taxoninterpretation for the original determination instead). see wiki.vegpath.org/Spot-checking#2013-10-10 > Mike Lee's conference call feedback.
fix: inputs/VegBank/taxonobservation_/map.csv: remapped int_* to OMIT because these are not specific to the taxoninterpretation row (this is in a separate taxoninterpretation for the original determination instead). see wiki.vegpath.org/Spot-checking#2013-10-10 > Mike Lee's conference call feedback.
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
inputs/VegBank/plot_/create.sql: documented runtime (5 min)
inputs/VegBank/verify/input_cols.txt, inputs/VegBank/+taxon_observation.**.sample/create.sql: updated to match taxon_observation.** columns
bugfix: inputs/VegBank/+taxon_observation.**.sample/: renamed to ^taxon_observation.**.sample because a leading + has a special meaning to bash (it indicates a shell option, and you will get an error "invalid option name"), as well as to make (it indicates that a recipe command invokes make recursively)
bugfix: inputs/VegBank/taxon_observation.**/header.csv: updated for observation_/map.csv bugfix, which added new hasobservationsynonym field. this fixes a strange test bug caused by the taxon_observation.**/map.csv column list being mismatched/misaligned with what was in the underlying tables. (column mismatches will often cause unexplainable errors in unrelated sections of code the same way that buffer overflows do in C++.)
bugfix: inputs/VegBank/taxon_observation.**.sample/: renamed to +taxon_observation.**.sample so that the -expansion of taxon_observation.* doesn't add taxon_observation.**.sample (which causes it to attempt to install taxon_observation.**.sample before taxon_observation.** is installed)
bugfix: inputs/VegBank/observation_/header.csv, map.csv: updated for refresh, which inserts hasobservationsynonym at the end of the observation table
inputs/VegBank/taxon_observation.**.sample/create.sql: reordered columns in the same order as analytical_plot, for easier validation
inputs/VegBank/taxon_observation.**.sample/create.sql: include only the subset of columns that is imported to VegBIEN
inputs/VegBank/taxon_observation.**.sample/test.xml.ref: updated inserted row count (which was most likely generated before the output column names had been set to the input column names)
added inputs/VegBank/verify/input_cols.include.txt, with runscript to generate it
inputs/VegBank/verify/input_cols.unmapped.txt*: renamed to input_cols.exclude.txt* because this now includes mapped columns as well
inputs/VegBank/verify/input_cols.unmapped.txt.run: remove unmapped join columns, since these would be included in the extract
inputs/VegBank/verify/input_cols.unmapped.txt.run: take input directly from input_cols.txt to avoid needing to first copy and paste it into input_cols.unmapped.txt
inputs/VegBank/verify/input_cols.unmapped.txt.run: added back deliberately excluded columns (DUPLICATE#of:..., etc.) so that the # of rows in the file can be subtracted from the total # of columns to get the # of input columns that would be included in the extract
added inputs/VegBank/verify/input_cols.txt, input_cols.unmapped.txt (with runscript to filter input_cols.unmapped.txt)
inputs/VegBank/stratum/postprocess.sql: added pkey
inputs/VegBank/taxonobservation_/postprocess.sql: added __parent index on locationID to facilitate the LEFT JOINs used to create the validation input
inputs/VegBank/observation_/postprocess.sql: added __parent index on locationID to facilitate the LEFT JOINs used to create the validation input
inputs/VegBank/import_order.txt: added taxon_observation.**.sample so it will automatically be kept up to date
inputs/VegBank/taxon_observation.**.sample/create.sql: set runtime (1 s)
inputs/VegBank/: added taxon_observation.**.sample subset of plots to use in the validation. this avoids the need to import all of VegBank just to validate a few of the plots.
inputs/VegBank/taxon_observation.**/: updated for data refresh
inputs/VegBank/plantconcept_/: mapped columns, since this is now included in import_order.txt and therefore gets processed by the column-renaming runscripts. note that this means that in taxonobservation_/map.csv, the plantconcept_ input column names need to be changed to what they are mapped to.
inputs/VegBank/taxonobservation_/create.sql: updated runtime (20 s)
inputs/VegBank/plantconcept_/create.sql: documented runtime (21 min)
bugfix: inputs/VegBank/plantconcept_/: added new-style import files
bugfix: inputs/VegBank/import_order.txt: added plantconcept_, because new-style import needs it to be explicitly listed in import_order.txt in order to run it
inputs/VegBank/run: refresh(): added usage
inputs/VegBank/run: refresh(): documented that this should be run on vegbiendev
inputs/VegBank/_archive/2012-8-30/: svn:ignore the data exports
inputs/VegBank/run: added refresh() target
inputs/VegBank/: refreshed VegBank so that all of Mike Lee's sample plots would be included in the input data. (VegBank was last refreshed from the live DB on 2012-8-30.) split vegbank.sql into vegbank.schema.sql and vegbank.data.sql so that the schema can be examined and imported separately, like for MySQL datasources. inputs/VegBank/vegbank.~.clean_up.sql: commented out setting comminterpretation.commname to NOT NULL, because after the refresh it is now NULL in 10 rows, where commconcept_id is also NULL.
inputs/VegBank/: added observation__community/
inputs/VegBank/vegbank.~.clean_up.sql: commclass.commcode,commname: rename to prevent collisions
inputs/VegBank/vegbank.~.clean_up.sql: indicate required column comminterpretation.commname
inputs/VegBank/vegbank.~.clean_up.sql: commconcept.commname: rename to prevent collision with commname.commname
bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.
inputs/VegBank/observationcontributor_/test.xml.ref: updated inserted row count
bugfix: mappings/VegCore-VegBIEN.csv: stratum's locationevent: link this to the parent locationevent, so that the parent locationevent's information (such as locationeventcontributors) is accessible to the stratum's locationevent
bugfix: inputs/VegBank/taxon_observation.**/postprocess.sql: inlined _join() so that taxon_observation.** wouldn't get cascadingly deleted whenever the util schema (where this normally resides) gets reinstalled
added inputs/VegBank/observationcontributor_/
bugfix: inputs/VegBank/: taxonOccurrenceID: include the aggregateOrganismObservationID in this so that there is one taxonoccurrence for each stratum's taxonImportance. this allows the different strata to have separate taxonoccurrences that are associated with the stratum-specific locationevents, rather than all being lumped into one taxonoccurrence, with only one locationevent.
mappings/VegCore-VegBIEN.csv: mapped stratum__name
mappings/VegCore.htm: regenerated from wiki. added Stratum table.
bugfix: inputs/VegBank/import_order.txt: added stratum
inputs/VegBank/taxon_observation.**/postprocess.sql: added stratum, stratumtype to the left-join
inputs/VegBank/stemcount_/map.csv: stratum_id: removed table prefix so it can be used as a join column
inputs/VegBank/: mapped stratum
inputs/VegBank/: mapped stratumtype
inputs/VegBank/taxonobservation_/map.csv: taxonomic ranks not in VegCore: removed table prefix so they will be automapped (they are globally unique)
inputs//: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead
inputs/VegBank/taxon_observation.**/postprocess.sql: run mk_subset_by_row_num_func() to add a subset function that uses sort_col. this is used by column-based import, and also provides a common subsetting/sorting API for all the left-joined views. test.xml.ref: the inserted row count most likely changes because the sort order changes.
bugfix: inputs/VegBank/stemlocation_/map.csv: remapped stemcount-related fields to OMIT, so that these don't collide with fields of the same name in stemcount_ when they are left-joined together in taxon_observation.** . having the same name causes these to be incorrectly interpreted as shared fkey columns in the NATURAL JOIN (and without the NATURAL JOIN, they would instead be collision errors).
bugfix: inputs/VegBank/stemlocation_/postprocess.sql: added missing index on aggregateOrganismObservationID, needed for the 1:many portion of the taxon_observation.** left-join
inputs/VegBank/stemcount_/postprocess.sql: moved stemcount___parent index before the derived columns section because it does not depend on them
bugfix: inputs/VegBank/stemcount_/postprocess.sql: added missing index on taxonOccurrenceID, needed for the 1:many portion of the taxon_observation.** left-join
bugfix: inputs/VegBank/taxon_observation.**/postprocess.sql: added sort_col (=identificationID) at beginning because column-based import will always sort a view by the first column, which may lead to slow query plans if the first column is not a joined table's pkey
inputs/VegBank/taxon_observation.**/postprocess.sql: documented that there is no row_num because left-join to stemcount_, stemlocation_ adds rows to each taxonobservation_