/trunk/inputs/VegBank - Changes - BIEN 3 - NCEAS Projects

root/trunk/inputs/VegBank @ 12786

svn:ignore: *

#	Date	Author	Comment
12786	03/20/2014 10:40 PM	Aaron Marcuse-Kubitza	inputs/VegBank/run: documented `datasrc_make sql/install` runtime (25 min)
12783	03/20/2014 08:13 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plantconcept_/create.sql: updated runtime (25 min, ~same)
12779	03/20/2014 07:58 PM	Aaron Marcuse-Kubitza	*{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
12776	03/20/2014 07:47 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plot/postprocess.sql: remove institutions that we have direct data for: CVS: updated runtime (same)
12758	03/18/2014 05:47 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/plot/postprocess.sql: use CVS.plot_ instead because that has the renamed staging table columns, and is compatible with auto-renaming of the SQL script columns
12753	03/18/2014 05:10 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
12018	02/02/2014 12:49 AM	Aaron Marcuse-Kubitza	inputs/input.Makefile: add!: verify/: also svn:ignore .tsv, .txt
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11961	01/15/2014 10:18 AM	Aaron Marcuse-Kubitza	inputs/VegBank/import_order.txt: added projectcontributor_
11960	01/15/2014 10:11 AM	Aaron Marcuse-Kubitza	inputs/VegBank/projectcontributor_/map.csv, postprocess.sql: added project_participant
11957	01/15/2014 09:41 AM	Aaron Marcuse-Kubitza	added inputs/VegBank/projectcontributor_/
11956	01/15/2014 09:29 AM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: projectcontributor.surname: prepend table name to avoid join collisions
11955	01/15/2014 09:23 AM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql, inputs/CVS/cvs.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined: put tables in alphabetical order for consistency
11934	12/20/2013 04:41 PM	Aaron Marcuse-Kubitza	inputs/VegBank/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
11933	12/20/2013 04:31 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**/postprocess.sql: added the project table
11932	12/20/2013 04:25 PM	Aaron Marcuse-Kubitza	mapped inputs/VegBank/project/, which includes the projectName for attribution
11904	12/11/2013 10:42 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/plot/: added _no_import because this table is left-joined and should not be imported separately
11801	12/03/2013 06:51 AM	Aaron Marcuse-Kubitza	inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled
11788	11/26/2013 11:11 PM	Aaron Marcuse-Kubitza	**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)
11705	11/21/2013 12:24 AM	Aaron Marcuse-Kubitza	copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.
11679	11/18/2013 04:27 AM	Aaron Marcuse-Kubitza	inputs/VegBank/stemlocation_/header.csv: updated from reinstalling stemlocation_
11604	11/09/2013 02:20 AM	Aaron Marcuse-Kubitza	inputs/VegBank/^taxon_observation.**.sample/test.xml.ref: updated inserted row count, now that CVS plots have been removed
11601	11/08/2013 10:28 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/: need to remove inter-datasource duplicates from plot instead of the left-joined plot_ table, because the fkeys needed to do the cascading deletes are all to the plot table. this requires doing the column-renaming and postprocessing on plot before it's left-joined.
11600	11/08/2013 09:57 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plot_/create.sql: updated runtime (5 s) for previous bugfix
11539	10/31/2013 07:51 AM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/import_order.txt: updated name of ^taxon_observation.**.sample table
11538	10/31/2013 07:16 AM	Aaron Marcuse-Kubitza	fix: inputs/VegBank/^taxon_observation.**.sample/create.sql: moved continent before country
11537	10/31/2013 06:54 AM	Aaron Marcuse-Kubitza	inputs/VegBank/^taxon_observation.**.sample/create.sql: added missing columns that were recently mapped to VegBIEN (identifiedBy)
11536	10/31/2013 06:52 AM	Aaron Marcuse-Kubitza	inputs/VegBank/^taxon_observation.**.sample/create.sql: synced column order to analytical_plot
11535	10/31/2013 06:49 AM	Aaron Marcuse-Kubitza	inputs/VegBank/^taxon_observation.**.sample/create.sql: synced column order to analytical_plot
11534	10/31/2013 06:47 AM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/map.csv, postprocess.sql: mapped identifiedBy (the join_words() of identifiedBy_first, etc.)
11524	10/31/2013 02:46 AM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/map.csv, postprocess.sql: mapped identifiedBy (the join_words() of identifiedBy_first, etc.)
11523	10/31/2013 02:34 AM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/create.sql: also join party_id to get the identifiedBy (not mapped yet). note that the inserted row count changes, because taxonobservation_ does not yet have a pkey to do a stable ordering with.
11521	10/31/2013 02:06 AM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: taxoninterpretation.party_id: don't rename to taxoninterpretation_party_id, so that this can be used directly in taxonobservation_/create.sql with a USING join
11520	10/31/2013 01:52 AM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/create.sql: join taxonobservation to taxoninterpretation (as in CVS) instead of vice versa, since taxonobservation is the primary, operative table. having VegBank and CVS do things the same way helps ensure that fixes in one can transfer easily to the other.
11518	10/31/2013 01:30 AM	Aaron Marcuse-Kubitza	inputs/VegBank/^taxon_observation..sample/create.sql: synced with taxon_observation.
11517	10/31/2013 01:22 AM	Aaron Marcuse-Kubitza	(for r11396) fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
11514	10/30/2013 11:03 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: mapped taxon_determination__is_current, taxon_determination__is_original
11513	10/30/2013 09:49 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: mapped taxon_determination__is_current, taxon_determination__is_original
11511	10/30/2013 09:07 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/map.csv: originalinterpretation, currentinterpretation: removed table name prefix so these would automap
11488	10/30/2013 04:23 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/plot_/postprocess.sql: coordinateUncertaintyInMeters__from_fuzzing: need to convert km to m in the fuzzing radii. updated derived cols runtimes.
11487	10/30/2013 04:05 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plot_/postprocess.sql: remove duplicated CVS plots (2323 of 7079 CVS plots are removed by this)
11439	10/25/2013 09:24 AM	Aaron Marcuse-Kubitza	fix: inputs/VegBank/taxonobservation_/map.csv: remapped authorplantname to OMIT because these are not specific to the taxoninterpretation row (this is in a separate taxoninterpretation for the original determination instead). see wiki.vegpath.org/Spot-checking#2013-10-10 > Mike Lee's conference call feedback.
11438	10/25/2013 09:22 AM	Aaron Marcuse-Kubitza	fix: inputs/VegBank/taxonobservation_/map.csv: remapped int_* to OMIT because these are not specific to the taxoninterpretation row (this is in a separate taxoninterpretation for the original determination instead). see wiki.vegpath.org/Spot-checking#2013-10-10 > Mike Lee's conference call feedback.
11396	10/21/2013 07:14 PM	Aaron Marcuse-Kubitza	fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
11265	10/13/2013 12:10 AM	Aaron Marcuse-Kubitza	inputs/VegBank/plot_/create.sql: documented runtime (5 min)
11261	10/12/2013 04:20 PM	Aaron Marcuse-Kubitza	inputs/VegBank/verify/input_cols.txt, inputs/VegBank/+taxon_observation..sample/create.sql: updated to match taxon_observation. columns
11260	10/12/2013 04:16 PM	Aaron Marcuse-Kubitza	inputs/VegBank/verify/input_cols.txt, inputs/VegBank/+taxon_observation..sample/create.sql: updated to match taxon_observation. columns
11257	10/12/2013 03:05 PM	Aaron Marcuse-Kubitza	inputs/VegBank/verify/input_cols.txt, inputs/VegBank/+taxon_observation..sample/create.sql: updated to match taxon_observation. columns
11256	10/12/2013 03:03 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/+taxon_observation..sample/: renamed to ^taxon_observation..sample because a leading + has a special meaning to bash (it indicates a shell option, and you will get an error "invalid option name"), as well as to make (it indicates that a recipe command invokes make recursively)
11255	10/12/2013 02:14 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/taxon_observation./header.csv: updated for observation_/map.csv bugfix, which added new hasobservationsynonym field. this fixes a strange test bug caused by the taxon_observation./map.csv column list being mismatched/misaligned with what was in the underlying tables. (column mismatches will often cause unexplainable errors in unrelated sections of code the same way that buffer overflows do in C++.)
11254	10/12/2013 02:01 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/taxon_observation..sample/: renamed to +taxon_observation..sample so that the -expansion of taxon_observation.* doesn't add taxon_observation..sample (which causes it to attempt to install taxon_observation..sample before taxon_observation.** is installed)
11249	10/10/2013 06:50 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/observation_/header.csv, map.csv: updated for refresh, which inserts hasobservationsynonym at the end of the observation table
11248	10/10/2013 05:46 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**.sample/create.sql: reordered columns in the same order as analytical_plot, for easier validation
11244	10/10/2013 02:40 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**.sample/create.sql: include only the subset of columns that is imported to VegBIEN
11243	10/10/2013 02:32 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**.sample/test.xml.ref: updated inserted row count (which was most likely generated before the output column names had been set to the input column names)
11242	10/10/2013 01:55 PM	Aaron Marcuse-Kubitza	added inputs/VegBank/verify/input_cols.include.txt, with runscript to generate it
11241	10/10/2013 01:26 PM	Aaron Marcuse-Kubitza	inputs/VegBank/verify/input_cols.unmapped.txt: renamed to input_cols.exclude.txt because this now includes mapped columns as well
11240	10/10/2013 01:18 PM	Aaron Marcuse-Kubitza	inputs/VegBank/verify/input_cols.unmapped.txt: renamed to input_cols.exclude.txt because this now includes mapped columns as well
11239	10/10/2013 01:11 PM	Aaron Marcuse-Kubitza	inputs/VegBank/verify/input_cols.unmapped.txt.run: remove unmapped join columns, since these would be included in the extract
11238	10/10/2013 01:09 PM	Aaron Marcuse-Kubitza	inputs/VegBank/verify/input_cols.unmapped.txt.run: take input directly from input_cols.txt to avoid needing to first copy and paste it into input_cols.unmapped.txt
11237	10/10/2013 01:03 PM	Aaron Marcuse-Kubitza	inputs/VegBank/verify/input_cols.unmapped.txt.run: added back deliberately excluded columns (DUPLICATE#of:..., etc.) so that the # of rows in the file can be subtracted from the total # of columns to get the # of input columns that would be included in the extract
11235	10/10/2013 12:23 PM	Aaron Marcuse-Kubitza	added inputs/VegBank/verify/input_cols.txt, input_cols.unmapped.txt (with runscript to filter input_cols.unmapped.txt)
11233	10/10/2013 08:18 AM	Aaron Marcuse-Kubitza	inputs/VegBank/stratum/postprocess.sql: added pkey
11232	10/10/2013 08:05 AM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/postprocess.sql: added __parent index on locationID to facilitate the LEFT JOINs used to create the validation input
11231	10/10/2013 07:54 AM	Aaron Marcuse-Kubitza	inputs/VegBank/observation_/postprocess.sql: added __parent index on locationID to facilitate the LEFT JOINs used to create the validation input
11230	10/10/2013 07:45 AM	Aaron Marcuse-Kubitza	inputs/VegBank/import_order.txt: added taxon_observation.**.sample so it will automatically be kept up to date
11229	10/10/2013 07:32 AM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**.sample/create.sql: set runtime (1 s)
11228	10/10/2013 07:30 AM	Aaron Marcuse-Kubitza	inputs/VegBank/: added taxon_observation.**.sample subset of plots to use in the validation. this avoids the need to import all of VegBank just to validate a few of the plots.
11225	10/09/2013 06:31 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**/: updated for data refresh
11224	10/09/2013 06:25 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plantconcept_/: mapped columns, since this is now included in import_order.txt and therefore gets processed by the column-renaming runscripts. note that this means that in taxonobservation_/map.csv, the plantconcept_ input column names need to be changed to what they are mapped to.
11223	10/09/2013 06:16 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/create.sql: updated runtime (20 s)
11178	10/09/2013 08:54 AM	Aaron Marcuse-Kubitza	inputs/VegBank/plantconcept_/create.sql: documented runtime (21 min)
11177	10/09/2013 08:28 AM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/plantconcept_/: added new-style import files
11176	10/09/2013 08:27 AM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/import_order.txt: added plantconcept_, because new-style import needs it to be explicitly listed in import_order.txt in order to run it
11175	10/09/2013 08:24 AM	Aaron Marcuse-Kubitza	inputs/VegBank/run: refresh(): added usage
11174	10/09/2013 08:22 AM	Aaron Marcuse-Kubitza	inputs/VegBank/run: refresh(): documented that this should be run on vegbiendev
11173	10/09/2013 07:58 AM	Aaron Marcuse-Kubitza	inputs/VegBank/_archive/2012-8-30/: svn:ignore the data exports
11172	10/09/2013 07:57 AM	Aaron Marcuse-Kubitza	inputs/VegBank/run: added refresh() target
11171	10/09/2013 07:54 AM	Aaron Marcuse-Kubitza	inputs/VegBank/: refreshed VegBank so that all of Mike Lee's sample plots would be included in the input data. (VegBank was last refreshed from the live DB on 2012-8-30.) split vegbank.sql into vegbank.schema.sql and vegbank.data.sql so that the schema can be examined and imported separately, like for MySQL datasources. inputs/VegBank/vegbank.~.clean_up.sql: commented out setting comminterpretation.commname to NOT NULL, because after the refresh it is now NULL in 10 rows, where commconcept_id is also NULL.
11160	10/02/2013 05:56 AM	Aaron Marcuse-Kubitza	inputs/VegBank/: added observation__community/
11159	10/02/2013 05:49 AM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: commclass.commcode,commname: rename to prevent collisions
11158	10/02/2013 04:26 AM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: indicate required column comminterpretation.commname
11157	10/02/2013 04:20 AM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: commconcept.commname: rename to prevent collision with commname.commname
11107	09/29/2013 08:58 PM	Aaron Marcuse-Kubitza	bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.
11106	09/29/2013 08:37 PM	Aaron Marcuse-Kubitza	inputs/VegBank/observationcontributor_/test.xml.ref: updated inserted row count
11105	09/28/2013 10:40 PM	Aaron Marcuse-Kubitza	bugfix: mappings/VegCore-VegBIEN.csv: stratum's locationevent: link this to the parent locationevent, so that the parent locationevent's information (such as locationeventcontributors) is accessible to the stratum's locationevent
11104	09/28/2013 09:08 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/taxon_observation./postprocess.sql: inlined _join() so that taxon_observation. wouldn't get cascadingly deleted whenever the util schema (where this normally resides) gets reinstalled
11098	09/28/2013 06:53 AM	Aaron Marcuse-Kubitza	added inputs/VegBank/observationcontributor_/
11095	09/28/2013 05:04 AM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/: taxonOccurrenceID: include the aggregateOrganismObservationID in this so that there is one taxonoccurrence for each stratum's taxonImportance. this allows the different strata to have separate taxonoccurrences that are associated with the stratum-specific locationevents, rather than all being lumped into one taxonoccurrence, with only one locationevent.
11082	09/24/2013 02:14 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: mapped stratum__name
11071	09/22/2013 08:25 PM	Aaron Marcuse-Kubitza	mappings/VegCore.htm: regenerated from wiki. added Stratum table.
11024	09/19/2013 06:49 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/import_order.txt: added stratum
11023	09/19/2013 06:48 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**/postprocess.sql: added stratum, stratumtype to the left-join
11022	09/19/2013 06:46 PM	Aaron Marcuse-Kubitza	inputs/VegBank/stemcount_/map.csv: stratum_id: removed table prefix so it can be used as a join column
11021	09/19/2013 06:45 PM	Aaron Marcuse-Kubitza	inputs/VegBank/: mapped stratum
11020	09/19/2013 06:39 PM	Aaron Marcuse-Kubitza	inputs/VegBank/: mapped stratumtype
11016	09/19/2013 11:31 AM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/map.csv: taxonomic ranks not in VegCore: removed table prefix so they will be automapped (they are globally unique)
11013	09/19/2013 02:55 AM	Aaron Marcuse-Kubitza	inputs//: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead
11012	09/19/2013 01:22 AM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**/postprocess.sql: run mk_subset_by_row_num_func() to add a subset function that uses sort_col. this is used by column-based import, and also provides a common subsetting/sorting API for all the left-joined views. test.xml.ref: the inserted row count most likely changes because the sort order changes.
11009	09/18/2013 11:53 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/stemlocation_/map.csv: remapped stemcount-related fields to OMIT, so that these don't collide with fields of the same name in stemcount_ when they are left-joined together in taxon_observation.** . having the same name causes these to be incorrectly interpreted as shared fkey columns in the NATURAL JOIN (and without the NATURAL JOIN, they would instead be collision errors).

Project

General

Profile