/trunk/inputs - Changes - BIEN 3 - NCEAS Projects

root/trunk/inputs @ 11992

svn:ignore: .~*

#	Date	Author	Comment
11992	01/22/2014 01:06 PM	Aaron Marcuse-Kubitza	bugfix: inputs/SALVIAS/import_order.txt: added party_code_party_
11991	01/22/2014 12:50 PM	Aaron Marcuse-Kubitza	bugfix: inputs/SALVIAS/party_code_party_/create.sql: need to remove duplicate entries in party_code_party
11988	01/22/2014 11:10 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS/party_code_party_/map.csv: mapped fullname->event_participant_name for use by other tables
11987	01/22/2014 10:34 AM	Aaron Marcuse-Kubitza	mapped inputs/SALVIAS/party_code_party_/
11983	01/20/2014 10:12 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/_MySQL/salvias_plots.*.sql: refreshed. this adds the party and party_code_party tables Brad provided for mapping the plot contributors.
11982	01/20/2014 10:10 PM	Aaron Marcuse-Kubitza	fix: inputs/SALVIAS/salvias_plots.~.clean_up.sql: Delete rows that do not satisfy foreign key constraints: also need to do this for plotObservations, since the refreshed data contains dangling rows for that as well
11981	01/20/2014 10:08 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/run_: documented *.sql install runtime (3 min), as separate from the full `datasrc_make reinstall` runtime (3.5 min)
11980	01/20/2014 10:07 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/run_: refresh(): `datasrc_make reinstall`: updated runtime. documented that runtimes are from starscream.
11979	01/20/2014 08:09 PM	Aaron Marcuse-Kubitza	added inputs/SALVIAS/run_, which includes a refresh() target
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11965	01/16/2014 01:22 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: scrubbed_family: Name_matched_accepted_family was missing from the TNRS results at one point, so we are now using Family_matched as a workaround to populate this. the workaround is for accepted names only, as no opinion names do not have an Accepted_name_family to prepend to the scrubbed name to parse.
11964	01/16/2014 01:19 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: reexported from live DB, which changes the element order
11961	01/15/2014 10:18 AM	Aaron Marcuse-Kubitza	inputs/VegBank/import_order.txt: added projectcontributor_
11960	01/15/2014 10:11 AM	Aaron Marcuse-Kubitza	inputs/VegBank/projectcontributor_/map.csv, postprocess.sql: added project_participant
11957	01/15/2014 09:41 AM	Aaron Marcuse-Kubitza	added inputs/VegBank/projectcontributor_/
11956	01/15/2014 09:29 AM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: projectcontributor.surname: prepend table name to avoid join collisions
11955	01/15/2014 09:23 AM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql, inputs/CVS/cvs.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined: put tables in alphabetical order for consistency
11943	01/14/2014 08:34 PM	Aaron Marcuse-Kubitza	inputs/publishable datasources.xlsx: updated
11942	01/14/2014 08:31 PM	Aaron Marcuse-Kubitza	inputs/datasource_release_status.xlsx: renamed to `publishable datasources.xlsx` to match the spreadsheet title
11934	12/20/2013 04:41 PM	Aaron Marcuse-Kubitza	inputs/VegBank/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
11933	12/20/2013 04:31 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**/postprocess.sql: added the project table
11932	12/20/2013 04:25 PM	Aaron Marcuse-Kubitza	mapped inputs/VegBank/project/, which includes the projectName for attribution
11931	12/20/2013 02:56 PM	Aaron Marcuse-Kubitza	inputs/CVS/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
11930	12/20/2013 02:44 PM	Aaron Marcuse-Kubitza	inputs/CVS/taxon_observation.**/postprocess.sql: added the project table
11929	12/20/2013 02:42 PM	Aaron Marcuse-Kubitza	inputs/CVS/project/map.csv: mapped stopDate->projectEndDate
11928	12/20/2013 02:35 PM	Aaron Marcuse-Kubitza	mapped inputs/CVS/project/, which includes the projectName for attribution
11927	12/20/2013 01:25 AM	Aaron Marcuse-Kubitza	inputs/VegBIEN/Redmine/svn/.htaccess: updated to use much faster direct repository URL rather than Redmine web interface, now that the repository itself is publicly accessible in addition to the Redmine view of it
11924	12/20/2013 12:28 AM	Aaron Marcuse-Kubitza	fix: inputs/TEX/Specimen*/map.csv, postprocess.sql: habitat: also placed in occurrenceRemarks so that this field gets parsed for growth form information, as requested by Brad (wiki.vegpath.org/TEX_validation#2013-2-26)
11923	12/19/2013 11:49 PM	Aaron Marcuse-Kubitza	fix: inputs/TEX/Specimen*/map.csv: mapped constant values for specimenHolderInstitutions, country. these have to be added with `rm=1 ./inputs/TEX/Specimen.../run postprocess`.
11922	12/19/2013 11:42 PM	Aaron Marcuse-Kubitza	bugfix: inputs/TEX/Specimen2/map.csv: mapped BARCODE to accessionNumber so that we have a unique ID for each row
11920	12/17/2013 08:06 AM	Aaron Marcuse-Kubitza	inputs/datasource_release_status.xlsx: updated
11917	12/16/2013 07:05 PM	Aaron Marcuse-Kubitza	inputs/CVS/^taxon_observation.**.sample/create.sql: added Mike Lee's additional plots used to validate confidentiality-related fields (wiki.vegpath.org/CVS_validation#plots-to-include)
11916	12/16/2013 06:00 PM	Aaron Marcuse-Kubitza	bugfix: inputs/CVS/^taxon_observation.**.sample/create.sql: include taxonName in the subset of columns that's imported for the validation, because it is _alt-ed with scientificName for forming the TNRS input name. this is unique to CVS, which is why it was not part of the validation subset copied from the VegBank subset.
11912	12/16/2013 01:43 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: granted bien_read SELECT access to derived views as well as the core tnrs table
11911	12/15/2013 05:30 PM	Aaron Marcuse-Kubitza	updated inputs/datasource_release_status.xlsx
11910	12/15/2013 05:27 PM	Aaron Marcuse-Kubitza	added inputs/datasource_release_status.xlsx, export of Google spreadsheet at https://docs.google.com/spreadsheet/ccc?key=0ArZXrTAXd-TYdDRRb2RxYi11TWZrQVh5bVdKOURCeFE
11905	12/11/2013 10:54 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/^taxon_observation..sample/: added _no_import because this table duplicates part of what's imported from taxon_observation.
11904	12/11/2013 10:42 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/plot/: added _no_import because this table is left-joined and should not be imported separately
11903	12/11/2013 10:40 PM	Aaron Marcuse-Kubitza	bugfix: inputs/{.NCBI,CTFS}/*.src/: added _no_import because these tables are left-joined and should not be imported separately
11902	12/11/2013 09:56 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: removed table names from datasources where only one table is imported
11901	12/11/2013 09:52 PM	Aaron Marcuse-Kubitza	fix: inputs/import.stats.xls: removed deleted tables from current import
11900	12/11/2013 09:51 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: updated import times
11888	12/10/2013 06:35 AM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/map.csv: row_num: remapped to plain *row_num, like the other datasources that have this field
11887	12/10/2013 06:31 AM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: rerun time: noted that this is only fast after manual vacuuming of the table (to remove the deleted rows from the index). autovacuum apparently does not run, although it should.
11881	12/09/2013 07:24 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: reran test, which added yearCollected/monthCollected/dayCollected
11880	12/09/2013 07:23 PM	Aaron Marcuse-Kubitza	inputs/CVS/plantConcept_/create.sql: documented runtime (3 min)
11879	12/09/2013 06:59 PM	Aaron Marcuse-Kubitza	inputs/CTFS/*.src/: added test.xml.ref
11878	12/09/2013 06:58 PM	Aaron Marcuse-Kubitza	inputs/CTFS/*.src/: added VegBIEN.csv
11877	12/09/2013 06:56 PM	Aaron Marcuse-Kubitza	bugfix: inputs/CTFS/TaxonOccurrence*/map.csv: things mapped to taxonObservationID: remapped to taxonOccurrenceID since taxonObservationID is not mapped to anything in VegBIEN (denormalized VegCore doesn't distinguish between taxon occurrences and taxon observations of them)
11876	12/09/2013 05:46 PM	Aaron Marcuse-Kubitza	bugfix: inputs/ARIZ/~.clean_up.sql: prevent "column already exists" errors when there is an input column of the same name as an output column
11873	12/09/2013 04:16 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/import_order.txt: added county_centroids so that it would be installed by new-style import
11871	12/09/2013 03:37 PM	Aaron Marcuse-Kubitza	inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h)
11869	12/09/2013 02:43 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/run: updated import() runtime (same), documented table cleanup runtime (1.5 h)
11868	12/09/2013 02:38 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: CREATE INDEX ... specimenHolderInstitutions: documented runtime (45 min)
11867	12/09/2013 02:28 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented runtime (3.5 min)
11865	12/06/2013 07:46 AM	Aaron Marcuse-Kubitza	bugfix: inputs/CTFS/import_order.txt: added .src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in /create.sql. note that VegCore taxonOccurrenceID has been renamed to taxonObservationID since this was last run.
11864	12/06/2013 06:56 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/run: documented import() runtime (20 min)
11863	12/06/2013 06:12 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.NCBI/import_order.txt: added nodes.src, names.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in nodes/create.sql.
11849	12/06/2013 02:44 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)
11847	12/06/2013 12:51 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)
11843	12/05/2013 11:38 PM	Aaron Marcuse-Kubitza	inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy
11842	12/05/2013 12:27 PM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible
11841	12/05/2013 12:19 PM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible
11820	12/04/2013 04:57 PM	Aaron Marcuse-Kubitza	inputs/CVS/^taxon_observation.**.sample/create.sql: uncommented identifiedBy since this is now part of taxonObservation_
11819	12/04/2013 04:08 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/observation_community/create.sql: communityName: populate from commConcept.commName instead, because commInterpretation.commname is not always populated. this requires left-joining to commConcept.
11818	12/04/2013 03:58 PM	Aaron Marcuse-Kubitza	inputs/CVS/observation_community/map.csv: updated output column names to new input column names, to avoid later output column collisions
11817	12/04/2013 03:42 PM	Aaron Marcuse-Kubitza	inputs/CVS/observation_community/header.csv, map.csv: updated input column names for cvs.~.clean_up.sql renamings
11815	12/04/2013 04:19 AM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: commClass, commConcept fields: prepend table name to avoid inter-table collisions upon join
11814	12/04/2013 03:43 AM	Aaron Marcuse-Kubitza	added inputs/CVS/observation_community/, as for VegBank
11813	12/04/2013 03:32 AM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: commClass.dba_src_ID: prepend table name to avoid inter-table collisions upon join
11812	12/03/2013 04:32 PM	Aaron Marcuse-Kubitza	added inputs/CVS/observationContributor_/, which adds the people collecting the plot
11811	12/03/2013 04:02 PM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: observationContributor.dba_src_ID: prepended table name to avoid collision when left-joining to party
11810	12/03/2013 03:44 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: %/header.csv: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`
11809	12/03/2013 02:57 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/taxonObservation_/create.sql: mapped identifiedBy, which involves joining to party
11808	12/03/2013 02:35 PM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: don't rename taxonInterpretation.PARTY_ID, so that this can be USING-joined to party in inputs/CVS/taxonObservation_/create.sql
11805	12/03/2013 08:25 AM	Aaron Marcuse-Kubitza	inputs/CVS/^taxon_observation.*.sample/map.csv: synced output columns to input columns (which removes the extra s)
11804	12/03/2013 08:00 AM	Aaron Marcuse-Kubitza	fix: inputs/CVS/plot_/postprocess.sql: locality: include the site name (authorLocation), because this is part of the unique specification of the place that was sampled, and Bob wants this to be included in VegBIEN
11803	12/03/2013 07:58 AM	Aaron Marcuse-Kubitza	inputs/CVS/^taxon_observation.**.sample/create.sql: removed parentLocationID, since this is unused in CVS
11802	12/03/2013 07:45 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: `%/install: %/create.sql`: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`
11801	12/03/2013 06:51 AM	Aaron Marcuse-Kubitza	inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled
11799	12/03/2013 05:19 AM	Aaron Marcuse-Kubitza	fix: inputs/CVS/taxon_observation.**/map.csv: omit authorPlantName because it is not specific to the taxonInterpretation row (this is in a separate taxonInterpretation for the original determination instead)
11796	12/02/2013 02:46 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/plot_/map.csv: PARENT_ID: remapped to UNUSED, to clarify that subplots are not implemented through this field
11794	11/27/2013 11:04 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: scrub: clarified that using & (background process) also ignores TNRS errors (the primary purpose of & , of course, is to run asynchronously)
11792	11/27/2013 09:24 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min)
11790	11/27/2013 08:33 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min)
11789	11/26/2013 11:18 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/Source/map.csv: source__modified_date: updated for current run
11788	11/26/2013 11:11 PM	Aaron Marcuse-Kubitza	**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)
11786	11/26/2013 11:07 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s)
11785	11/26/2013 11:00 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s)
11782	11/26/2013 09:57 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h)
11781	11/26/2013 09:45 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.
11777	11/26/2013 02:23 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: $(import): except in a full-database import, errexit so that the import will stop on an error and not let it scroll by
11776	11/26/2013 01:55 PM	Aaron Marcuse-Kubitza	added inputs/CVS/^taxon_observation.**.sample/, used for the extract. note that the column list is slightly different than for VegBank.
11775	11/26/2013 01:42 PM	Aaron Marcuse-Kubitza	inputs/CVS/taxonObservation_/map.csv: removed taxonObservation_-- prefix from terms that do not need to be table-specific (like for VegBank)
11774	11/26/2013 01:32 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/taxonObservation_/map.csv: plantConcept_ columns: synced input and output column names to their names in plantConcept_
11773	11/26/2013 01:30 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/taxonObservation_/map.csv: plantConcept_ columns: synced input and output column names to their names in plantConcept_
11772	11/26/2013 01:26 PM	Aaron Marcuse-Kubitza	inputs/CVS/plantConcept_/map.csv: removed plantConcept_-- prefix from terms that do not need to be table-specific (like for VegBank)
11761	11/26/2013 05:56 AM	Aaron Marcuse-Kubitza	bugfix: inputs/CVS/import_order.txt: added taxon_observation.**
11760	11/26/2013 05:54 AM	Aaron Marcuse-Kubitza	inputs/CVS/: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead
11759	11/26/2013 05:53 AM	Aaron Marcuse-Kubitza	inputs/CVS/: added taxon_observation.** left-join of the tables, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource. this involves renaming taxonOccurrenceID->taxonOccurrenceID__overall_plot so that it can then be joined together with aggregateOrganismObservationID to create the full taxonOccurrenceID (as in VegBank).

Project

General

Profile