/ - Changes - BIEN 3 - NCEAS Projects

root @ 6002

#	Date	Author	Comment
6002	11/05/2012 08:48 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: subplot locationevent: Only populate parent locationevent's location unique IDs if a subplot #/subplotID is actually specified. (The lack of a location unique ID will cause the parent locationevent's location to be removed, as well as the parent locationevent itself if there is no parent locationevent unique ID.) This fixes a bug where top-level plots in datasources that provide a nullable subplot #/subplotID were incorrectly getting connected to parent locationevents.
6001	11/05/2012 08:34 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql: Added # locations, # location events, which also include subplots
6000	11/05/2012 08:17 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref.sql: Standardized line wrapping to put ; on its own line in multi-line queries and at the end of the line in one-line queries
5999	11/05/2012 08:13 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql: Added "top-level" to # locations, # location events names because these queries do not include all locations
5998	11/05/2012 07:52 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref.sql: # locations: Fixed count to include all plotMetadata entries instead of just unique coordinates, because each plotMetadata entry now gets its own location
5997	11/05/2012 07:34 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: matched place: When any of the accepted* names is specified, set matched_place_id=0 (self-reference) to indicate that the matched place is an accepted (scrubbed) place
5996	11/05/2012 05:30 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Use the canonical (scrubbed) place instead of the datasource's place where available. Note that canon_place_id is always populated (with a self-reference if needed), to enable using a just a LEFT JOIN for this.
5995	11/05/2012 05:14 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Use datasource_* prefix instead of reference_* prefix that tables had gotten when datasource_id was renamed to reference_id
5994	11/05/2012 05:04 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref: Regenerated on vegbiendev using the PostgreSQL 9.1 pg_catalog.default collation
5993	11/05/2012 04:49 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Setting pkeys of missing rows: Fixed bug in column-based import where when input rows match multiple output rows in one of this iteration's input tables, the into table's pkey constraint is violated because full_in_table contains multiple entries for an input pkey
5992	11/05/2012 04:42 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Added index on matched_place_id to facilitate looking up places by matched_place_id in place_set_canon_place_id()
5991	11/05/2012 04:26 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added geoscrub_input derived table with associated view. Build geoscrub_input as part of make_analytical_db().
5990	11/05/2012 03:56 PM	Aaron Marcuse-Kubitza	sql.py: Added table_has_pkey()
5989	11/05/2012 03:13 PM	Aaron Marcuse-Kubitza	sql.py: table_pkey_col(): For PostgreSQL DBs, use pg_catalog via index_cols() and table_pkey_index(), in order to use the search_path to look up the table. This fixes a bug where the pkey would be selected from information_schema.table_constraints in random order, and this order sometimes returned the corresponding table in the public schema but sometimes in other schemas, such as VegBank. This became a problem now that VegBIEN has a place table, which conflicts with VegBank's place table. (Most other VegBank tables that are mapped to have been renamed in VegBIEN.)
5988	11/05/2012 02:41 PM	Aaron Marcuse-Kubitza	sql.py: add_pkey_or_index(): Use new add_pkey_index()
5987	11/05/2012 02:41 PM	Aaron Marcuse-Kubitza	sql.py: Added add_pkey_index()
5986	11/05/2012 02:29 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
5985	11/05/2012 02:15 PM	Aaron Marcuse-Kubitza	import_times: Use $'' quoting to expand tab, in order to also work on Mac OS X
5984	11/02/2012 06:14 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Added place_set_canon_place_id_on_() triggers, analogous to taxonlabel.taxonlabel_2_set_canon_label_id_on_()
5983	11/02/2012 06:12 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Added canon_place_id
5982	11/02/2012 06:04 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql, inputs/SALVIAS/verify/plots.ref.sql: Renamed project column to projectname to avoid confusion with the project table
5981	11/02/2012 06:03 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: Fixed bug where needed to make .ref .PRECIOUS instead of verify/.ref
5980	11/02/2012 05:57 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id everywhere, not just as the ID to list for each project, to match the corresponding VegBIEN queries
5979	11/02/2012 05:55 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: Make verify/%.ref .PRECIOUS because there must always be a .ref for the make rules to work
5978	11/02/2012 05:34 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id as the ID to list for each project, since the SALVIAS projects table is now mapped to the DB and the project_name is used as project.projectname
5977	11/02/2012 05:18 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: subplots: Also complete the locationevent/location diamond (subplot event -> {subplot location, parent plot event} -> parent plot location) when an eventDate or range is specified, as this is also an identifying field for locationevent. This fixes a bug where subplots data without explicit plot events (such as SALVIAS and TEAM) was not being connected to the appropriate parent plot event as well as parent plot location. This should fix the SALVIAS verification # location events, which should include only parent plots' locationevents to correspond with # locations, which only includes parent plots' locations, and uses locationevent.parent_id being NULL to determine what is a parent plot event.
5976	11/02/2012 04:45 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: $(verify): Use $(inDatasrc) so that the full standard search_path, including any custom public schema, is made available to the VegBIEN-side verification script
5975	11/02/2012 04:33 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql: # locations: Fixed bug where need to use location instead of locationevent
5974	11/02/2012 04:30 PM	Aaron Marcuse-Kubitza	mappings/verify.specimens.sql: Updated for current schema
5973	11/02/2012 04:26 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: verify/%.out: Made the target .PRECIOUS so that partial output will be saved in case of error to help debugging
5972	11/02/2012 04:13 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql: Updated for current schema
5971	11/02/2012 04:01 PM	Aaron Marcuse-Kubitza	Updated validation/BIEN2_Analytical_DB_overview.docx with Brad's revision
5970	11/02/2012 03:58 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel_set_canon_label_id(): Fixed bug where always need to set canon_label_id on insert (because it must be a self-reference rather than NULL if there is no matched_label_id), but wasn't being set when no matched_label_id because the IS DISTINCT FROM check returned false
5969	11/02/2012 03:42 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: decimalLatitude/Longitude->geoscrub input coordinates: Also set to NULL if 0 here, not just for the coordinates linked to the datasource's place instance
5968	11/02/2012 03:37 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Switched back to using run_query_into()'s add_pkey_ option now that it uses sql.add_pkey_or_index() instead of sql.add_pkey()
5967	11/02/2012 03:35 PM	Aaron Marcuse-Kubitza	sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.
5966	11/02/2012 03:24 PM	Aaron Marcuse-Kubitza	README.TXT: Data import: After import, removed steps to reinstall tnrs_accepted, tnrs_other because this would overwrite the versions of these staging tables that were used to build the current version of the database. (The backup should store a snapshot of the database's input data as it was used to build the database.) Note that these tables will still be reinstalled when the next import starts.
5965	11/02/2012 02:54 PM	Aaron Marcuse-Kubitza	Added validation/BIEN2_Analytical_DB_overview.docx
5964	11/02/2012 02:54 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
5963	11/02/2012 12:00 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs_*/create.sql: Fixed bug where need to COALESCE the WHERE condition with false to ensure that a boolean value, not NULL, is negated by NOT. Otherwise, tnrs_other's query would exclude rows for which the condition returned NULL rather than false.
5962	11/02/2012 11:43 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: make_analytical_db(): Truncate tables before (re-)populating them in case make_analytical_db() has already been run
5961	11/02/2012 11:40 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Updated to use reference instead of party to store the datasource name
5960	11/02/2012 11:28 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Renamed make_analytical_stem() back to make_analytical_db()
5959	11/01/2012 10:52 AM	Aaron Marcuse-Kubitza	import_all: After starting geoscrub import in the background, wait for make commands to scroll by before starting NCBI import
5958	11/01/2012 10:39 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: matched place: Also map verbatim place's geoscrub-related fields to the matched place, to link up with geoscrub's corresponding input place
5957	11/01/2012 10:22 AM	Aaron Marcuse-Kubitza	import_all: Removed explicit by_col=1 from datasources that don't require it for proper import. (It will still be set if the user provides it on the command line.)
5956	11/01/2012 10:18 AM	Aaron Marcuse-Kubitza	schemas/functions.sql: Removed duplicate previous version of _and() with fewer parameters
5955	11/01/2012 10:14 AM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
5954	11/01/2012 10:11 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: method: Made reference_id NOT NULL so the method table would automatically be datasource-scoped
5953	11/01/2012 10:09 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Renamed creator_ids to reference_id since they are now fkeys to reference
5952	11/01/2012 10:04 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Made creator_ids an fkey to reference instead of party, so that datasources are stored separately from people and to allow adding reference-type metadata (URL, copyright, etc.) for each datasource
5951	11/01/2012 09:51 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: reference: Added unique index
5950	11/01/2012 09:47 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Remapped accordingTo to taxonconcept.concept_reference_id
5949	11/01/2012 09:46 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Don't populate taxonlabel.creator_id using identifiedBy, because that was meant for creating a taxonconcept when the accordingTo was not specified, but taxonconcepts are now modeled differently using a taxonconcept subclass of taxonlabel with a concept_reference_id
5948	11/01/2012 09:36 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonverbatim: Renamed binomial to taxonname because this field is actually the taxonomic name without the author, not just the genus+specific epithet binomial
5947	11/01/2012 09:29 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Renamed Binomial to TaxonName because this field can store more ranks than just the genus+specificEpithet binomial (that goes in speciesBinomial)
5946	11/01/2012 09:22 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Renamed taxonName to taxonNameOrEpithet for clarity
5945	11/01/2012 09:16 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_aggregate_view: Added growthForm, cultivated, cultivatedBasis from analytical_stem. Note that this will create multiple entries for a taxon name when some taxonoccurrences are marked as cultivated and others aren't, or different taxonoccurrences have different growthform entries.
5944	11/01/2012 09:01 AM	Aaron Marcuse-Kubitza	import_all: Added geoscrub import, which can happen concurrently with NCBI/TNRS but must come before the main datasources for the matched places to link up properly
5943	11/01/2012 08:59 AM	Aaron Marcuse-Kubitza	import_all: Documented that TNRS import must come after NCBI for cross links to be made
5942	11/01/2012 08:53 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%.sql: _MySQL/%.sql`, etc.: Only run if target does not exist, to avoid regenerating the target when a .make script which generates the target's prerequisite is checked out from svn (causing its mod time to be newer than the target)
5941	11/01/2012 08:24 AM	Aaron Marcuse-Kubitza	Added inputs/.geoscrub/
5940	11/01/2012 08:21 AM	Aaron Marcuse-Kubitza	input.Makefile: SVN: $(svnFilesGlob): Added /, _MySQL/ subdirs
5939	11/01/2012 08:20 AM	Aaron Marcuse-Kubitza	lib/common.Makefile: SVN: $(add*): Fixed bug where need to use --depth=empty to ensure directory contents are not added unless explicitly listed
5938	11/01/2012 08:09 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped georeferenceValid, latLongInCountry, latLongInStateProvince
5937	11/01/2012 08:05 AM	Aaron Marcuse-Kubitza	schemas/functions.sql: Added _and()
5936	11/01/2012 07:40 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped distanceToCountry_km, distanceToStateProvince_km
5935	11/01/2012 07:39 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added _km_to_m()
5934	11/01/2012 07:36 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added _km_to_m()
5933	11/01/2012 07:31 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Added georeference_valid, distance_to_country_m, distance_to_state_m
5932	11/01/2012 07:18 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: matched taxonlabel's ancestors: Only create the cross links to NCBI if the name is accepted (taxonIsCanonical)
5931	11/01/2012 06:55 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: accepted* taxonomic terms: If is canonical name, don't populate a separate accepted taxonlabel because the taxonlabel itself is accepted. If is not canonical name, populate matched_label_id directly when the row is created instead of waiting until after all rows are created and using _taxonlabel_set_matched_label_id(). This is possible now that accepted taxonlabels occur only in TNRS's tnrs_accepted table, which is specially marked as taxonIsCanonical. Avoiding _taxonlabel_set_matched_label_id() should fix the slowdown that occurs when it's used on large tables.
5930	11/01/2012 06:48 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: taxonIsCanonical: Moved then=0 into non-identifying _if statement attributes
5929	11/01/2012 06:41 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped taxonIsCanonical
5928	11/01/2012 06:41 AM	Aaron Marcuse-Kubitza	bin/map: map_rows(): map_table(): Fixed bug where metadata values were being removed prematurely, by passing them through
5927	11/01/2012 06:40 AM	Aaron Marcuse-Kubitza	bin/map: map_rows(): Fixed bug where metadata values were being passed to functions that expected columns, by placing them directly in the XML import tree and then removing them from the mappings
5926	11/01/2012 05:51 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Filter out metadata values, which are not new terms even though they are in the same map column as the input columns
5925	11/01/2012 05:46 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs_accepted/map.csv: Set taxonIsCanonical to true, using new metadata value syntax
5924	11/01/2012 05:40 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added taxonIsCanonical
5923	11/01/2012 05:37 AM	Aaron Marcuse-Kubitza	input.Makefile: SVN: $(svnFiles): Added _MySQL files that would otherwise be excluded with $(filter-out _% ...)
5922	11/01/2012 05:34 AM	Aaron Marcuse-Kubitza	inputs/*/: Placed all logs/ subdirs and import_order.txt under version control, using `make inputs/.../add`
5921	11/01/2012 05:31 AM	Aaron Marcuse-Kubitza	input.Makefile: SVN: add: Factored list of files to add out into $(svnFiles) var
5920	11/01/2012 05:30 AM	Aaron Marcuse-Kubitza	Added inputs/.TNRS/tnrs/header.csv
5919	11/01/2012 05:28 AM	Aaron Marcuse-Kubitza	input.Makefile: SVN: $(svnFilesGlob): Added *.make
5918	11/01/2012 05:21 AM	Aaron Marcuse-Kubitza	input.Makefile: SVN: add: Fixed bug where _-prefixed dirs (such as _MySQL) were incorrectly being included in the list of files to put under version control
5917	11/01/2012 05:15 AM	Aaron Marcuse-Kubitza	Calls to `make inputs/.TNRS/cleanup`: Do `make inputs/.TNRS/tnrs_accepted/reinstall; make inputs/.TNRS/tnrs_other/reinstall` instead to use new split TNRS tables
5916	11/01/2012 05:04 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Removed no longer needed tnrs_accepted_names index, now that the accepted names have been split into a separate table
5915	11/01/2012 05:00 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/: Split tnrs table into tnrs_accepted and tnrs_other, with tnrs_accepted imported first. This will eventually allow TNRS data to be imported without needing to use _taxonlabel_set_matched_label_id(), which is extremely slow due to an unknown bug in PostgreSQL's handling of UPDATEs on large tables (possibly limited to UPDATEs inside functions).
5914	11/01/2012 05:00 AM	Aaron Marcuse-Kubitza	input.Makefile: SVN: add: Fixed bug where logs/*.log.sql was incorrectly being included in the list of files to put under version control
5913	11/01/2012 04:53 AM	Aaron Marcuse-Kubitza	lib/common.Makefile: Filesystem: $(wildcard/): Fixed bug where {} exprs weren't handled correctly when some branches contained *, by using `shopt -s nullglob` and bash directly instead of trying to emulate it with grep
5912	11/01/2012 04:44 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also provide the schema to create.sql in :schema
5911	11/01/2012 04:16 AM	Aaron Marcuse-Kubitza	bin/map: Added support for including literal metadata values in the map spreadsheet, by prefixing them with ':'
5910	11/01/2012 03:56 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added distanceToCountry_km, distanceToStateProvince_km
5909	11/01/2012 03:45 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added distanceToCountry_m, distanceToStateProvince_m
5908	11/01/2012 03:36 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude. Mapped decimalLatitude/Longitude to matched place's coordinates when acceptedDecimalLatitude/Longitude not provided (as is the case for the geoscrub table).
5907	11/01/2012 03:22 AM	Aaron Marcuse-Kubitza	input.Makefile: SVN: $(svnFilesGlob): Added schema.sql
5906	11/01/2012 02:57 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Replaced georeferenceInvalid with latLongDomainValid, latLongDomainInvalid because this field is actually just whether the coordinates are in the range of valid decimal values, not whether they are within the named place reference. Re-sourced georeferenceValid to bien_web.observation#isGeovalid rather than isValidLatLong. VegCore-VegBIEN.csv: Replaced now-removed georeferenceInvalid with latLongInCountry for locationID redirection checks.
5905	11/01/2012 01:54 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided
5904	11/01/2012 01:06 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: accepted* taxonlabel: Removed ancestor hierarchy because this is populated, in much greater detail, when the accepted name is imported as an input name and the TNRS-parsed components are available
5903	11/01/2012 12:55 AM	Aaron Marcuse-Kubitza	sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.

Project

General

Profile