/ - Changes - BIEN 3 - NCEAS Projects

root @ 6063

#	Date	Author	Comment
6063	11/08/2012 07:47 AM	Aaron Marcuse-Kubitza	psql_vegbien: Use bash because it supports substitutions
6062	11/08/2012 07:46 AM	Aaron Marcuse-Kubitza	psql_vegbien: Set the search_path to $out_schemas set by vegbien_dest. This will enable running any psql_vegbien script on a schema other than public.
6061	11/08/2012 07:07 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Changed inner joins on non-datasource taxonlabels to LEFT JOINs, to ensure that an entry is included in the analytical DB even if there was no matched taxonlabel. In theory, this shouldn't be necessary, because every taxonlabel should have a canonical taxonlabel since canon_label_id is auto-populated to the taxonlabel_id if there is no matched_label_id; there should be a taxonverbatim for every datasource and accepted taxonlabel because datasources link to taxonlabel via taxonverbatim and TNRS populates a taxonverbatim for every accepted taxonlabel; and there should be a parsed taxonlabel for every datasource taxonlabel because the mappings populate it.
6060	11/08/2012 06:56 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Removed join on specimenreplicate, because it isn't used in the analytical DB. Each specimen will still get an entry in analytical_*, because it gets its own location.
6059	11/08/2012 06:45 AM	Aaron Marcuse-Kubitza	README.TXT: Data import: Before starting import, added step to run `make inputs/upload` and reinstall newly-uploaded datasources
6058	11/08/2012 03:22 AM	Aaron Marcuse-Kubitza	README.TXT: Schema changes: Remember to update the following files with any renamings: Added mappings/verify.*.sql
6057	11/08/2012 02:21 AM	Aaron Marcuse-Kubitza	README.TXT: Data import: make_analytical_db: Documented how to view progress in log file
6056	11/08/2012 02:18 AM	Aaron Marcuse-Kubitza	make_analytical_db: Run all commands synchronously so the log file output doesn't become jumbled
6055	11/08/2012 02:16 AM	Aaron Marcuse-Kubitza	make_analytical_db: Fixed bug where log file needed to be appended to instead of overwritten
6054	11/08/2012 02:15 AM	Aaron Marcuse-Kubitza	make_analytical_db: Wrap each individual command in `set -x` to avoid echoing low-level commands such as sleep, wait
6053	11/08/2012 02:02 AM	Aaron Marcuse-Kubitza	make_analytical_db: Moved log file to inputs/analytical_db/logs/make_analytical_db.log.sql so it would be synced along with the other import logs
6052	11/08/2012 01:57 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/tnrs.make: Output the time at which it's run, so this is included in the log file
6051	11/08/2012 01:53 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/tnrs.make: Moved log file to logs/tnrs.make.log.sql so it would automatically be synced along with the other import logs
6050	11/08/2012 01:49 AM	Aaron Marcuse-Kubitza	make_analytical_db: Moved log file to inputs/analytical_db/logs/make_analytical_db.log.sql so it would be synced along with the other import logs
6049	11/08/2012 01:40 AM	Aaron Marcuse-Kubitza	inputs/Makefile: Import logs: $(rsyncLogs): Always download the TNRS daemon log, rather than requiring tnrs_log=1 to be specified to download it
6048	11/08/2012 01:37 AM	Aaron Marcuse-Kubitza	make_analytical_db: Output the time at which it's run, so this is included in the log file
6047	11/08/2012 01:36 AM	Aaron Marcuse-Kubitza	make_analytical_db: Store output in schemas/make_analytical_db.log
6046	11/08/2012 01:24 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Removed no longer used make_analytical_db(). Use bin/make_analytical_db instead.
6045	11/08/2012 01:23 AM	Aaron Marcuse-Kubitza	make_analytical_db: Use new psql_verbose_vegbien
6044	11/08/2012 01:22 AM	Aaron Marcuse-Kubitza	Added psql_verbose_vegbien
6043	11/08/2012 01:18 AM	Aaron Marcuse-Kubitza	make_analytical_db: Use psql_script_vegbien, which contains the necessary psq options, instead of setting those options manually
6042	11/08/2012 01:15 AM	Aaron Marcuse-Kubitza	make_analytical_db: Run the SQL commands directly with psql instead of using the make_analytical_db() function. This provides incremental results and avoids running all commands in one transaction, thus preventing pgAdmin from freezing when the user attempts to access a table used in analytical DB creation (because the TRUNCATE statement fully locks the table until the entire analytical DB is built).
6041	11/08/2012 12:46 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: make_analytical_db(): Added make_family_higher_plant_group()
6040	11/08/2012 12:17 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times. Fixed input row counts and import times to include derived data, such as TNRS and geoscrub, which adds to the import time and therefore should be considered in the import's speed. (TNRS was already being included in the import time for some, but not all, imports.)
6039	11/06/2012 05:02 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_*: Added higherPlantGroup
6038	11/06/2012 04:06 PM	Aaron Marcuse-Kubitza	pg_dump_vegbien: When not in data mode, use --inserts so the INSERT statements generated can be run directly in pgAdmin, they can be directly translated to MySQL, and an empty COPY FROM statement isn't generated for every table
6037	11/06/2012 04:05 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added family_higher_plant_group lookup table, generated with make_family_higher_plant_group() using the contents of higher_plant_group_nodes
6036	11/06/2012 03:40 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added family_higher_plant_group lookup table, generated with make_family_higher_plant_group() using the contents of higher_plant_group_nodes
6035	11/06/2012 03:23 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Always map taxonNameOrEpithet to taxonomicname, now that it's globally unique at all ranks in the datasource that provides it (NCBI)
6034	11/06/2012 03:18 PM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/create.sql: Make name_txt completely globally unique by removing all duplicates, not just duplicate genera
6033	11/06/2012 03:17 PM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/create.sql: Make name_txt (mostly) globally unique by removing several other kingdoms/superkingdoms, not just Animalia
6032	11/06/2012 02:56 PM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/create.sql: Making genus globally unique: Moved comment with kingdom name to line with DELETE, and put "delete cascades to descendants" comment on its own line
6031	11/06/2012 01:57 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added higher_plant_group_nodes lookup table
6030	11/06/2012 01:57 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added higher_plant_group_nodes lookup table
6029	11/06/2012 01:49 PM	Aaron Marcuse-Kubitza	schemas/Makefile: DDLs: $(pg_dump): Turn off schema-only mode so that lookup table contents are included as well
6028	11/06/2012 01:47 PM	Aaron Marcuse-Kubitza	pg_dump_vegbien: Except in (raw) data mode, filter out pg_catalog.setval() statements (only lookup table contents should be preserved)
6027	11/06/2012 01:39 PM	Aaron Marcuse-Kubitza	lib/PostgreSQL-MySQL.csv: Remove SELECT pg_catalog.setval() statements
6026	11/06/2012 01:27 PM	Aaron Marcuse-Kubitza	lib/PostgreSQL-MySQL.csv: Remove COPY statements
6025	11/06/2012 01:18 PM	Aaron Marcuse-Kubitza	pg_dump_vegbien: $schema flag defaults to on to export only schema
6024	11/06/2012 01:08 PM	Aaron Marcuse-Kubitza	pg_dump_vegbien: Fixed bug where schema name var can't be named $schema because that would conflict with the $schema flag
6023	11/06/2012 01:03 PM	Aaron Marcuse-Kubitza	root Makefile: VegBIEN DB: DB and bien user: db: Run inputs/.TNRS/install before schemas/install because public.tnrs_input_name depends on TNRS.tnrs
6022	11/06/2012 12:33 PM	Aaron Marcuse-Kubitza	pg_dump_vegbien: Only include just the schema if new $schema flag is set. This causes constant lookup tables to automatically be exported for DB schemas, which was not fully supported by the $data flag because it also forced $owners to be enabled.
6021	11/06/2012 12:17 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added higher_plant_group enum
6020	11/06/2012 11:25 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added higherPlantGroup
6019	11/06/2012 10:40 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_*: Added isNewWorld
6018	11/06/2012 10:32 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added isNewWorld
6017	11/06/2012 10:21 AM	Aaron Marcuse-Kubitza	inputs/newWorld/: Added geoscrub.schema.~.changes.sql to add a unique constraint on countryNameStd and change isNewWorld's type to boolean
6016	11/06/2012 09:59 AM	Aaron Marcuse-Kubitza	Added inputs/newWorld/
6015	11/06/2012 09:55 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Moved Auto-create the map spreadsheets after Install the staging tables, because for DB-only tables, creating the map spreadsheets requires the header.csv generated by the staging tables install
6014	11/06/2012 09:48 AM	Aaron Marcuse-Kubitza	input.Makefile: SVN: add: Also set _MySQL/'s svn:ignore
6013	11/06/2012 09:30 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS/_MySQL/salvias_plots.*.sql.make: Use new my2pg_export
6012	11/06/2012 09:27 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/_MySQL/geoscrub.*.sql.make: Use new my2pg_export
6011	11/06/2012 09:25 AM	Aaron Marcuse-Kubitza	Added my2pg_export
6010	11/05/2012 11:16 PM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
6009	11/05/2012 11:14 PM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Added analytical_stem, analytical_aggregate to ERD
6008	11/05/2012 10:49 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_*: Added georeferenceValid, distanceToCountry_km, distanceToStateProvince_km
6007	11/05/2012 10:34 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added _m_to_km()
6006	11/05/2012 10:06 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_cleaned_unique/map.csv: Removed geoscrubID->locationID mapping because it allowed multiple occurrences of the same placename/coordinates combination, which was causing problems when datasources try to link up to the geoscrub places (and end up matching multiple output rows for each input row)
6005	11/05/2012 09:54 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Removed assertion that into's row count be at least full_in_table's row count, because now that DISTINCT ON is used to satisfy the into table pkey, this is no longer necessarily true
6004	11/05/2012 09:15 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: method: method_unique: Added reference_id to make method unique within the datasource instead of globally unique within VegBIEN
6003	11/05/2012 09:12 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added samplingProtocolID
6002	11/05/2012 08:48 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: subplot locationevent: Only populate parent locationevent's location unique IDs if a subplot #/subplotID is actually specified. (The lack of a location unique ID will cause the parent locationevent's location to be removed, as well as the parent locationevent itself if there is no parent locationevent unique ID.) This fixes a bug where top-level plots in datasources that provide a nullable subplot #/subplotID were incorrectly getting connected to parent locationevents.
6001	11/05/2012 08:34 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql: Added # locations, # location events, which also include subplots
6000	11/05/2012 08:17 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref.sql: Standardized line wrapping to put ; on its own line in multi-line queries and at the end of the line in one-line queries
5999	11/05/2012 08:13 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql: Added "top-level" to # locations, # location events names because these queries do not include all locations
5998	11/05/2012 07:52 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref.sql: # locations: Fixed count to include all plotMetadata entries instead of just unique coordinates, because each plotMetadata entry now gets its own location
5997	11/05/2012 07:34 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: matched place: When any of the accepted* names is specified, set matched_place_id=0 (self-reference) to indicate that the matched place is an accepted (scrubbed) place
5996	11/05/2012 05:30 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Use the canonical (scrubbed) place instead of the datasource's place where available. Note that canon_place_id is always populated (with a self-reference if needed), to enable using a just a LEFT JOIN for this.
5995	11/05/2012 05:14 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Use datasource_* prefix instead of reference_* prefix that tables had gotten when datasource_id was renamed to reference_id
5994	11/05/2012 05:04 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref: Regenerated on vegbiendev using the PostgreSQL 9.1 pg_catalog.default collation
5993	11/05/2012 04:49 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Setting pkeys of missing rows: Fixed bug in column-based import where when input rows match multiple output rows in one of this iteration's input tables, the into table's pkey constraint is violated because full_in_table contains multiple entries for an input pkey
5992	11/05/2012 04:42 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Added index on matched_place_id to facilitate looking up places by matched_place_id in place_set_canon_place_id()
5991	11/05/2012 04:26 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added geoscrub_input derived table with associated view. Build geoscrub_input as part of make_analytical_db().
5990	11/05/2012 03:56 PM	Aaron Marcuse-Kubitza	sql.py: Added table_has_pkey()
5989	11/05/2012 03:13 PM	Aaron Marcuse-Kubitza	sql.py: table_pkey_col(): For PostgreSQL DBs, use pg_catalog via index_cols() and table_pkey_index(), in order to use the search_path to look up the table. This fixes a bug where the pkey would be selected from information_schema.table_constraints in random order, and this order sometimes returned the corresponding table in the public schema but sometimes in other schemas, such as VegBank. This became a problem now that VegBIEN has a place table, which conflicts with VegBank's place table. (Most other VegBank tables that are mapped to have been renamed in VegBIEN.)
5988	11/05/2012 02:41 PM	Aaron Marcuse-Kubitza	sql.py: add_pkey_or_index(): Use new add_pkey_index()
5987	11/05/2012 02:41 PM	Aaron Marcuse-Kubitza	sql.py: Added add_pkey_index()
5986	11/05/2012 02:29 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
5985	11/05/2012 02:15 PM	Aaron Marcuse-Kubitza	import_times: Use $'' quoting to expand tab, in order to also work on Mac OS X
5984	11/02/2012 06:14 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Added place_set_canon_place_id_on_() triggers, analogous to taxonlabel.taxonlabel_2_set_canon_label_id_on_()
5983	11/02/2012 06:12 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Added canon_place_id
5982	11/02/2012 06:04 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql, inputs/SALVIAS/verify/plots.ref.sql: Renamed project column to projectname to avoid confusion with the project table
5981	11/02/2012 06:03 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: Fixed bug where needed to make .ref .PRECIOUS instead of verify/.ref
5980	11/02/2012 05:57 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id everywhere, not just as the ID to list for each project, to match the corresponding VegBIEN queries
5979	11/02/2012 05:55 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: Make verify/%.ref .PRECIOUS because there must always be a .ref for the make rules to work
5978	11/02/2012 05:34 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id as the ID to list for each project, since the SALVIAS projects table is now mapped to the DB and the project_name is used as project.projectname
5977	11/02/2012 05:18 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: subplots: Also complete the locationevent/location diamond (subplot event -> {subplot location, parent plot event} -> parent plot location) when an eventDate or range is specified, as this is also an identifying field for locationevent. This fixes a bug where subplots data without explicit plot events (such as SALVIAS and TEAM) was not being connected to the appropriate parent plot event as well as parent plot location. This should fix the SALVIAS verification # location events, which should include only parent plots' locationevents to correspond with # locations, which only includes parent plots' locations, and uses locationevent.parent_id being NULL to determine what is a parent plot event.
5976	11/02/2012 04:45 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: $(verify): Use $(inDatasrc) so that the full standard search_path, including any custom public schema, is made available to the VegBIEN-side verification script
5975	11/02/2012 04:33 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql: # locations: Fixed bug where need to use location instead of locationevent
5974	11/02/2012 04:30 PM	Aaron Marcuse-Kubitza	mappings/verify.specimens.sql: Updated for current schema
5973	11/02/2012 04:26 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: verify/%.out: Made the target .PRECIOUS so that partial output will be saved in case of error to help debugging
5972	11/02/2012 04:13 PM	Aaron Marcuse-Kubitza	mappings/verify.plots.sql: Updated for current schema
5971	11/02/2012 04:01 PM	Aaron Marcuse-Kubitza	Updated validation/BIEN2_Analytical_DB_overview.docx with Brad's revision
5970	11/02/2012 03:58 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel_set_canon_label_id(): Fixed bug where always need to set canon_label_id on insert (because it must be a self-reference rather than NULL if there is no matched_label_id), but wasn't being set when no matched_label_id because the IS DISTINCT FROM check returned false
5969	11/02/2012 03:42 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: decimalLatitude/Longitude->geoscrub input coordinates: Also set to NULL if 0 here, not just for the coordinates linked to the datasource's place instance
5968	11/02/2012 03:37 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Switched back to using run_query_into()'s add_pkey_ option now that it uses sql.add_pkey_or_index() instead of sql.add_pkey()
5967	11/02/2012 03:35 PM	Aaron Marcuse-Kubitza	sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.
5966	11/02/2012 03:24 PM	Aaron Marcuse-Kubitza	README.TXT: Data import: After import, removed steps to reinstall tnrs_accepted, tnrs_other because this would overwrite the versions of these staging tables that were used to build the current version of the database. (The backup should store a snapshot of the database's input data as it was used to build the database.) Note that these tables will still be reinstalled when the next import starts.
5965	11/02/2012 02:54 PM	Aaron Marcuse-Kubitza	Added validation/BIEN2_Analytical_DB_overview.docx
5964	11/02/2012 02:54 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times

Project

General

Profile