/ - Changes - BIEN 3 - NCEAS Projects

root @ 5810

#	Date	Author	Comment
5810	10/26/2012 04:32 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: If $schema_only option is set, only install .sql files ending in schema.sql
5809	10/26/2012 04:24 PM	Aaron Marcuse-Kubitza	inputs/Makefile: $(rsyncLogs): Use $(rsync) instead of $(rsync) now that it supports excluding just temp files and .svn rather than all .
5808	10/26/2012 04:21 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: rsync: $(rsync): Exclude .svn, #, and .DS_Store rather than all . because dirs beginning with . created by the user (such as .NCBI, .TNRS) should be included in the sync
5807	10/26/2012 04:18 PM	Aaron Marcuse-Kubitza	Added inputs/REMIB/Specimen.src/.map.csv.last_cleanup
5806	10/26/2012 04:10 PM	Aaron Marcuse-Kubitza	Added inputs/bien_web/observation/+header.csv
5805	10/26/2012 04:09 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(dbExports): When putting schemas first, don't require a . before "schema" to allow the entire filename to be schema.sql
5804	10/26/2012 03:44 PM	Aaron Marcuse-Kubitza	inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes
5803	10/26/2012 03:42 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/map.csv: Added _nullIf filter to remove "Unknown" values for Accepted_name_family
5802	10/26/2012 03:35 PM	Aaron Marcuse-Kubitza	README.TXT: Generate the local TNRS cache from the test_taxonomic_names rather than syncing it with the vegbiendev TNRS cache, so that the automated test's inserted row count stays the same regardless of the contents of the full-DB TNRS cache
5801	10/26/2012 03:34 PM	Aaron Marcuse-Kubitza	README.TXT: Backups: Added TNRS cache section
5800	10/26/2012 03:12 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/test.xml.ref: Accepted inserted row count using TNRS cache created from test_taxonomic_names. Using a standard set of names for the test ensures that the inserted row count will not change when the full-DB TNRS cache changes.
5799	10/26/2012 02:48 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_accepted_names: Prepend the Accepted_name_family to the taxonomic name that will be submitted back to TNRS for parsing, because TNRS input names now always include the family when it's provided
5798	10/26/2012 02:46 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_accepted_names: Use simpler array_to_string() instead of \|\| and COALESCE to put together the taxonomic name that will be submitted back to TNRS for parsing. Note that this requires defining an IMMUTABLE wrapper function for array_to_string(), because pg_catalog.array_to_string() is declared STABLE but indexes require functions to be IMMUTABLE (http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg156323.html).
5797	10/26/2012 02:42 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Don't hardcode the schema name
5796	10/26/2012 02:40 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: sql/install: Provide the datasource's schema to the script in :schema, so it can refer to its own elements explicitly when it's not possible to rely on the search_path. This is the case for functions that have the same signature as (and are intended to replace) a pg_catalog function, because the pg_catalog function will be used in preference to the datasource function regardless of the search_path.
5795	10/26/2012 02:06 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(cleanup): If a cleanup.sql is provided, only run it and don't do default cleanup, to allow tables to override rather than just add to default cleanup operations. This prevents the automatic replacement of certain strings (sql_io.null_strs) with NULL on TNRS, and keeps the TNRS cache mostly as it was output by the TNRS service. Note that empty strings are still replaced with NULL by COPY FROM in sql_io.append_csv(). This is necessary for TNRS import to work properly, because although '' generally means NULL, it is not treated that way by PostgreSQL.
5794	10/26/2012 01:53 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup
5793	10/26/2012 01:52 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup
5792	10/25/2012 05:16 PM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: If first param's type is anyelement (for polymorphic function, which had mismatched arg types), use type text, as all types can cast to it
5791	10/25/2012 05:12 PM	Aaron Marcuse-Kubitza	sql_io.py: cast(): Set the created function's value param type to anyelement to support any input type, not just text
5790	10/25/2012 04:53 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Only prepend the family to the concatenated scientificName for TNRS if it ends in -aceae (using _taxon_family_require_std()), to avoid sending unsupported, nonstandard families to TNRS which it will place in Unmatched_terms
5789	10/25/2012 04:48 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added _taxon_family_require_std()
5788	10/25/2012 04:09 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Prepend the family to the concatenated scientificName input to TNRS, so that TNRS can use it to disambiguate the genus
5787	10/25/2012 03:50 PM	Aaron Marcuse-Kubitza	tnrs_db: Making TNRS request: Fixed bug where needed to remove else block now that there is no except block
5786	10/25/2012 03:45 PM	Aaron Marcuse-Kubitza	tnrs.py: retrieval_request_template: Turn on taxonomic_constraint (to match family before genus) and source_sorting (to always return any result from the first source before returning results from any other sources, regardless of match %)
5785	10/25/2012 12:50 PM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
5784	10/25/2012 12:46 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: speciesBinomial: Changed definition to genus+specificEpithet, not genus+species, to match the scientific meaning of specificEpithet vs. species
5783	10/25/2012 12:45 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonverbatim: Renamed species to specific_epithet to avoid confusion with the scientific meaning of species (genus+specificEpithet), since this field contains just the specific epithet
5782	10/25/2012 12:30 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: verify: Use tables from the verify/.ref files themselves rather than from the datasource's subdirs, in order to match the tables in mappings/verify..sql
5781	10/25/2012 11:52 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Added stemobservation.tag, stemobservation.height_m for use in plot change over time analysis <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis>
5780	10/25/2012 11:45 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Fixed typo in scientificNameWithMorphospecies
5779	10/25/2012 11:41 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Renamed columns to VegCore names (https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/mappings/VegCore.csv)
5778	10/25/2012 11:36 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added cultivatedBasis
5777	10/25/2012 11:34 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added scientificNameWithMorphospecies
5776	10/25/2012 10:55 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added speciesBinomial
5775	10/25/2012 10:49 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Generate species by concatenating genus and specific epithet, since according to Brad this field is actually the binomial, not the specificEpithet
5774	10/25/2012 10:41 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Removed no longer used plot_change_over_time view. Use one of the queries at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis> instead.
5773	10/25/2012 10:36 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: location: Populate sourceaccessioncode with locationID + subplot when subplot is unique only within the parent plot, so that location always has a sourceaccessioncode to use as the plotCode in analytical_db_view
5772	10/25/2012 10:07 AM	Aaron Marcuse-Kubitza	lib/PostgreSQL-MySQL.csv: Remove views because they can contain arbitrary expressions, whose syntax may not be compatible with MySQL
5771	10/25/2012 10:04 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Use location.sourceaccessioncode as plotCode instead of authorlocationcode because authorlocationcode isn't globally unique (for subplots, it's only unique within the parent plot)
5770	10/25/2012 09:48 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantobservation: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)
5769	10/25/2012 09:47 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: aggregateoccurrence: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)
5768	10/25/2012 09:42 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: aggregateoccurrence: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)
5767	10/25/2012 09:31 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonoccurrence: Added taxonoccurrence_required_key check constraint to ensure that all taxonoccurrences are properly identified, and empty taxonoccurrences are properly pruned. This fixes a bug where taxon-only and stem-only data did not properly prune the taxonoccurrence that would otherwise get created because it's included in the mappings.
5766	10/25/2012 07:51 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): insert_into_pkeys(): Use new sql.add_pkey_or_index() instead of sql.add_pkey() in order to just print a warning if for some reason there were duplicate entries for an input row in the iteration's pkeys table. This should provide a workaround for bugs (often in the schema itself, related to its unique indexes) that cause an input row to match multiple output rows when joining on the output table using the unique constraint's columns.
5765	10/25/2012 07:44 AM	Aaron Marcuse-Kubitza	sql.py: Added add_pkey_or_index()
5764	10/25/2012 07:32 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): Parse "could not create unique index ... Key is duplicated" errors as DuplicateKeyException
5763	10/25/2012 07:27 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): DuplicateKeyException: Factored out creation of DuplicateKeyException into helper function
5762	10/25/2012 07:20 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
5761	10/24/2012 06:31 PM	Aaron Marcuse-Kubitza	tnrs_db: Removed tnrs.InvalidResponse exception handler that retries the query because the current query does not track which names have been submitted to but not processed by TNRS, so the error would continue to happen repeatedly
5760	10/24/2012 06:13 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: Added index on parent_id to speed up plot change over time joins
5759	10/24/2012 05:45 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: Added index on creator_id to speed up analytical_db_view joins
5758	10/24/2012 05:15 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: stemobservation: Added index on plantobservation_id to speed up analytical_db_view joins
5757	10/23/2012 01:08 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added initial plot_change_over_time view
5756	10/23/2012 12:53 PM	Aaron Marcuse-Kubitza	Added inputs/bien_web/
5755	10/23/2012 12:43 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Reordered taxonoccurrence.growthform to put if after the bien_web.observation fields
5754	10/23/2012 12:32 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Include taxonoccurrence.growthform
5753	10/23/2012 12:27 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Generate taxonMorphospecies by concatenating the scientificName to the morphospecies
5752	10/23/2012 12:23 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take taxonomic name components from the accepted taxonlabel's taxonverbatim instead of the datasource's taxonverbatim, which does not contain the accepted name
5751	10/23/2012 12:19 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: identifiedBy: Added NULLIF to keep empty strings out of the analytical DB
5750	10/23/2012 12:13 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take taxonomic name components from the accepted taxonlabel's taxonverbatim instead of the datasource's taxonverbatim, which does not contain the accepted name
5749	10/23/2012 12:10 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take morphospecies from the parsed taxonlabel's taxonverbatim, where it has been parsed out, instead of the datasource's taxonverbatim, which has it as part of the verbatim input name
5748	10/23/2012 11:58 AM	Aaron Marcuse-Kubitza	analytical_db_view: Added stemobservation.xposition_m, yposition_m
5747	10/23/2012 11:46 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/map.csv: Added new Time_submitted field
5746	10/23/2012 11:45 AM	Aaron Marcuse-Kubitza	inputs/REMIB/Specimen/header.csv: Regenerated for new staging tables format
5745	10/23/2012 11:41 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/test.xml.ref: Accepted correct inserted row count, which most likely became detached from the primary row count when the TNRS cache was cleared and repopulated with test data
5744	10/23/2012 11:28 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Reordered joins in path order, putting datasource before location. This will enable more naturally reusing the SELECT query for other analyses.
5743	10/23/2012 11:15 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Do not include rank in the mapping because taxonomicname is globally unique, and thus it isn't used in looking up the NCBI taxonlabel
5742	10/23/2012 11:05 AM	Aaron Marcuse-Kubitza	inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes
5741	10/23/2012 10:49 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Also attach TNRS genus to NCBI backbone. This causes attachment to be made with as many of family and genus as are provided and have an entry in NCBI.
5740	10/23/2012 10:45 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: family -> NCBI backbone: Removed extra path after _if statement's cond/_exists
5739	10/23/2012 10:39 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Instead of connecting the acceptedFamily to the NCBI backbone, connect the family for the TNRS matched taxonlabel. This connects more families and also connects the same set of fields as will be connected for the genus.
5738	10/23/2012 10:01 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Fixed bug where needed to attach accepted family to NCBI using taxonomicname, which is globally unique, rather than taxonepithet, which is only unique within the parent taxon
5737	10/23/2012 09:34 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/: Added Time_submitted column at beginning and populate it in tnrs_db with the time the batch TNRS request was submitted
5736	10/23/2012 09:08 AM	Aaron Marcuse-Kubitza	csvs.py: RowNumFilter: Use new ColInsertFilter
5735	10/23/2012 09:08 AM	Aaron Marcuse-Kubitza	csvs.py: Added ColInsertFilter
5734	10/23/2012 08:43 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Removed no longer used _is_higher_taxon(). Use _has_taxonomic_name() or _taxonomic_name_is_epithet() instead.
5733	10/23/2012 08:42 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: taxonName->taxonepithet: Use new _taxonomic_name_is_epithet() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field
5732	10/23/2012 08:36 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added _taxonomic_name_is_epithet()
5731	10/23/2012 08:33 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field
5730	10/23/2012 08:30 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field
5729	10/23/2012 08:25 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added _has_taxonomic_name() for lower taxon ranks that typically have a globally unique taxonomic name
5728	10/23/2012 08:10 AM	Aaron Marcuse-Kubitza	schemas/functions.sql: Removed unit conversion functions that take a text input, since casts to the parameter type (double precision) are now automatically performed by sql_io.put_table(), using sql.parse_exception()'s function MissingCastException parsing
5727	10/23/2012 08:01 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: _is_higher_taxon() calls: Default to true if the rank can't be parsed to a taxonrank enum value
5726	10/23/2012 07:56 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Moved definition of wrapper function inside try block of main loop because the creation of the empty pkeys table (whose row type is needed for the wrapper function) can itself produce MissingCastExceptions, which must be thrown inside the loop in order to be handled properly
5725	10/23/2012 07:05 AM	Aaron Marcuse-Kubitza	db_xml.py: put(): Indicate no parent_ids_loc using no_parent_ids_loc sentinel instead of None to support parent_ids_locs that are equal to None (e.g. if the parent node had an error). Always forward parent_ids_loc to children with fkeys to parent, even on error, because the parent table may not be required for the child tables to be valid, such as for taxonomic-data-only datasets that nevertheless have nodes for the non-taxonomic tables in their mappings.
5724	10/23/2012 06:38 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text
5723	10/23/2012 05:59 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): InvalidValueException: Fixed bug in regexp where can't use .*? before (?:...)? surrounding matched value, because it prevents the value from being matched now that it is optional
5722	10/23/2012 05:52 AM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/header.csv: Updated for new staging table format, which includes a row_num column in each joined table
5721	10/23/2012 05:51 AM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/create.sql: Updated for new src table names
5720	10/23/2012 05:36 AM	Aaron Marcuse-Kubitza	xml_func.py: process(): Pass on_error through to sql_io.put(). This fixes a bug in row-based import where DB errors in the xml_func.process() phase would abort the entire import instead of being tracked and having the return value set to None.
5719	10/23/2012 05:33 AM	Aaron Marcuse-Kubitza	sql_io.py: put(): Pass on_error through to put_table()
5718	10/23/2012 05:19 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): log_exc(): Return False if removing all rows and have callers break the main loop so that no further exception-handling code is processed before the main loop is exited
5717	10/23/2012 05:17 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): InvalidValueException: Also match exceptions which don't provide a specific value but just indicate that a value was invalid, such as PL/Python's "day is out of range for month"
5716	10/23/2012 04:39 AM	Aaron Marcuse-Kubitza	db_xml.py: put(): Inserting children with fkeys to parent: Don't do this if this node had an error and sql_io.put_table() returned None as the generated pkey. This fixes a bug where a node with an error will still try to create children with fkeys to parent, but pass None as the fkey to parent, which the recursive put() call will then incorrectly treat as there being no field with an fkey to parent at all rather than a field whose value is NULL. This causes function overload resolution to be unable to find the intended function, because it is missing a parameter.
5715	10/23/2012 04:34 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Return the actual type of the function's 1st param, using new function_param0_type(), rather than just text
5714	10/23/2012 04:31 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Fixed bug where can't return the function name as the name of what was missing the cast, because this must be a column
5713	10/23/2012 04:28 AM	Aaron Marcuse-Kubitza	sql.py: Added function_param0_type()
5712	10/23/2012 04:26 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Only treat DoesNotExistException as a MissingCastException if the query that was run did not already include a cast, to avoid infinite exception-handling recursion
5711	10/23/2012 04:24 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Fixed bug where determining whether the exception is a MissingCastException rather than a DoesNotExistException needs to check whether the function exists rather than whether it's the same in the exception message as in the query that was run. The exception message will of course copy the function name verbatim from the query, so there is no information in the exception message itself to indicate whether the DoesNotExistException was caused by a missing cast or by a nonexistent function.

Project

General

Profile