Project

General

Profile

Statistics
| Revision:

# Date Author Comment
5823 10/26/2012 08:18 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist

5822 10/26/2012 08:10 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Renamed datasource's taxonverbatim to datasource_taxonverbatim to distinguish it from the other taxonverbatims that are joined on (parsed_taxonverbatim, accepted_taxonverbatim)

5821 10/26/2012 07:18 PM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/create.sql: Make genus (mostly) globally unique by removing kingdom Animalia, which has significant genus overlap with plants. This reduces the number of duplicated genera from 578 to 65 (determined with `SELECT name_txt, count(), array_agg(rank) FROM "NCBI".nodes GROUP BY name_txt HAVING count() > 1 AND 'genus' = ALL (array_agg(rank))`).

5820 10/26/2012 07:08 PM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/create.sql: Added foreign key on parent tax_id with covering index

5819 10/26/2012 07:06 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Added %/uninstall, %/reinstall to allow reinstalling individual tables

5818 10/26/2012 06:00 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): When adding the failed condition to the errors table, also include the original, untranslated condition from the DB schema in addition to the translation of the condition into the input schema

5817 10/26/2012 05:45 PM Aaron Marcuse-Kubitza

sql_io.py: track_data_error(): Fixed bug where errors whose column had no srcs (indicated by () ) were incorrectly being ignored. This affected NOT NULL exceptions where the column was not provided by the dataset.

5816 10/26/2012 05:38 PM Aaron Marcuse-Kubitza

sql_gen.py: If no cols had srcs, return [] instead of the [()] that itertools.product() would have returned

5815 10/26/2012 05:38 PM Aaron Marcuse-Kubitza

sql_io.py: track_data_error(): Support errors with no columns by inserting a single entry with column set to NULL

5814 10/26/2012 05:35 PM Aaron Marcuse-Kubitza

strings.py: Added join()

5813 10/26/2012 05:00 PM Aaron Marcuse-Kubitza

sql_io.py: mk_errors_table(): Made "column" column nullable, because some errors (such as check constraint violations) don't have any corresponding columns if its columns weren't provided in the input data

5812 10/26/2012 04:35 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: `make inputs/.TNRS/reinstall`: Use new $schema_only option so that an empty TNRS schema is installed rather than one containing inputs/.TNRS/data.sql

5811 10/26/2012 04:34 PM Aaron Marcuse-Kubitza

inputs/.TNRS/: Added data.sql containing the test_taxonomic_names TNRS results, so that a new installation of VegBIEN will contain the necessary data to make the tests pass, including the TNRS import test

5810 10/26/2012 04:32 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: If $schema_only option is set, only install .sql files ending in schema.sql

5809 10/26/2012 04:24 PM Aaron Marcuse-Kubitza

inputs/Makefile: $(rsyncLogs): Use $(rsync) instead of $(rsync*) now that it supports excluding just temp files and .svn rather than all .*

5808 10/26/2012 04:21 PM Aaron Marcuse-Kubitza

lib/common.Makefile: rsync: $(rsync): Exclude .svn, #, and .DS_Store rather than all . because dirs beginning with . created by the user (such as .NCBI, .TNRS) should be included in the sync

5807 10/26/2012 04:18 PM Aaron Marcuse-Kubitza

Added inputs/REMIB/Specimen.src/.map.csv.last_cleanup

5806 10/26/2012 04:10 PM Aaron Marcuse-Kubitza

Added inputs/bien_web/observation/+header.csv

5805 10/26/2012 04:09 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(dbExports): When putting schemas first, don't require a . before "schema" to allow the entire filename to be schema.sql

5804 10/26/2012 03:44 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes

5803 10/26/2012 03:42 PM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/map.csv: Added _nullIf filter to remove "Unknown" values for Accepted_name_family

5802 10/26/2012 03:35 PM Aaron Marcuse-Kubitza

README.TXT: Generate the local TNRS cache from the test_taxonomic_names rather than syncing it with the vegbiendev TNRS cache, so that the automated test's inserted row count stays the same regardless of the contents of the full-DB TNRS cache

5801 10/26/2012 03:34 PM Aaron Marcuse-Kubitza

README.TXT: Backups: Added TNRS cache section

5800 10/26/2012 03:12 PM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/test.xml.ref: Accepted inserted row count using TNRS cache created from test_taxonomic_names. Using a standard set of names for the test ensures that the inserted row count will not change when the full-DB TNRS cache changes.

5799 10/26/2012 02:48 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_accepted_names: Prepend the Accepted_name_family to the taxonomic name that will be submitted back to TNRS for parsing, because TNRS input names now always include the family when it's provided

5798 10/26/2012 02:46 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_accepted_names: Use simpler array_to_string() instead of || and COALESCE to put together the taxonomic name that will be submitted back to TNRS for parsing. Note that this requires defining an IMMUTABLE wrapper function for array_to_string(), because pg_catalog.array_to_string() is declared STABLE but indexes require functions to be IMMUTABLE (http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg156323.html).

5797 10/26/2012 02:42 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Don't hardcode the schema name

5796 10/26/2012 02:40 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: sql/install: Provide the datasource's schema to the script in :schema, so it can refer to its own elements explicitly when it's not possible to rely on the search_path. This is the case for functions that have the same signature as (and are intended to replace) a pg_catalog function, because the pg_catalog function will be used in preference to the datasource function regardless of the search_path.

5795 10/26/2012 02:06 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(cleanup): If a cleanup.sql is provided, only run it and don't do default cleanup, to allow tables to override rather than just add to default cleanup operations. This prevents the automatic replacement of certain strings (sql_io.null_strs) with NULL on TNRS, and keeps the TNRS cache mostly as it was output by the TNRS service. Note that empty strings are still replaced with NULL by COPY FROM in sql_io.append_csv(). This is necessary for TNRS import to work properly, because although '' generally means NULL, it is not treated that way by PostgreSQL.

5794 10/26/2012 01:53 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup

5793 10/26/2012 01:52 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup

5792 10/25/2012 05:16 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: If first param's type is anyelement (for polymorphic function, which had mismatched arg types), use type text, as all types can cast to it

5791 10/25/2012 05:12 PM Aaron Marcuse-Kubitza

sql_io.py: cast(): Set the created function's value param type to anyelement to support any input type, not just text

5790 10/25/2012 04:53 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Only prepend the family to the concatenated scientificName for TNRS if it ends in -aceae (using _taxon_family_require_std()), to avoid sending unsupported, nonstandard families to TNRS which it will place in Unmatched_terms

5789 10/25/2012 04:48 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _taxon_family_require_std()

5788 10/25/2012 04:09 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Prepend the family to the concatenated scientificName input to TNRS, so that TNRS can use it to disambiguate the genus

5787 10/25/2012 03:50 PM Aaron Marcuse-Kubitza

tnrs_db: Making TNRS request: Fixed bug where needed to remove else block now that there is no except block

5786 10/25/2012 03:45 PM Aaron Marcuse-Kubitza

tnrs.py: retrieval_request_template: Turn on taxonomic_constraint (to match family before genus) and source_sorting (to always return any result from the first source before returning results from any other sources, regardless of match %)

5785 10/25/2012 12:50 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

5784 10/25/2012 12:46 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: speciesBinomial: Changed definition to genus+specificEpithet, not genus+species, to match the scientific meaning of specificEpithet vs. species

5783 10/25/2012 12:45 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonverbatim: Renamed species to specific_epithet to avoid confusion with the scientific meaning of species (genus+specificEpithet), since this field contains just the specific epithet

5782 10/25/2012 12:30 PM Aaron Marcuse-Kubitza

input.Makefile: Verification of import: verify: Use tables from the verify/*.ref files themselves rather than from the datasource's subdirs, in order to match the tables in mappings/verify.*.sql

5781 10/25/2012 11:52 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Added stemobservation.tag, stemobservation.height_m for use in plot change over time analysis <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis>

5780 10/25/2012 11:45 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed typo in scientificNameWithMorphospecies

5779 10/25/2012 11:41 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Renamed columns to VegCore names (https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/mappings/VegCore.csv)

5778 10/25/2012 11:36 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added cultivatedBasis

5777 10/25/2012 11:34 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added scientificNameWithMorphospecies

5776 10/25/2012 10:55 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added speciesBinomial

5775 10/25/2012 10:49 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Generate species by concatenating genus and specific epithet, since according to Brad this field is actually the binomial, not the specificEpithet

5774 10/25/2012 10:41 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed no longer used plot_change_over_time view. Use one of the queries at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis> instead.

5773 10/25/2012 10:36 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: location: Populate sourceaccessioncode with locationID + subplot when subplot is unique only within the parent plot, so that location always has a sourceaccessioncode to use as the plotCode in analytical_db_view

5772 10/25/2012 10:07 AM Aaron Marcuse-Kubitza

lib/PostgreSQL-MySQL.csv: Remove views because they can contain arbitrary expressions, whose syntax may not be compatible with MySQL

5771 10/25/2012 10:04 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Use location.sourceaccessioncode as plotCode instead of authorlocationcode because authorlocationcode isn't globally unique (for subplots, it's only unique within the parent plot)

5770 10/25/2012 09:48 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: plantobservation: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)

5769 10/25/2012 09:47 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: aggregateoccurrence: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)

5768 10/25/2012 09:42 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: aggregateoccurrence: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)

5767 10/25/2012 09:31 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonoccurrence: Added taxonoccurrence_required_key check constraint to ensure that all taxonoccurrences are properly identified, and empty taxonoccurrences are properly pruned. This fixes a bug where taxon-only and stem-only data did not properly prune the taxonoccurrence that would otherwise get created because it's included in the mappings.

5766 10/25/2012 07:51 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): insert_into_pkeys(): Use new sql.add_pkey_or_index() instead of sql.add_pkey() in order to just print a warning if for some reason there were duplicate entries for an input row in the iteration's pkeys table. This should provide a workaround for bugs (often in the schema itself, related to its unique indexes) that cause an input row to match multiple output rows when joining on the output table using the unique constraint's columns.

5765 10/25/2012 07:44 AM Aaron Marcuse-Kubitza

sql.py: Added add_pkey_or_index()

5764 10/25/2012 07:32 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): Parse "could not create unique index ... Key is duplicated" errors as DuplicateKeyException

5763 10/25/2012 07:27 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): DuplicateKeyException: Factored out creation of DuplicateKeyException into helper function

5762 10/25/2012 07:20 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

5761 10/24/2012 06:31 PM Aaron Marcuse-Kubitza

tnrs_db: Removed tnrs.InvalidResponse exception handler that retries the query because the current query does not track which names have been submitted to but not processed by TNRS, so the error would continue to happen repeatedly

5760 10/24/2012 06:13 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: location: Added index on parent_id to speed up plot change over time joins

5759 10/24/2012 05:45 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: location: Added index on creator_id to speed up analytical_db_view joins

5758 10/24/2012 05:15 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: stemobservation: Added index on plantobservation_id to speed up analytical_db_view joins

5757 10/23/2012 01:08 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added initial plot_change_over_time view

5756 10/23/2012 12:53 PM Aaron Marcuse-Kubitza

Added inputs/bien_web/

5755 10/23/2012 12:43 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Reordered taxonoccurrence.growthform to put if after the bien_web.observation fields

5754 10/23/2012 12:32 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Include taxonoccurrence.growthform

5753 10/23/2012 12:27 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Generate taxonMorphospecies by concatenating the scientificName to the morphospecies

5752 10/23/2012 12:23 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take taxonomic name components from the accepted taxonlabel's taxonverbatim instead of the datasource's taxonverbatim, which does not contain the accepted name

5751 10/23/2012 12:19 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: identifiedBy: Added NULLIF to keep empty strings out of the analytical DB

5750 10/23/2012 12:13 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take taxonomic name components from the accepted taxonlabel's taxonverbatim instead of the datasource's taxonverbatim, which does not contain the accepted name

5749 10/23/2012 12:10 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take morphospecies from the parsed taxonlabel's taxonverbatim, where it has been parsed out, instead of the datasource's taxonverbatim, which has it as part of the verbatim input name

5748 10/23/2012 11:58 AM Aaron Marcuse-Kubitza

analytical_db_view: Added stemobservation.xposition_m, yposition_m

5747 10/23/2012 11:46 AM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/map.csv: Added new Time_submitted field

5746 10/23/2012 11:45 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/header.csv: Regenerated for new staging tables format

5745 10/23/2012 11:41 AM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/test.xml.ref: Accepted correct inserted row count, which most likely became detached from the primary row count when the TNRS cache was cleared and repopulated with test data

5744 10/23/2012 11:28 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Reordered joins in path order, putting datasource before location. This will enable more naturally reusing the SELECT query for other analyses.

5743 10/23/2012 11:15 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Do not include rank in the mapping because taxonomicname is globally unique, and thus it isn't used in looking up the NCBI taxonlabel

5742 10/23/2012 11:05 AM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes

5741 10/23/2012 10:49 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Also attach TNRS genus to NCBI backbone. This causes attachment to be made with as many of family and genus as are provided and have an entry in NCBI.

5740 10/23/2012 10:45 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: family -> NCBI backbone: Removed extra path after _if statement's cond/_exists

5739 10/23/2012 10:39 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Instead of connecting the acceptedFamily to the NCBI backbone, connect the family for the TNRS matched taxonlabel. This connects more families and also connects the same set of fields as will be connected for the genus.

5738 10/23/2012 10:01 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Fixed bug where needed to attach accepted family to NCBI using taxonomicname, which is globally unique, rather than taxonepithet, which is only unique within the parent taxon

5737 10/23/2012 09:34 AM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/: Added Time_submitted column at beginning and populate it in tnrs_db with the time the batch TNRS request was submitted

5736 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: RowNumFilter: Use new ColInsertFilter

5735 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: Added ColInsertFilter

5734 10/23/2012 08:43 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed no longer used _is_higher_taxon(). Use _has_taxonomic_name() or _taxonomic_name_is_epithet() instead.

5733 10/23/2012 08:42 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonepithet: Use new _taxonomic_name_is_epithet() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5732 10/23/2012 08:36 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _taxonomic_name_is_epithet()

5731 10/23/2012 08:33 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5730 10/23/2012 08:30 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5729 10/23/2012 08:25 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _has_taxonomic_name() for lower taxon ranks that typically have a globally unique taxonomic name

5728 10/23/2012 08:10 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Removed unit conversion functions that take a text input, since casts to the parameter type (double precision) are now automatically performed by sql_io.put_table(), using sql.parse_exception()'s function MissingCastException parsing

5727 10/23/2012 08:01 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: _is_higher_taxon() calls: Default to true if the rank can't be parsed to a taxonrank enum value

5726 10/23/2012 07:56 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Moved definition of wrapper function inside try block of main loop because the creation of the empty pkeys table (whose row type is needed for the wrapper function) can itself produce MissingCastExceptions, which must be thrown inside the loop in order to be handled properly

5725 10/23/2012 07:05 AM Aaron Marcuse-Kubitza

db_xml.py: put(): Indicate no parent_ids_loc using no_parent_ids_loc sentinel instead of None to support parent_ids_locs that are equal to None (e.g. if the parent node had an error). Always forward parent_ids_loc to children with fkeys to parent, even on error, because the parent table may not be required for the child tables to be valid, such as for taxonomic-data-only datasets that nevertheless have nodes for the non-taxonomic tables in their mappings.

5724 10/23/2012 06:38 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text