mappings/VegCore-VegBIEN.csv: family -> NCBI backbone: Removed extra path after _if statement's cond/_exists
mappings/VegCore-VegBIEN.csv: Instead of connecting the acceptedFamily to the NCBI backbone, connect the family for the TNRS matched taxonlabel. This connects more families and also connects the same set of fields as will be connected for the genus.
mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Fixed bug where needed to attach accepted family to NCBI using taxonomicname, which is globally unique, rather than taxonepithet, which is only unique within the parent taxon
inputs/.TNRS/tnrs/: Added Time_submitted column at beginning and populate it in tnrs_db with the time the batch TNRS request was submitted
csvs.py: RowNumFilter: Use new ColInsertFilter
csvs.py: Added ColInsertFilter
schemas/vegbien.sql: Removed no longer used _is_higher_taxon(). Use _has_taxonomic_name() or _taxonomic_name_is_epithet() instead.
mappings/VegCore-VegBIEN.csv: taxonName->taxonepithet: Use new _taxonomic_name_is_epithet() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field
schemas/vegbien.sql: Added _taxonomic_name_is_epithet()
mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field
schemas/vegbien.sql: Added _has_taxonomic_name() for lower taxon ranks that typically have a globally unique taxonomic name
schemas/functions.sql: Removed unit conversion functions that take a text input, since casts to the parameter type (double precision) are now automatically performed by sql_io.put_table(), using sql.parse_exception()'s function MissingCastException parsing
mappings/VegCore-VegBIEN.csv: _is_higher_taxon() calls: Default to true if the rank can't be parsed to a taxonrank enum value
sql_io.py: put_table(): is_function: Moved definition of wrapper function inside try block of main loop because the creation of the empty pkeys table (whose row type is needed for the wrapper function) can itself produce MissingCastExceptions, which must be thrown inside the loop in order to be handled properly
db_xml.py: put(): Indicate no parent_ids_loc using no_parent_ids_loc sentinel instead of None to support parent_ids_locs that are equal to None (e.g. if the parent node had an error). Always forward parent_ids_loc to children with fkeys to parent, even on error, because the parent table may not be required for the child tables to be valid, such as for taxonomic-data-only datasets that nevertheless have nodes for the non-taxonomic tables in their mappings.
sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text
sql.py: parse_exception(): InvalidValueException: Fixed bug in regexp where can't use .*? before (?:...)? surrounding matched value, because it prevents the value from being matched now that it is optional
inputs/.NCBI/nodes/header.csv: Updated for new staging table format, which includes a row_num column in each joined table
inputs/.NCBI/nodes/create.sql: Updated for new src table names
xml_func.py: process(): Pass on_error through to sql_io.put(). This fixes a bug in row-based import where DB errors in the xml_func.process() phase would abort the entire import instead of being tracked and having the return value set to None.
sql_io.py: put(): Pass on_error through to put_table()
sql_io.py: put_table(): log_exc(): Return False if removing all rows and have callers break the main loop so that no further exception-handling code is processed before the main loop is exited
sql.py: parse_exception(): InvalidValueException: Also match exceptions which don't provide a specific value but just indicate that a value was invalid, such as PL/Python's "day is out of range for month"
db_xml.py: put(): Inserting children with fkeys to parent: Don't do this if this node had an error and sql_io.put_table() returned None as the generated pkey. This fixes a bug where a node with an error will still try to create children with fkeys to parent, but pass None as the fkey to parent, which the recursive put() call will then incorrectly treat as there being no field with an fkey to parent at all rather than a field whose value is NULL. This causes function overload resolution to be unable to find the intended function, because it is missing a parameter.
sql.py: parse_exception(): function MissingCastException: Return the actual type of the function's 1st param, using new function_param0_type(), rather than just text
sql.py: parse_exception(): function MissingCastException: Fixed bug where can't return the function name as the name of what was missing the cast, because this must be a column
sql.py: Added function_param0_type()
sql.py: parse_exception(): function MissingCastException: Only treat DoesNotExistException as a MissingCastException if the query that was run did not already include a cast, to avoid infinite exception-handling recursion
sql.py: parse_exception(): function MissingCastException: Fixed bug where determining whether the exception is a MissingCastException rather than a DoesNotExistException needs to check whether the function exists rather than whether it's the same in the exception message as in the query that was run. The exception message will of course copy the function name verbatim from the query, so there is no information in the exception message itself to indicate whether the DoesNotExistException was caused by a missing cast or by a nonexistent function.
sql.py: parse_exception(): function MissingCastException: Documented that the regexp match to extract the function name also checks that a function signature with param types was matched, indicating a function call rather than cast to regproc. This check will also help avoid infinite recursion when function MissingCastException parsing calls database structure introspection functions.
sql.py: parse_exception(): function MissingCastException: Don't match quotes around the function name because this particular exception (incorrect param type) does not include them. Casts to regproc, which also produce a DoesNotExistException, include the quotes but do not indicate a MissingCastException.
sql.py: parse_exception(): function MissingCastException: Fixed bug where the 1st param's type in the exception's function signature is not actually the type the argument needs to have, because this is just the argument's current type
sql.py: parse_exception(): typed_name_re: Also match identifiers without quotes, such as functions in "No function matches the given name and argument types" errors. This fixes a bug where DoesNotExistExceptions could not be parsed as MissingCastExceptions when applicable because the DoesNotExistException pattern would not even match.
inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa
schemas/vegbien.sql: taxonlabel: taxonlabel_2_set_canon_label_id(): Only run if matched_label_id has actually changed, to avoid infinite recursion when updating canon_label_id on labels that resolve to this label when there are cycles in the data
inputs/.NCBI/: Renamed names, nodes to *.src so they wouldn't get an automatic row_num column and can be used in higher_taxa's join
inputs/NCU-NCSC/Specimen/+header.csv: Fixed bug where needed ! at beginning to indicate a header override file, which prevents the following row from being treated as data
units.py: MissingUnitsException: Fixed bug where quantity is a Quantity object, not a string, and thus needs to be converted to a string using strings.ustr()
inputs/FIA/Organism/test.xml.ref: Accepted new test output now that FIA table is sorted in the order of the original CSV after staging table reinstallation
inputs/VegBank/taxonobservation_/create.sql: Removed dropping of row_num column, which is no longer added on non-CSV tables
input.Makefile: Staging tables installation: %/install: Moved "table-scope src table's row_num col" comment outside of define block so it wouldn't be echoed to stdout even when the table is not a src table
Added inputs/NCU-NCSC/Specimen/+header.csv header override to remove empty, unnamed column at end
inputs/*/*/header.csv: Regenerated for new staging tables format (which now includes a row_num column on every CSV table), as part of reinstalling staging tables
inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where DROP VIEW statements needed IF EXISTS because CASCADEs on previous DROP VIEWs may have already dropped the view in question
input.Makefile: Staging tables installation: %/install: Fixed bug where a .src table's row_num column needed to have the table name prefixed (making it globally unique) to allow joining the table with other tables
input.Makefile: Staging tables installation: sql/install: Fixed bug where $(logInstall) needed to be called with arguments, so that either > or >> would be used before the install log's filename
tnrs.py: submission_request_template: Use just Tropicos as the name source, as Brad says "GCC is for only one family (Asteraceae)" and USDA's "taxonomy is of lower quality and sometimes conflicts with Tropicos"
sql.py: parse_exception(): function MissingCastException: Support functions with named parameters
sql.py: parse_exception(): function MissingCastException: Support function names enclosed in quotes on the context line
mappings/VegCore-VegBIEN.csv: taxonName: Place it in taxonomicname instead of taxonepithet for lower taxa, because the only datasource that currently provides this field (NCBI) actually provides the full taxonomicname instead of the epithet at the current rank for lower taxa. (taxonomicname is not applicable to higher taxa because their names are not guaranteed to be globally unique.) taxonName may need to be renamed and/or redefined to account for this ambiguity in NCBI's usage.
mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.
schemas/vegbien.sql: Added _is_higher_taxon()
README.TXT: Documentation: To import and scrub just the test taxonomic names: Added `make inputs/.TNRS/cleanup` after `make backups/TNRS.backup/restore` because the PostgreSQL collation may differ between vegbiendev's and the user's DB
sql.py: parse_exception(): DoesNotExistException: If item not found was a function and not found only because of a missing cast, raise MissingCastException instead. This should allow automatic casts to be added on function parameters as well as table columns.
schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join to taxonverbatim on taxonverbatim_id (the pkey) instead of taxonlabel_id, which used to be the pkey but is now an fkey
inputs/test_taxonomic_names/test_scrub: Remove any previous version of public.test_taxonomic_names before renaming public to it
inputs/test_taxonomic_names/test_scrub: Fixed bug where public.sql export did not include the "CREATE SCHEMA public" statement, because pg_dump doesn't add it to backups, by using new schemas/rename/% make target to first rename the public schema and then exporting it
root Makefile: VegBIEN DB: Schemas: schemas/rotate: Use new schemas/rename/%
root Makefile: VegBIEN DB: Schemas: Added schemas/rename/% to rename the public schema
mappings/VegCore-VegBIEN.csv: Removed filter preventing taxonomicStatus from being placed in taxonlabel if a morphospecies was provided, because the morphospecies actually never goes in the matched taxonlabel, only the verbatim taxonlabel
mappings/VegCore-VegBIEN.csv: morphospecies: Also place it in the verbatim (input name's) taxonlabel. Note that it does not go in the matched name's taxonlabel, because that contains only fields from the matched name. The verbatim taxonlabel is thus a synonym of the matched taxonlabel where there is no morphospecies, or a child of it if there is a morphospecies.
mappings/VegCore-VegBIEN.csv: Do not place taxonomicStatus in taxonlabel if a morphospecies was provided, to prevent it from being incorrectly marked as accepted
mappings/VegCore-VegBIEN.csv: morphospecies -> taxonverbatim.morphospecies: Fixed bug where needed suffix with _if statement then clause
inputs/test_taxonomic_names/_scrub/public.sql, TNRS.sql: Regenerated with schema changes
pg_dump_vegbien: Added opts env var to allow specifying options to a Makefile command, which does not take positional arguments
README.TXT: Schema changes: files to update with any renamings: Removed tnrs_db because that is now abstracted from the schema through the tnrs_input_name view. Note that PostgreSQL will automatically update tnrs_input_name with any table or column renames, which is the significant advantage of using a view rather than a hardcoded query.
schemas/vegbien.sql: tnrs_input_name: Use DISTINCT instead of DISTINCT ON because there is only one column
tnrs_db: Use new tnrs_input_name view to avoid hardcoding changing schema information
inputs/test_taxonomic_names/test_scrub, README.TXT: Documented that `make schemas/public/reinstall` must come after TNRS restore to recreate the tnrs_input_name view, which has a dependency on the TNRS schema
schemas/vegbien.sql: Added tnrs_input_name view for use by tnrs_db
schemas/vegbien.sql: taxonlabel, taxonverbatim: Updated comments for new taxonlabel/taxonverbatim split
schemas/vegbien.sql: taxonlabel_update_ancestors(): Use aliased types (http://www.postgresql.org/docs/8.3/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-TYPE) where possible
schemas/vegbien.sql: taxonlabel_update_ancestors(): Adding new parent's ancestors: Change unique_violations to warnings so they don't abort the import. unique_violations should never happen unless there are cycles of two or mode nodes, but they seem to be happening nevertheless, so this will provide a workaround to that problem.
inputs/import.stats.xls: Updated import times
Regenerated vegbien.ERD exports
tnrs_db: Updated with schema changes
schemas/vegbien.sql: taxonverbatim: Removed subclass relationship to taxonlabel in order to allow multiple taxonverbatims to point to the same taxonlabel. This involves adding a taxonverbatim_id serial column and pointing all fkeys to taxonverbatim to that column.
schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join on taxonverbatim before joining on taxonlabel, now that taxondetermination is linked directly to taxonverbatim. Interestingly, PostgreSQL did not flag this error when the schema was changed, but only when the schema was reloaded from the DDL.
schemas/vegbien.ERD.mwb: Moved taxonlabel to the right of taxonverbatim to make room for taxonverbatim to expand
schemas/vegbien.sql: Link taxondetermination to taxonverbatim (which is a subclass of taxonlabel) instead of directly to taxonlabel. This will enable later having multiple taxonverbatims for one taxonlabel.
schemas/vegbien.sql: taxonlabel: Renamed identifyingtaxonomicname to taxonomicname because the taxonomicname provided by the datasource is now in taxonverbatim, so there is no name collision. Note that both of these fields store the same type of information, but taxonlabel's is autogenerated while taxonverbatim's is verbatim (and is only set if provided by the datasource).
schemas/vegbien.sql: taxonlabel: Moved non-scoping fields to new taxonverbatim subclass table, which contains the component parts of the taxonlabel
schemas/vegbien.sql: taxonlabel: Renamed taxonlabel_2_propagate_canon_label_id() to taxonlabel_2_set_canon_label_id() for clarity
schemas/vegbien.sql: taxonlabel_2_propagate_canon_label_id(): If no matched taxonlabel, make self-reference. This fixes a bug in analytical_db_view where rows without a canon_label_id were excluded because they did not have a corresponding canonical taxonlabel.
schemas/vegbien.sql: taxonlabel_unique unique index: Removed binomial, author, taxonomicname, and morphospecies because these are now part of the identifyingtaxonomicname, which is also in the unique index
schemas/vegbien.sql: taxonlabel: Require either an identifyingtaxonomicname or a taxonepithet. The NCBI inserted row count decreases by one because this prunes off a taxonlabel created for a parent node which was not contained in the first two rows (remember that NCBI taxa are not in dependency order, so parents are often imported after children).
mappings/VegCore-VegBIEN.csv: Also generate the identifyingtaxonomicname for the original* taxondetermination's taxonlabel
schemas/vegbien.sql: taxonlabel: Renamed taxonomicnamewithauthor to taxonomicname because it is equivalent to Darwin Core's scientificName
mappings/VegCore-VegBIEN.csv: Also include morphospecies in the identifyingtaxonomicname, except for the matched TNRS taxonlabel, which should not contain morphospecies information
mappings/VegCore-VegBIEN.csv: Mapped acceptedScientificName
mappings/VegCore-VegBIEN.csv: Also create the identifyingtaxonomicname on the verbatim taxonlabel supplied by the datasource, in addition to on the TNRS input taxonlabel that the verbatim taxonlabel is matched up with
mappings/VegCore-VegBIEN.csv: Expanded brace expressions for putting together the identifyingtaxonomicname
mappings/VegCore-VegBIEN.csv: Always generate the concatenated identifyingtaxonomicname, even for higher taxa, to ensure that this field is always populated. Note that this will cause names of higher taxa to be scrubbed by TNRS, but this is usually not a problem because such names either have no match or not a close enough match based on the name only. Naming conventions generally cause names at different ranks to be different, so that collisions with lower ranks should not be a problem.
tnrs_db: Fixed bug where needed to remove internal identifyingtaxonomicname duplicates as well as duplicates with existing Name_submitted values, to avoid violating the TNRS.tnrs pkey constraint when the scrubbed names are later inserted. Note that the taxonlabel_0_unique_identifying_name unique index is not sufficient to prevent internal duplicates, because it includes the creator_id (and thus allows multiple instances of the same name defined by different creators).
sql.py: mk_select(): Don't add table0 to order_by with no table, because this could cause it not to match a corresponding DISTINCT ON column with no explicit table. PostgreSQL apparently does not treat a column with no explicit table and a column with the applicable table as identical for purposes of ORDER BY/DISTINCT ON checking, even when they refer to the same physical column.
sql.py: mk_select(): order_by defaults to first distinct_on column when distinct_on provided