Project

General

Profile

Statistics
| Revision:

# Date Author Comment
5737 10/23/2012 09:34 AM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/: Added Time_submitted column at beginning and populate it in tnrs_db with the time the batch TNRS request was submitted

5736 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: RowNumFilter: Use new ColInsertFilter

5735 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: Added ColInsertFilter

5734 10/23/2012 08:43 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed no longer used _is_higher_taxon(). Use _has_taxonomic_name() or _taxonomic_name_is_epithet() instead.

5733 10/23/2012 08:42 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonepithet: Use new _taxonomic_name_is_epithet() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5732 10/23/2012 08:36 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _taxonomic_name_is_epithet()

5731 10/23/2012 08:33 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5730 10/23/2012 08:30 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5729 10/23/2012 08:25 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _has_taxonomic_name() for lower taxon ranks that typically have a globally unique taxonomic name

5728 10/23/2012 08:10 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Removed unit conversion functions that take a text input, since casts to the parameter type (double precision) are now automatically performed by sql_io.put_table(), using sql.parse_exception()'s function MissingCastException parsing

5727 10/23/2012 08:01 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: _is_higher_taxon() calls: Default to true if the rank can't be parsed to a taxonrank enum value

5726 10/23/2012 07:56 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Moved definition of wrapper function inside try block of main loop because the creation of the empty pkeys table (whose row type is needed for the wrapper function) can itself produce MissingCastExceptions, which must be thrown inside the loop in order to be handled properly

5725 10/23/2012 07:05 AM Aaron Marcuse-Kubitza

db_xml.py: put(): Indicate no parent_ids_loc using no_parent_ids_loc sentinel instead of None to support parent_ids_locs that are equal to None (e.g. if the parent node had an error). Always forward parent_ids_loc to children with fkeys to parent, even on error, because the parent table may not be required for the child tables to be valid, such as for taxonomic-data-only datasets that nevertheless have nodes for the non-taxonomic tables in their mappings.

5724 10/23/2012 06:38 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text

5723 10/23/2012 05:59 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): InvalidValueException: Fixed bug in regexp where can't use .*? before (?:...)? surrounding matched value, because it prevents the value from being matched now that it is optional

5722 10/23/2012 05:52 AM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/header.csv: Updated for new staging table format, which includes a row_num column in each joined table

5721 10/23/2012 05:51 AM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/create.sql: Updated for new src table names

5720 10/23/2012 05:36 AM Aaron Marcuse-Kubitza

xml_func.py: process(): Pass on_error through to sql_io.put(). This fixes a bug in row-based import where DB errors in the xml_func.process() phase would abort the entire import instead of being tracked and having the return value set to None.

5719 10/23/2012 05:33 AM Aaron Marcuse-Kubitza

sql_io.py: put(): Pass on_error through to put_table()

5718 10/23/2012 05:19 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): log_exc(): Return False if removing all rows and have callers break the main loop so that no further exception-handling code is processed before the main loop is exited

5717 10/23/2012 05:17 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): InvalidValueException: Also match exceptions which don't provide a specific value but just indicate that a value was invalid, such as PL/Python's "day is out of range for month"

5716 10/23/2012 04:39 AM Aaron Marcuse-Kubitza

db_xml.py: put(): Inserting children with fkeys to parent: Don't do this if this node had an error and sql_io.put_table() returned None as the generated pkey. This fixes a bug where a node with an error will still try to create children with fkeys to parent, but pass None as the fkey to parent, which the recursive put() call will then incorrectly treat as there being no field with an fkey to parent at all rather than a field whose value is NULL. This causes function overload resolution to be unable to find the intended function, because it is missing a parameter.

5715 10/23/2012 04:34 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Return the actual type of the function's 1st param, using new function_param0_type(), rather than just text

5714 10/23/2012 04:31 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where can't return the function name as the name of what was missing the cast, because this must be a column

5713 10/23/2012 04:28 AM Aaron Marcuse-Kubitza

sql.py: Added function_param0_type()

5712 10/23/2012 04:26 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Only treat DoesNotExistException as a MissingCastException if the query that was run did not already include a cast, to avoid infinite exception-handling recursion

5711 10/23/2012 04:24 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where determining whether the exception is a MissingCastException rather than a DoesNotExistException needs to check whether the function exists rather than whether it's the same in the exception message as in the query that was run. The exception message will of course copy the function name verbatim from the query, so there is no information in the exception message itself to indicate whether the DoesNotExistException was caused by a missing cast or by a nonexistent function.

5710 10/23/2012 04:19 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Documented that the regexp match to extract the function name also checks that a function signature with param types was matched, indicating a function call rather than cast to regproc. This check will also help avoid infinite recursion when function MissingCastException parsing calls database structure introspection functions.

5709 10/23/2012 04:15 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Don't match quotes around the function name because this particular exception (incorrect param type) does not include them. Casts to regproc, which also produce a DoesNotExistException, include the quotes but do not indicate a MissingCastException.

5708 10/23/2012 04:12 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where the 1st param's type in the exception's function signature is not actually the type the argument needs to have, because this is just the argument's current type

5707 10/23/2012 04:04 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): typed_name_re: Also match identifiers without quotes, such as functions in "No function matches the given name and argument types" errors. This fixes a bug where DoesNotExistExceptions could not be parsed as MissingCastExceptions when applicable because the DoesNotExistException pattern would not even match.

5706 10/23/2012 03:57 AM Aaron Marcuse-Kubitza

inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa

5705 10/23/2012 01:24 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: taxonlabel_2_set_canon_label_id(): Only run if matched_label_id has actually changed, to avoid infinite recursion when updating canon_label_id on labels that resolve to this label when there are cycles in the data

5704 10/23/2012 01:21 AM Aaron Marcuse-Kubitza

inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa

5703 10/23/2012 12:57 AM Aaron Marcuse-Kubitza

inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa

5702 10/23/2012 12:49 AM Aaron Marcuse-Kubitza

inputs/.NCBI/: Renamed names, nodes to *.src so they wouldn't get an automatic row_num column and can be used in higher_taxa's join

5701 10/23/2012 12:38 AM Aaron Marcuse-Kubitza

inputs/NCU-NCSC/Specimen/+header.csv: Fixed bug where needed ! at beginning to indicate a header override file, which prevents the following row from being treated as data

5700 10/23/2012 12:36 AM Aaron Marcuse-Kubitza

units.py: MissingUnitsException: Fixed bug where quantity is a Quantity object, not a string, and thus needs to be converted to a string using strings.ustr()

5699 10/23/2012 12:25 AM Aaron Marcuse-Kubitza

inputs/FIA/Organism/test.xml.ref: Accepted new test output now that FIA table is sorted in the order of the original CSV after staging table reinstallation

5698 10/23/2012 12:24 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/create.sql: Removed dropping of row_num column, which is no longer added on non-CSV tables

5697 10/23/2012 12:22 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: %/install: Moved "table-scope src table's row_num col" comment outside of define block so it wouldn't be echoed to stdout even when the table is not a src table

5696 10/23/2012 12:17 AM Aaron Marcuse-Kubitza

Added inputs/NCU-NCSC/Specimen/+header.csv header override to remove empty, unnamed column at end

5695 10/23/2012 12:05 AM Aaron Marcuse-Kubitza

inputs/*/*/header.csv: Regenerated for new staging tables format (which now includes a row_num column on every CSV table), as part of reinstalling staging tables

5694 10/22/2012 11:59 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where DROP VIEW statements needed IF EXISTS because CASCADEs on previous DROP VIEWs may have already dropped the view in question

5693 10/22/2012 11:57 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: %/install: Fixed bug where a .src table's row_num column needed to have the table name prefixed (making it globally unique) to allow joining the table with other tables

5692 10/22/2012 11:31 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: sql/install: Fixed bug where $(logInstall) needed to be called with arguments, so that either > or >> would be used before the install log's filename

5691 10/22/2012 08:22 PM Aaron Marcuse-Kubitza

tnrs.py: submission_request_template: Use just Tropicos as the name source, as Brad says "GCC is for only one family (Asteraceae)" and USDA's "taxonomy is of lower quality and sometimes conflicts with Tropicos"

5690 10/19/2012 06:20 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Support functions with named parameters

5689 10/19/2012 06:18 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Support function names enclosed in quotes on the context line

5688 10/19/2012 06:15 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName: Place it in taxonomicname instead of taxonepithet for lower taxa, because the only datasource that currently provides this field (NCBI) actually provides the full taxonomicname instead of the epithet at the current rank for lower taxa. (taxonomicname is not applicable to higher taxa because their names are not guaranteed to be globally unique.) taxonName may need to be renamed and/or redefined to account for this ambiguity in NCBI's usage.

5687 10/19/2012 06:14 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.

5686 10/19/2012 06:04 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.

5685 10/19/2012 05:57 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _is_higher_taxon()

5684 10/19/2012 05:52 PM Aaron Marcuse-Kubitza

README.TXT: Documentation: To import and scrub just the test taxonomic names: Added `make inputs/.TNRS/cleanup` after `make backups/TNRS.backup/restore` because the PostgreSQL collation may differ between vegbiendev's and the user's DB

5683 10/19/2012 05:50 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): DoesNotExistException: If item not found was a function and not found only because of a missing cast, raise MissingCastException instead. This should allow automatic casts to be added on function parameters as well as table columns.

5682 10/19/2012 05:28 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join to taxonverbatim on taxonverbatim_id (the pkey) instead of taxonlabel_id, which used to be the pkey but is now an fkey

5681 10/19/2012 05:22 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: Remove any previous version of public.test_taxonomic_names before renaming public to it

5680 10/19/2012 05:19 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: Fixed bug where public.sql export did not include the "CREATE SCHEMA public" statement, because pg_dump doesn't add it to backups, by using new schemas/rename/% make target to first rename the public schema and then exporting it

5679 10/19/2012 05:12 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: schemas/rotate: Use new schemas/rename/%

5678 10/19/2012 05:12 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: Added schemas/rename/% to rename the public schema

5677 10/19/2012 04:54 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Removed filter preventing taxonomicStatus from being placed in taxonlabel if a morphospecies was provided, because the morphospecies actually never goes in the matched taxonlabel, only the verbatim taxonlabel

5676 10/19/2012 04:50 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: morphospecies: Also place it in the verbatim (input name's) taxonlabel. Note that it does not go in the matched name's taxonlabel, because that contains only fields from the matched name. The verbatim taxonlabel is thus a synonym of the matched taxonlabel where there is no morphospecies, or a child of it if there is a morphospecies.

5675 10/19/2012 04:36 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Do not place taxonomicStatus in taxonlabel if a morphospecies was provided, to prevent it from being incorrectly marked as accepted

5674 10/19/2012 04:25 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: morphospecies -> taxonverbatim.morphospecies: Fixed bug where needed suffix with _if statement then clause

5673 10/19/2012 04:23 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/_scrub/public.sql, TNRS.sql: Regenerated with schema changes

5672 10/19/2012 03:45 PM Aaron Marcuse-Kubitza

pg_dump_vegbien: Added opts env var to allow specifying options to a Makefile command, which does not take positional arguments

5671 10/19/2012 03:37 PM Aaron Marcuse-Kubitza

README.TXT: Schema changes: files to update with any renamings: Removed tnrs_db because that is now abstracted from the schema through the tnrs_input_name view. Note that PostgreSQL will automatically update tnrs_input_name with any table or column renames, which is the significant advantage of using a view rather than a hardcoded query.

5670 10/19/2012 03:35 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: tnrs_input_name: Use DISTINCT instead of DISTINCT ON because there is only one column

5669 10/19/2012 03:34 PM Aaron Marcuse-Kubitza

tnrs_db: Use new tnrs_input_name view to avoid hardcoding changing schema information

5668 10/19/2012 03:25 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub, README.TXT: Documented that `make schemas/public/reinstall` must come after TNRS restore to recreate the tnrs_input_name view, which has a dependency on the TNRS schema

5667 10/19/2012 03:23 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added tnrs_input_name view for use by tnrs_db

5666 10/19/2012 12:53 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel, taxonverbatim: Updated comments for new taxonlabel/taxonverbatim split

5665 10/19/2012 12:42 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_update_ancestors(): Use aliased types (http://www.postgresql.org/docs/8.3/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-TYPE) where possible

5664 10/19/2012 12:37 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_update_ancestors(): Adding new parent's ancestors: Change unique_violations to warnings so they don't abort the import. unique_violations should never happen unless there are cycles of two or mode nodes, but they seem to be happening nevertheless, so this will provide a workaround to that problem.

5663 10/19/2012 12:18 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

5662 10/18/2012 04:58 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

5661 10/18/2012 04:55 PM Aaron Marcuse-Kubitza

tnrs_db: Updated with schema changes

5660 10/18/2012 04:54 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonverbatim: Removed subclass relationship to taxonlabel in order to allow multiple taxonverbatims to point to the same taxonlabel. This involves adding a taxonverbatim_id serial column and pointing all fkeys to taxonverbatim to that column.

5659 10/18/2012 04:43 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join on taxonverbatim before joining on taxonlabel, now that taxondetermination is linked directly to taxonverbatim. Interestingly, PostgreSQL did not flag this error when the schema was changed, but only when the schema was reloaded from the DDL.

5658 10/18/2012 04:30 PM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Moved taxonlabel to the right of taxonverbatim to make room for taxonverbatim to expand

5657 10/18/2012 04:21 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Link taxondetermination to taxonverbatim (which is a subclass of taxonlabel) instead of directly to taxonlabel. This will enable later having multiple taxonverbatims for one taxonlabel.

5656 10/18/2012 04:04 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: Renamed identifyingtaxonomicname to taxonomicname because the taxonomicname provided by the datasource is now in taxonverbatim, so there is no name collision. Note that both of these fields store the same type of information, but taxonlabel's is autogenerated while taxonverbatim's is verbatim (and is only set if provided by the datasource).

5655 10/18/2012 03:57 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: Moved non-scoping fields to new taxonverbatim subclass table, which contains the component parts of the taxonlabel

5654 10/18/2012 03:06 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: Renamed taxonlabel_2_propagate_canon_label_id() to taxonlabel_2_set_canon_label_id() for clarity

5653 10/18/2012 03:04 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_2_propagate_canon_label_id(): If no matched taxonlabel, make self-reference. This fixes a bug in analytical_db_view where rows without a canon_label_id were excluded because they did not have a corresponding canonical taxonlabel.

5652 10/18/2012 02:53 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_unique unique index: Removed binomial, author, taxonomicname, and morphospecies because these are now part of the identifyingtaxonomicname, which is also in the unique index

5651 10/18/2012 02:44 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: Require either an identifyingtaxonomicname or a taxonepithet. The NCBI inserted row count decreases by one because this prunes off a taxonlabel created for a parent node which was not contained in the first two rows (remember that NCBI taxa are not in dependency order, so parents are often imported after children).

5650 10/18/2012 02:41 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Also generate the identifyingtaxonomicname for the original* taxondetermination's taxonlabel

5649 10/18/2012 02:31 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: Renamed taxonomicnamewithauthor to taxonomicname because it is equivalent to Darwin Core's scientificName

5648 10/18/2012 02:25 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Also include morphospecies in the identifyingtaxonomicname, except for the matched TNRS taxonlabel, which should not contain morphospecies information

5647 10/18/2012 02:14 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped acceptedScientificName

5646 10/18/2012 01:51 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Also create the identifyingtaxonomicname on the verbatim taxonlabel supplied by the datasource, in addition to on the TNRS input taxonlabel that the verbatim taxonlabel is matched up with

5645 10/18/2012 01:46 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Expanded brace expressions for putting together the identifyingtaxonomicname

5644 10/18/2012 01:21 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Always generate the concatenated identifyingtaxonomicname, even for higher taxa, to ensure that this field is always populated. Note that this will cause names of higher taxa to be scrubbed by TNRS, but this is usually not a problem because such names either have no match or not a close enough match based on the name only. Naming conventions generally cause names at different ranks to be different, so that collisions with lower ranks should not be a problem.

5643 10/18/2012 01:05 PM Aaron Marcuse-Kubitza

tnrs_db: Fixed bug where needed to remove internal identifyingtaxonomicname duplicates as well as duplicates with existing Name_submitted values, to avoid violating the TNRS.tnrs pkey constraint when the scrubbed names are later inserted. Note that the taxonlabel_0_unique_identifying_name unique index is not sufficient to prevent internal duplicates, because it includes the creator_id (and thus allows multiple instances of the same name defined by different creators).

5642 10/18/2012 01:01 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Don't add table0 to order_by with no table, because this could cause it not to match a corresponding DISTINCT ON column with no explicit table. PostgreSQL apparently does not treat a column with no explicit table and a column with the applicable table as identical for purposes of ORDER BY/DISTINCT ON checking, even when they refer to the same physical column.

5641 10/18/2012 12:53 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): order_by defaults to first distinct_on column when distinct_on provided

5640 10/18/2012 12:36 PM Aaron Marcuse-Kubitza

tnrs_db: Updated with schema changes

5639 10/18/2012 12:33 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: Renamed taxonomicnamewithauthor to taxonomicname because it is equivalent to Darwin Core's scientificName

5638 10/18/2012 12:25 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: Renamed taxonomicname to binomial because it excludes the author