Project

General

Profile

Statistics
| Revision:

# Date Author Comment
5781 10/25/2012 11:52 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Added stemobservation.tag, stemobservation.height_m for use in plot change over time analysis <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis>

5780 10/25/2012 11:45 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed typo in scientificNameWithMorphospecies

5779 10/25/2012 11:41 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Renamed columns to VegCore names (https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/mappings/VegCore.csv)

5778 10/25/2012 11:36 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added cultivatedBasis

5777 10/25/2012 11:34 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added scientificNameWithMorphospecies

5776 10/25/2012 10:55 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added speciesBinomial

5775 10/25/2012 10:49 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Generate species by concatenating genus and specific epithet, since according to Brad this field is actually the binomial, not the specificEpithet

5774 10/25/2012 10:41 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed no longer used plot_change_over_time view. Use one of the queries at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis> instead.

5773 10/25/2012 10:36 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: location: Populate sourceaccessioncode with locationID + subplot when subplot is unique only within the parent plot, so that location always has a sourceaccessioncode to use as the plotCode in analytical_db_view

5772 10/25/2012 10:07 AM Aaron Marcuse-Kubitza

lib/PostgreSQL-MySQL.csv: Remove views because they can contain arbitrary expressions, whose syntax may not be compatible with MySQL

5771 10/25/2012 10:04 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Use location.sourceaccessioncode as plotCode instead of authorlocationcode because authorlocationcode isn't globally unique (for subplots, it's only unique within the parent plot)

5770 10/25/2012 09:48 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: plantobservation: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)

5769 10/25/2012 09:47 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: aggregateoccurrence: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)

5768 10/25/2012 09:42 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: aggregateoccurrence: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)

5767 10/25/2012 09:31 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonoccurrence: Added taxonoccurrence_required_key check constraint to ensure that all taxonoccurrences are properly identified, and empty taxonoccurrences are properly pruned. This fixes a bug where taxon-only and stem-only data did not properly prune the taxonoccurrence that would otherwise get created because it's included in the mappings.

5766 10/25/2012 07:51 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): insert_into_pkeys(): Use new sql.add_pkey_or_index() instead of sql.add_pkey() in order to just print a warning if for some reason there were duplicate entries for an input row in the iteration's pkeys table. This should provide a workaround for bugs (often in the schema itself, related to its unique indexes) that cause an input row to match multiple output rows when joining on the output table using the unique constraint's columns.

5765 10/25/2012 07:44 AM Aaron Marcuse-Kubitza

sql.py: Added add_pkey_or_index()

5764 10/25/2012 07:32 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): Parse "could not create unique index ... Key is duplicated" errors as DuplicateKeyException

5763 10/25/2012 07:27 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): DuplicateKeyException: Factored out creation of DuplicateKeyException into helper function

5762 10/25/2012 07:20 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

5761 10/24/2012 06:31 PM Aaron Marcuse-Kubitza

tnrs_db: Removed tnrs.InvalidResponse exception handler that retries the query because the current query does not track which names have been submitted to but not processed by TNRS, so the error would continue to happen repeatedly

5760 10/24/2012 06:13 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: location: Added index on parent_id to speed up plot change over time joins

5759 10/24/2012 05:45 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: location: Added index on creator_id to speed up analytical_db_view joins

5758 10/24/2012 05:15 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: stemobservation: Added index on plantobservation_id to speed up analytical_db_view joins

5757 10/23/2012 01:08 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added initial plot_change_over_time view

5756 10/23/2012 12:53 PM Aaron Marcuse-Kubitza

Added inputs/bien_web/

5755 10/23/2012 12:43 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Reordered taxonoccurrence.growthform to put if after the bien_web.observation fields

5754 10/23/2012 12:32 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Include taxonoccurrence.growthform

5753 10/23/2012 12:27 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Generate taxonMorphospecies by concatenating the scientificName to the morphospecies

5752 10/23/2012 12:23 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take taxonomic name components from the accepted taxonlabel's taxonverbatim instead of the datasource's taxonverbatim, which does not contain the accepted name

5751 10/23/2012 12:19 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: identifiedBy: Added NULLIF to keep empty strings out of the analytical DB

5750 10/23/2012 12:13 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take taxonomic name components from the accepted taxonlabel's taxonverbatim instead of the datasource's taxonverbatim, which does not contain the accepted name

5749 10/23/2012 12:10 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take morphospecies from the parsed taxonlabel's taxonverbatim, where it has been parsed out, instead of the datasource's taxonverbatim, which has it as part of the verbatim input name

5748 10/23/2012 11:58 AM Aaron Marcuse-Kubitza

analytical_db_view: Added stemobservation.xposition_m, yposition_m

5747 10/23/2012 11:46 AM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/map.csv: Added new Time_submitted field

5746 10/23/2012 11:45 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/header.csv: Regenerated for new staging tables format

5745 10/23/2012 11:41 AM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/test.xml.ref: Accepted correct inserted row count, which most likely became detached from the primary row count when the TNRS cache was cleared and repopulated with test data

5744 10/23/2012 11:28 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Reordered joins in path order, putting datasource before location. This will enable more naturally reusing the SELECT query for other analyses.

5743 10/23/2012 11:15 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Do not include rank in the mapping because taxonomicname is globally unique, and thus it isn't used in looking up the NCBI taxonlabel

5742 10/23/2012 11:05 AM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes

5741 10/23/2012 10:49 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Also attach TNRS genus to NCBI backbone. This causes attachment to be made with as many of family and genus as are provided and have an entry in NCBI.

5740 10/23/2012 10:45 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: family -> NCBI backbone: Removed extra path after _if statement's cond/_exists

5739 10/23/2012 10:39 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Instead of connecting the acceptedFamily to the NCBI backbone, connect the family for the TNRS matched taxonlabel. This connects more families and also connects the same set of fields as will be connected for the genus.

5738 10/23/2012 10:01 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Fixed bug where needed to attach accepted family to NCBI using taxonomicname, which is globally unique, rather than taxonepithet, which is only unique within the parent taxon

5737 10/23/2012 09:34 AM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/: Added Time_submitted column at beginning and populate it in tnrs_db with the time the batch TNRS request was submitted

5736 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: RowNumFilter: Use new ColInsertFilter

5735 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: Added ColInsertFilter

5734 10/23/2012 08:43 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed no longer used _is_higher_taxon(). Use _has_taxonomic_name() or _taxonomic_name_is_epithet() instead.

5733 10/23/2012 08:42 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonepithet: Use new _taxonomic_name_is_epithet() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5732 10/23/2012 08:36 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _taxonomic_name_is_epithet()

5731 10/23/2012 08:33 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5730 10/23/2012 08:30 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field

5729 10/23/2012 08:25 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _has_taxonomic_name() for lower taxon ranks that typically have a globally unique taxonomic name

5728 10/23/2012 08:10 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Removed unit conversion functions that take a text input, since casts to the parameter type (double precision) are now automatically performed by sql_io.put_table(), using sql.parse_exception()'s function MissingCastException parsing

5727 10/23/2012 08:01 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: _is_higher_taxon() calls: Default to true if the rank can't be parsed to a taxonrank enum value

5726 10/23/2012 07:56 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Moved definition of wrapper function inside try block of main loop because the creation of the empty pkeys table (whose row type is needed for the wrapper function) can itself produce MissingCastExceptions, which must be thrown inside the loop in order to be handled properly

5725 10/23/2012 07:05 AM Aaron Marcuse-Kubitza

db_xml.py: put(): Indicate no parent_ids_loc using no_parent_ids_loc sentinel instead of None to support parent_ids_locs that are equal to None (e.g. if the parent node had an error). Always forward parent_ids_loc to children with fkeys to parent, even on error, because the parent table may not be required for the child tables to be valid, such as for taxonomic-data-only datasets that nevertheless have nodes for the non-taxonomic tables in their mappings.

5724 10/23/2012 06:38 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text

5723 10/23/2012 05:59 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): InvalidValueException: Fixed bug in regexp where can't use .*? before (?:...)? surrounding matched value, because it prevents the value from being matched now that it is optional

5722 10/23/2012 05:52 AM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/header.csv: Updated for new staging table format, which includes a row_num column in each joined table

5721 10/23/2012 05:51 AM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/create.sql: Updated for new src table names

5720 10/23/2012 05:36 AM Aaron Marcuse-Kubitza

xml_func.py: process(): Pass on_error through to sql_io.put(). This fixes a bug in row-based import where DB errors in the xml_func.process() phase would abort the entire import instead of being tracked and having the return value set to None.

5719 10/23/2012 05:33 AM Aaron Marcuse-Kubitza

sql_io.py: put(): Pass on_error through to put_table()

5718 10/23/2012 05:19 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): log_exc(): Return False if removing all rows and have callers break the main loop so that no further exception-handling code is processed before the main loop is exited

5717 10/23/2012 05:17 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): InvalidValueException: Also match exceptions which don't provide a specific value but just indicate that a value was invalid, such as PL/Python's "day is out of range for month"

5716 10/23/2012 04:39 AM Aaron Marcuse-Kubitza

db_xml.py: put(): Inserting children with fkeys to parent: Don't do this if this node had an error and sql_io.put_table() returned None as the generated pkey. This fixes a bug where a node with an error will still try to create children with fkeys to parent, but pass None as the fkey to parent, which the recursive put() call will then incorrectly treat as there being no field with an fkey to parent at all rather than a field whose value is NULL. This causes function overload resolution to be unable to find the intended function, because it is missing a parameter.

5715 10/23/2012 04:34 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Return the actual type of the function's 1st param, using new function_param0_type(), rather than just text

5714 10/23/2012 04:31 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where can't return the function name as the name of what was missing the cast, because this must be a column

5713 10/23/2012 04:28 AM Aaron Marcuse-Kubitza

sql.py: Added function_param0_type()

5712 10/23/2012 04:26 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Only treat DoesNotExistException as a MissingCastException if the query that was run did not already include a cast, to avoid infinite exception-handling recursion

5711 10/23/2012 04:24 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where determining whether the exception is a MissingCastException rather than a DoesNotExistException needs to check whether the function exists rather than whether it's the same in the exception message as in the query that was run. The exception message will of course copy the function name verbatim from the query, so there is no information in the exception message itself to indicate whether the DoesNotExistException was caused by a missing cast or by a nonexistent function.

5710 10/23/2012 04:19 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Documented that the regexp match to extract the function name also checks that a function signature with param types was matched, indicating a function call rather than cast to regproc. This check will also help avoid infinite recursion when function MissingCastException parsing calls database structure introspection functions.

5709 10/23/2012 04:15 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Don't match quotes around the function name because this particular exception (incorrect param type) does not include them. Casts to regproc, which also produce a DoesNotExistException, include the quotes but do not indicate a MissingCastException.

5708 10/23/2012 04:12 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where the 1st param's type in the exception's function signature is not actually the type the argument needs to have, because this is just the argument's current type

5707 10/23/2012 04:04 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): typed_name_re: Also match identifiers without quotes, such as functions in "No function matches the given name and argument types" errors. This fixes a bug where DoesNotExistExceptions could not be parsed as MissingCastExceptions when applicable because the DoesNotExistException pattern would not even match.

5706 10/23/2012 03:57 AM Aaron Marcuse-Kubitza

inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa

5705 10/23/2012 01:24 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: taxonlabel_2_set_canon_label_id(): Only run if matched_label_id has actually changed, to avoid infinite recursion when updating canon_label_id on labels that resolve to this label when there are cycles in the data

5704 10/23/2012 01:21 AM Aaron Marcuse-Kubitza

inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa

5703 10/23/2012 12:57 AM Aaron Marcuse-Kubitza

inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa

5702 10/23/2012 12:49 AM Aaron Marcuse-Kubitza

inputs/.NCBI/: Renamed names, nodes to *.src so they wouldn't get an automatic row_num column and can be used in higher_taxa's join

5701 10/23/2012 12:38 AM Aaron Marcuse-Kubitza

inputs/NCU-NCSC/Specimen/+header.csv: Fixed bug where needed ! at beginning to indicate a header override file, which prevents the following row from being treated as data

5700 10/23/2012 12:36 AM Aaron Marcuse-Kubitza

units.py: MissingUnitsException: Fixed bug where quantity is a Quantity object, not a string, and thus needs to be converted to a string using strings.ustr()

5699 10/23/2012 12:25 AM Aaron Marcuse-Kubitza

inputs/FIA/Organism/test.xml.ref: Accepted new test output now that FIA table is sorted in the order of the original CSV after staging table reinstallation

5698 10/23/2012 12:24 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/create.sql: Removed dropping of row_num column, which is no longer added on non-CSV tables

5697 10/23/2012 12:22 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: %/install: Moved "table-scope src table's row_num col" comment outside of define block so it wouldn't be echoed to stdout even when the table is not a src table

5696 10/23/2012 12:17 AM Aaron Marcuse-Kubitza

Added inputs/NCU-NCSC/Specimen/+header.csv header override to remove empty, unnamed column at end

5695 10/23/2012 12:05 AM Aaron Marcuse-Kubitza

inputs/*/*/header.csv: Regenerated for new staging tables format (which now includes a row_num column on every CSV table), as part of reinstalling staging tables

5694 10/22/2012 11:59 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where DROP VIEW statements needed IF EXISTS because CASCADEs on previous DROP VIEWs may have already dropped the view in question

5693 10/22/2012 11:57 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: %/install: Fixed bug where a .src table's row_num column needed to have the table name prefixed (making it globally unique) to allow joining the table with other tables

5692 10/22/2012 11:31 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: sql/install: Fixed bug where $(logInstall) needed to be called with arguments, so that either > or >> would be used before the install log's filename

5691 10/22/2012 08:22 PM Aaron Marcuse-Kubitza

tnrs.py: submission_request_template: Use just Tropicos as the name source, as Brad says "GCC is for only one family (Asteraceae)" and USDA's "taxonomy is of lower quality and sometimes conflicts with Tropicos"

5690 10/19/2012 06:20 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Support functions with named parameters

5689 10/19/2012 06:18 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Support function names enclosed in quotes on the context line

5688 10/19/2012 06:15 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonName: Place it in taxonomicname instead of taxonepithet for lower taxa, because the only datasource that currently provides this field (NCBI) actually provides the full taxonomicname instead of the epithet at the current rank for lower taxa. (taxonomicname is not applicable to higher taxa because their names are not guaranteed to be globally unique.) taxonName may need to be renamed and/or redefined to account for this ambiguity in NCBI's usage.

5687 10/19/2012 06:14 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.

5686 10/19/2012 06:04 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.

5685 10/19/2012 05:57 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _is_higher_taxon()

5684 10/19/2012 05:52 PM Aaron Marcuse-Kubitza

README.TXT: Documentation: To import and scrub just the test taxonomic names: Added `make inputs/.TNRS/cleanup` after `make backups/TNRS.backup/restore` because the PostgreSQL collation may differ between vegbiendev's and the user's DB

5683 10/19/2012 05:50 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): DoesNotExistException: If item not found was a function and not found only because of a missing cast, raise MissingCastException instead. This should allow automatic casts to be added on function parameters as well as table columns.

5682 10/19/2012 05:28 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join to taxonverbatim on taxonverbatim_id (the pkey) instead of taxonlabel_id, which used to be the pkey but is now an fkey