inputs/.TNRS/schema.sql: Removed no longer needed tnrs_accepted_names index, now that the accepted names have been split into a separate table
inputs/.TNRS/: Split tnrs table into tnrs_accepted and tnrs_other, with tnrs_accepted imported first. This will eventually allow TNRS data to be imported without needing to use _taxonlabel_set_matched_label_id(), which is extremely slow due to an unknown bug in PostgreSQL's handling of UPDATEs on large tables (possibly limited to UPDATEs inside functions).
input.Makefile: SVN: add: Fixed bug where logs/*.log.sql was incorrectly being included in the list of files to put under version control
lib/common.Makefile: Filesystem: $(wildcard/): Fixed bug where {} exprs weren't handled correctly when some branches contained *, by using `shopt -s nullglob` and bash directly instead of trying to emulate it with grep
input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also provide the schema to create.sql in :schema
bin/map: Added support for including literal metadata values in the map spreadsheet, by prefixing them with ':'
mappings/VegCore.csv: Added distanceToCountry_km, distanceToStateProvince_km
mappings/VegCore.csv: Added distanceToCountry_m, distanceToStateProvince_m
mappings/VegCore-VegBIEN.csv: Mapped acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude. Mapped decimalLatitude/Longitude to matched place's coordinates when acceptedDecimalLatitude/Longitude not provided (as is the case for the geoscrub table).
input.Makefile: SVN: $(svnFilesGlob): Added schema.sql
mappings/VegCore.csv: Replaced georeferenceInvalid with latLongDomainValid, latLongDomainInvalid because this field is actually just whether the coordinates are in the range of valid decimal values, not whether they are within the named place reference. Re-sourced georeferenceValid to bien_web.observation#isGeovalid rather than isValidLatLong. VegCore-VegBIEN.csv: Replaced now-removed georeferenceInvalid with latLongInCountry for locationID redirection checks.
mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided
mappings/VegCore-VegBIEN.csv: accepted* taxonlabel: Removed ancestor hierarchy because this is populated, in much greater detail, when the accepted name is imported as an input name and the TNRS-parsed components are available
sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.
sql.py: distinct_table(): Changed comment about distinct_on column index to include just the input table, so that the function does not guarantee a unique index on the output table's distinct_on columns
mappings/VegCore.csv: Added acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude
mappings/VegCore.csv: Renamed latLongValid, latLongInvalid to georeferenceValid, georeferenceInvalid to correspond to DwC term georeferenceVerificationStatus
mappings/VegCore.csv: Added latLongValid, latLongInvalid, latLongInCountry, latLongInStateProvince
input.Makefile: Staging tables installation: Treat any .sql file whose name contains (not just ends with) "schema" as a schema file and sort it before other .sql files
schemas/vegbien.sql: Functions containing UPDATE statements: Use quote_nullable() instead of quote_literal() to properly encode NULL values
schemas/vegbien.sql: Functions containing UPDATE statements: Use PL/pgSQL's EXECUTE statement to avoid caching query plans. This is necessary because as the table grows over time, the optimal query plan may change.
sql_io.py: put_table(): ensure_cond(): When deleting rows rows that do not satisfy the condition, handle sql.DoesNotExistExceptions caused by columns in the condition that were not replaced with NULL. These occur when out_table is a function, and the columns of the table the condition relates to therefore can't be found using out_table.
sql_io.py: put_table(): Calling function: Do not cache the function call, because it may be retried after error handling
sql_gen.py: NotCond: Treat a condition that evaluates to NULL as false instead, so that the boolean effect of the condition is completely inverted
sql_gen.py: null_as_str: Use new null instead of hardcoding 'NULL'
sql_gen.py: Added null
sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged
sql_io.py: put_table(): sql.DatabaseErrors: Factored exception-handling code out into handle_unknown_exc(), for use by other exception handlers
sql_io.py: put_table(): ensure_cond(): is_function: Fixed bug where can't replace out_table_cols with NULL because out_table is a function, not a table
my2pg*: Turn off escape_string_warning because \-escaped strings are standard in MySQL
my2pg.data: Turn off standard_conforming_strings like in my2pg
my2pg: Also remove any CHARACTER SET modifier on a column definition
schemas/vegbien.sql: analytical_aggregate_view: Make size classes cumulative ranges (stems above a certain DBH) rather than bins, per Brad's request
input.Makefile: SVN: add: Add header override files with any extension, not just .csv
README.TXT: Datasource setup: Replaced manual `svn add` commands with one `make inputs/<datasrc>/add` before committing to add newly-created files
input.Makefile: SVN: add: Also add any .sql, *when it's in a subdir. This applies to create.sql, cleanup.sql, etc.
lib/common.Makefile: SVN: Added $(add*)
input.Makefile: SVN: add: Also add any newly-created files which should be under version control
input.Makefile: Fixed bug where _MySQL/%.sql files weren't being built from associated .make files by adding special `%.sql: .sql.make` rule to override `.sql: _MySQL/%.sql`
input.Makefile: `%: .make`: Factored $(if $(wildcard $@)... test out into $(make_script) so all `: %.make`-like rules could use it directly
lib/forwarding.Makefile: $(subdirs): Use all folders other than ../ ./ .svn/ instead of listing folders that start with . explicitly
schemas/vegbien.sql: analytical_stem_view: Use accepted_taxonlabel.taxonomicname instead of accepted_taxonverbatim.taxonomicname in order to have the family prepended
Regenerated vegbien.ERD exports
schemas/vegbien.sql: placerank: Reordered in path order, using <http://rs.tdwg.org/dwc/terms/#dcindex> and <http://vegbank.org/vegbank/views/dba_fielddescription_detail.jsp?view=detail&wparam=1415&entity=dba_fielddescription¶ms=1415> as a guide. Documented the source of the values.
schemas/vegbien.sql: placename: Removed non-name-related fields, because placename is designed only to store a hierarchy of placenames, not additional place information
schemas/vegbien.sql: Moved placedescription from placename to place (and renamed it to description), because it applies to the place itself, not the name for the place
schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species
schemas/vegbien.sql: analytical_aggregate_view: Include all analytical_stem species, not just those whose stems have non-NULL DBH
schemas/vegbien.sql: Renamed aggregated_analytical_db to analytical_aggregate to match the name of analytical_stem
schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB
schemas/vegbien.sql: Removed no longer used locationcoords
schemas/vegbien.sql: analytical_db_view: Use new coordinates instead of locationcoords
mappings/VegCore-VegBIEN.csv: Remapped latitude/longitude to new coordinates table
schemas/vegbien.sql: place: Added coordinates_id
schemas/vegbien.sql: Added coordinates table
schemas/vegbien.sql: place: Removed municipality, site because they are not used in the geoscrubbing
schemas/vegbien.sql: place: Place custom hierarchy of placenames in placename table instead of in otherranks field
schemas/vegbien.sql: place.matched_place_id: Changed comment to say that places are linked in a three-level (instead of two-level) hierarchy of datasource place -> verbatim place -> accepted place, and this field contains the closest match
schemas/vegbien.sql: Renamed placepath to place since this contains primary information about the place, including the reference to the canonical place
schemas/vegbien.sql: Renamed place to placename since it refers just to a name for a place, without coordinates
schemas/vegbien.sql: analytical_db_view: Exclude original taxondeterminations, so that there is only one taxondetermination for each taxonoccurrence
schemas/vegbien.sql: make_analytical_db(): Also make new aggregated_analytical_db
schemas/vegbien.sql: sync_analytical_db_to_view(): DROP TABLE: Use IF EXISTS in case analytical_db table has already been deleted, or not yet created
schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view())
lib/PostgreSQL-MySQL.csv: custom types: Also match column names enclosed in ``
schemas/vegbien.sql: Store materialized analytical_db table in schema so aggregating views can reference it. Added sync_analytical_db_to_view() to maintain analytical_db table.
schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated
schemas/vegbien.sql: analytical_db_view: Reordered columns in path order
schemas/: Moved unit conversion functions from functions.sql to vegbien.sql so the unit conversion functions used by analytical_db_view wouldn't need to be stored both in functions.sql and in vegbien.sql. (All unit conversion functions used by analytical_db_view must be stored in the public schema so that analytical_db_view doesn't get cascadingly deleted when the functions schema is reinstalled.)
schemas/vegbien.sql: analytical_db_view: Use public._m2_to_ha() instead of functions._m2_to_ha()
schemas/vegbien.sql: Copied _m2_to_ha() to public schema for use by analytical_db_view
schemas/vegbien.sql: analytical_db_view: Added diameterBreastHeight_cm
schemas/functions.sql, vegbien.sql: Added _m_to_cm()
schemas/vegbien.sql: Copied _cm_to_m() to public schema for use by new aggregated_analytical_db_view
schemas/vegbien.sql: analytical_db_view: datasource table: Fixed bug where need to filter by creator_id = party_id in order to use just root parties (datasources)
tnrs_db: Fetching names to scrub: Omit sql.select() fields param because it will be filled in with its default value
import_all: Pass command-line args (such as make vars) to all commands, not just with_all, so that a custom public schema is properly used by all commands
inputs/.NCBI/nodes/create.sql: Make genus completely globally unique by removing duplicates. Note that only duplicates with ranks at or below the genus level need be removed, which for this dataset is just genus and subgenus.
schemas/vegbien.sql: taxonlabel: taxonlabel_required_key constraint: Also allow taxonlabels with just a sourceaccessioncode, to support looking up parent taxonlabels using just their sourceaccessioncode (e.g. in NCBI)
mappings/VegCore-VegBIEN.csv: matched taxonlabel: Don't include taxonName in the concatenated taxonomicname. This also prevents the creation of the matched taxonlabel entirely when only the taxonName is provided.
mappings/VegCore-VegBIEN.csv: Don't create matched taxonlabel if taxonName was provided. This fixes a bug where an NCBI node was incorrectly pointing to a TNRS name, when the reference should only be the other way around. This may also fix the TNRS slowdown, if it was caused by circular matched_label_id references.
schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_insert(): Fixed bug where also need to set canon_label_id based on matched_label_id here, not just in taxonlabel_2_set_canon_label_id_on_update(), because the matched_label_id could be specified when the taxonlabel is first created
schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_*(): Fixed bug where need to use := instead of = to perform assignment of canon_label_id
schemas/tree_cross-links.sql: Updated for schema changes
schemas/vegbien.sql: taxonlabel_update_ancestors(): Include ancestors for both parent_id and matched_label_id rather than just one or the other. This avoids needing to delete existing ancestors for the parent_id when a matched_label_id is added and overrides it. This should reduce the TNRS import time if the slowdown was due to the need to delete parent_id ancestors when later adding a matched_label_id (which only occurs in a separate step in the TNRS datasource).
sql_io.py: put_table(): ensure_cond(): Fixed bug where test if any rows failed cond did not check if cur != None (which is the case when cond == sql_gen.true_expr) before checking cur.rowcount
sql_gen.py: simplify_expr(): Don't require () around NULL IS NULL and NULL IS NOT NULL because extra parentheses are not provided in index conditions, only in check constraint conditions
inputs/import.stats.xls: Updated import times. The TNRS import has slowed down significantly, possibly due to a bug in the autopopulation of the taxonlabel_relationship table when the input data contains cycles.
sql_io.py: put_table(): Assertion that into and full_in_table have the same row count: Allow into to have more rows than full_in_table, in case an input row matched multiple output rows. This should not happen for a properly-configured database, but seems to happen periodically nevertheless (currently, to the MO datasource) and should not abort the import when it does.
sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist
schemas/vegbien.sql: analytical_db_view: Renamed datasource's taxonverbatim to datasource_taxonverbatim to distinguish it from the other taxonverbatims that are joined on (parsed_taxonverbatim, accepted_taxonverbatim)
inputs/.NCBI/nodes/create.sql: Make genus (mostly) globally unique by removing kingdom Animalia, which has significant genus overlap with plants. This reduces the number of duplicated genera from 578 to 65 (determined with `SELECT name_txt, count(), array_agg(rank) FROM "NCBI".nodes GROUP BY name_txt HAVING count() > 1 AND 'genus' = ALL (array_agg(rank))`).
inputs/.NCBI/nodes/create.sql: Added foreign key on parent tax_id with covering index
input.Makefile: Staging tables installation: Added %/uninstall, %/reinstall to allow reinstalling individual tables
sql_io.py: put_table(): ensure_cond(): When adding the failed condition to the errors table, also include the original, untranslated condition from the DB schema in addition to the translation of the condition into the input schema
sql_io.py: track_data_error(): Fixed bug where errors whose column had no srcs (indicated by () ) were incorrectly being ignored. This affected NOT NULL exceptions where the column was not provided by the dataset.