sql_io.py: put_table(): Switched back to using run_query_into()'s add_pkey_ option now that it uses sql.add_pkey_or_index() instead of sql.add_pkey()
sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.
README.TXT: Data import: After import, removed steps to reinstall tnrs_accepted, tnrs_other because this would overwrite the versions of these staging tables that were used to build the current version of the database. (The backup should store a snapshot of the database's input data as it was used to build the database.) Note that these tables will still be reinstalled when the next import starts.
Added validation/BIEN2_Analytical_DB_overview.docx
inputs/import.stats.xls: Updated import times
inputs/.TNRS/tnrs_*/create.sql: Fixed bug where need to COALESCE the WHERE condition with false to ensure that a boolean value, not NULL, is negated by NOT. Otherwise, tnrs_other's query would exclude rows for which the condition returned NULL rather than false.
schemas/vegbien.sql: make_analytical_db(): Truncate tables before (re-)populating them in case make_analytical_db() has already been run
schemas/vegbien.sql: analytical_stem_view: Updated to use reference instead of party to store the datasource name
schemas/vegbien.sql: Renamed make_analytical_stem() back to make_analytical_db()
import_all: After starting geoscrub import in the background, wait for make commands to scroll by before starting NCBI import
mappings/VegCore-VegBIEN.csv: matched place: Also map verbatim place's geoscrub-related fields to the matched place, to link up with geoscrub's corresponding input place
import_all: Removed explicit by_col=1 from datasources that don't require it for proper import. (It will still be set if the user provides it on the command line.)
schemas/functions.sql: Removed duplicate previous version of _and() with fewer parameters
Regenerated vegbien.ERD exports
schemas/vegbien.sql: method: Made reference_id NOT NULL so the method table would automatically be datasource-scoped
mappings/VegCore-VegBIEN.csv: Renamed creator_ids to reference_id since they are now fkeys to reference
schemas/vegbien.sql: Made creator_ids an fkey to reference instead of party, so that datasources are stored separately from people and to allow adding reference-type metadata (URL, copyright, etc.) for each datasource
schemas/vegbien.sql: reference: Added unique index
mappings/VegCore-VegBIEN.csv: Remapped accordingTo to taxonconcept.concept_reference_id
mappings/VegCore-VegBIEN.csv: Don't populate taxonlabel.creator_id using identifiedBy, because that was meant for creating a taxonconcept when the accordingTo was not specified, but taxonconcepts are now modeled differently using a taxonconcept subclass of taxonlabel with a concept_reference_id
schemas/vegbien.sql: taxonverbatim: Renamed binomial to taxonname because this field is actually the taxonomic name without the author, not just the genus+specific epithet binomial
mappings/VegCore.csv: Renamed *Binomial to *TaxonName because this field can store more ranks than just the genus+specificEpithet binomial (that goes in speciesBinomial)
mappings/VegCore.csv: Renamed taxonName to taxonNameOrEpithet for clarity
schemas/vegbien.sql: analytical_aggregate_view: Added growthForm, cultivated, cultivatedBasis from analytical_stem. Note that this will create multiple entries for a taxon name when some taxonoccurrences are marked as cultivated and others aren't, or different taxonoccurrences have different growthform entries.
import_all: Added geoscrub import, which can happen concurrently with NCBI/TNRS but must come before the main datasources for the matched places to link up properly
import_all: Documented that TNRS import must come after NCBI for cross links to be made
input.Makefile: Staging tables installation: `%.sql: _MySQL/%.sql`, etc.: Only run if target does not exist, to avoid regenerating the target when a .make script which generates the target's prerequisite is checked out from svn (causing its mod time to be newer than the target)
Added inputs/.geoscrub/
input.Makefile: SVN: $(*svnFilesGlob): Added */, _MySQL/ subdirs
lib/common.Makefile: SVN: $(add*): Fixed bug where need to use --depth=empty to ensure directory contents are not added unless explicitly listed
mappings/VegCore-VegBIEN.csv: Mapped georeferenceValid, latLongInCountry, latLongInStateProvince
schemas/functions.sql: Added _and()
mappings/VegCore-VegBIEN.csv: Mapped distanceToCountry_km, distanceToStateProvince_km
schemas/vegbien.sql: Added _km_to_m()
schemas/vegbien.sql: place: Added georeference_valid, distance_to_country_m, distance_to_state_m
mappings/VegCore-VegBIEN.csv: matched taxonlabel's ancestors: Only create the cross links to NCBI if the name is accepted (taxonIsCanonical)
mappings/VegCore-VegBIEN.csv: accepted* taxonomic terms: If is canonical name, don't populate a separate accepted taxonlabel because the taxonlabel itself is accepted. If is not canonical name, populate matched_label_id directly when the row is created instead of waiting until after all rows are created and using _taxonlabel_set_matched_label_id(). This is possible now that accepted taxonlabels occur only in TNRS's tnrs_accepted table, which is specially marked as taxonIsCanonical. Avoiding _taxonlabel_set_matched_label_id() should fix the slowdown that occurs when it's used on large tables.
mappings/VegCore-VegBIEN.csv: taxonIsCanonical: Moved then=0 into non-identifying _if statement attributes
mappings/VegCore-VegBIEN.csv: Mapped taxonIsCanonical
bin/map: map_rows(): map_table(): Fixed bug where metadata values were being removed prematurely, by passing them through
bin/map: map_rows(): Fixed bug where metadata values were being passed to functions that expected columns, by placing them directly in the XML import tree and then removing them from the mappings
input.Makefile: Maps validation: %/new_terms.csv: Filter out metadata values, which are not new terms even though they are in the same map column as the input columns
inputs/.TNRS/tnrs_accepted/map.csv: Set taxonIsCanonical to true, using new metadata value syntax
mappings/VegCore.csv: Added taxonIsCanonical
input.Makefile: SVN: $(svnFiles): Added _MySQL files that would otherwise be excluded with $(filter-out _% ...)
inputs/*/: Placed all logs/ subdirs and import_order.txt under version control, using `make inputs/.../add`
input.Makefile: SVN: add: Factored list of files to add out into $(svnFiles) var
Added inputs/.TNRS/tnrs/header.csv
input.Makefile: SVN: $(svnFilesGlob): Added *.make
input.Makefile: SVN: add: Fixed bug where _-prefixed dirs (such as _MySQL) were incorrectly being included in the list of files to put under version control
Calls to `make inputs/.TNRS/cleanup`: Do `make inputs/.TNRS/tnrs_accepted/reinstall; make inputs/.TNRS/tnrs_other/reinstall` instead to use new split TNRS tables
inputs/.TNRS/schema.sql: Removed no longer needed tnrs_accepted_names index, now that the accepted names have been split into a separate table
inputs/.TNRS/: Split tnrs table into tnrs_accepted and tnrs_other, with tnrs_accepted imported first. This will eventually allow TNRS data to be imported without needing to use _taxonlabel_set_matched_label_id(), which is extremely slow due to an unknown bug in PostgreSQL's handling of UPDATEs on large tables (possibly limited to UPDATEs inside functions).
input.Makefile: SVN: add: Fixed bug where logs/*.log.sql was incorrectly being included in the list of files to put under version control
lib/common.Makefile: Filesystem: $(wildcard/): Fixed bug where {} exprs weren't handled correctly when some branches contained *, by using `shopt -s nullglob` and bash directly instead of trying to emulate it with grep
input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also provide the schema to create.sql in :schema
bin/map: Added support for including literal metadata values in the map spreadsheet, by prefixing them with ':'
mappings/VegCore.csv: Added distanceToCountry_km, distanceToStateProvince_km
mappings/VegCore.csv: Added distanceToCountry_m, distanceToStateProvince_m
mappings/VegCore-VegBIEN.csv: Mapped acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude. Mapped decimalLatitude/Longitude to matched place's coordinates when acceptedDecimalLatitude/Longitude not provided (as is the case for the geoscrub table).
input.Makefile: SVN: $(svnFilesGlob): Added schema.sql
mappings/VegCore.csv: Replaced georeferenceInvalid with latLongDomainValid, latLongDomainInvalid because this field is actually just whether the coordinates are in the range of valid decimal values, not whether they are within the named place reference. Re-sourced georeferenceValid to bien_web.observation#isGeovalid rather than isValidLatLong. VegCore-VegBIEN.csv: Replaced now-removed georeferenceInvalid with latLongInCountry for locationID redirection checks.
mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided
mappings/VegCore-VegBIEN.csv: accepted* taxonlabel: Removed ancestor hierarchy because this is populated, in much greater detail, when the accepted name is imported as an input name and the TNRS-parsed components are available
sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.
sql.py: distinct_table(): Changed comment about distinct_on column index to include just the input table, so that the function does not guarantee a unique index on the output table's distinct_on columns
mappings/VegCore.csv: Added acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude
mappings/VegCore.csv: Renamed latLongValid, latLongInvalid to georeferenceValid, georeferenceInvalid to correspond to DwC term georeferenceVerificationStatus
mappings/VegCore.csv: Added latLongValid, latLongInvalid, latLongInCountry, latLongInStateProvince
input.Makefile: Staging tables installation: Treat any .sql file whose name contains (not just ends with) "schema" as a schema file and sort it before other .sql files
schemas/vegbien.sql: Functions containing UPDATE statements: Use quote_nullable() instead of quote_literal() to properly encode NULL values
schemas/vegbien.sql: Functions containing UPDATE statements: Use PL/pgSQL's EXECUTE statement to avoid caching query plans. This is necessary because as the table grows over time, the optimal query plan may change.
sql_io.py: put_table(): ensure_cond(): When deleting rows rows that do not satisfy the condition, handle sql.DoesNotExistExceptions caused by columns in the condition that were not replaced with NULL. These occur when out_table is a function, and the columns of the table the condition relates to therefore can't be found using out_table.
sql_io.py: put_table(): Calling function: Do not cache the function call, because it may be retried after error handling
sql_gen.py: NotCond: Treat a condition that evaluates to NULL as false instead, so that the boolean effect of the condition is completely inverted
sql_gen.py: null_as_str: Use new null instead of hardcoding 'NULL'
sql_gen.py: Added null
sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged
sql_io.py: put_table(): sql.DatabaseErrors: Factored exception-handling code out into handle_unknown_exc(), for use by other exception handlers
sql_io.py: put_table(): ensure_cond(): is_function: Fixed bug where can't replace out_table_cols with NULL because out_table is a function, not a table
my2pg*: Turn off escape_string_warning because \-escaped strings are standard in MySQL
my2pg.data: Turn off standard_conforming_strings like in my2pg
my2pg: Also remove any CHARACTER SET modifier on a column definition
schemas/vegbien.sql: analytical_aggregate_view: Make size classes cumulative ranges (stems above a certain DBH) rather than bins, per Brad's request
input.Makefile: SVN: add: Add header override files with any extension, not just .csv
README.TXT: Datasource setup: Replaced manual `svn add` commands with one `make inputs/<datasrc>/add` before committing to add newly-created files
input.Makefile: SVN: add: Also add any .sql, *when it's in a subdir. This applies to create.sql, cleanup.sql, etc.
lib/common.Makefile: SVN: Added $(add*)
input.Makefile: SVN: add: Also add any newly-created files which should be under version control
input.Makefile: Fixed bug where _MySQL/%.sql files weren't being built from associated .make files by adding special `%.sql: .sql.make` rule to override `.sql: _MySQL/%.sql`
input.Makefile: `%: .make`: Factored $(if $(wildcard $@)... test out into $(make_script) so all `: %.make`-like rules could use it directly
lib/forwarding.Makefile: $(subdirs): Use all folders other than ../ ./ .svn/ instead of listing folders that start with . explicitly
schemas/vegbien.sql: analytical_stem_view: Use accepted_taxonlabel.taxonomicname instead of accepted_taxonverbatim.taxonomicname in order to have the family prepended
schemas/vegbien.sql: placerank: Reordered in path order, using <http://rs.tdwg.org/dwc/terms/#dcindex> and <http://vegbank.org/vegbank/views/dba_fielddescription_detail.jsp?view=detail&wparam=1415&entity=dba_fielddescription¶ms=1415> as a guide. Documented the source of the values.
schemas/vegbien.sql: placename: Removed non-name-related fields, because placename is designed only to store a hierarchy of placenames, not additional place information
schemas/vegbien.sql: Moved placedescription from placename to place (and renamed it to description), because it applies to the place itself, not the name for the place
schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species