mappings/VegCore-VegBIEN.csv: Don't create NCBI crosslinks for the matched taxonomic name. These crosslinks are no longer needed now that TNRS provides a separate accepted name on which crosslinks can be made.
schemas/vegbien.sql: unscrubbed_taxondetermination_view: Include the accepted name's row next to the matched name's row instead of merging the two together into one TNRS row, to allow including separate taxondeterminations for the matched and accepted names. Added Max_score from TNRS.tnrs.
schemas/vegbien.sql: taxondetermination_set_iscurrent(): Added new determinationtype accepted to sort order
mappings/VegCore-VegBIEN.csv: Mapped accepted* taxonomic name, now to separate accepted taxondetermination
mappings/VegCore.csv: Regenerated from wiki
schemas/vegbien.sql: taxondetermination_set_iscurrent(): Changed TNRS determinationtype from computer to matched, to allow for a separate accepted determinationtype
schemas/vegbien.sql: taxonlabel: Removed creationdate, which duplicates taxondetermination.determinationdate
schemas/vegbien.sql: analytical_stem_view: isNewWorld: Removed no longer needed COALESCE to false, because newWorldCountries now uses false where applicable instead of NULL. This also ensures that isNewWorld will be NULL if there is no country name to test, which was not the case in the previous workaround.
Added inputs/newWorld/newWorldCountries/ with postprocess.sql that sets isNewWorld to false wherever it's NULL. (The input table only marks New World countries as true, but doesn't mark non-New World countries as false.)
schemas/vegbien.sql: analytical_stem_view: isNewWorld: Fixed bug where need to COALESCE "newWorldCountries"."isNewWorld" to false, because it is only set to a boolean for countries that are New World
README.TXT: Full database import: freeing disk space: Updated import schema size, which is smaller due to the removed CTFS staging tables, removed duplicate rows, and possibly fewer index holes
README.TXT: Full database import: After running `make schemas/$version/publish`, added `unset version` to make sure future version-dependent commands use the public schema
schemas/vegbien.sql: taxon_trait_view: Fixed bug where measurementUnit needed to be set to trait.units, not name
schemas/vegbien.sql: provider_count_view: Don't set default values for sourcetype/observationtype, because the appropriate values are now set for all top-level inputs and these defaults are not applicable for data owners not in geoscrub.herbaria
inputs/bien2_traits/Source/map.csv: Mapped observationType
schemas/vegbien.sql: taxondetermination: Removed taxondetermination_computer_min_fit CHECK constraint, whose functionality is now duplicated by unscrubbed_taxondetermination_view's Max_score filter condition. The score threshold value should only be maintained in one place, namely unscrubbed_taxondetermination_view.
schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to filter out any names that will be rejected by taxondetermination's constraints, because otherwise, these names will stay in unscrubbed_taxondetermination_view and be repeatedly reimported
inputs/.TNRS/schema.sql: tnrs: Added Max_score column for use in filtering out names that will be rejected by taxondetermination's constraints
inputs/.TNRS/schema.sql: Renamed tnrs_populate_accepted_scientific_name() trigger to tnrs_populate_derived_fields() to accommodate additional derived fields
tnrs_db: Support multiple appended columns in the tnrs table
csvs.py: ColInsertFilter: Support adding multiple, consecutive columns
schemas/functions.sql: _max(), _min(): Put $n params all on one line to match other aggregating functions
schemas/functions.sql: _max(), _min(): Use PostgreSQL built-in functions GREATEST, LEAST instead of a query with aggregating functions
README.TXT: Added Single datasource import section with commands to import/reimport/scrub just a datasource rather than the full DB
schemas/vegbien.sql: taxondetermination: taxondetermination_set_iscurrent_on_delete() trigger: Fixed bug where need to suppress any foreign key exception, which occurs during a cascading delete because the associated taxonoccurrence has already been deleted, preventing any other taxondeterminations of that taxonoccurrence from being updated
input.Makefile: Taxonomic scrubbing: Added reimport_scrub
input.Makefile: Import to VegBIEN: Added reimport
input.Makefile: Taxonomic scrubbing: Added rescrub
input.Makefile: Taxonomic scrubbing: Added scrub target and use it in import_scrub
input.Makefile: Import to VegBIEN: Moved import, rm to top of section since they are top-level targets and don't depend on the variables defined for %/import
input.Makefile: Moved rm to Import to VegBIEN section
input.Makefile: Moved taxonomic scrubbing targets to separate Taxonomic scrubbing section
inputs/import.stats.xls: Updated import times
schemas/vegbien.sql: provider_count_view: Include only sources with at least one row. Currently (as of r7023), all entries in BIEN2's geoscrub.herbaria are also in VegBIEN, so the filter is not yet necessary, but switching to bien3_adb.ih could create source entries without data rows which should be excluded from the providers list.
import_all: Output the PIDs of the import_scrub and after_import processes, so those processes can be managed without shell job control. This is useful if the connection is lost to the remote shell running the import, which prevents using job control on the import processes.
input.Makefile: Import to VegBIEN: import_scrub: Run `make scrub` in the background, to allow the import to continue with the next table rather than having to wait for the current table to be scrubbed
inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Moved waitself call to top of script
inputs/import.stats.xls: Added Postprocessing section for use with the next import
inputs/import.stats.xls: Updated import times. Total does not yet include postprocessing.
import_times: Add blank line before \"Postprocessing logs\" to separate it from the input logs
import_times: Separate out the postprocessing logs (e.g. public.unscrubbed_taxondetermination_view), as the import times in these logs are not aggregated together (each input has its own run of the postprocessing script)
root Makefile: Datasources: import: Use new import_scrub instead of import (input.Makefile)
import_all: Use new import_scrub (input.Makefile) instead of import, which avoids needing to start background processes for tnrs-remake and scrub-remake
inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need to use tnrs.make's lockfile instead because can't be importing while tnrs.make is scrubbing. tnrs.make leaves tnrs in an incomplete state while running because the accepted names are parsed after their matched names. Using a separate lockfile would cause some accepted names to be missing.
input.Makefile: Import to VegBIEN: Added import_scrub, which runs `make scrub` after the import
root Makefile: Datasources: Added scrub, which runs tnrs-remake and scrub-remake
inputs/.TNRS/*/*.make: Only allow one instance of the script to be running at any time, by using new waitself
waitpid, lockfile: Changed $interval default to 5s to work with smaller imports, where less waiting is needed
Added waitself
bin/lockfile: Include the PID in the lockfile to avoid the need to manually remove lockfiles. On Mac, this requires using shlock instead of lockfile.
Added bin/lockfile
Added pid2name
Added name2pids
waitpid: Use `ps` instead of /proc to also work on Mac
inputs/.TNRS/tnrs/tnrs.make: Fixed bug where need special handling to support being run as a .make script
inputs/.geoscrub/_src/README.TXT: Added dates for e-mails from Jim
inputs/.geoscrub/_src/README.TXT: Added e-mail from Jim about repository with scripts to generate the geoscrub_output table
schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to use tnrs_accepted.Name_submitted IS NOT NULL rather than tnrs_accepted.* IS NOT NULL, because tnrs_accepted.* (which plain tnrs_accepted gets changed to by PostgreSQL) checks each field of the tnrs_accepted tuple rather than checking if the tuple itself is NULL
inputs/.TNRS/schema.sql: Added back tnrs+accepted view, which is useful for debugging the import of the TNRS results
inputs/REMIB/Specimen/postprocess.sql: Added back ARIZ, NY because some REMIB specimens for these datasources are not yet in the datasources themselves
Added inputs/REMIB/Specimen/postprocess.sql to remove institutions that we have direct data for
Placed inputs/REMIB/_archive/ under version control
Added inputs/SpeciesLink/Specimen/postprocess.sql to remove institutions that we have direct data for
Placed inputs/SpeciesLink/_archive/ under version control
input.Makefile: $(import?): Renamed $public_import option to $full_import because it applies to any import of all datasources, not just a public import on vegbiendev
schemas/vegbien.sql: analytical_stem_view: Changed `WHERE COALESCE` to a join condition to enable using the taxondetermination_single_current_determination index, which produces the filtered rows directly. Note that this index will not be used for full-database imports, because the query planner uses hash joins everywhere instead of nested loops.
db_xml.py: put_table(): Fixed bug where for views, shouldn't advance start (OFFSET clause) after each chunk, because views are typically dynamic and will contain a new set of rows after the first set is imported
sql.py: Added view_exists()
inputs/.TNRS/schema.sql: Removed no longer used tnrs_canon. unscrubbed_taxondetermination_view uses its definition directly instead.
schemas/vegbien.sql: unscrubbed_taxondetermination_view: Added comment from tnrs_canon
schemas/vegbien.sql: unscrubbed_taxondetermination_view: Do the tnrs_canon joins manually instead of using tnrs_canon, to allow PostgreSQL to use a nested loop join on just the needed tnrs rows instead of a hash self-join of all tnrs rows. The query planner is not yet advanced enough to automatically integrate the select on the view into the top-level joins list, which would make this change automatically.
inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Look at last 100 rows instead of last 10, because rows are added to the log file each time the script waits and the Inserted # new rows message must be in the tailed rows
inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Fixed bug where need to test if log file exists before using it in tail, because if tail fails and causes rowsAdded to return false, this error exit status will be indistinguishable from false for no rows added and the script will keep going
inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need special handling to support being run as a .make script
input.Makefile: Editing import: Added unscrub to remove TNRS taxondeterminations
psql_script_vegbien: Added no_query_results option to hide results of calls to void functions
schemas/vegbien.sql: Added delete_scrubbed_taxondeterminations()
root Makefile: python-Darwin: Added instructions to install dateutil for Python 3 as well as Python 2, for use in PL/Python functions
root Makefile: python-Darwin: Added note that Python 2 comes preinstalled
Added inputs/GBIF/Specimen/postprocess.sql to remove institutions that we have direct data for
import_all: Run disown_all after background processes have been created, so that they will not be aborted if the shell exits (e.g. due to a broken connection). Note that with_all processes are automatically disowned as they are created, but other processes, such as after_import, were not.
inputs/.TNRS/schema.sql: Removed no longer used array_to_string(). The IMMUTABLE wrapper is only needed for index conditions and other places that require an IMMUTABLE function.
input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms
README.TXT: Data import: Checking free disk space: Updated import schema size to 110GB
Added inputs/Madidi/_README.TXT
new_terms.csv: Regenerated
inputs/Madidi/new_terms.csv: Regenerated
inputs/Madidi/_archive/2010-1-2/: Set svn:ignore
inputs/Madidi/_README.TXT: Archived to _archive/2010-1-2/
inputs/Madidi/: Refreshed. Note that new export has a completely new schema.
mappings/VegCore-VegBIEN.csv: fieldNumber (authorEventCode): Fixed bug where locationevent.authorlocationcode should be authoreventcode
Added inputs/Madidi/map.csv, created from new_terms.csv
inputs/Madidi/_archive/: Set svn:ignore
csvs.py: sniff(): TSVs: Don't turn off quoting, because some TSVs (such as Madidi.IndividualObservation) do quote fields
csvs.py: TsvReader: Use csv.reader.next() when possible to support quoted fields, such as in Madidi.IndividualObservation