inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
bugfix: inputs/.TNRS/schema.sql: scrubbed_family: Name_matched_accepted_family was missing from the TNRS results at one point, so we are now using Family_matched as a workaround to populate this. the workaround is for accepted names only, as no opinion names do not have an Accepted_name_family to prepend to the scrubbed name to parse.
inputs/.TNRS/schema.sql: reexported from live DB, which changes the element order
bugfix: inputs/.TNRS/schema.sql: granted bien_read SELECT access to derived views as well as the core tnrs table
inputs/.TNRS/schema.sql: updated runtime (30 min) and rowcount (+2 million)
fix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: set this to false if Taxonomic_status is Invalid
inputs/.TNRS/schema.sql: added map_taxonomic_status()
inputs/.TNRS/schema.sql, data.sql: updated for PostgreSQL 9.3
inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtime (40 min)
inputs/.TNRS/schema.sql: tnrs: removed no longer used Accepted_scientific_name. use scrubbed_unique_taxon_name instead.
inputs/.TNRS/schema.sql: MatchedTaxon, etc.: removed no longer used acceptedScientificName (from tnrs.Accepted_scientific_name). use scrubbed_unique_taxon_name instead.
inputs/.TNRS/schema.sql: removed no longer used AcceptedTaxon. use taxon_scrub.scrubbed_unique_taxon_name.* instead.
inputs/.TNRS/schema.sql: removed no longer used ScrubbedTaxon. use taxon_scrub instead.
inputs/.TNRS/schema.sql: added taxon_scrub, which combines ValidMatchedTaxon with scrubbed_unique_taxon_name.* instead of AcceptedTaxon
inputs/.TNRS/schema.sql: ValidMatchedTaxon: synced to MatchedTaxon
fix: inputs/.TNRS/schema.sql: scrubbed_taxon_name_with_author: renamed to scrubbed_unique_taxon_name because this also contains the family, and is therefore different from just the taxon name with author
inputs/.TNRS/schema.sql: MatchedTaxon: added scrubbed_taxon_name_with_author
inputs/.TNRS/schema.sql: tnrs: removed Is_homonym, since this did not take into account the never_homonym status (when the author disambiguates) or the ability of a non-homonym at a lower rank to override a homonym at a higher rank. taking these into account just produces the value of is_valid_match.
inputs/.TNRS/schema.sql: tnrs: removed Is_plant, since this functionality is now provided by is_valid_match. note that whether a name is a plant is not meaningful for TNRS, because it can match only plant names (thus a "non-plant" is actually a non-match).
inputs/.TNRS/schema.sql: tnrs: added scrubbed_taxon_name_with_author derived column, which uses the matched name when an accepted name is not available
inputs/.TNRS/schema.sql: tnrs: removed no longer used Max_score. use is_valid_match to determine validity instead.
bugfix: lib/runscripts/file.pg.sql.run: export_(): exclude Source and related tables so that these will be re-created by the staging tables installation instead, ensuring that they are always in sync with the Source/ subdir
inputs/.TNRS/data.sql: updated for new derived columns
inputs/.TNRS/schema.sql: removed no longer used score_ok(). use tnrs.Is_plant instead. (the threshold is still documented in tnrs_populate_fields().)
inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: don't consider Max_score because Is_plant will always be false when the Max_score is insufficient (<0.8)
inputs/.TNRS/schema.sql: schema comment: added steps to remake schema.sql and back up the new TNRS schema. documented that these steps should be run on vegbiendev.
inputs/.TNRS/schema.sql: schema comment: added steps to determine what changes need to be made on vegbiendev
inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtimes (~same)
inputs/.TNRS/schema.sql: tnrs: moved instructions to apply schema changes on vegbiendev to the TNRS schema, because this applies to all elements in the TNRS schema, not just the tnrs table
inputs/.TNRS/schema.sql: score_ok(): don't make it STRICT because this prevents it from being inlined
inputs/.TNRS/schema.sql: tnrs: removed no longer used tnrs_score_ok index. use tnrs__valid_match instead.
bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: documented that this excludes homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous)
bugfix: inputs/.TNRS/schema.sql: ValidMatchedTaxon: exclude inter-kingdom homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous). this uses taxon_scrub__is_valid_match instead of score_ok() to determine whether the result should be included.
inputs/.TNRS/schema.sql: MatchedTaxon: added is_valid_match
inputs/.TNRS/schema.sql: tnrs: added tnrs__valid_match index to facilitate joining to only valid matches
inputs/.TNRS/schema.sql: tnrs: added is_valid_match derived column, to make it easier to select from only those TNRS results that can safely be used as a scrubbed name
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
inputs/.TNRS/schema.sql: added covering indexes on foreign keys where needed. this enables rows to be cascadingly deleted without a full table scan.
inputs/.TNRS/schema.sql: tnrs: instructions for when changing this table's schema: updated to use new `inputs/.TNRS/data.sql.run refresh`
inputs/.TNRS/data.sql.run: added refresh() target which runs inputs/test_taxonomic_names/test_scrub
inputs/.TNRS/schema.sql: tnrs: updated steps to run when changing this table's schema, to use new TNRS editing workflow
inputs/.TNRS/data.sql: re-ran TNRS using `inputs/test_taxonomic_names/test_scrub; rm=1 inputs/.TNRS/data.sql.run export_`
inputs/.TNRS/data.sql: generate from the DB using `rm=1 inputs/.TNRS/data.sql.run export_` instead of being a hand-edited file
added inputs/.TNRS/data.sql.run for syncing data.sql directly with the DB without needing to use inputs/test_taxonomic_names/test_scrub just to export the sample data. (however, when modifying the tnrs table, it may still be easier to generate new sample data using test_scrub rather than refactoring the table in place.)
added lib/runscripts/schema.pg.sql.run and use it in inputs/.TNRS/schema.sql.run
inputs/.TNRS/schema.sql: generate from the DB using `rm=1 inputs/.TNRS/schema.sql.run export_` instead of being a hand-edited file. this makes it much easier to edit the (now frequently-changing) TNRS schema directly in pgAdmin (which is graphical), rather than having to manually copy SQL changes from pgAdmin to the file.
inputs/.TNRS/schema.sql.run: export_(): added usage
added inputs/.TNRS/schema.sql.run, which syncs schema.sql with the DB
inputs/.TNRS/schema.sql: moved source code comments to in-schema COMMENT ON comments so all the info in schema.sql is in the DB
inputs/.TNRS/schema.sql: views that use * as the column list: added comments to indicate that this is the case, so that the views can be updated in place rather than only by reinstalling the TNRS schema
inputs/.TNRS/schema.sql: tnrs: util.set_col_types() runtime: updated for most recent ALTER COLUMN TYPE command (9 min)
inputs/.TNRS/schema.sql: tnrs.Time_submitted: renamed to batch and added fkey to batch.id. this requires including the batch table in inputs/.TNRS/data.sql, so that the fkey is satisfied (batch entries are already added by bin/tnrs_db.
inputs/.TNRS/schema.sql: batch: reset name of id_by_time unique constraint since this field is now in the batch table
inputs/.TNRS/schema.sql: download_settings: renamed to batch_download_settings because this table is actually specific to the batch, and it does not make sense to have a download settings file without a batch
inputs/.TNRS/schema.sql: download_settings.id: added fkey to batch.id to create a 1:1 relationship with optional participation by download_settings. note that this relationship happens to be the same as SQL inheritance, as used in VegCore, but in this case, the 1:1 relationship is not related to inheritance.
inputs/.TNRS/schema.sql: client_version: added table, column comments with info on how to retrieve each value
inputs/.TNRS/schema.sql: added client_version table for svn revisions, with fkey from batch
inputs/.TNRS/schema.sql: added batch table and moved download_settings.time_submitted, id_by_time to it since these are not related to the download_settings file
inputs/.TNRS/schema.sql: added VegCore-style id column as the primary key, instead of using time_submitted directly. this enables always using the same name for the pkey. the pkey is now autopopulated from time_submitted in a trigger, using helper column id_by_time. the user is now also able to specify their own globally-unique ID that is not based on the time_submitted.
inputs/.TNRS/schema.sql: download_settings comment: changed name of button to Download settings, which had gotten auto-replaced to download_settings
inputs/.TNRS/schema.sql: Download settings table: renamed to download_settings because although Download settings is the verbatim name of the button that this info comes from, it is not necessary to name the table a particular way in order to match up data to it correctly, so we can just use the standard naming convention (wiki.vegpath.org/u-name#format) and avoid the need to enclose the name in ""
inputs/.TNRS/schema.sql: added Download settings table, which stores data from http://tnrs.iplantcollaborative.org/TNRSapp.html > Submit List > results section > Download settings > settings.txt
inputs/.TNRS/Source/map.csv: mapped datasetURL
inputs/.TNRS/: switched to new-style import. because this does not have data subdirs (data comes from the TNRS client), this is just a matter of adding ./run.
inputs/.TNRS/Source/: switched to new-style import. this had been missed when all the Source/ subdirs were batch-switched to new-style import.
inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtimes. it now takes 25 min instead of 16 min to regenerate the derived cols.
inputs/*/*/map.csv for CSV tables with a row_num column: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table
bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.
inputs/*/: added table.run for use by the table subdirs in new-style import. datasources without table subdirs do not need this.
inputs/*/: added top-level Makefile which includes inputs/input.Makefile, so that make can be run directly on the datasrc dir without needing to specify `--makefile=../input.Makefile` (see input.Makefile $(selfMake))
bugfix: inputs/*/Source/: use installed staging table (with blank-line data.csv) in order to also work with new-style import. this also fixes a benign diff between the by-row and by-col test outputs, where row-based import would not import the Source/ entries because there was not at least one row in the input. note that in order to ensure that all datasources are properly run, you need to check `svn st|sort` against the datasource schema names to see if any are missing.
added inputs/.TNRS/grants.sql, with statements to provide SELECT access to bien_read. these statements must be in grants.sql to avoid them being filtered out by pg_dump_limit.
inputs/.TNRS/schema.sql: AcceptedTaxon: removed Annotations entry because the accepted name only contains name elements, not additional text (vegpath.org/cf_aff)
inputs/.TNRS/schema.sql: tnrs: vegbiendev update steps: added `make backups/TNRS.backup-remake` to back up TNRS before making changes to it. this provides a more recent restore point than the last import in case the changes mess things up. (however, the last import's backup is usually sufficient unless TNRS has been run since then.)
inputs/.TNRS/schema.sql: tnrs_populate_fields(): added VACUUM ANALYZE and runtime (50 s)
inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (16 min)
inputs/.TNRS/schema.sql: tnrs: documented that when changing this table's schema, you must also make the same changes on vegbiendev. included sample util.set_col_types() call with runtime (4 min).
bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): need to schema-qualify invoked functions
bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): Is_homonym: use the *_is_homonym flag for whichever of genus or family (in that order) is NOT NULL, rather than horizontal-ORing potentially NULL values together
bugfix: inputs/.TNRS/schema.sql: family_is_homonym(), genus_is_homonym(): need to return NULL instead of false when input family/genus is NULL. EXISTS does not support this, so STRICT is used to provide this functionality automatically.
inputs/.TNRS/schema.sql: added family_is_homonym(), genus_is_homonym() and use them in tnrs_populate_fields()
inputs/.TNRS/schema.sql: score_ok(): changed to IMMUTABLE and STRICT
inputs/.TNRS/schema.sql: tnrs_populate_fields(): never_homonym: use Author_score threshold to exclude matches that are too fuzzy to confirm the presence of a plant name author
bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): *_is_homonym: also need to check that there was no Author_matched (i.e. that it could be a homonym). Is_homonym: use new never_homonym var.
inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (18 min)
inputs/.TNRS/schema.sql: tnrs_populate_fields(): when changing this function: UPDATE statement: include TNRS schema since it may not be in the search_path
inputs/.TNRS/schema.sql: tnrs_populate_fields(): Is_plant: also consider homonyms using new family_is_homonym, genus_is_homonym (see wiki.vegpath.org/Result_filtering#taxon_is_plant)
inputs/.TNRS/schema.sql: tnrs: added Is_homonym derived col (uses IRMNG.family_homonym_epithet, genus_homonym_epithet)
inputs/.TNRS/schema.sql: tnrs_populate_fields(): removed no longer needed casts of *_score to double precision
inputs/.TNRS/schema.sql: tnrs: *_score: changed type to double precision because these fields are always floats. this also avoids the need to manually cast them to double precision each time they are used.
inputs/.TNRS/schema.sql: tnrs: Name_number: changed type to integer so it would sort numerically
inputs/.TNRS/schema.sql: added pkey on Time_submitted, Name_number
inputs/.TNRS/schema.sql: changed Name_submitted pkey to a unique constraint to allow adding a pkey on Time_submitted, Name_number instead