lib/tnrs.py: switched to downloading all matches per name, as is needed to implement #917. note that this will break the parts of the schema that use the tnrs table, until Brad's match-picking algorithm can be implemented, but this tradeoff is necessary to be able to begin scrubbing sooner (Martha; wiki.vegpath.org/2014-05-29_conference_call#TNRS)
schemas/vegbien.sql: tnrs_input_name: don't scrub accepted names, as using multiple matches per name no longer provides a single accepted name to scrub. instead, the Accepted_* fields can be whitespace-split to generate the same columns that would have been generated by the scrubbing (and without the overhead of the extra TNRS call).
fix: inputs/.TNRS/schema.sql: added back index on Name_submitted, which is needed for tnrs_input_name to work properly (now that there is no automatic index created by a unique constraint)
fix: inputs/.TNRS/schema.sql: tnrs: removed unique constraint on Name_submitted, Name_matched because there can be more than one match with the same Name_matched (but different accepted names, etc.)
fix: inputs/.TNRS/schema.sql: tnrs.tnrs__valid_match index: made it non-unique to allow multiple matches per name, as is needed to implement #917
bugfix: inputs/.TNRS/schema.sql: tnrs__match_num__fill(): only fill if not set, to support case where tnrs is being restored from a .sql file (where match_num is already set)
inputs/.TNRS/schema.sql: tnrs: documented runtime to add a constraint (3 min)
inputs/.TNRS/schema.sql: unique constraint on Name_submitted: added Name_matched to allow multiple matches per name, as is needed to implement #917
inputs/.TNRS/schema.sql: tnrs: documented how to populate a new column
inputs/.TNRS/schema.sql: tnrs: pkey: use match_num instead of Name_number to allow multiple matches per name, as is needed to implement #917
inputs/.TNRS/schema.sql: tnrs.match_num: made it NOT NULL now that it's populated
inputs/.TNRS/schema.sql: tnrs: populate match_num
inputs/.TNRS/schema.sql: tnrs: documented how to add and remove columns
inputs/.TNRS/schema.sql: made COMMENTs start on their own line, using the steps at wiki.vegpath.org/Postgres_queries#make-COMMENTs-start-on-their-own-line
inputs/.TNRS/schema.sql: tnrs: added match_num
inputs/.TNRS/data.sql.run: refresh(): documented runtime (1 min)
inputs/.TNRS/schema.sql: added tnrs__match_num__next()
inputs/.TNRS/schema.sql: added tnrs__batch_begin() trigger to populate the match_num (match sort order)
inputs/.TNRS/schema.sql: taxon_scrub.scrubbed_unique_taxon_name.*: added scrubbed_taxon_name_with_author, needed by Jeff Ott's analysis (wiki.vegpath.org/Data_requests)
inputs/.TNRS/schema.sql: taxon_scrub: added scrubbed_morphospecies_binomial, analogous to accepted_morphospecies_binomial for scrubbed_*
inputs/.TNRS/schema.sql: taxon_scrub: documented how to modify it
inputs/.TNRS/schema.sql: added taxon_scrub_modify()
inputs/.TNRS/schema.sql: MatchedTaxon_modify(): use simpler util.recreate_view()
inputs/.TNRS/schema.sql: MatchedTaxon_modify(): documented usage
inputs/.TNRS/schema.sql: MatchedTaxon_modify(): removed no longer needed DROP VIEW statement
fix: schemas/util.sql: force_recreate(): renamed to just recreate(), because "force" normally implies that things will be deleted, which this function does not do
fix: inputs/.TNRS/schema.sql: MatchedTaxon.taxonomicStatus: filter using map_taxonomic_status() so that the corrected value is available in the normalized DB, not just analytical_stem
inputs/.TNRS/schema.sql: MatchedTaxon: to modify: use new MatchedTaxon_modify(), which eliminates the work of putting together the dependent views
inputs/.TNRS/schema.sql: added MatchedTaxon_modify()
bugfix: inputs/.TNRS/schema.sql: map_taxonomic_status(): need to use accepted name instead of scrubbed name (which also includes no-opinion names), as described at http://wiki.vegpath.org/2013-11-14_conference_call#taxonomic-fields. this used to be the accepted name, but got switched when the concatenated name was also used to store the matched name for no-opinion names.
inputs/.TNRS/schema.sql: MatchedTaxon: documented how to modify it (using util.force_recreate())
inputs/.TNRS/schema.sql: MatchedTaxon, etc.: added accepted_morphospecies_binomial derived field
inputs/.TNRS/schema.sql: MatchedTaxon.Accepted_name_species: mapped to accepted_species_binomial
fix: inputs/.TNRS/schema.sql: COMMENTs: always include newline before and after
bugfix: inputs/.TNRS/schema.sql: taxon_scrub, etc.: undid rename of accepted name columns to scrubbed_* (r13435), because these are actually not the same (scrubbed_* is the combination of accepted and no-opinion names). the accepted name columns will now be named accepted_*, following the standard naming scheme.
fix: inputs/.TNRS/schema.sql: taxon_scrub, etc.: scrubbed_*: use columns from MatchedTaxon whenever possible, to as much as possible avoid the need to join to taxon_scrub.scrubbed_unique_taxon_name.*
bugfix: inputs/.TNRS/grants.sql: added GRANT statements from schema.sql because these aren't run by `make inputs/.TNRS/reinstall`
inputs/input.Makefile: add: verify/: also svn:ignore *.log
fix: lib/runscripts/file.pg.sql.run: removed include of in_datasrc_dir.run, because this location does not apply to all .sql export scripts
*{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
bugfix: inputs/.TNRS/schema.sql: scrubbed_family: Name_matched_accepted_family was missing from the TNRS results at one point, so we are now using Family_matched as a workaround to populate this. the workaround is for accepted names only, as no opinion names do not have an Accepted_name_family to prepend to the scrubbed name to parse.
inputs/.TNRS/schema.sql: reexported from live DB, which changes the element order
bugfix: inputs/.TNRS/schema.sql: granted bien_read SELECT access to derived views as well as the core tnrs table
inputs/.TNRS/schema.sql: updated runtime (30 min) and rowcount (+2 million)
fix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: set this to false if Taxonomic_status is Invalid
inputs/.TNRS/schema.sql: added map_taxonomic_status()
inputs/.TNRS/schema.sql, data.sql: updated for PostgreSQL 9.3
inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtime (40 min)
inputs/.TNRS/schema.sql: tnrs: removed no longer used Accepted_scientific_name. use scrubbed_unique_taxon_name instead.
inputs/.TNRS/schema.sql: MatchedTaxon, etc.: removed no longer used acceptedScientificName (from tnrs.Accepted_scientific_name). use scrubbed_unique_taxon_name instead.
inputs/.TNRS/schema.sql: removed no longer used AcceptedTaxon. use taxon_scrub.scrubbed_unique_taxon_name.* instead.
inputs/.TNRS/schema.sql: removed no longer used ScrubbedTaxon. use taxon_scrub instead.
inputs/.TNRS/schema.sql: added taxon_scrub, which combines ValidMatchedTaxon with scrubbed_unique_taxon_name.* instead of AcceptedTaxon
inputs/.TNRS/schema.sql: ValidMatchedTaxon: synced to MatchedTaxon
fix: inputs/.TNRS/schema.sql: scrubbed_taxon_name_with_author: renamed to scrubbed_unique_taxon_name because this also contains the family, and is therefore different from just the taxon name with author
inputs/.TNRS/schema.sql: MatchedTaxon: added scrubbed_taxon_name_with_author
inputs/.TNRS/schema.sql: tnrs: removed Is_homonym, since this did not take into account the never_homonym status (when the author disambiguates) or the ability of a non-homonym at a lower rank to override a homonym at a higher rank. taking these into account just produces the value of is_valid_match.
inputs/.TNRS/schema.sql: tnrs: removed Is_plant, since this functionality is now provided by is_valid_match. note that whether a name is a plant is not meaningful for TNRS, because it can match only plant names (thus a "non-plant" is actually a non-match).
inputs/.TNRS/schema.sql: tnrs: added scrubbed_taxon_name_with_author derived column, which uses the matched name when an accepted name is not available
inputs/.TNRS/schema.sql: tnrs: removed no longer used Max_score. use is_valid_match to determine validity instead.
bugfix: lib/runscripts/file.pg.sql.run: export_(): exclude Source and related tables so that these will be re-created by the staging tables installation instead, ensuring that they are always in sync with the Source/ subdir
inputs/.TNRS/data.sql: updated for new derived columns
inputs/.TNRS/schema.sql: removed no longer used score_ok(). use tnrs.Is_plant instead. (the threshold is still documented in tnrs_populate_fields().)
inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: don't consider Max_score because Is_plant will always be false when the Max_score is insufficient (<0.8)
inputs/.TNRS/schema.sql: schema comment: added steps to remake schema.sql and back up the new TNRS schema. documented that these steps should be run on vegbiendev.
inputs/.TNRS/schema.sql: schema comment: added steps to determine what changes need to be made on vegbiendev
inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtimes (~same)
inputs/.TNRS/schema.sql: tnrs: moved instructions to apply schema changes on vegbiendev to the TNRS schema, because this applies to all elements in the TNRS schema, not just the tnrs table
inputs/.TNRS/schema.sql: score_ok(): don't make it STRICT because this prevents it from being inlined
inputs/.TNRS/schema.sql: tnrs: removed no longer used tnrs_score_ok index. use tnrs__valid_match instead.
bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: documented that this excludes homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous)
bugfix: inputs/.TNRS/schema.sql: ValidMatchedTaxon: exclude inter-kingdom homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous). this uses taxon_scrub__is_valid_match instead of score_ok() to determine whether the result should be included.
inputs/.TNRS/schema.sql: MatchedTaxon: added is_valid_match
inputs/.TNRS/schema.sql: tnrs: added tnrs__valid_match index to facilitate joining to only valid matches
inputs/.TNRS/schema.sql: tnrs: added is_valid_match derived column, to make it easier to select from only those TNRS results that can safely be used as a scrubbed name
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
inputs/.TNRS/schema.sql: added covering indexes on foreign keys where needed. this enables rows to be cascadingly deleted without a full table scan.
inputs/.TNRS/schema.sql: tnrs: instructions for when changing this table's schema: updated to use new `inputs/.TNRS/data.sql.run refresh`
inputs/.TNRS/data.sql.run: added refresh() target which runs inputs/test_taxonomic_names/test_scrub
inputs/.TNRS/schema.sql: tnrs: updated steps to run when changing this table's schema, to use new TNRS editing workflow
inputs/.TNRS/data.sql: re-ran TNRS using `inputs/test_taxonomic_names/test_scrub; rm=1 inputs/.TNRS/data.sql.run export_`
inputs/.TNRS/data.sql: generate from the DB using `rm=1 inputs/.TNRS/data.sql.run export_` instead of being a hand-edited file
added inputs/.TNRS/data.sql.run for syncing data.sql directly with the DB without needing to use inputs/test_taxonomic_names/test_scrub just to export the sample data. (however, when modifying the tnrs table, it may still be easier to generate new sample data using test_scrub rather than refactoring the table in place.)
added lib/runscripts/schema.pg.sql.run and use it in inputs/.TNRS/schema.sql.run
inputs/.TNRS/schema.sql: generate from the DB using `rm=1 inputs/.TNRS/schema.sql.run export_` instead of being a hand-edited file. this makes it much easier to edit the (now frequently-changing) TNRS schema directly in pgAdmin (which is graphical), rather than having to manually copy SQL changes from pgAdmin to the file.
inputs/.TNRS/schema.sql.run: export_(): added usage
added inputs/.TNRS/schema.sql.run, which syncs schema.sql with the DB
inputs/.TNRS/schema.sql: moved source code comments to in-schema COMMENT ON comments so all the info in schema.sql is in the DB
inputs/.TNRS/schema.sql: views that use * as the column list: added comments to indicate that this is the case, so that the views can be updated in place rather than only by reinstalling the TNRS schema
inputs/.TNRS/schema.sql: tnrs: util.set_col_types() runtime: updated for most recent ALTER COLUMN TYPE command (9 min)
inputs/.TNRS/schema.sql: tnrs.Time_submitted: renamed to batch and added fkey to batch.id. this requires including the batch table in inputs/.TNRS/data.sql, so that the fkey is satisfied (batch entries are already added by bin/tnrs_db.
inputs/.TNRS/schema.sql: batch: reset name of id_by_time unique constraint since this field is now in the batch table
inputs/.TNRS/schema.sql: download_settings: renamed to batch_download_settings because this table is actually specific to the batch, and it does not make sense to have a download settings file without a batch