/trunk/inputs/.TNRS - Changes - BIEN 3 - NCEAS Projects

root/trunk/inputs/.TNRS @ 13601

svn:ignore: *

#	Date	Author	Comment
13591	06/02/2014 04:50 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: switched to downloading all matches per name, as is needed to implement #917. note that this will break the parts of the schema that use the tnrs table, until Brad's match-picking algorithm can be implemented, but this tradeoff is necessary to be able to begin scrubbing sooner (Martha; wiki.vegpath.org/2014-05-29_conference_call#TNRS)
13590	06/02/2014 04:35 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: tnrs_input_name: don't scrub accepted names, as using multiple matches per name no longer provides a single accepted name to scrub. instead, the Accepted_* fields can be whitespace-split to generate the same columns that would have been generated by the scrubbing (and without the overhead of the extra TNRS call).
13589	06/02/2014 04:27 AM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: added back index on Name_submitted, which is needed for tnrs_input_name to work properly (now that there is no automatic index created by a unique constraint)
13587	06/02/2014 03:43 AM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: tnrs: removed unique constraint on Name_submitted, Name_matched because there can be more than one match with the same Name_matched (but different accepted names, etc.)
13586	06/01/2014 09:00 PM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: tnrs.tnrs__valid_match index: made it non-unique to allow multiple matches per name, as is needed to implement #917
13585	06/01/2014 05:00 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: tnrs__match_num__fill(): only fill if not set, to support case where tnrs is being restored from a .sql file (where match_num is already set)
13584	06/01/2014 04:36 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: documented runtime to add a constraint (3 min)
13583	06/01/2014 04:35 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: unique constraint on Name_submitted: added Name_matched to allow multiple matches per name, as is needed to implement #917
13582	06/01/2014 03:44 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: documented how to populate a new column
13581	06/01/2014 03:41 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: pkey: use match_num instead of Name_number to allow multiple matches per name, as is needed to implement #917
13580	05/31/2014 10:31 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs.match_num: made it NOT NULL now that it's populated
13579	05/31/2014 10:28 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: populate match_num
13578	05/31/2014 10:25 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: populate match_num
13577	05/31/2014 09:50 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: documented how to add and remove columns
13575	05/31/2014 08:58 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: made COMMENTs start on their own line, using the steps at wiki.vegpath.org/Postgres_queries#make-COMMENTs-start-on-their-own-line
13573	05/31/2014 08:10 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: added match_num
13572	05/31/2014 08:06 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql.run: refresh(): documented runtime (1 min)
13570	05/31/2014 06:44 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added tnrs__match_num__next()
13567	05/30/2014 06:34 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added tnrs__batch_begin() trigger to populate the match_num (match sort order)
13540	05/27/2014 10:13 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: taxon_scrub.scrubbed_unique_taxon_name.*: added scrubbed_taxon_name_with_author, needed by Jeff Ott's analysis (wiki.vegpath.org/Data_requests)
13533	05/27/2014 12:28 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: taxon_scrub: added scrubbed_morphospecies_binomial, analogous to accepted_morphospecies_binomial for scrubbed_*
13532	05/27/2014 12:13 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: taxon_scrub: added scrubbed_morphospecies_binomial, analogous to accepted_morphospecies_binomial for scrubbed_*
13531	05/26/2014 11:54 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: taxon_scrub: documented how to modify it
13528	05/26/2014 11:20 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added taxon_scrub_modify()
13527	05/23/2014 06:17 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon_modify(): use simpler util.recreate_view()
13526	05/23/2014 06:15 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon_modify(): documented usage
13518	05/21/2014 07:30 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon_modify(): removed no longer needed DROP VIEW statement
13512	05/21/2014 05:50 PM	Aaron Marcuse-Kubitza	fix: schemas/util.sql: force_recreate(): renamed to just recreate(), because "force" normally implies that things will be deleted, which this function does not do
13508	05/21/2014 04:25 PM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: MatchedTaxon.taxonomicStatus: filter using map_taxonomic_status() so that the corrected value is available in the normalized DB, not just analytical_stem
13507	05/21/2014 04:05 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon: to modify: use new MatchedTaxon_modify(), which eliminates the work of putting together the dependent views
13506	05/21/2014 03:53 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added MatchedTaxon_modify()
13503	05/21/2014 04:13 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: map_taxonomic_status(): need to use accepted name instead of scrubbed name (which also includes no-opinion names), as described at http://wiki.vegpath.org/2013-11-14_conference_call#taxonomic-fields. this used to be the accepted name, but got switched when the concatenated name was also used to store the matched name for no-opinion names.
13501	05/21/2014 01:27 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon: documented how to modify it (using util.force_recreate())
13498	05/20/2014 05:46 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon, etc.: added accepted_morphospecies_binomial derived field
13444	05/13/2014 04:50 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon.Accepted_name_species: mapped to accepted_species_binomial
13443	05/13/2014 04:09 AM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: COMMENTs: always include newline before and after
13441	05/13/2014 03:46 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: taxon_scrub, etc.: undid rename of accepted name columns to scrubbed_* (r13435), because these are actually not the same (scrubbed_* is the combination of accepted and no-opinion names). the accepted name columns will now be named accepted_*, following the standard naming scheme.
13439	05/13/2014 03:13 AM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: taxon_scrub, etc.: scrubbed_: use columns from MatchedTaxon whenever possible, to as much as possible avoid the need to join to taxon_scrub.scrubbed_unique_taxon_name.
13437	05/13/2014 02:29 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/grants.sql: added GRANT statements from schema.sql because these aren't run by `make inputs/.TNRS/reinstall`
13401	05/03/2014 02:03 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: add: verify/: also svn:ignore *.log
13372	05/01/2014 01:29 PM	Aaron Marcuse-Kubitza	fix: lib/runscripts/file.pg.sql.run: removed include of in_datasrc_dir.run, because this location does not apply to all .sql export scripts
12779	03/20/2014 07:58 PM	Aaron Marcuse-Kubitza	*{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
12018	02/02/2014 12:49 AM	Aaron Marcuse-Kubitza	inputs/input.Makefile: add!: verify/: also svn:ignore .tsv, .txt
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11965	01/16/2014 01:22 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: scrubbed_family: Name_matched_accepted_family was missing from the TNRS results at one point, so we are now using Family_matched as a workaround to populate this. the workaround is for accepted names only, as no opinion names do not have an Accepted_name_family to prepend to the scrubbed name to parse.
11964	01/16/2014 01:19 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: reexported from live DB, which changes the element order
11912	12/16/2013 01:43 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: granted bien_read SELECT access to derived views as well as the core tnrs table
11715	11/21/2013 11:08 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: updated runtime (30 min) and rowcount (+2 million)
11711	11/21/2013 09:04 AM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: set this to false if Taxonomic_status is Invalid
11709	11/21/2013 08:49 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added map_taxonomic_status()
11708	11/21/2013 08:48 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql, data.sql: updated for PostgreSQL 9.3
11647	11/13/2013 02:48 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtime (40 min)
11643	11/10/2013 07:02 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed no longer used Accepted_scientific_name. use scrubbed_unique_taxon_name instead.
11642	11/10/2013 07:00 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon, etc.: removed no longer used acceptedScientificName (from tnrs.Accepted_scientific_name). use scrubbed_unique_taxon_name instead.
11641	11/10/2013 06:43 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: removed no longer used AcceptedTaxon. use taxon_scrub.scrubbed_unique_taxon_name.* instead.
11637	11/10/2013 05:55 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: removed no longer used ScrubbedTaxon. use taxon_scrub instead.
11634	11/10/2013 04:11 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added taxon_scrub, which combines ValidMatchedTaxon with scrubbed_unique_taxon_name.* instead of AcceptedTaxon
11633	11/10/2013 03:38 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: ValidMatchedTaxon: synced to MatchedTaxon
11632	11/10/2013 03:22 PM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: scrubbed_taxon_name_with_author: renamed to scrubbed_unique_taxon_name because this also contains the family, and is therefore different from just the taxon name with author
11631	11/10/2013 01:50 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon: added scrubbed_taxon_name_with_author
11630	11/10/2013 01:23 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed Is_homonym, since this did not take into account the never_homonym status (when the author disambiguates) or the ability of a non-homonym at a lower rank to override a homonym at a higher rank. taking these into account just produces the value of is_valid_match.
11629	11/10/2013 01:19 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed Is_plant, since this functionality is now provided by is_valid_match. note that whether a name is a plant is not meaningful for TNRS, because it can match only plant names (thus a "non-plant" is actually a non-match).
11628	11/10/2013 01:06 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: added scrubbed_taxon_name_with_author derived column, which uses the matched name when an accepted name is not available
11627	11/10/2013 09:44 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed no longer used Max_score. use is_valid_match to determine validity instead.
11626	11/10/2013 12:09 AM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/file.pg.sql.run: export_(): exclude Source and related tables so that these will be re-created by the staging tables installation instead, ensuring that they are always in sync with the Source/ subdir
11625	11/10/2013 12:08 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql: updated for new derived columns
11624	11/10/2013 12:04 AM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/file.pg.sql.run: export_(): exclude Source and related tables so that these will be re-created by the staging tables installation instead, ensuring that they are always in sync with the Source/ subdir
11619	11/09/2013 04:47 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: removed no longer used score_ok(). use tnrs.Is_plant instead. (the threshold is still documented in tnrs_populate_fields().)
11618	11/09/2013 04:45 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: don't consider Max_score because Is_plant will always be false when the Max_score is insufficient (<0.8)
11617	11/09/2013 04:20 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: schema comment: added steps to remake schema.sql and back up the new TNRS schema. documented that these steps should be run on vegbiendev.
11616	11/09/2013 04:16 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: schema comment: added steps to determine what changes need to be made on vegbiendev
11615	11/09/2013 04:01 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtimes (~same)
11614	11/09/2013 03:54 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: moved instructions to apply schema changes on vegbiendev to the TNRS schema, because this applies to all elements in the TNRS schema, not just the tnrs table
11613	11/09/2013 03:30 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: score_ok(): don't make it STRICT because this prevents it from being inlined
11612	11/09/2013 03:24 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed no longer used tnrs_score_ok index. use tnrs__valid_match instead.
11611	11/09/2013 03:09 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: documented that this excludes homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous)
11610	11/09/2013 03:07 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: ValidMatchedTaxon: exclude inter-kingdom homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous). this uses taxon_scrub__is_valid_match instead of score_ok() to determine whether the result should be included.
11609	11/09/2013 02:56 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: ValidMatchedTaxon: synced to MatchedTaxon
11608	11/09/2013 02:55 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon: added is_valid_match
11607	11/09/2013 02:52 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: added tnrs__valid_match index to facilitate joining to only valid matches
11606	11/09/2013 02:48 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: added is_valid_match derived column, to make it easier to select from only those TNRS results that can safely be used as a scrubbed name
11396	10/21/2013 07:14 PM	Aaron Marcuse-Kubitza	fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
10866	09/04/2013 11:06 PM	Aaron Marcuse-Kubitza	inputs///test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
10793	08/29/2013 02:07 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added covering indexes on foreign keys where needed. this enables rows to be cascadingly deleted without a full table scan.
10790	08/27/2013 10:52 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: instructions for when changing this table's schema: updated to use new `inputs/.TNRS/data.sql.run refresh`
10789	08/27/2013 10:50 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql.run: added refresh() target which runs inputs/test_taxonomic_names/test_scrub
10787	08/27/2013 10:32 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: updated steps to run when changing this table's schema, to use new TNRS editing workflow
10786	08/27/2013 10:14 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql: re-ran TNRS using `inputs/test_taxonomic_names/test_scrub; rm=1 inputs/.TNRS/data.sql.run export_`
10783	08/27/2013 09:53 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql: generate from the DB using `rm=1 inputs/.TNRS/data.sql.run export_` instead of being a hand-edited file
10782	08/27/2013 09:50 PM	Aaron Marcuse-Kubitza	added inputs/.TNRS/data.sql.run for syncing data.sql directly with the DB without needing to use inputs/test_taxonomic_names/test_scrub just to export the sample data. (however, when modifying the tnrs table, it may still be easier to generate new sample data using test_scrub rather than refactoring the table in place.)
10779	08/27/2013 09:25 PM	Aaron Marcuse-Kubitza	added lib/runscripts/schema.pg.sql.run and use it in inputs/.TNRS/schema.sql.run
10778	08/27/2013 09:18 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: generate from the DB using `rm=1 inputs/.TNRS/schema.sql.run export_` instead of being a hand-edited file. this makes it much easier to edit the (now frequently-changing) TNRS schema directly in pgAdmin (which is graphical), rather than having to manually copy SQL changes from pgAdmin to the file.
10777	08/27/2013 09:15 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql.run: export_(): added usage
10776	08/27/2013 09:12 PM	Aaron Marcuse-Kubitza	added inputs/.TNRS/schema.sql.run, which syncs schema.sql with the DB
10754	08/27/2013 01:54 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: moved source code comments to in-schema COMMENT ON comments so all the info in schema.sql is in the DB
10753	08/27/2013 01:47 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: views that use * as the column list: added comments to indicate that this is the case, so that the views can be updated in place rather than only by reinstalling the TNRS schema
10747	08/27/2013 12:49 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: util.set_col_types() runtime: updated for most recent ALTER COLUMN TYPE command (9 min)
10746	08/27/2013 12:25 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs.Time_submitted: renamed to batch and added fkey to batch.id. this requires including the batch table in inputs/.TNRS/data.sql, so that the fkey is satisfied (batch entries are already added by bin/tnrs_db.
10741	08/26/2013 07:48 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: batch: reset name of id_by_time unique constraint since this field is now in the batch table
10740	08/26/2013 07:46 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: download_settings: renamed to batch_download_settings because this table is actually specific to the batch, and it does not make sense to have a download settings file without a batch

Project

General

Profile