/trunk/inputs/.TNRS - Changes - BIEN 3 - NCEAS Projects

root/trunk/inputs/.TNRS @ 13372

svn:ignore: *

#	Date	Author	Comment
13372	05/01/2014 01:29 PM	Aaron Marcuse-Kubitza	fix: lib/runscripts/file.pg.sql.run: removed include of in_datasrc_dir.run, because this location does not apply to all .sql export scripts
12779	03/20/2014 07:58 PM	Aaron Marcuse-Kubitza	*{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
12018	02/02/2014 12:49 AM	Aaron Marcuse-Kubitza	inputs/input.Makefile: add!: verify/: also svn:ignore .tsv, .txt
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11965	01/16/2014 01:22 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: scrubbed_family: Name_matched_accepted_family was missing from the TNRS results at one point, so we are now using Family_matched as a workaround to populate this. the workaround is for accepted names only, as no opinion names do not have an Accepted_name_family to prepend to the scrubbed name to parse.
11964	01/16/2014 01:19 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: reexported from live DB, which changes the element order
11912	12/16/2013 01:43 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: granted bien_read SELECT access to derived views as well as the core tnrs table
11715	11/21/2013 11:08 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: updated runtime (30 min) and rowcount (+2 million)
11711	11/21/2013 09:04 AM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: set this to false if Taxonomic_status is Invalid
11709	11/21/2013 08:49 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added map_taxonomic_status()
11708	11/21/2013 08:48 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql, data.sql: updated for PostgreSQL 9.3
11647	11/13/2013 02:48 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtime (40 min)
11643	11/10/2013 07:02 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed no longer used Accepted_scientific_name. use scrubbed_unique_taxon_name instead.
11642	11/10/2013 07:00 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon, etc.: removed no longer used acceptedScientificName (from tnrs.Accepted_scientific_name). use scrubbed_unique_taxon_name instead.
11641	11/10/2013 06:43 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: removed no longer used AcceptedTaxon. use taxon_scrub.scrubbed_unique_taxon_name.* instead.
11637	11/10/2013 05:55 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: removed no longer used ScrubbedTaxon. use taxon_scrub instead.
11634	11/10/2013 04:11 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added taxon_scrub, which combines ValidMatchedTaxon with scrubbed_unique_taxon_name.* instead of AcceptedTaxon
11633	11/10/2013 03:38 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: ValidMatchedTaxon: synced to MatchedTaxon
11632	11/10/2013 03:22 PM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: scrubbed_taxon_name_with_author: renamed to scrubbed_unique_taxon_name because this also contains the family, and is therefore different from just the taxon name with author
11631	11/10/2013 01:50 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon: added scrubbed_taxon_name_with_author
11630	11/10/2013 01:23 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed Is_homonym, since this did not take into account the never_homonym status (when the author disambiguates) or the ability of a non-homonym at a lower rank to override a homonym at a higher rank. taking these into account just produces the value of is_valid_match.
11629	11/10/2013 01:19 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed Is_plant, since this functionality is now provided by is_valid_match. note that whether a name is a plant is not meaningful for TNRS, because it can match only plant names (thus a "non-plant" is actually a non-match).
11628	11/10/2013 01:06 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: added scrubbed_taxon_name_with_author derived column, which uses the matched name when an accepted name is not available
11627	11/10/2013 09:44 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed no longer used Max_score. use is_valid_match to determine validity instead.
11626	11/10/2013 12:09 AM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/file.pg.sql.run: export_(): exclude Source and related tables so that these will be re-created by the staging tables installation instead, ensuring that they are always in sync with the Source/ subdir
11625	11/10/2013 12:08 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql: updated for new derived columns
11624	11/10/2013 12:04 AM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/file.pg.sql.run: export_(): exclude Source and related tables so that these will be re-created by the staging tables installation instead, ensuring that they are always in sync with the Source/ subdir
11619	11/09/2013 04:47 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: removed no longer used score_ok(). use tnrs.Is_plant instead. (the threshold is still documented in tnrs_populate_fields().)
11618	11/09/2013 04:45 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: don't consider Max_score because Is_plant will always be false when the Max_score is insufficient (<0.8)
11617	11/09/2013 04:20 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: schema comment: added steps to remake schema.sql and back up the new TNRS schema. documented that these steps should be run on vegbiendev.
11616	11/09/2013 04:16 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: schema comment: added steps to determine what changes need to be made on vegbiendev
11615	11/09/2013 04:01 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtimes (~same)
11614	11/09/2013 03:54 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: moved instructions to apply schema changes on vegbiendev to the TNRS schema, because this applies to all elements in the TNRS schema, not just the tnrs table
11613	11/09/2013 03:30 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: score_ok(): don't make it STRICT because this prevents it from being inlined
11612	11/09/2013 03:24 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: removed no longer used tnrs_score_ok index. use tnrs__valid_match instead.
11611	11/09/2013 03:09 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: documented that this excludes homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous)
11610	11/09/2013 03:07 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: ValidMatchedTaxon: exclude inter-kingdom homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous). this uses taxon_scrub__is_valid_match instead of score_ok() to determine whether the result should be included.
11609	11/09/2013 02:56 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: ValidMatchedTaxon: synced to MatchedTaxon
11608	11/09/2013 02:55 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon: added is_valid_match
11607	11/09/2013 02:52 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: added tnrs__valid_match index to facilitate joining to only valid matches
11606	11/09/2013 02:48 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: added is_valid_match derived column, to make it easier to select from only those TNRS results that can safely be used as a scrubbed name
11396	10/21/2013 07:14 PM	Aaron Marcuse-Kubitza	fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
10866	09/04/2013 11:06 PM	Aaron Marcuse-Kubitza	inputs///test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
10793	08/29/2013 02:07 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added covering indexes on foreign keys where needed. this enables rows to be cascadingly deleted without a full table scan.
10790	08/27/2013 10:52 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: instructions for when changing this table's schema: updated to use new `inputs/.TNRS/data.sql.run refresh`
10789	08/27/2013 10:50 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql.run: added refresh() target which runs inputs/test_taxonomic_names/test_scrub
10787	08/27/2013 10:32 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: updated steps to run when changing this table's schema, to use new TNRS editing workflow
10786	08/27/2013 10:14 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql: re-ran TNRS using `inputs/test_taxonomic_names/test_scrub; rm=1 inputs/.TNRS/data.sql.run export_`
10783	08/27/2013 09:53 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/data.sql: generate from the DB using `rm=1 inputs/.TNRS/data.sql.run export_` instead of being a hand-edited file
10782	08/27/2013 09:50 PM	Aaron Marcuse-Kubitza	added inputs/.TNRS/data.sql.run for syncing data.sql directly with the DB without needing to use inputs/test_taxonomic_names/test_scrub just to export the sample data. (however, when modifying the tnrs table, it may still be easier to generate new sample data using test_scrub rather than refactoring the table in place.)
10779	08/27/2013 09:25 PM	Aaron Marcuse-Kubitza	added lib/runscripts/schema.pg.sql.run and use it in inputs/.TNRS/schema.sql.run
10778	08/27/2013 09:18 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: generate from the DB using `rm=1 inputs/.TNRS/schema.sql.run export_` instead of being a hand-edited file. this makes it much easier to edit the (now frequently-changing) TNRS schema directly in pgAdmin (which is graphical), rather than having to manually copy SQL changes from pgAdmin to the file.
10777	08/27/2013 09:15 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql.run: export_(): added usage
10776	08/27/2013 09:12 PM	Aaron Marcuse-Kubitza	added inputs/.TNRS/schema.sql.run, which syncs schema.sql with the DB
10754	08/27/2013 01:54 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: moved source code comments to in-schema COMMENT ON comments so all the info in schema.sql is in the DB
10753	08/27/2013 01:47 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: views that use * as the column list: added comments to indicate that this is the case, so that the views can be updated in place rather than only by reinstalling the TNRS schema
10747	08/27/2013 12:49 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: util.set_col_types() runtime: updated for most recent ALTER COLUMN TYPE command (9 min)
10746	08/27/2013 12:25 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs.Time_submitted: renamed to batch and added fkey to batch.id. this requires including the batch table in inputs/.TNRS/data.sql, so that the fkey is satisfied (batch entries are already added by bin/tnrs_db.
10741	08/26/2013 07:48 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: batch: reset name of id_by_time unique constraint since this field is now in the batch table
10740	08/26/2013 07:46 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: download_settings: renamed to batch_download_settings because this table is actually specific to the batch, and it does not make sense to have a download settings file without a batch
10739	08/26/2013 07:32 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: download_settings.id: added fkey to batch.id to create a 1:1 relationship with optional participation by download_settings. note that this relationship happens to be the same as SQL inheritance, as used in VegCore, but in this case, the 1:1 relationship is not related to inheritance.
10738	08/26/2013 06:30 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: client_version: added table, column comments with info on how to retrieve each value
10737	08/26/2013 06:28 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added client_version table for svn revisions, with fkey from batch
10736	08/26/2013 06:23 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added batch table and moved download_settings.time_submitted, id_by_time to it since these are not related to the download_settings file
10728	08/26/2013 01:49 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added VegCore-style id column as the primary key, instead of using time_submitted directly. this enables always using the same name for the pkey. the pkey is now autopopulated from time_submitted in a trigger, using helper column id_by_time. the user is now also able to specify their own globally-unique ID that is not based on the time_submitted.
10727	08/25/2013 11:22 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: download_settings comment: changed name of button to Download settings, which had gotten auto-replaced to download_settings
10726	08/25/2013 11:08 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Download settings table: renamed to download_settings because although Download settings is the verbatim name of the button that this info comes from, it is not necessary to name the table a particular way in order to match up data to it correctly, so we can just use the standard naming convention (wiki.vegpath.org/u-name#format) and avoid the need to enclose the name in ""
10725	08/24/2013 06:00 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added Download settings table, which stores data from http://tnrs.iplantcollaborative.org/TNRSapp.html > Submit List > results section > Download settings > settings.txt
10724	08/24/2013 04:07 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/Source/map.csv: mapped datasetURL
10294	07/17/2013 04:54 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/: switched to new-style import. because this does not have data subdirs (data comes from the TNRS client), this is just a matter of adding ./run.
10293	07/17/2013 04:53 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/Source/: switched to new-style import. this had been missed when all the Source/ subdirs were batch-switched to new-style import.
10280	07/14/2013 03:26 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtimes. it now takes 25 min instead of 16 min to regenerate the derived cols.
10209	07/10/2013 02:32 AM	Aaron Marcuse-Kubitza	inputs///map.csv for CSV tables with a row_num column: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table
10208	07/10/2013 02:27 AM	Aaron Marcuse-Kubitza	bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.
10199	07/09/2013 04:44 PM	Aaron Marcuse-Kubitza	bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.
10179	07/06/2013 05:39 PM	Aaron Marcuse-Kubitza	inputs/*/: added table.run for use by the table subdirs in new-style import. datasources without table subdirs do not need this.
10178	07/06/2013 05:35 PM	Aaron Marcuse-Kubitza	inputs/*/: added top-level Makefile which includes inputs/input.Makefile, so that make can be run directly on the datasrc dir without needing to specify `--makefile=../input.Makefile` (see input.Makefile $(selfMake))
10170	07/06/2013 02:26 PM	Aaron Marcuse-Kubitza	bugfix: inputs/*/Source/: use installed staging table (with blank-line data.csv) in order to also work with new-style import. this also fixes a benign diff between the by-row and by-col test outputs, where row-based import would not import the Source/ entries because there was not at least one row in the input. note that in order to ensure that all datasources are properly run, you need to check `svn st\|sort` against the datasource schema names to see if any are missing.
10108	06/28/2013 04:48 PM	Aaron Marcuse-Kubitza	added inputs/.TNRS/grants.sql, with statements to provide SELECT access to bien_read. these statements must be in grants.sql to avoid them being filtered out by pg_dump_limit.
10094	06/27/2013 03:47 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: AcceptedTaxon: removed Annotations entry because the accepted name only contains name elements, not additional text (vegpath.org/cf_aff)
9994	06/20/2013 06:10 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: vegbiendev update steps: added `make backups/TNRS.backup-remake` to back up TNRS before making changes to it. this provides a more recent restore point than the last import in case the changes mess things up. (however, the last import's backup is usually sufficient unless TNRS has been run since then.)
9993	06/20/2013 05:53 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): added VACUUM ANALYZE and runtime (50 s)
9992	06/20/2013 05:42 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (16 min)
9990	06/20/2013 04:49 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: documented that when changing this table's schema, you must also make the same changes on vegbiendev. included sample util.set_col_types() call with runtime (4 min).
9989	06/20/2013 03:58 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (16 min)
9988	06/20/2013 03:32 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): need to schema-qualify invoked functions
9987	06/20/2013 03:29 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): Is_homonym: use the *_is_homonym flag for whichever of genus or family (in that order) is NOT NULL, rather than horizontal-ORing potentially NULL values together
9986	06/20/2013 03:22 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: family_is_homonym(), genus_is_homonym(): need to return NULL instead of false when input family/genus is NULL. EXISTS does not support this, so STRICT is used to provide this functionality automatically.
9985	06/20/2013 03:19 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: added family_is_homonym(), genus_is_homonym() and use them in tnrs_populate_fields()
9984	06/20/2013 03:15 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: score_ok(): changed to IMMUTABLE and STRICT
9983	06/20/2013 03:14 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (16 min)
9982	06/20/2013 02:41 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): never_homonym: use Author_score threshold to exclude matches that are too fuzzy to confirm the presence of a plant name author
9981	06/20/2013 02:38 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): *_is_homonym: also need to check that there was no Author_matched (i.e. that it could be a homonym). Is_homonym: use new never_homonym var.
9980	06/20/2013 02:18 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (18 min)
9974	06/20/2013 08:20 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): when changing this function: UPDATE statement: include TNRS schema since it may not be in the search_path
9973	06/20/2013 08:14 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): Is_plant: also consider homonyms using new family_is_homonym, genus_is_homonym (see wiki.vegpath.org/Result_filtering#taxon_is_plant)
9972	06/20/2013 08:03 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: added Is_homonym derived col (uses IRMNG.family_homonym_epithet, genus_homonym_epithet)
9914	06/18/2013 06:21 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_fields(): removed no longer needed casts of *_score to double precision
9913	06/18/2013 06:06 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: *_score: changed type to double precision because these fields are always floats. this also avoids the need to manually cast them to double precision each time they are used.
9908	06/18/2013 04:27 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs: Name_number: changed type to integer so it would sort numerically

Project

General

Profile