Project

General

Profile

# Date Author Comment
11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11965 01/16/2014 01:22 AM Aaron Marcuse-Kubitza

bugfix: inputs/.TNRS/schema.sql: scrubbed_family: Name_matched_accepted_family was missing from the TNRS results at one point, so we are now using Family_matched as a workaround to populate this. the workaround is for accepted names only, as no opinion names do not have an Accepted_name_family to prepend to the scrubbed name to parse.

11964 01/16/2014 01:19 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: reexported from live DB, which changes the element order

11912 12/16/2013 01:43 PM Aaron Marcuse-Kubitza

bugfix: inputs/.TNRS/schema.sql: granted bien_read SELECT access to derived views as well as the core tnrs table

11715 11/21/2013 11:08 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: updated runtime (30 min) and rowcount (+2 million)

11711 11/21/2013 09:04 AM Aaron Marcuse-Kubitza

fix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: set this to false if Taxonomic_status is Invalid

11709 11/21/2013 08:49 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added map_taxonomic_status()

11708 11/21/2013 08:48 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql, data.sql: updated for PostgreSQL 9.3

11647 11/13/2013 02:48 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtime (40 min)

11643 11/10/2013 07:02 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: removed no longer used Accepted_scientific_name. use scrubbed_unique_taxon_name instead.

11642 11/10/2013 07:00 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: MatchedTaxon, etc.: removed no longer used acceptedScientificName (from tnrs.Accepted_scientific_name). use scrubbed_unique_taxon_name instead.

11641 11/10/2013 06:43 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: removed no longer used AcceptedTaxon. use taxon_scrub.scrubbed_unique_taxon_name.* instead.

11637 11/10/2013 05:55 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: removed no longer used ScrubbedTaxon. use taxon_scrub instead.

11634 11/10/2013 04:11 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added taxon_scrub, which combines ValidMatchedTaxon with scrubbed_unique_taxon_name.* instead of AcceptedTaxon

11633 11/10/2013 03:38 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: ValidMatchedTaxon: synced to MatchedTaxon

11632 11/10/2013 03:22 PM Aaron Marcuse-Kubitza

fix: inputs/.TNRS/schema.sql: scrubbed_taxon_name_with_author: renamed to scrubbed_unique_taxon_name because this also contains the family, and is therefore different from just the taxon name with author

11631 11/10/2013 01:50 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: MatchedTaxon: added scrubbed_taxon_name_with_author

11630 11/10/2013 01:23 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: removed Is_homonym, since this did not take into account the never_homonym status (when the author disambiguates) or the ability of a non-homonym at a lower rank to override a homonym at a higher rank. taking these into account just produces the value of is_valid_match.

11629 11/10/2013 01:19 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: removed Is_plant, since this functionality is now provided by is_valid_match. note that whether a name is a plant is not meaningful for TNRS, because it can match only plant names (thus a "non-plant" is actually a non-match).

11628 11/10/2013 01:06 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: added scrubbed_taxon_name_with_author derived column, which uses the matched name when an accepted name is not available

11627 11/10/2013 09:44 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: removed no longer used Max_score. use is_valid_match to determine validity instead.

11624 11/10/2013 12:04 AM Aaron Marcuse-Kubitza

bugfix: lib/runscripts/file.pg.sql.run: export_(): exclude Source and related tables so that these will be re-created by the staging tables installation instead, ensuring that they are always in sync with the Source/ subdir

11619 11/09/2013 04:47 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: removed no longer used score_ok(). use tnrs.Is_plant instead. (the threshold is still documented in tnrs_populate_fields().)

11618 11/09/2013 04:45 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: don't consider Max_score because Is_plant will always be false when the Max_score is insufficient (<0.8)

11617 11/09/2013 04:20 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: schema comment: added steps to remake schema.sql and back up the new TNRS schema. documented that these steps should be run on vegbiendev.

11616 11/09/2013 04:16 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: schema comment: added steps to determine what changes need to be made on vegbiendev

11615 11/09/2013 04:01 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): regenerate the derived cols: updated runtimes (~same)

11614 11/09/2013 03:54 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: moved instructions to apply schema changes on vegbiendev to the TNRS schema, because this applies to all elements in the TNRS schema, not just the tnrs table

11613 11/09/2013 03:30 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: score_ok(): don't make it STRICT because this prevents it from being inlined

11612 11/09/2013 03:24 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: removed no longer used tnrs_score_ok index. use tnrs__valid_match instead.

11611 11/09/2013 03:09 PM Aaron Marcuse-Kubitza

bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: documented that this excludes homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous)

11610 11/09/2013 03:07 PM Aaron Marcuse-Kubitza

bugfix: inputs/.TNRS/schema.sql: ValidMatchedTaxon: exclude inter-kingdom homonyms because these are not valid matches (i.e. TNRS provides a name, but the name is not meaningful because it is not unambiguous). this uses taxon_scrub__is_valid_match instead of score_ok() to determine whether the result should be included.

11609 11/09/2013 02:56 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: ValidMatchedTaxon: synced to MatchedTaxon

11608 11/09/2013 02:55 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: MatchedTaxon: added is_valid_match

11607 11/09/2013 02:52 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: added tnrs__valid_match index to facilitate joining to only valid matches

11606 11/09/2013 02:48 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: added is_valid_match derived column, to make it easier to select from only those TNRS results that can safely be used as a scrubbed name

10793 08/29/2013 02:07 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added covering indexes on foreign keys where needed. this enables rows to be cascadingly deleted without a full table scan.

10790 08/27/2013 10:52 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: instructions for when changing this table's schema: updated to use new `inputs/.TNRS/data.sql.run refresh`

10787 08/27/2013 10:32 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: updated steps to run when changing this table's schema, to use new TNRS editing workflow

10778 08/27/2013 09:18 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: generate from the DB using `rm=1 inputs/.TNRS/schema.sql.run export_` instead of being a hand-edited file. this makes it much easier to edit the (now frequently-changing) TNRS schema directly in pgAdmin (which is graphical), rather than having to manually copy SQL changes from pgAdmin to the file.

10754 08/27/2013 01:54 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: moved source code comments to in-schema COMMENT ON comments so all the info in schema.sql is in the DB

10753 08/27/2013 01:47 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: views that use * as the column list: added comments to indicate that this is the case, so that the views can be updated in place rather than only by reinstalling the TNRS schema

10747 08/27/2013 12:49 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: util.set_col_types() runtime: updated for most recent ALTER COLUMN TYPE command (9 min)

10746 08/27/2013 12:25 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs.Time_submitted: renamed to batch and added fkey to batch.id. this requires including the batch table in inputs/.TNRS/data.sql, so that the fkey is satisfied (batch entries are already added by bin/tnrs_db.

10741 08/26/2013 07:48 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: batch: reset name of id_by_time unique constraint since this field is now in the batch table

10740 08/26/2013 07:46 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: download_settings: renamed to batch_download_settings because this table is actually specific to the batch, and it does not make sense to have a download settings file without a batch

10739 08/26/2013 07:32 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: download_settings.id: added fkey to batch.id to create a 1:1 relationship with optional participation by download_settings. note that this relationship happens to be the same as SQL inheritance, as used in VegCore, but in this case, the 1:1 relationship is not related to inheritance.

10738 08/26/2013 06:30 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: client_version: added table, column comments with info on how to retrieve each value

10737 08/26/2013 06:28 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added client_version table for svn revisions, with fkey from batch

10736 08/26/2013 06:23 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added batch table and moved download_settings.time_submitted, id_by_time to it since these are not related to the download_settings file

10728 08/26/2013 01:49 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added VegCore-style id column as the primary key, instead of using time_submitted directly. this enables always using the same name for the pkey. the pkey is now autopopulated from time_submitted in a trigger, using helper column id_by_time. the user is now also able to specify their own globally-unique ID that is not based on the time_submitted.

10727 08/25/2013 11:22 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: download_settings comment: changed name of button to Download settings, which had gotten auto-replaced to download_settings

10726 08/25/2013 11:08 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Download settings table: renamed to download_settings because although Download settings is the verbatim name of the button that this info comes from, it is not necessary to name the table a particular way in order to match up data to it correctly, so we can just use the standard naming convention (wiki.vegpath.org/u-name#format) and avoid the need to enclose the name in ""

10725 08/24/2013 06:00 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added Download settings table, which stores data from http://tnrs.iplantcollaborative.org/TNRSapp.html > Submit List > results section > Download settings > settings.txt

10280 07/14/2013 03:26 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtimes. it now takes 25 min instead of 16 min to regenerate the derived cols.

10094 06/27/2013 03:47 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: AcceptedTaxon: removed Annotations entry because the accepted name only contains name elements, not additional text (vegpath.org/cf_aff)

9994 06/20/2013 06:10 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: vegbiendev update steps: added `make backups/TNRS.backup-remake` to back up TNRS before making changes to it. this provides a more recent restore point than the last import in case the changes mess things up. (however, the last import's backup is usually sufficient unless TNRS has been run since then.)

9993 06/20/2013 05:53 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): added VACUUM ANALYZE and runtime (50 s)

9992 06/20/2013 05:42 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (16 min)

9990 06/20/2013 04:49 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: documented that when changing this table's schema, you must also make the same changes on vegbiendev. included sample util.set_col_types() call with runtime (4 min).

9989 06/20/2013 03:58 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (16 min)

9988 06/20/2013 03:32 PM Aaron Marcuse-Kubitza

bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): need to schema-qualify invoked functions

9987 06/20/2013 03:29 PM Aaron Marcuse-Kubitza

bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): Is_homonym: use the *_is_homonym flag for whichever of genus or family (in that order) is NOT NULL, rather than horizontal-ORing potentially NULL values together

9986 06/20/2013 03:22 PM Aaron Marcuse-Kubitza

bugfix: inputs/.TNRS/schema.sql: family_is_homonym(), genus_is_homonym(): need to return NULL instead of false when input family/genus is NULL. EXISTS does not support this, so STRICT is used to provide this functionality automatically.

9985 06/20/2013 03:19 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added family_is_homonym(), genus_is_homonym() and use them in tnrs_populate_fields()

9984 06/20/2013 03:15 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: score_ok(): changed to IMMUTABLE and STRICT

9983 06/20/2013 03:14 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (16 min)

9982 06/20/2013 02:41 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): never_homonym: use Author_score threshold to exclude matches that are too fuzzy to confirm the presence of a plant name author

9981 06/20/2013 02:38 PM Aaron Marcuse-Kubitza

bugfix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): *_is_homonym: also need to check that there was no Author_matched (i.e. that it could be a homonym). Is_homonym: use new never_homonym var.

9980 06/20/2013 02:18 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): updated runtime (18 min)

9974 06/20/2013 08:20 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): when changing this function: UPDATE statement: include TNRS schema since it may not be in the search_path

9973 06/20/2013 08:14 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): Is_plant: also consider homonyms using new family_is_homonym, genus_is_homonym (see wiki.vegpath.org/Result_filtering#taxon_is_plant)

9972 06/20/2013 08:03 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: added Is_homonym derived col (uses IRMNG.family_homonym_epithet, genus_homonym_epithet)

9914 06/18/2013 06:21 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): removed no longer needed casts of *_score to double precision

9913 06/18/2013 06:06 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: *_score: changed type to double precision because these fields are always floats. this also avoids the need to manually cast them to double precision each time they are used.

9908 06/18/2013 04:27 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: Name_number: changed type to integer so it would sort numerically

9907 06/18/2013 04:24 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added pkey on Time_submitted, Name_number

9906 06/18/2013 04:21 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: changed Name_submitted pkey to a unique constraint to allow adding a pkey on Time_submitted, Name_number instead

9905 06/18/2013 04:14 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Time_submitted, Name_number: added NOT NULL constraints so that they can be used in a unique constraint

9767 06/06/2013 04:54 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): documented runtime (17 min)

9765 06/06/2013 04:29 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): documented that when changing this function, you must regenerate the derived cols using `UPDATE tnrs SET "Name_submitted" = "Name_submitted"`

9764 06/06/2013 04:21 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_fields(): Is_plant: must match family as Family_score = 1 (as discussed during conference call vegpath.org/wiki/2013-05-30_conference_call#postprocess-TNRS-results-to-exclude-animals-with-genus-homonyms) instead of as Family_matched IS NOT NULL (as listed in Brad's formula at vegpath.org/wiki/Result_filtering#TNRS-results) because TNRS transforms animal to plant families via fuzzy matching, necessitating a Family_score check to ensure an exact match to a plant family that was not transformed from an animal family

9763 06/06/2013 03:29 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added Is_plant derived field, which is populated using the formula at vegpath.org/wiki/Result_filtering#TNRS-results . note that the homonym filtering is currently excluded until we determine whether we can get direct access to the IRMNG homonyms database (http://www.cmar.csiro.au/datacentre/irmng/homonyms.htm). note also that changes to the TNRS schema cannot be fully tested until any TNRS client bugs are fixed, because the data.sql updater requires a working TNRS client to regenerate the sample data.

9762 06/06/2013 02:42 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: updated for current TSV schema: renamed Accepted_species->Accepted_name_species, Accepted_family->Accepted_name_family

9760 06/06/2013 02:17 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: fixed whitespace

9759 06/06/2013 02:15 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: added MatchedTaxon view, which now just renames the columns but does not filter the results, and use it in ValidMatchedTaxon

9758 06/06/2013 02:11 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: MatchedTaxon: renamed to ValidMatchedTaxon since this view actually contains only the names with a valid match

9756 06/06/2013 11:08 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: documented that when changing this table's schema, you must regenerate data.sql using `inputs/test_taxonomic_names/test_scrub`

9616 05/29/2013 05:22 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: MatchedTaxon: filter out rows where Max_score was not high enough to use the TNRS result as a match. removed now-duplicated filter for this in AcceptedTaxon.

9615 05/29/2013 05:19 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: ScrubbedTaxon: removed extra ; at end of WHERE clause

9541 05/23/2013 10:35 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: MatchedTaxon: matchedFamily: use Accepted_family when the Name_matched_accepted_family is not provided, as it's omitted by the current TNRS CSV schema

9529 05/23/2013 03:33 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql, data.sql: updated TNRS CSV columns to preserve Name_matched_accepted_family even though it isn't present in the current TNRS CSVs. this way, Name_matched_accepted_family can still be used for previously-scrubbed names, and family_matched can be added back to analytical_stem_view. (now that bin/tnrs_db uses an explicit columns list in COPY TO, the absence of a column in the CSV is no longer a problem.)

9513 05/23/2013 01:19 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs.Time_submitted: set default to now() (the timestamp of the start of the current transaction, http://www.postgresql.org/docs/9.1/static/functions-datetime.html) so that it would automatically be populated when rows are added. note that because the start of the current transaction instead of the exact time at insertion is used, all rows inserted in the same transaction (e.g. as part of the same batch) will have the same value for this, linking them together.

9512 05/23/2013 01:10 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_derived_fields(): renamed to tnrs_populate_fields() so it can be used to populate other fields as well

9493 05/21/2013 07:37 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql, data.sql: updated for new TNRS CSV columns (see bug at https://pods.iplantcollaborative.org/jira/browse/TNRS-183). note that these columns may eventually change back (comment by Naim at https://pods.iplantcollaborative.org/jira/browse/TNRS-183#comment-34444).

7848 03/05/2013 11:07 PM Aaron Marcuse-Kubitza

*.sql: concat_ws() calls: Fixed bug where need to surround it with NULLIF because concat_ws() returns '' instead of the required NULL when supplied with only NULL args

7845 03/05/2013 10:05 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: ScrubbedTaxon: Include only matches with a match score of at least 0.8 (score_ok())

7844 03/05/2013 09:59 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Added back score_ok() and tnrs.tnrs_score_ok index

7843 03/05/2013 09:53 PM Aaron Marcuse-Kubitza

*.sql: Replaced concat_delim() with concat_ws(), which is built-in as of PostgreSQL 9.1 (http://stackoverflow.com/questions/1943433/postgresql-concat-ws-like-function)

7833 03/05/2013 07:20 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: MatchedTaxon: Renamed matchedTaxonomicStatus to preferred name taxonomicStatus