Project

General

Profile

Statistics
| Revision:

# Date Author Comment
5878 10/30/2012 03:55 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Also add any newly-created files which should be under version control

5877 10/30/2012 03:35 PM Aaron Marcuse-Kubitza

input.Makefile: Fixed bug where _MySQL/%.sql files weren't being built from associated .make files by adding special `%.sql: .sql.make` rule to override `.sql: _MySQL/%.sql`

5876 10/30/2012 03:33 PM Aaron Marcuse-Kubitza

input.Makefile: `%: .make`: Factored $(if $(wildcard $@)... test out into $(make_script) so all `: %.make`-like rules could use it directly

5875 10/30/2012 03:09 PM Aaron Marcuse-Kubitza

lib/forwarding.Makefile: $(subdirs): Use all folders other than ../ ./ .svn/ instead of listing folders that start with . explicitly

5874 10/30/2012 02:31 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: Use accepted_taxonlabel.taxonomicname instead of accepted_taxonverbatim.taxonomicname in order to have the family prepended

5873 10/30/2012 12:41 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

5872 10/30/2012 12:38 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: placerank: Reordered in path order, using <http://rs.tdwg.org/dwc/terms/#dcindex> and <http://vegbank.org/vegbank/views/dba_fielddescription_detail.jsp?view=detail&wparam=1415&entity=dba_fielddescription&params=1415> as a guide. Documented the source of the values.

5871 10/30/2012 12:26 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: placename: Removed non-name-related fields, because placename is designed only to store a hierarchy of placenames, not additional place information

5870 10/30/2012 12:23 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Moved placedescription from placename to place (and renamed it to description), because it applies to the place itself, not the name for the place

5869 10/30/2012 12:16 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species

5868 10/30/2012 12:13 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species

5867 10/30/2012 12:03 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_aggregate_view: Include all analytical_stem species, not just those whose stems have non-NULL DBH

5866 10/30/2012 11:57 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed aggregated_analytical_db to analytical_aggregate to match the name of analytical_stem

5865 10/30/2012 11:55 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB

5864 10/30/2012 11:52 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB

5863 10/30/2012 11:38 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed no longer used locationcoords

5862 10/30/2012 11:35 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Use new coordinates instead of locationcoords

5861 10/30/2012 11:23 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Remapped latitude/longitude to new coordinates table

5860 10/30/2012 11:15 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Added coordinates_id

5859 10/30/2012 11:01 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added coordinates table

5858 10/30/2012 10:40 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Removed municipality, site because they are not used in the geoscrubbing

5857 10/30/2012 10:19 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Place custom hierarchy of placenames in placename table instead of in otherranks field

5856 10/30/2012 10:04 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place.matched_place_id: Changed comment to say that places are linked in a three-level (instead of two-level) hierarchy of datasource place -> verbatim place -> accepted place, and this field contains the closest match

5855 10/30/2012 09:54 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed placepath to place since this contains primary information about the place, including the reference to the canonical place

5854 10/30/2012 09:42 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed place to placename since it refers just to a name for a place, without coordinates

5853 10/30/2012 07:18 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Exclude original taxondeterminations, so that there is only one taxondetermination for each taxonoccurrence

5852 10/30/2012 07:03 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: make_analytical_db(): Also make new aggregated_analytical_db

5851 10/30/2012 07:02 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: sync_analytical_db_to_view(): DROP TABLE: Use IF EXISTS in case analytical_db table has already been deleted, or not yet created

5850 10/30/2012 07:01 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view())

5849 10/30/2012 07:01 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view())

5848 10/30/2012 06:56 AM Aaron Marcuse-Kubitza

lib/PostgreSQL-MySQL.csv: custom types: Also match column names enclosed in ``

5847 10/30/2012 06:49 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Store materialized analytical_db table in schema so aggregating views can reference it. Added sync_analytical_db_to_view() to maintain analytical_db table.

5846 10/30/2012 06:30 AM Aaron Marcuse-Kubitza

schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated

5845 10/30/2012 06:23 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Reordered columns in path order

5844 10/30/2012 06:04 AM Aaron Marcuse-Kubitza

schemas/: Moved unit conversion functions from functions.sql to vegbien.sql so the unit conversion functions used by analytical_db_view wouldn't need to be stored both in functions.sql and in vegbien.sql. (All unit conversion functions used by analytical_db_view must be stored in the public schema so that analytical_db_view doesn't get cascadingly deleted when the functions schema is reinstalled.)

5843 10/30/2012 05:52 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Use public._m2_to_ha() instead of functions._m2_to_ha()

5842 10/30/2012 05:51 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Copied _m2_to_ha() to public schema for use by analytical_db_view

5841 10/30/2012 05:40 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Added diameterBreastHeight_cm

5840 10/30/2012 05:38 AM Aaron Marcuse-Kubitza

schemas/functions.sql, vegbien.sql: Added _m_to_cm()

5839 10/30/2012 05:07 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Copied _cm_to_m() to public schema for use by new aggregated_analytical_db_view

5838 10/30/2012 04:19 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: datasource table: Fixed bug where need to filter by creator_id = party_id in order to use just root parties (datasources)

5837 10/30/2012 03:40 AM Aaron Marcuse-Kubitza

tnrs_db: Fetching names to scrub: Omit sql.select() fields param because it will be filled in with its default value

5836 10/30/2012 03:29 AM Aaron Marcuse-Kubitza

import_all: Pass command-line args (such as make vars) to all commands, not just with_all, so that a custom public schema is properly used by all commands

5835 10/30/2012 02:57 AM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/create.sql: Make genus completely globally unique by removing duplicates. Note that only duplicates with ranks at or below the genus level need be removed, which for this dataset is just genus and subgenus.

5834 10/30/2012 02:00 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: taxonlabel_required_key constraint: Also allow taxonlabels with just a sourceaccessioncode, to support looking up parent taxonlabels using just their sourceaccessioncode (e.g. in NCBI)

5833 10/30/2012 01:23 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: matched taxonlabel: Don't include taxonName in the concatenated taxonomicname. This also prevents the creation of the matched taxonlabel entirely when only the taxonName is provided.

5832 10/30/2012 01:20 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Don't create matched taxonlabel if taxonName was provided. This fixes a bug where an NCBI node was incorrectly pointing to a TNRS name, when the reference should only be the other way around. This may also fix the TNRS slowdown, if it was caused by circular matched_label_id references.

5831 10/30/2012 12:47 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_insert(): Fixed bug where also need to set canon_label_id based on matched_label_id here, not just in taxonlabel_2_set_canon_label_id_on_update(), because the matched_label_id could be specified when the taxonlabel is first created

5830 10/30/2012 12:34 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_*(): Fixed bug where need to use := instead of = to perform assignment of canon_label_id

5829 10/30/2012 12:17 AM Aaron Marcuse-Kubitza

schemas/tree_cross-links.sql: Updated for schema changes

5828 10/30/2012 12:16 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_update_ancestors(): Include ancestors for both parent_id and matched_label_id rather than just one or the other. This avoids needing to delete existing ancestors for the parent_id when a matched_label_id is added and overrides it. This should reduce the TNRS import time if the slowdown was due to the need to delete parent_id ancestors when later adding a matched_label_id (which only occurs in a separate step in the TNRS datasource).

5827 10/30/2012 12:07 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): Fixed bug where test if any rows failed cond did not check if cur != None (which is the case when cond == sql_gen.true_expr) before checking cur.rowcount

5826 10/29/2012 10:26 PM Aaron Marcuse-Kubitza

sql_gen.py: simplify_expr(): Don't require () around NULL IS NULL and NULL IS NOT NULL because extra parentheses are not provided in index conditions, only in check constraint conditions

5825 10/29/2012 10:06 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times. The TNRS import has slowed down significantly, possibly due to a bug in the autopopulation of the taxonlabel_relationship table when the input data contains cycles.

5824 10/29/2012 09:37 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Assertion that into and full_in_table have the same row count: Allow into to have more rows than full_in_table, in case an input row matched multiple output rows. This should not happen for a properly-configured database, but seems to happen periodically nevertheless (currently, to the MO datasource) and should not abort the import when it does.

5823 10/26/2012 08:18 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist

5822 10/26/2012 08:10 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Renamed datasource's taxonverbatim to datasource_taxonverbatim to distinguish it from the other taxonverbatims that are joined on (parsed_taxonverbatim, accepted_taxonverbatim)

5821 10/26/2012 07:18 PM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/create.sql: Make genus (mostly) globally unique by removing kingdom Animalia, which has significant genus overlap with plants. This reduces the number of duplicated genera from 578 to 65 (determined with `SELECT name_txt, count(), array_agg(rank) FROM "NCBI".nodes GROUP BY name_txt HAVING count() > 1 AND 'genus' = ALL (array_agg(rank))`).

5820 10/26/2012 07:08 PM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/create.sql: Added foreign key on parent tax_id with covering index

5819 10/26/2012 07:06 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Added %/uninstall, %/reinstall to allow reinstalling individual tables

5818 10/26/2012 06:00 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): When adding the failed condition to the errors table, also include the original, untranslated condition from the DB schema in addition to the translation of the condition into the input schema

5817 10/26/2012 05:45 PM Aaron Marcuse-Kubitza

sql_io.py: track_data_error(): Fixed bug where errors whose column had no srcs (indicated by () ) were incorrectly being ignored. This affected NOT NULL exceptions where the column was not provided by the dataset.

5816 10/26/2012 05:38 PM Aaron Marcuse-Kubitza

sql_gen.py: If no cols had srcs, return [] instead of the [()] that itertools.product() would have returned

5815 10/26/2012 05:38 PM Aaron Marcuse-Kubitza

sql_io.py: track_data_error(): Support errors with no columns by inserting a single entry with column set to NULL

5814 10/26/2012 05:35 PM Aaron Marcuse-Kubitza

strings.py: Added join()

5813 10/26/2012 05:00 PM Aaron Marcuse-Kubitza

sql_io.py: mk_errors_table(): Made "column" column nullable, because some errors (such as check constraint violations) don't have any corresponding columns if its columns weren't provided in the input data

5812 10/26/2012 04:35 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: `make inputs/.TNRS/reinstall`: Use new $schema_only option so that an empty TNRS schema is installed rather than one containing inputs/.TNRS/data.sql

5811 10/26/2012 04:34 PM Aaron Marcuse-Kubitza

inputs/.TNRS/: Added data.sql containing the test_taxonomic_names TNRS results, so that a new installation of VegBIEN will contain the necessary data to make the tests pass, including the TNRS import test

5810 10/26/2012 04:32 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: If $schema_only option is set, only install .sql files ending in schema.sql

5809 10/26/2012 04:24 PM Aaron Marcuse-Kubitza

inputs/Makefile: $(rsyncLogs): Use $(rsync) instead of $(rsync*) now that it supports excluding just temp files and .svn rather than all .*

5808 10/26/2012 04:21 PM Aaron Marcuse-Kubitza

lib/common.Makefile: rsync: $(rsync): Exclude .svn, #, and .DS_Store rather than all . because dirs beginning with . created by the user (such as .NCBI, .TNRS) should be included in the sync

5807 10/26/2012 04:18 PM Aaron Marcuse-Kubitza

Added inputs/REMIB/Specimen.src/.map.csv.last_cleanup

5806 10/26/2012 04:10 PM Aaron Marcuse-Kubitza

Added inputs/bien_web/observation/+header.csv

5805 10/26/2012 04:09 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(dbExports): When putting schemas first, don't require a . before "schema" to allow the entire filename to be schema.sql

5804 10/26/2012 03:44 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes

5803 10/26/2012 03:42 PM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/map.csv: Added _nullIf filter to remove "Unknown" values for Accepted_name_family

5802 10/26/2012 03:35 PM Aaron Marcuse-Kubitza

README.TXT: Generate the local TNRS cache from the test_taxonomic_names rather than syncing it with the vegbiendev TNRS cache, so that the automated test's inserted row count stays the same regardless of the contents of the full-DB TNRS cache

5801 10/26/2012 03:34 PM Aaron Marcuse-Kubitza

README.TXT: Backups: Added TNRS cache section

5800 10/26/2012 03:12 PM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/test.xml.ref: Accepted inserted row count using TNRS cache created from test_taxonomic_names. Using a standard set of names for the test ensures that the inserted row count will not change when the full-DB TNRS cache changes.

5799 10/26/2012 02:48 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_accepted_names: Prepend the Accepted_name_family to the taxonomic name that will be submitted back to TNRS for parsing, because TNRS input names now always include the family when it's provided

5798 10/26/2012 02:46 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_accepted_names: Use simpler array_to_string() instead of || and COALESCE to put together the taxonomic name that will be submitted back to TNRS for parsing. Note that this requires defining an IMMUTABLE wrapper function for array_to_string(), because pg_catalog.array_to_string() is declared STABLE but indexes require functions to be IMMUTABLE (http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg156323.html).

5797 10/26/2012 02:42 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Don't hardcode the schema name

5796 10/26/2012 02:40 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: sql/install: Provide the datasource's schema to the script in :schema, so it can refer to its own elements explicitly when it's not possible to rely on the search_path. This is the case for functions that have the same signature as (and are intended to replace) a pg_catalog function, because the pg_catalog function will be used in preference to the datasource function regardless of the search_path.

5795 10/26/2012 02:06 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(cleanup): If a cleanup.sql is provided, only run it and don't do default cleanup, to allow tables to override rather than just add to default cleanup operations. This prevents the automatic replacement of certain strings (sql_io.null_strs) with NULL on TNRS, and keeps the TNRS cache mostly as it was output by the TNRS service. Note that empty strings are still replaced with NULL by COPY FROM in sql_io.append_csv(). This is necessary for TNRS import to work properly, because although '' generally means NULL, it is not treated that way by PostgreSQL.

5794 10/26/2012 01:53 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup

5793 10/26/2012 01:52 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup

5792 10/25/2012 05:16 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: If first param's type is anyelement (for polymorphic function, which had mismatched arg types), use type text, as all types can cast to it

5791 10/25/2012 05:12 PM Aaron Marcuse-Kubitza

sql_io.py: cast(): Set the created function's value param type to anyelement to support any input type, not just text

5790 10/25/2012 04:53 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Only prepend the family to the concatenated scientificName for TNRS if it ends in -aceae (using _taxon_family_require_std()), to avoid sending unsupported, nonstandard families to TNRS which it will place in Unmatched_terms

5789 10/25/2012 04:48 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _taxon_family_require_std()

5788 10/25/2012 04:09 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Prepend the family to the concatenated scientificName input to TNRS, so that TNRS can use it to disambiguate the genus

5787 10/25/2012 03:50 PM Aaron Marcuse-Kubitza

tnrs_db: Making TNRS request: Fixed bug where needed to remove else block now that there is no except block

5786 10/25/2012 03:45 PM Aaron Marcuse-Kubitza

tnrs.py: retrieval_request_template: Turn on taxonomic_constraint (to match family before genus) and source_sorting (to always return any result from the first source before returning results from any other sources, regardless of match %)

5785 10/25/2012 12:50 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

5784 10/25/2012 12:46 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: speciesBinomial: Changed definition to genus+specificEpithet, not genus+species, to match the scientific meaning of specificEpithet vs. species

5783 10/25/2012 12:45 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonverbatim: Renamed species to specific_epithet to avoid confusion with the scientific meaning of species (genus+specificEpithet), since this field contains just the specific epithet

5782 10/25/2012 12:30 PM Aaron Marcuse-Kubitza

input.Makefile: Verification of import: verify: Use tables from the verify/*.ref files themselves rather than from the datasource's subdirs, in order to match the tables in mappings/verify.*.sql

5781 10/25/2012 11:52 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Added stemobservation.tag, stemobservation.height_m for use in plot change over time analysis <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis>

5780 10/25/2012 11:45 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Fixed typo in scientificNameWithMorphospecies

5779 10/25/2012 11:41 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Renamed columns to VegCore names (https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/mappings/VegCore.csv)