/ - Changes - BIEN 3 - NCEAS Projects

root @ 5896

#	Date	Author	Comment
5896	10/31/2012 10:10 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Functions containing UPDATE statements: Use PL/pgSQL's EXECUTE statement to avoid caching query plans. This is necessary because as the table grows over time, the optimal query plan may change.
5895	10/31/2012 10:05 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): When deleting rows rows that do not satisfy the condition, handle sql.DoesNotExistExceptions caused by columns in the condition that were not replaced with NULL. These occur when out_table is a function, and the columns of the table the condition relates to therefore can't be found using out_table.
5894	10/31/2012 09:59 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Calling function: Do not cache the function call, because it may be retried after error handling
5893	10/31/2012 09:58 PM	Aaron Marcuse-Kubitza	sql_gen.py: NotCond: Treat a condition that evaluates to NULL as false instead, so that the boolean effect of the condition is completely inverted
5892	10/31/2012 09:42 PM	Aaron Marcuse-Kubitza	sql_gen.py: null_as_str: Use new null instead of hardcoding 'NULL'
5891	10/31/2012 09:41 PM	Aaron Marcuse-Kubitza	sql_gen.py: Added null
5890	10/31/2012 09:40 PM	Aaron Marcuse-Kubitza	sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged
5889	10/31/2012 08:45 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): sql.DatabaseErrors: Factored exception-handling code out into handle_unknown_exc(), for use by other exception handlers
5888	10/31/2012 08:39 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): is_function: Fixed bug where can't replace out_table_cols with NULL because out_table is a function, not a table
5887	10/30/2012 04:59 PM	Aaron Marcuse-Kubitza	my2pg*: Turn off escape_string_warning because \-escaped strings are standard in MySQL
5886	10/30/2012 04:58 PM	Aaron Marcuse-Kubitza	my2pg.data: Turn off standard_conforming_strings like in my2pg
5885	10/30/2012 04:42 PM	Aaron Marcuse-Kubitza	my2pg: Also remove any CHARACTER SET modifier on a column definition
5884	10/30/2012 04:26 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_aggregate_view: Make size classes cumulative ranges (stems above a certain DBH) rather than bins, per Brad's request
5883	10/30/2012 04:26 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_aggregate_view: Make size classes cumulative ranges (stems above a certain DBH) rather than bins, per Brad's request
5882	10/30/2012 04:18 PM	Aaron Marcuse-Kubitza	input.Makefile: SVN: add: Add header override files with any extension, not just .csv
5881	10/30/2012 04:15 PM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Replaced manual `svn add` commands with one `make inputs/<datasrc>/add` before committing to add newly-created files
5880	10/30/2012 04:00 PM	Aaron Marcuse-Kubitza	input.Makefile: SVN: add: Also add any *.sql, when it's in a subdir**. This applies to create.sql, cleanup.sql, etc.
5879	10/30/2012 03:58 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: SVN: Added $(add*)
5878	10/30/2012 03:55 PM	Aaron Marcuse-Kubitza	input.Makefile: SVN: add: Also add any newly-created files which should be under version control
5877	10/30/2012 03:35 PM	Aaron Marcuse-Kubitza	input.Makefile: Fixed bug where _MySQL/%.sql files weren't being built from associated .make files by adding special `%.sql: .sql.make` rule to override `.sql: _MySQL/%.sql`
5876	10/30/2012 03:33 PM	Aaron Marcuse-Kubitza	input.Makefile: `%: .make`: Factored $(if $(wildcard $@)... test out into $(make_script) so all `: %.make`-like rules could use it directly
5875	10/30/2012 03:09 PM	Aaron Marcuse-Kubitza	lib/forwarding.Makefile: $(subdirs): Use all folders other than ../ ./ .svn/ instead of listing folders that start with . explicitly
5874	10/30/2012 02:31 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Use accepted_taxonlabel.taxonomicname instead of accepted_taxonverbatim.taxonomicname in order to have the family prepended
5873	10/30/2012 12:41 PM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
5872	10/30/2012 12:38 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: placerank: Reordered in path order, using <http://rs.tdwg.org/dwc/terms/#dcindex> and <http://vegbank.org/vegbank/views/dba_fielddescription_detail.jsp?view=detail&wparam=1415&entity=dba_fielddescription&params=1415> as a guide. Documented the source of the values.
5871	10/30/2012 12:26 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: placename: Removed non-name-related fields, because placename is designed only to store a hierarchy of placenames, not additional place information
5870	10/30/2012 12:23 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Moved placedescription from placename to place (and renamed it to description), because it applies to the place itself, not the name for the place
5869	10/30/2012 12:16 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species
5868	10/30/2012 12:13 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species
5867	10/30/2012 12:03 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_aggregate_view: Include all analytical_stem species, not just those whose stems have non-NULL DBH
5866	10/30/2012 11:57 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Renamed aggregated_analytical_db to analytical_aggregate to match the name of analytical_stem
5865	10/30/2012 11:55 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB
5864	10/30/2012 11:52 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB
5863	10/30/2012 11:38 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Removed no longer used locationcoords
5862	10/30/2012 11:35 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Use new coordinates instead of locationcoords
5861	10/30/2012 11:23 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Remapped latitude/longitude to new coordinates table
5860	10/30/2012 11:15 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Added coordinates_id
5859	10/30/2012 11:01 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added coordinates table
5858	10/30/2012 10:40 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Removed municipality, site because they are not used in the geoscrubbing
5857	10/30/2012 10:19 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Place custom hierarchy of placenames in placename table instead of in otherranks field
5856	10/30/2012 10:04 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place.matched_place_id: Changed comment to say that places are linked in a three-level (instead of two-level) hierarchy of datasource place -> verbatim place -> accepted place, and this field contains the closest match
5855	10/30/2012 09:54 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Renamed placepath to place since this contains primary information about the place, including the reference to the canonical place
5854	10/30/2012 09:42 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Renamed place to placename since it refers just to a name for a place, without coordinates
5853	10/30/2012 07:18 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Exclude original taxondeterminations, so that there is only one taxondetermination for each taxonoccurrence
5852	10/30/2012 07:03 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: make_analytical_db(): Also make new aggregated_analytical_db
5851	10/30/2012 07:02 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: sync_analytical_db_to_view(): DROP TABLE: Use IF EXISTS in case analytical_db table has already been deleted, or not yet created
5850	10/30/2012 07:01 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view())
5849	10/30/2012 07:01 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view())
5848	10/30/2012 06:56 AM	Aaron Marcuse-Kubitza	lib/PostgreSQL-MySQL.csv: custom types: Also match column names enclosed in ``
5847	10/30/2012 06:49 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Store materialized analytical_db table in schema so aggregating views can reference it. Added sync_analytical_db_to_view() to maintain analytical_db table.
5846	10/30/2012 06:30 AM	Aaron Marcuse-Kubitza	schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated
5845	10/30/2012 06:23 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Reordered columns in path order
5844	10/30/2012 06:04 AM	Aaron Marcuse-Kubitza	schemas/: Moved unit conversion functions from functions.sql to vegbien.sql so the unit conversion functions used by analytical_db_view wouldn't need to be stored both in functions.sql and in vegbien.sql. (All unit conversion functions used by analytical_db_view must be stored in the public schema so that analytical_db_view doesn't get cascadingly deleted when the functions schema is reinstalled.)
5843	10/30/2012 05:52 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Use public._m2_to_ha() instead of functions._m2_to_ha()
5842	10/30/2012 05:51 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Copied _m2_to_ha() to public schema for use by analytical_db_view
5841	10/30/2012 05:40 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Added diameterBreastHeight_cm
5840	10/30/2012 05:38 AM	Aaron Marcuse-Kubitza	schemas/functions.sql, vegbien.sql: Added _m_to_cm()
5839	10/30/2012 05:07 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Copied _cm_to_m() to public schema for use by new aggregated_analytical_db_view
5838	10/30/2012 04:19 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: datasource table: Fixed bug where need to filter by creator_id = party_id in order to use just root parties (datasources)
5837	10/30/2012 03:40 AM	Aaron Marcuse-Kubitza	tnrs_db: Fetching names to scrub: Omit sql.select() fields param because it will be filled in with its default value
5836	10/30/2012 03:29 AM	Aaron Marcuse-Kubitza	import_all: Pass command-line args (such as make vars) to all commands, not just with_all, so that a custom public schema is properly used by all commands
5835	10/30/2012 02:57 AM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/create.sql: Make genus completely globally unique by removing duplicates. Note that only duplicates with ranks at or below the genus level need be removed, which for this dataset is just genus and subgenus.
5834	10/30/2012 02:00 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel: taxonlabel_required_key constraint: Also allow taxonlabels with just a sourceaccessioncode, to support looking up parent taxonlabels using just their sourceaccessioncode (e.g. in NCBI)
5833	10/30/2012 01:23 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: matched taxonlabel: Don't include taxonName in the concatenated taxonomicname. This also prevents the creation of the matched taxonlabel entirely when only the taxonName is provided.
5832	10/30/2012 01:20 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Don't create matched taxonlabel if taxonName was provided. This fixes a bug where an NCBI node was incorrectly pointing to a TNRS name, when the reference should only be the other way around. This may also fix the TNRS slowdown, if it was caused by circular matched_label_id references.
5831	10/30/2012 12:47 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_insert(): Fixed bug where also need to set canon_label_id based on matched_label_id here, not just in taxonlabel_2_set_canon_label_id_on_update(), because the matched_label_id could be specified when the taxonlabel is first created
5830	10/30/2012 12:34 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_*(): Fixed bug where need to use := instead of = to perform assignment of canon_label_id
5829	10/30/2012 12:17 AM	Aaron Marcuse-Kubitza	schemas/tree_cross-links.sql: Updated for schema changes
5828	10/30/2012 12:16 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel_update_ancestors(): Include ancestors for both parent_id and matched_label_id rather than just one or the other. This avoids needing to delete existing ancestors for the parent_id when a matched_label_id is added and overrides it. This should reduce the TNRS import time if the slowdown was due to the need to delete parent_id ancestors when later adding a matched_label_id (which only occurs in a separate step in the TNRS datasource).
5827	10/30/2012 12:07 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): Fixed bug where test if any rows failed cond did not check if cur != None (which is the case when cond == sql_gen.true_expr) before checking cur.rowcount
5826	10/29/2012 10:26 PM	Aaron Marcuse-Kubitza	sql_gen.py: simplify_expr(): Don't require () around NULL IS NULL and NULL IS NOT NULL because extra parentheses are not provided in index conditions, only in check constraint conditions
5825	10/29/2012 10:06 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times. The TNRS import has slowed down significantly, possibly due to a bug in the autopopulation of the taxonlabel_relationship table when the input data contains cycles.
5824	10/29/2012 09:37 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Assertion that into and full_in_table have the same row count: Allow into to have more rows than full_in_table, in case an input row matched multiple output rows. This should not happen for a properly-configured database, but seems to happen periodically nevertheless (currently, to the MO datasource) and should not abort the import when it does.
5823	10/26/2012 08:18 PM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist
5822	10/26/2012 08:10 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Renamed datasource's taxonverbatim to datasource_taxonverbatim to distinguish it from the other taxonverbatims that are joined on (parsed_taxonverbatim, accepted_taxonverbatim)
5821	10/26/2012 07:18 PM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/create.sql: Make genus (mostly) globally unique by removing kingdom Animalia, which has significant genus overlap with plants. This reduces the number of duplicated genera from 578 to 65 (determined with `SELECT name_txt, count(), array_agg(rank) FROM "NCBI".nodes GROUP BY name_txt HAVING count() > 1 AND 'genus' = ALL (array_agg(rank))`).
5820	10/26/2012 07:08 PM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/create.sql: Added foreign key on parent tax_id with covering index
5819	10/26/2012 07:06 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Added %/uninstall, %/reinstall to allow reinstalling individual tables
5818	10/26/2012 06:00 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): When adding the failed condition to the errors table, also include the original, untranslated condition from the DB schema in addition to the translation of the condition into the input schema
5817	10/26/2012 05:45 PM	Aaron Marcuse-Kubitza	sql_io.py: track_data_error(): Fixed bug where errors whose column had no srcs (indicated by () ) were incorrectly being ignored. This affected NOT NULL exceptions where the column was not provided by the dataset.
5816	10/26/2012 05:38 PM	Aaron Marcuse-Kubitza	sql_gen.py: If no cols had srcs, return [] instead of the [()] that itertools.product() would have returned
5815	10/26/2012 05:38 PM	Aaron Marcuse-Kubitza	sql_io.py: track_data_error(): Support errors with no columns by inserting a single entry with column set to NULL
5814	10/26/2012 05:35 PM	Aaron Marcuse-Kubitza	strings.py: Added join()
5813	10/26/2012 05:00 PM	Aaron Marcuse-Kubitza	sql_io.py: mk_errors_table(): Made "column" column nullable, because some errors (such as check constraint violations) don't have any corresponding columns if its columns weren't provided in the input data
5812	10/26/2012 04:35 PM	Aaron Marcuse-Kubitza	inputs/test_taxonomic_names/test_scrub: `make inputs/.TNRS/reinstall`: Use new $schema_only option so that an empty TNRS schema is installed rather than one containing inputs/.TNRS/data.sql
5811	10/26/2012 04:34 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/: Added data.sql containing the test_taxonomic_names TNRS results, so that a new installation of VegBIEN will contain the necessary data to make the tests pass, including the TNRS import test
5810	10/26/2012 04:32 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: If $schema_only option is set, only install .sql files ending in schema.sql
5809	10/26/2012 04:24 PM	Aaron Marcuse-Kubitza	inputs/Makefile: $(rsyncLogs): Use $(rsync) instead of $(rsync) now that it supports excluding just temp files and .svn rather than all .
5808	10/26/2012 04:21 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: rsync: $(rsync): Exclude .svn, #, and .DS_Store rather than all . because dirs beginning with . created by the user (such as .NCBI, .TNRS) should be included in the sync
5807	10/26/2012 04:18 PM	Aaron Marcuse-Kubitza	Added inputs/REMIB/Specimen.src/.map.csv.last_cleanup
5806	10/26/2012 04:10 PM	Aaron Marcuse-Kubitza	Added inputs/bien_web/observation/+header.csv
5805	10/26/2012 04:09 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(dbExports): When putting schemas first, don't require a . before "schema" to allow the entire filename to be schema.sql
5804	10/26/2012 03:44 PM	Aaron Marcuse-Kubitza	inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes
5803	10/26/2012 03:42 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/map.csv: Added _nullIf filter to remove "Unknown" values for Accepted_name_family
5802	10/26/2012 03:35 PM	Aaron Marcuse-Kubitza	README.TXT: Generate the local TNRS cache from the test_taxonomic_names rather than syncing it with the vegbiendev TNRS cache, so that the automated test's inserted row count stays the same regardless of the contents of the full-DB TNRS cache
5801	10/26/2012 03:34 PM	Aaron Marcuse-Kubitza	README.TXT: Backups: Added TNRS cache section
5800	10/26/2012 03:12 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/test.xml.ref: Accepted inserted row count using TNRS cache created from test_taxonomic_names. Using a standard set of names for the test ensures that the inserted row count will not change when the full-DB TNRS cache changes.
5799	10/26/2012 02:48 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_accepted_names: Prepend the Accepted_name_family to the taxonomic name that will be submitted back to TNRS for parsing, because TNRS input names now always include the family when it's provided
5798	10/26/2012 02:46 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_accepted_names: Use simpler array_to_string() instead of \|\| and COALESCE to put together the taxonomic name that will be submitted back to TNRS for parsing. Note that this requires defining an IMMUTABLE wrapper function for array_to_string(), because pg_catalog.array_to_string() is declared STABLE but indexes require functions to be IMMUTABLE (http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg156323.html).
5797	10/26/2012 02:42 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Don't hardcode the schema name

Project

General

Profile