Project

General

Profile

Statistics
| Revision:

# Date Author Comment
5987 11/05/2012 02:41 PM Aaron Marcuse-Kubitza

sql.py: Added add_pkey_index()

5986 11/05/2012 02:29 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

5985 11/05/2012 02:15 PM Aaron Marcuse-Kubitza

import_times: Use $'' quoting to expand tab, in order to also work on Mac OS X

5984 11/02/2012 06:14 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Added place_set_canon_place_id_on_*() triggers, analogous to taxonlabel.taxonlabel_2_set_canon_label_id_on_*()

5983 11/02/2012 06:12 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Added canon_place_id

5982 11/02/2012 06:04 PM Aaron Marcuse-Kubitza

mappings/verify.plots.sql, inputs/SALVIAS/verify/plots.ref.sql: Renamed project column to projectname to avoid confusion with the project table

5981 11/02/2012 06:03 PM Aaron Marcuse-Kubitza

input.Makefile: Verification of import: Fixed bug where needed to make .ref .PRECIOUS instead of verify/.ref

5980 11/02/2012 05:57 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id everywhere, not just as the ID to list for each project, to match the corresponding VegBIEN queries

5979 11/02/2012 05:55 PM Aaron Marcuse-Kubitza

input.Makefile: Verification of import: Make verify/%.ref .PRECIOUS because there must always be a .ref for the make rules to work

5978 11/02/2012 05:34 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id as the ID to list for each project, since the SALVIAS projects table is now mapped to the DB and the project_name is used as project.projectname

5977 11/02/2012 05:18 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: subplots: Also complete the locationevent/location diamond (subplot event -> {subplot location, parent plot event} -> parent plot location) when an eventDate or range is specified, as this is also an identifying field for locationevent. This fixes a bug where subplots data without explicit plot events (such as SALVIAS and TEAM) was not being connected to the appropriate parent plot event as well as parent plot location. This should fix the SALVIAS verification # location events, which should include only parent plots' locationevents to correspond with # locations, which only includes parent plots' locations, and uses locationevent.parent_id being NULL to determine what is a parent plot event.

5976 11/02/2012 04:45 PM Aaron Marcuse-Kubitza

input.Makefile: Verification of import: $(verify): Use $(inDatasrc) so that the full standard search_path, including any custom public schema, is made available to the VegBIEN-side verification script

5975 11/02/2012 04:33 PM Aaron Marcuse-Kubitza

mappings/verify.plots.sql: # locations: Fixed bug where need to use location instead of locationevent

5974 11/02/2012 04:30 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: Updated for current schema

5973 11/02/2012 04:26 PM Aaron Marcuse-Kubitza

input.Makefile: Verification of import: verify/%.out: Made the target .PRECIOUS so that partial output will be saved in case of error to help debugging

5972 11/02/2012 04:13 PM Aaron Marcuse-Kubitza

mappings/verify.plots.sql: Updated for current schema

5971 11/02/2012 04:01 PM Aaron Marcuse-Kubitza

Updated validation/BIEN2_Analytical_DB_overview.docx with Brad's revision

5970 11/02/2012 03:58 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_set_canon_label_id(): Fixed bug where always need to set canon_label_id on insert (because it must be a self-reference rather than NULL if there is no matched_label_id), but wasn't being set when no matched_label_id because the IS DISTINCT FROM check returned false

5969 11/02/2012 03:42 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: decimalLatitude/Longitude->geoscrub input coordinates: Also set to NULL if 0 here, not just for the coordinates linked to the datasource's place instance

5968 11/02/2012 03:37 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Switched back to using run_query_into()'s add_pkey_ option now that it uses sql.add_pkey_or_index() instead of sql.add_pkey()

5967 11/02/2012 03:35 PM Aaron Marcuse-Kubitza

sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.

5966 11/02/2012 03:24 PM Aaron Marcuse-Kubitza

README.TXT: Data import: After import, removed steps to reinstall tnrs_accepted, tnrs_other because this would overwrite the versions of these staging tables that were used to build the current version of the database. (The backup should store a snapshot of the database's input data as it was used to build the database.) Note that these tables will still be reinstalled when the next import starts.

5965 11/02/2012 02:54 PM Aaron Marcuse-Kubitza

Added validation/BIEN2_Analytical_DB_overview.docx

5964 11/02/2012 02:54 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

5963 11/02/2012 12:00 PM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs_*/create.sql: Fixed bug where need to COALESCE the WHERE condition with false to ensure that a boolean value, not NULL, is negated by NOT. Otherwise, tnrs_other's query would exclude rows for which the condition returned NULL rather than false.

5962 11/02/2012 11:43 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: make_analytical_db(): Truncate tables before (re-)populating them in case make_analytical_db() has already been run

5961 11/02/2012 11:40 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: Updated to use reference instead of party to store the datasource name

5960 11/02/2012 11:28 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed make_analytical_stem() back to make_analytical_db()

5959 11/01/2012 10:52 AM Aaron Marcuse-Kubitza

import_all: After starting geoscrub import in the background, wait for make commands to scroll by before starting NCBI import

5958 11/01/2012 10:39 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: matched place: Also map verbatim place's geoscrub-related fields to the matched place, to link up with geoscrub's corresponding input place

5957 11/01/2012 10:22 AM Aaron Marcuse-Kubitza

import_all: Removed explicit by_col=1 from datasources that don't require it for proper import. (It will still be set if the user provides it on the command line.)

5956 11/01/2012 10:18 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Removed duplicate previous version of _and() with fewer parameters

5955 11/01/2012 10:14 AM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

5954 11/01/2012 10:11 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: method: Made reference_id NOT NULL so the method table would automatically be datasource-scoped

5953 11/01/2012 10:09 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Renamed creator_ids to reference_id since they are now fkeys to reference

5952 11/01/2012 10:04 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Made creator_ids an fkey to reference instead of party, so that datasources are stored separately from people and to allow adding reference-type metadata (URL, copyright, etc.) for each datasource

5951 11/01/2012 09:51 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: reference: Added unique index

5950 11/01/2012 09:47 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Remapped accordingTo to taxonconcept.concept_reference_id

5949 11/01/2012 09:46 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Don't populate taxonlabel.creator_id using identifiedBy, because that was meant for creating a taxonconcept when the accordingTo was not specified, but taxonconcepts are now modeled differently using a taxonconcept subclass of taxonlabel with a concept_reference_id

5948 11/01/2012 09:36 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonverbatim: Renamed binomial to taxonname because this field is actually the taxonomic name without the author, not just the genus+specific epithet binomial

5947 11/01/2012 09:29 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Renamed *Binomial to *TaxonName because this field can store more ranks than just the genus+specificEpithet binomial (that goes in speciesBinomial)

5946 11/01/2012 09:22 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Renamed taxonName to taxonNameOrEpithet for clarity

5945 11/01/2012 09:16 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_aggregate_view: Added growthForm, cultivated, cultivatedBasis from analytical_stem. Note that this will create multiple entries for a taxon name when some taxonoccurrences are marked as cultivated and others aren't, or different taxonoccurrences have different growthform entries.

5944 11/01/2012 09:01 AM Aaron Marcuse-Kubitza

import_all: Added geoscrub import, which can happen concurrently with NCBI/TNRS but must come before the main datasources for the matched places to link up properly

5943 11/01/2012 08:59 AM Aaron Marcuse-Kubitza

import_all: Documented that TNRS import must come after NCBI for cross links to be made

5942 11/01/2012 08:53 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%.sql: _MySQL/%.sql`, etc.: Only run if target does not exist, to avoid regenerating the target when a .make script which generates the target's prerequisite is checked out from svn (causing its mod time to be newer than the target)

5941 11/01/2012 08:24 AM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/

5940 11/01/2012 08:21 AM Aaron Marcuse-Kubitza

input.Makefile: SVN: $(*svnFilesGlob): Added */, _MySQL/ subdirs

5939 11/01/2012 08:20 AM Aaron Marcuse-Kubitza

lib/common.Makefile: SVN: $(add*): Fixed bug where need to use --depth=empty to ensure directory contents are not added unless explicitly listed

5938 11/01/2012 08:09 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped georeferenceValid, latLongInCountry, latLongInStateProvince

5937 11/01/2012 08:05 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added _and()

5936 11/01/2012 07:40 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped distanceToCountry_km, distanceToStateProvince_km

5935 11/01/2012 07:39 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _km_to_m()

5934 11/01/2012 07:36 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _km_to_m()

5933 11/01/2012 07:31 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Added georeference_valid, distance_to_country_m, distance_to_state_m

5932 11/01/2012 07:18 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: matched taxonlabel's ancestors: Only create the cross links to NCBI if the name is accepted (taxonIsCanonical)

5931 11/01/2012 06:55 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: accepted* taxonomic terms: If is canonical name, don't populate a separate accepted taxonlabel because the taxonlabel itself is accepted. If is not canonical name, populate matched_label_id directly when the row is created instead of waiting until after all rows are created and using _taxonlabel_set_matched_label_id(). This is possible now that accepted taxonlabels occur only in TNRS's tnrs_accepted table, which is specially marked as taxonIsCanonical. Avoiding _taxonlabel_set_matched_label_id() should fix the slowdown that occurs when it's used on large tables.

5930 11/01/2012 06:48 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonIsCanonical: Moved then=0 into non-identifying _if statement attributes

5929 11/01/2012 06:41 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped taxonIsCanonical

5928 11/01/2012 06:41 AM Aaron Marcuse-Kubitza

bin/map: map_rows(): map_table(): Fixed bug where metadata values were being removed prematurely, by passing them through

5927 11/01/2012 06:40 AM Aaron Marcuse-Kubitza

bin/map: map_rows(): Fixed bug where metadata values were being passed to functions that expected columns, by placing them directly in the XML import tree and then removing them from the mappings

5926 11/01/2012 05:51 AM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/new_terms.csv: Filter out metadata values, which are not new terms even though they are in the same map column as the input columns

5925 11/01/2012 05:46 AM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs_accepted/map.csv: Set taxonIsCanonical to true, using new metadata value syntax

5924 11/01/2012 05:40 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added taxonIsCanonical

5923 11/01/2012 05:37 AM Aaron Marcuse-Kubitza

input.Makefile: SVN: $(svnFiles): Added _MySQL files that would otherwise be excluded with $(filter-out _% ...)

5922 11/01/2012 05:34 AM Aaron Marcuse-Kubitza

inputs/*/: Placed all logs/ subdirs and import_order.txt under version control, using `make inputs/.../add`

5921 11/01/2012 05:31 AM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Factored list of files to add out into $(svnFiles) var

5920 11/01/2012 05:30 AM Aaron Marcuse-Kubitza

Added inputs/.TNRS/tnrs/header.csv

5919 11/01/2012 05:28 AM Aaron Marcuse-Kubitza

input.Makefile: SVN: $(svnFilesGlob): Added *.make

5918 11/01/2012 05:21 AM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Fixed bug where _-prefixed dirs (such as _MySQL) were incorrectly being included in the list of files to put under version control

5917 11/01/2012 05:15 AM Aaron Marcuse-Kubitza

Calls to `make inputs/.TNRS/cleanup`: Do `make inputs/.TNRS/tnrs_accepted/reinstall; make inputs/.TNRS/tnrs_other/reinstall` instead to use new split TNRS tables

5916 11/01/2012 05:04 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Removed no longer needed tnrs_accepted_names index, now that the accepted names have been split into a separate table

5915 11/01/2012 05:00 AM Aaron Marcuse-Kubitza

inputs/.TNRS/: Split tnrs table into tnrs_accepted and tnrs_other, with tnrs_accepted imported first. This will eventually allow TNRS data to be imported without needing to use _taxonlabel_set_matched_label_id(), which is extremely slow due to an unknown bug in PostgreSQL's handling of UPDATEs on large tables (possibly limited to UPDATEs inside functions).

5914 11/01/2012 05:00 AM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Fixed bug where logs/*.log.sql was incorrectly being included in the list of files to put under version control

5913 11/01/2012 04:53 AM Aaron Marcuse-Kubitza

lib/common.Makefile: Filesystem: $(wildcard/): Fixed bug where {} exprs weren't handled correctly when some branches contained *, by using `shopt -s nullglob` and bash directly instead of trying to emulate it with grep

5912 11/01/2012 04:44 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also provide the schema to create.sql in :schema

5911 11/01/2012 04:16 AM Aaron Marcuse-Kubitza

bin/map: Added support for including literal metadata values in the map spreadsheet, by prefixing them with ':'

5910 11/01/2012 03:56 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added distanceToCountry_km, distanceToStateProvince_km

5909 11/01/2012 03:45 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added distanceToCountry_m, distanceToStateProvince_m

5908 11/01/2012 03:36 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude. Mapped decimalLatitude/Longitude to matched place's coordinates when acceptedDecimalLatitude/Longitude not provided (as is the case for the geoscrub table).

5907 11/01/2012 03:22 AM Aaron Marcuse-Kubitza

input.Makefile: SVN: $(svnFilesGlob): Added schema.sql

5906 11/01/2012 02:57 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Replaced georeferenceInvalid with latLongDomainValid, latLongDomainInvalid because this field is actually just whether the coordinates are in the range of valid decimal values, not whether they are within the named place reference. Re-sourced georeferenceValid to bien_web.observation#isGeovalid rather than isValidLatLong. VegCore-VegBIEN.csv: Replaced now-removed georeferenceInvalid with latLongInCountry for locationID redirection checks.

5905 11/01/2012 01:54 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided

5904 11/01/2012 01:06 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: accepted* taxonlabel: Removed ancestor hierarchy because this is populated, in much greater detail, when the accepted name is imported as an input name and the TNRS-parsed components are available

5903 11/01/2012 12:55 AM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.

5902 11/01/2012 12:36 AM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Changed comment about distinct_on column index to include just the input table, so that the function does not guarantee a unique index on the output table's distinct_on columns

5901 11/01/2012 12:15 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude

5900 10/31/2012 11:57 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Renamed latLongValid, latLongInvalid to georeferenceValid, georeferenceInvalid to correspond to DwC term georeferenceVerificationStatus

5899 10/31/2012 11:45 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added latLongValid, latLongInvalid, latLongInCountry, latLongInStateProvince

5898 10/31/2012 11:14 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Treat any .sql file whose name contains (not just ends with) "schema" as a schema file and sort it before other .sql files

5897 10/31/2012 10:17 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Functions containing UPDATE statements: Use quote_nullable() instead of quote_literal() to properly encode NULL values

5896 10/31/2012 10:10 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Functions containing UPDATE statements: Use PL/pgSQL's EXECUTE statement to avoid caching query plans. This is necessary because as the table grows over time, the optimal query plan may change.

5895 10/31/2012 10:05 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): When deleting rows rows that do not satisfy the condition, handle sql.DoesNotExistExceptions caused by columns in the condition that were not replaced with NULL. These occur when out_table is a function, and the columns of the table the condition relates to therefore can't be found using out_table.

5894 10/31/2012 09:59 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Calling function: Do not cache the function call, because it may be retried after error handling

5893 10/31/2012 09:58 PM Aaron Marcuse-Kubitza

sql_gen.py: NotCond: Treat a condition that evaluates to NULL as false instead, so that the boolean effect of the condition is completely inverted

5892 10/31/2012 09:42 PM Aaron Marcuse-Kubitza

sql_gen.py: null_as_str: Use new null instead of hardcoding 'NULL'

5891 10/31/2012 09:41 PM Aaron Marcuse-Kubitza

sql_gen.py: Added null

5890 10/31/2012 09:40 PM Aaron Marcuse-Kubitza

sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged

5889 10/31/2012 08:45 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): sql.DatabaseErrors: Factored exception-handling code out into handle_unknown_exc(), for use by other exception handlers

5888 10/31/2012 08:39 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): is_function: Fixed bug where can't replace out_table_cols with NULL because out_table is a function, not a table