Project

General

Profile

Activity

From 10/10/2012 to 11/08/2012

11/08/2012

06:38 PM Revision 6098: root Makefile: VegBIEN DB: Schemas: Run all schema installs and uninstalls using no_search_path=1, so that the schemas in the automatic search_path are not required for the command to run
Aaron Marcuse-Kubitza
06:37 PM Revision 6097: psql_vegbien: Added $no_search_path option to turn off the automatic SET search_path directive
Aaron Marcuse-Kubitza
06:11 PM Revision 6096: schemas/vegbien.sql: taxonverbatim: Added growthform for growthform based on the taxon name rather than provided with the input data's taxonoccurrence
Aaron Marcuse-Kubitza
06:00 PM Revision 6095: schemas/vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
05:47 PM Revision 6094: inputs/SALVIAS/plotMetadata/: LEFT JOINed with lookup_MethodCode to create plotMetadata_
Aaron Marcuse-Kubitza
04:52 PM Revision 6093: schemas/vegbien.sql: threatened_taxonlabel_view: Fixed bug where needed DISTINCT on resulting taxonlabel_id because some descendants apparently appear in multiple threatened taxonlabels' subtrees
Aaron Marcuse-Kubitza
04:42 PM Revision 6092: schemas/vegbien.sql: analytical_*: Added threatened, using new threatened_taxonlabel lookup table
Aaron Marcuse-Kubitza
04:12 PM Revision 6091: schemas/vegbien.sql: reference_by_shortname(): Fixed bug where need to use $-syntax to reference params in sql-language functions
Aaron Marcuse-Kubitza
04:07 PM Revision 6090: schemas/vegbien.sql: threatened_taxonlabel_view: Use new reference_by_shortname()
Aaron Marcuse-Kubitza
03:45 PM Revision 6089: root Makefile: VegBIEN DB: Schemas: public: schemas/public/uninstall: Fixed bug where need to run psql_vegbien without public in the search_path because it may have already been deleted
Aaron Marcuse-Kubitza
03:44 PM Revision 6088: root Makefile: VegBIEN DB: Schemas: public: schemas/public/install: Fixed bug where need to run psql_vegbien without public in the search_path because it doesn't exist, by setting public to the empty string (deleting it)
Aaron Marcuse-Kubitza
03:42 PM Revision 6087: vegbien_dest: $schemas: Don't include the , before $public if it has been set to the empty string (deleted)
Aaron Marcuse-Kubitza
03:27 PM Revision 6086: schemas/vegbien.sql: Added reference_by_shortname(). Using this function instead of the manual query should force the query planner to evaluate this expression first, rather than complexly reordering joins to place this nested select as a filter condition.
Aaron Marcuse-Kubitza
03:00 PM Revision 6085: schemas/vegbien.sql: Added threatened_taxonlabel derived table with generating view threatened_taxonlabel_view
Aaron Marcuse-Kubitza
02:48 PM Revision 6084: Updated inputs/UNCC/Specimen/test.xml.ref inserted row count
Aaron Marcuse-Kubitza
01:38 PM Revision 6083: mappings/VegCore.csv: Added threatened
Aaron Marcuse-Kubitza
01:21 PM Revision 6082: inputs/VegBank/vegbank.~.clean_up.sql: Remove private columns (plot.reallatitude, reallongitude) that should not be publicly visible
Aaron Marcuse-Kubitza
01:13 PM Revision 6081: inputs/CVS/Organism/map.csv: Removed now-dropped realLatitude, realLongitude
Aaron Marcuse-Kubitza
01:12 PM Revision 6080: inputs/CVS/Organism/map.csv: Removed now-dropped realLatitude, realLongitude
Aaron Marcuse-Kubitza
01:12 PM Revision 6079: Added inputs/CVS/Organism/postprocess.sql to drop private realLatitude, realLongitude columns
Aaron Marcuse-Kubitza
01:10 PM Revision 6078: input.Makefile: Staging tables installation: Added back postprocess.sql, which is now used for one-time dropping of private columns that should not be publicly visible
Aaron Marcuse-Kubitza
12:47 PM Revision 6077: input.Makefile: Maps building: %/.map.csv.last_cleanup: $(dict) canon/translate: Use new $(translate?)
Aaron Marcuse-Kubitza
12:45 PM Revision 6076: input.Makefile: Maps building: %/.map.csv.last_cleanup: Added $(srcDict) as a prerequisite, so that .last_cleanup will be re-run if it changes. Added empty $(srcDict) target in case it doesn't exist.
Aaron Marcuse-Kubitza
12:39 PM Revision 6075: inputs/bien_web/observation/map.csv: Omit *_index because they are placeholder columns created by the MySQL to PostgreSQL translation
Aaron Marcuse-Kubitza
12:37 PM Revision 6074: input.Makefile: Maps building: %/.map.csv.last_cleanup: Fixed bug where can only canon/translate using $(srcDict) if it exists for the datasource
Aaron Marcuse-Kubitza
12:26 PM Revision 6073: inputs/bien_web/observation/: Regenerated from actual bien_web.observation schema on nimoy, which has additional columns
Aaron Marcuse-Kubitza
12:24 PM Revision 6072: input.Makefile: SVN: $(svnFilesGlob): Added top-level map.csv, which can be used to apply a datasource-global data dictionary to all tables
Aaron Marcuse-Kubitza
12:18 PM Revision 6071: input.Makefile: Maps building: %/.map.csv.last_cleanup: Also apply any map.csv at the top level of the datasource directory. This can be used to apply a datasource-global data dictionary to all tables.
Aaron Marcuse-Kubitza
12:01 PM Revision 6070: my2pg: Also remove column comments. Note that these cannot be translated by sed, because PostgreSQL only allows setting column comments in a separate statement, not inline with the column's entry in the CREATE TABLE statement, and sed can only make replacements contiguous with the input line.
Aaron Marcuse-Kubitza
11:28 AM Revision 6069: mappings/VegCore.csv: Removed incorrect duplicate entry for verbatimSubgenus
Aaron Marcuse-Kubitza
10:58 AM Revision 6068: schemas/vegbien.sql: _taxon_family_require_std(): Fixed bug where name needed to be lowercased before checking if it ended in -aceae, to support family names that are uppercase. Note that this resulted in the family not being prepended to the TNRS input name for datasources with uppercase family names, so the next DB import will likely produce a number of unscrubbed TNRS input names which now have the uppercase family prepended.
Aaron Marcuse-Kubitza
10:17 AM Revision 6067: inputs/.TNRS/tnrs/tnrs.make: Fixed bug where need to reference the log file path relative to the make script itself, because otherwise the log file would go in inputs/.TNRS/logs/tnrs.make.log.sql
Aaron Marcuse-Kubitza
10:07 AM Revision 6066: inputs/.TNRS/tnrs/tnrs.make: Fixed bug where need to use just the basename of $0 as the log file name
Aaron Marcuse-Kubitza
09:51 AM Revision 6065: Added inputs/IUCN/
Aaron Marcuse-Kubitza
09:51 AM Revision 6064: input.Makefile: SVN: add: Added _src/ (when it exists). $(_svnFilesGlob): Added .url, .pdf files in _src/.
Aaron Marcuse-Kubitza
07:47 AM Revision 6063: psql_vegbien: Use bash because it supports substitutions
Aaron Marcuse-Kubitza
07:46 AM Revision 6062: psql_vegbien: Set the search_path to $out_schemas set by vegbien_dest. This will enable running any psql_vegbien script on a schema other than public.
Aaron Marcuse-Kubitza
07:07 AM Revision 6061: schemas/vegbien.sql: analytical_stem_view: Changed inner joins on non-datasource taxonlabels to LEFT JOINs, to ensure that an entry is included in the analytical DB even if there was no matched taxonlabel. In theory, this shouldn't be necessary, because every taxonlabel should have a canonical taxonlabel since canon_label_id is auto-populated to the taxonlabel_id if there is no matched_label_id; there should be a taxonverbatim for every datasource and accepted taxonlabel because datasources link to taxonlabel via taxonverbatim and TNRS populates a taxonverbatim for every accepted taxonlabel; and there should be a parsed taxonlabel for every datasource taxonlabel because the mappings populate it.
Aaron Marcuse-Kubitza
06:56 AM Revision 6060: schemas/vegbien.sql: analytical_stem_view: Removed join on specimenreplicate, because it isn't used in the analytical DB. Each specimen will still get an entry in analytical_*, because it gets its own location.
Aaron Marcuse-Kubitza
06:45 AM Revision 6059: README.TXT: Data import: Before starting import, added step to run `make inputs/upload` and reinstall newly-uploaded datasources
Aaron Marcuse-Kubitza
03:22 AM Revision 6058: README.TXT: Schema changes: Remember to update the following files with any renamings: Added mappings/verify.*.sql
Aaron Marcuse-Kubitza
02:21 AM Revision 6057: README.TXT: Data import: make_analytical_db: Documented how to view progress in log file
Aaron Marcuse-Kubitza
02:18 AM Revision 6056: make_analytical_db: Run all commands synchronously so the log file output doesn't become jumbled
Aaron Marcuse-Kubitza
02:16 AM Revision 6055: make_analytical_db: Fixed bug where log file needed to be appended to instead of overwritten
Aaron Marcuse-Kubitza
02:15 AM Revision 6054: make_analytical_db: Wrap each individual command in `set -x` to avoid echoing low-level commands such as sleep, wait
Aaron Marcuse-Kubitza
02:02 AM Revision 6053: make_analytical_db: Moved log file to inputs/analytical_db/logs/make_analytical_db.log.sql so it would be synced along with the other import logs
Aaron Marcuse-Kubitza
01:57 AM Revision 6052: inputs/.TNRS/tnrs/tnrs.make: Output the time at which it's run, so this is included in the log file
Aaron Marcuse-Kubitza
01:53 AM Revision 6051: inputs/.TNRS/tnrs/tnrs.make: Moved log file to logs/tnrs.make.log.sql so it would automatically be synced along with the other import logs
Aaron Marcuse-Kubitza
01:49 AM Revision 6050: make_analytical_db: Moved log file to inputs/analytical_db/logs/make_analytical_db.log.sql so it would be synced along with the other import logs
Aaron Marcuse-Kubitza
01:40 AM Revision 6049: inputs/Makefile: Import logs: $(rsyncLogs): Always download the TNRS daemon log, rather than requiring tnrs_log=1 to be specified to download it
Aaron Marcuse-Kubitza
01:37 AM Revision 6048: make_analytical_db: Output the time at which it's run, so this is included in the log file
Aaron Marcuse-Kubitza
01:36 AM Revision 6047: make_analytical_db: Store output in schemas/make_analytical_db.log
Aaron Marcuse-Kubitza
01:24 AM Revision 6046: schemas/vegbien.sql: Removed no longer used make_analytical_db(). Use bin/make_analytical_db instead.
Aaron Marcuse-Kubitza
01:23 AM Revision 6045: make_analytical_db: Use new psql_verbose_vegbien
Aaron Marcuse-Kubitza
01:22 AM Revision 6044: Added psql_verbose_vegbien
Aaron Marcuse-Kubitza
01:18 AM Revision 6043: make_analytical_db: Use psql_script_vegbien, which contains the necessary psq options, instead of setting those options manually
Aaron Marcuse-Kubitza
01:15 AM Revision 6042: make_analytical_db: Run the SQL commands directly with psql instead of using the make_analytical_db() function. This provides incremental results and avoids running all commands in one transaction, thus preventing pgAdmin from freezing when the user attempts to access a table used in analytical DB creation (because the TRUNCATE statement fully locks the table until the entire analytical DB is built).
Aaron Marcuse-Kubitza
12:46 AM Revision 6041: schemas/vegbien.sql: make_analytical_db(): Added make_family_higher_plant_group()
Aaron Marcuse-Kubitza
12:17 AM Revision 6040: inputs/import.stats.xls: Updated import times. Fixed input row counts and import times to include derived data, such as TNRS and geoscrub, which adds to the import time and therefore should be considered in the import's speed. (TNRS was already being included in the import time for some, but not all, imports.)
Aaron Marcuse-Kubitza

11/06/2012

05:02 PM Revision 6039: schemas/vegbien.sql: analytical_*: Added higherPlantGroup
Aaron Marcuse-Kubitza
04:06 PM Revision 6038: pg_dump_vegbien: When not in data mode, use --inserts so the INSERT statements generated can be run directly in pgAdmin, they can be directly translated to MySQL, and an empty COPY FROM statement isn't generated for every table
Aaron Marcuse-Kubitza
04:05 PM Revision 6037: schemas/vegbien.sql: Added family_higher_plant_group lookup table, generated with make_family_higher_plant_group() using the contents of higher_plant_group_nodes
Aaron Marcuse-Kubitza
03:40 PM Revision 6036: schemas/vegbien.sql: Added family_higher_plant_group lookup table, generated with make_family_higher_plant_group() using the contents of higher_plant_group_nodes
Aaron Marcuse-Kubitza
03:23 PM Revision 6035: mappings/VegCore-VegBIEN.csv: Always map taxonNameOrEpithet to taxonomicname, now that it's globally unique at all ranks in the datasource that provides it (NCBI)
Aaron Marcuse-Kubitza
03:18 PM Revision 6034: inputs/.NCBI/nodes/create.sql: Make name_txt completely globally unique by removing all duplicates, not just duplicate genera
Aaron Marcuse-Kubitza
03:17 PM Revision 6033: inputs/.NCBI/nodes/create.sql: Make name_txt (mostly) globally unique by removing several other kingdoms/superkingdoms, not just Animalia
Aaron Marcuse-Kubitza
02:56 PM Revision 6032: inputs/.NCBI/nodes/create.sql: Making genus globally unique: Moved comment with kingdom name to line with DELETE, and put "delete cascades to descendants" comment on its own line
Aaron Marcuse-Kubitza
01:57 PM Revision 6031: schemas/vegbien.sql: Added higher_plant_group_nodes lookup table
Aaron Marcuse-Kubitza
01:57 PM Revision 6030: schemas/vegbien.sql: Added higher_plant_group_nodes lookup table
Aaron Marcuse-Kubitza
01:49 PM Revision 6029: schemas/Makefile: DDLs: $(pg_dump): Turn off schema-only mode so that lookup table contents are included as well
Aaron Marcuse-Kubitza
01:47 PM Revision 6028: pg_dump_vegbien: Except in (raw) data mode, filter out pg_catalog.setval() statements (only lookup table contents should be preserved)
Aaron Marcuse-Kubitza
01:39 PM Revision 6027: lib/PostgreSQL-MySQL.csv: Remove SELECT pg_catalog.setval() statements
Aaron Marcuse-Kubitza
01:27 PM Revision 6026: lib/PostgreSQL-MySQL.csv: Remove COPY statements
Aaron Marcuse-Kubitza
01:18 PM Revision 6025: pg_dump_vegbien: $schema flag defaults to on to export only schema
Aaron Marcuse-Kubitza
01:08 PM Revision 6024: pg_dump_vegbien: Fixed bug where schema name var can't be named $schema because that would conflict with the $schema flag
Aaron Marcuse-Kubitza
01:03 PM Revision 6023: root Makefile: VegBIEN DB: DB and bien user: db: Run inputs/.TNRS/install before schemas/install because public.tnrs_input_name depends on TNRS.tnrs
Aaron Marcuse-Kubitza
12:33 PM Revision 6022: pg_dump_vegbien: Only include just the schema if new $schema flag is set. This causes constant lookup tables to automatically be exported for DB schemas, which was not fully supported by the $data flag because it also forced $owners to be enabled.
Aaron Marcuse-Kubitza
12:17 PM Revision 6021: schemas/vegbien.sql: Added higher_plant_group enum
Aaron Marcuse-Kubitza
11:25 AM Revision 6020: mappings/VegCore.csv: Added higherPlantGroup
Aaron Marcuse-Kubitza
10:40 AM Revision 6019: schemas/vegbien.sql: analytical_*: Added isNewWorld
Aaron Marcuse-Kubitza
10:32 AM Revision 6018: mappings/VegCore.csv: Added isNewWorld
Aaron Marcuse-Kubitza
10:21 AM Revision 6017: inputs/newWorld/: Added geoscrub.schema.~.changes.sql to add a unique constraint on countryNameStd and change isNewWorld's type to boolean
Aaron Marcuse-Kubitza
09:59 AM Revision 6016: Added inputs/newWorld/
Aaron Marcuse-Kubitza
09:55 AM Revision 6015: README.TXT: Datasource setup: Moved Auto-create the map spreadsheets after Install the staging tables, because for DB-only tables, creating the map spreadsheets requires the header.csv generated by the staging tables install
Aaron Marcuse-Kubitza
09:48 AM Revision 6014: input.Makefile: SVN: add: Also set _MySQL/'s svn:ignore
Aaron Marcuse-Kubitza
09:30 AM Revision 6013: inputs/SALVIAS/_MySQL/salvias_plots.*.sql.make: Use new my2pg_export
Aaron Marcuse-Kubitza
09:27 AM Revision 6012: inputs/.geoscrub/_MySQL/geoscrub.*.sql.make: Use new my2pg_export
Aaron Marcuse-Kubitza
09:25 AM Revision 6011: Added my2pg_export
Aaron Marcuse-Kubitza

11/05/2012

11:16 PM Revision 6010: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
11:14 PM Revision 6009: schemas/vegbien.ERD.mwb: Added analytical_stem, analytical_aggregate to ERD
Aaron Marcuse-Kubitza
10:49 PM Revision 6008: schemas/vegbien.sql: analytical_*: Added georeferenceValid, distanceToCountry_km, distanceToStateProvince_km
Aaron Marcuse-Kubitza
10:34 PM Revision 6007: schemas/vegbien.sql: Added _m_to_km()
Aaron Marcuse-Kubitza
10:06 PM Revision 6006: inputs/.geoscrub/geoscrub_cleaned_unique/map.csv: Removed geoscrubID->locationID mapping because it allowed multiple occurrences of the same placename/coordinates combination, which was causing problems when datasources try to link up to the geoscrub places (and end up matching multiple output rows for each input row)
Aaron Marcuse-Kubitza
09:54 PM Revision 6005: sql_io.py: put_table(): Removed assertion that into's row count be at least full_in_table's row count, because now that DISTINCT ON is used to satisfy the into table pkey, this is no longer necessarily true
Aaron Marcuse-Kubitza
09:15 PM Revision 6004: schemas/vegbien.sql: method: method_unique: Added reference_id to make method unique within the datasource instead of globally unique within VegBIEN
Aaron Marcuse-Kubitza
09:12 PM Revision 6003: mappings/VegCore.csv: Added samplingProtocolID
Aaron Marcuse-Kubitza
08:48 PM Revision 6002: mappings/VegCore-VegBIEN.csv: subplot locationevent: Only populate parent locationevent's location unique IDs if a subplot #/subplotID is actually specified. (The lack of a location unique ID will cause the parent locationevent's location to be removed, as well as the parent locationevent itself if there is no parent locationevent unique ID.) This fixes a bug where top-level plots in datasources that provide a nullable subplot #/subplotID were incorrectly getting connected to parent locationevents.
Aaron Marcuse-Kubitza
08:34 PM Revision 6001: mappings/verify.plots.sql: Added # locations, # location events, which also include subplots
Aaron Marcuse-Kubitza
08:17 PM Revision 6000: inputs/SALVIAS/verify/plots.ref.sql: Standardized line wrapping to put ; on its own line in multi-line queries and at the end of the line in one-line queries
Aaron Marcuse-Kubitza
08:13 PM Revision 5999: mappings/verify.plots.sql: Added "top-level" to # locations, # location events names because these queries do not include all locations
Aaron Marcuse-Kubitza
07:52 PM Revision 5998: inputs/SALVIAS/verify/plots.ref.sql: # locations: Fixed count to include all plotMetadata entries instead of just unique coordinates, because each plotMetadata entry now gets its own location
Aaron Marcuse-Kubitza
07:34 PM Revision 5997: mappings/VegCore-VegBIEN.csv: matched place: When any of the accepted* names is specified, set matched_place_id=0 (self-reference) to indicate that the matched place is an accepted (scrubbed) place
Aaron Marcuse-Kubitza
05:30 PM Revision 5996: schemas/vegbien.sql: analytical_stem_view: Use the canonical (scrubbed) place instead of the datasource's place where available. Note that canon_place_id is always populated (with a self-reference if needed), to enable using a just a LEFT JOIN for this.
Aaron Marcuse-Kubitza
05:14 PM Revision 5995: schemas/vegbien.sql: analytical_stem_view: Use datasource_* prefix instead of reference_* prefix that tables had gotten when datasource_id was renamed to reference_id
Aaron Marcuse-Kubitza
05:04 PM Revision 5994: inputs/SALVIAS/verify/plots.ref: Regenerated on vegbiendev using the PostgreSQL 9.1 pg_catalog.default collation
Aaron Marcuse-Kubitza
04:49 PM Revision 5993: sql_io.py: put_table(): Setting pkeys of missing rows: Fixed bug in column-based import where when input rows match multiple output rows in one of this iteration's input tables, the into table's pkey constraint is violated because full_in_table contains multiple entries for an input pkey
Aaron Marcuse-Kubitza
04:42 PM Revision 5992: schemas/vegbien.sql: place: Added index on matched_place_id to facilitate looking up places by matched_place_id in place_set_canon_place_id()
Aaron Marcuse-Kubitza
04:26 PM Revision 5991: schemas/vegbien.sql: Added geoscrub_input derived table with associated view. Build geoscrub_input as part of make_analytical_db().
Aaron Marcuse-Kubitza
03:56 PM Revision 5990: sql.py: Added table_has_pkey()
Aaron Marcuse-Kubitza
03:13 PM Revision 5989: sql.py: table_pkey_col(): For PostgreSQL DBs, use pg_catalog via index_cols() and table_pkey_index(), in order to use the search_path to look up the table. This fixes a bug where the pkey would be selected from information_schema.table_constraints in random order, and this order sometimes returned the corresponding table in the public schema but sometimes in other schemas, such as VegBank. This became a problem now that VegBIEN has a place table, which conflicts with VegBank's place table. (Most other VegBank tables that are mapped to have been renamed in VegBIEN.)
Aaron Marcuse-Kubitza
02:41 PM Revision 5988: sql.py: add_pkey_or_index(): Use new add_pkey_index()
Aaron Marcuse-Kubitza
02:41 PM Revision 5987: sql.py: Added add_pkey_index()
Aaron Marcuse-Kubitza
02:29 PM Revision 5986: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
02:15 PM Revision 5985: import_times: Use $'' quoting to expand tab, in order to also work on Mac OS X
Aaron Marcuse-Kubitza

11/02/2012

06:14 PM Revision 5984: schemas/vegbien.sql: place: Added place_set_canon_place_id_on_*() triggers, analogous to taxonlabel.taxonlabel_2_set_canon_label_id_on_*()
Aaron Marcuse-Kubitza
06:12 PM Revision 5983: schemas/vegbien.sql: place: Added canon_place_id
Aaron Marcuse-Kubitza
06:04 PM Revision 5982: mappings/verify.plots.sql, inputs/SALVIAS/verify/plots.ref.sql: Renamed project column to projectname to avoid confusion with the project table
Aaron Marcuse-Kubitza
06:03 PM Revision 5981: input.Makefile: Verification of import: Fixed bug where needed to make %.ref .PRECIOUS instead of verify/%.ref
Aaron Marcuse-Kubitza
05:57 PM Revision 5980: inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id everywhere, not just as the ID to list for each project, to match the corresponding VegBIEN queries
Aaron Marcuse-Kubitza
05:55 PM Revision 5979: input.Makefile: Verification of import: Make verify/%.ref .PRECIOUS because there must always be a .ref for the make rules to work
Aaron Marcuse-Kubitza
05:34 PM Revision 5978: inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id as the ID to list for each project, since the SALVIAS projects table is now mapped to the DB and the project_name is used as project.projectname
Aaron Marcuse-Kubitza
05:18 PM Revision 5977: mappings/VegCore-VegBIEN.csv: subplots: Also complete the locationevent/location diamond (subplot event -> {subplot location, parent plot event} -> parent plot location) when an eventDate or range is specified, as this is also an identifying field for locationevent. This fixes a bug where subplots data without explicit plot events (such as SALVIAS and TEAM) was not being connected to the appropriate parent plot *event* as well as parent plot location. This should fix the SALVIAS verification # location events, which should include only parent plots' locationevents to correspond with # locations, which only includes parent plots' locations, and uses locationevent.parent_id being NULL to determine what is a parent plot event.
Aaron Marcuse-Kubitza
04:45 PM Revision 5976: input.Makefile: Verification of import: $(verify): Use $(inDatasrc) so that the full standard search_path, including any custom public schema, is made available to the VegBIEN-side verification script
Aaron Marcuse-Kubitza
04:33 PM Revision 5975: mappings/verify.plots.sql: # locations: Fixed bug where need to use location instead of locationevent
Aaron Marcuse-Kubitza
04:30 PM Revision 5974: mappings/verify.specimens.sql: Updated for current schema
Aaron Marcuse-Kubitza
04:26 PM Revision 5973: input.Makefile: Verification of import: verify/%.out: Made the target .PRECIOUS so that partial output will be saved in case of error to help debugging
Aaron Marcuse-Kubitza
04:13 PM Revision 5972: mappings/verify.plots.sql: Updated for current schema
Aaron Marcuse-Kubitza
04:01 PM Revision 5971: Updated validation/BIEN2_Analytical_DB_overview.docx with Brad's revision
Aaron Marcuse-Kubitza
03:58 PM Revision 5970: schemas/vegbien.sql: taxonlabel_set_canon_label_id(): Fixed bug where always need to set canon_label_id on insert (because it must be a self-reference rather than NULL if there is no matched_label_id), but wasn't being set when no matched_label_id because the IS DISTINCT FROM check returned false
Aaron Marcuse-Kubitza
03:42 PM Revision 5969: mappings/VegCore-VegBIEN.csv: decimalLatitude/Longitude->geoscrub input coordinates: Also set to NULL if 0 here, not just for the coordinates linked to the datasource's place instance
Aaron Marcuse-Kubitza
03:37 PM Revision 5968: sql_io.py: put_table(): Switched back to using run_query_into()'s add_pkey_ option now that it uses sql.add_pkey_or_index() instead of sql.add_pkey()
Aaron Marcuse-Kubitza
03:35 PM Revision 5967: sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.
Aaron Marcuse-Kubitza
03:24 PM Revision 5966: README.TXT: Data import: After import, removed steps to reinstall tnrs_accepted, tnrs_other because this would overwrite the versions of these staging tables that were used to build the current version of the database. (The backup should store a snapshot of the database's input data as it was used to build the database.) Note that these tables will still be reinstalled when the next import starts.
Aaron Marcuse-Kubitza
02:54 PM Revision 5965: Added validation/BIEN2_Analytical_DB_overview.docx
Aaron Marcuse-Kubitza
02:54 PM Revision 5964: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
12:00 PM Revision 5963: inputs/.TNRS/tnrs_*/create.sql: Fixed bug where need to COALESCE() the WHERE condition with false to ensure that a boolean value, not NULL, is negated by NOT. Otherwise, tnrs_other's query would exclude rows for which the condition returned NULL rather than false.
Aaron Marcuse-Kubitza
11:43 AM Revision 5962: schemas/vegbien.sql: make_analytical_db(): Truncate tables before (re-)populating them in case make_analytical_db() has already been run
Aaron Marcuse-Kubitza
11:40 AM Revision 5961: schemas/vegbien.sql: analytical_stem_view: Updated to use reference instead of party to store the datasource name
Aaron Marcuse-Kubitza
11:28 AM Revision 5960: schemas/vegbien.sql: Renamed make_analytical_stem() back to make_analytical_db()
Aaron Marcuse-Kubitza

11/01/2012

10:52 AM Revision 5959: import_all: After starting geoscrub import in the background, wait for make commands to scroll by before starting NCBI import
Aaron Marcuse-Kubitza
10:39 AM Revision 5958: mappings/VegCore-VegBIEN.csv: matched place: Also map verbatim place's geoscrub-related fields to the matched place, to link up with geoscrub's corresponding input place
Aaron Marcuse-Kubitza
10:22 AM Revision 5957: import_all: Removed explicit by_col=1 from datasources that don't require it for proper import. (It will still be set if the user provides it on the command line.)
Aaron Marcuse-Kubitza
10:18 AM Revision 5956: schemas/functions.sql: Removed duplicate previous version of _and() with fewer parameters
Aaron Marcuse-Kubitza
10:14 AM Revision 5955: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
10:11 AM Revision 5954: schemas/vegbien.sql: method: Made reference_id NOT NULL so the method table would automatically be datasource-scoped
Aaron Marcuse-Kubitza
10:09 AM Revision 5953: mappings/VegCore-VegBIEN.csv: Renamed creator_ids to reference_id since they are now fkeys to reference
Aaron Marcuse-Kubitza
10:04 AM Revision 5952: schemas/vegbien.sql: Made creator_ids an fkey to reference instead of party, so that datasources are stored separately from people and to allow adding reference-type metadata (URL, copyright, etc.) for each datasource
Aaron Marcuse-Kubitza
09:51 AM Revision 5951: schemas/vegbien.sql: reference: Added unique index
Aaron Marcuse-Kubitza
09:47 AM Revision 5950: mappings/VegCore-VegBIEN.csv: Remapped accordingTo to taxonconcept.concept_reference_id
Aaron Marcuse-Kubitza
09:46 AM Revision 5949: mappings/VegCore-VegBIEN.csv: Don't populate taxonlabel.creator_id using identifiedBy, because that was meant for creating a taxonconcept when the accordingTo was not specified, but taxonconcepts are now modeled differently using a taxonconcept subclass of taxonlabel with a concept_reference_id
Aaron Marcuse-Kubitza
09:36 AM Revision 5948: schemas/vegbien.sql: taxonverbatim: Renamed binomial to taxonname because this field is actually the taxonomic name without the author, not just the genus+specific epithet binomial
Aaron Marcuse-Kubitza
09:29 AM Revision 5947: mappings/VegCore.csv: Renamed *Binomial to *TaxonName because this field can store more ranks than just the genus+specificEpithet binomial (that goes in speciesBinomial)
Aaron Marcuse-Kubitza
09:22 AM Revision 5946: mappings/VegCore.csv: Renamed taxonName to taxonNameOrEpithet for clarity
Aaron Marcuse-Kubitza
09:16 AM Revision 5945: schemas/vegbien.sql: analytical_aggregate_view: Added growthForm, cultivated, cultivatedBasis from analytical_stem. Note that this will create multiple entries for a taxon name when some taxonoccurrences are marked as cultivated and others aren't, or different taxonoccurrences have different growthform entries.
Aaron Marcuse-Kubitza
09:01 AM Revision 5944: import_all: Added geoscrub import, which can happen concurrently with NCBI/TNRS but must come before the main datasources for the matched places to link up properly
Aaron Marcuse-Kubitza
08:59 AM Revision 5943: import_all: Documented that TNRS import must come after NCBI for cross links to be made
Aaron Marcuse-Kubitza
08:53 AM Revision 5942: input.Makefile: Staging tables installation: `%.sql: _MySQL/%.sql`, etc.: Only run if target does not exist, to avoid regenerating the target when a .make script which generates the target's prerequisite is checked out from svn (causing its mod time to be newer than the target)
Aaron Marcuse-Kubitza
08:24 AM Revision 5941: Added inputs/.geoscrub/
Aaron Marcuse-Kubitza
08:21 AM Revision 5940: input.Makefile: SVN: $(*svnFilesGlob): Added */, _MySQL/ subdirs
Aaron Marcuse-Kubitza
08:20 AM Revision 5939: lib/common.Makefile: SVN: $(add*): Fixed bug where need to use --depth=empty to ensure directory contents are not added unless explicitly listed
Aaron Marcuse-Kubitza
08:09 AM Revision 5938: mappings/VegCore-VegBIEN.csv: Mapped georeferenceValid, latLongInCountry, latLongInStateProvince
Aaron Marcuse-Kubitza
08:05 AM Revision 5937: schemas/functions.sql: Added _and()
Aaron Marcuse-Kubitza
07:40 AM Revision 5936: mappings/VegCore-VegBIEN.csv: Mapped distanceToCountry_km, distanceToStateProvince_km
Aaron Marcuse-Kubitza
07:39 AM Revision 5935: schemas/vegbien.sql: Added _km_to_m()
Aaron Marcuse-Kubitza
07:36 AM Revision 5934: schemas/vegbien.sql: Added _km_to_m()
Aaron Marcuse-Kubitza
07:31 AM Revision 5933: schemas/vegbien.sql: place: Added georeference_valid, distance_to_country_m, distance_to_state_m
Aaron Marcuse-Kubitza
07:18 AM Revision 5932: mappings/VegCore-VegBIEN.csv: matched taxonlabel's ancestors: Only create the cross links to NCBI if the name is accepted (taxonIsCanonical)
Aaron Marcuse-Kubitza
06:55 AM Revision 5931: mappings/VegCore-VegBIEN.csv: accepted* taxonomic terms: If is canonical name, don't populate a separate accepted taxonlabel because the taxonlabel itself is accepted. If is not canonical name, populate matched_label_id directly when the row is created instead of waiting until after all rows are created and using _taxonlabel_set_matched_label_id(). This is possible now that accepted taxonlabels occur only in TNRS's tnrs_accepted table, which is specially marked as taxonIsCanonical. Avoiding _taxonlabel_set_matched_label_id() should fix the slowdown that occurs when it's used on large tables.
Aaron Marcuse-Kubitza
06:48 AM Revision 5930: mappings/VegCore-VegBIEN.csv: taxonIsCanonical: Moved then=0 into non-identifying _if statement attributes
Aaron Marcuse-Kubitza
06:41 AM Revision 5929: mappings/VegCore-VegBIEN.csv: Mapped taxonIsCanonical
Aaron Marcuse-Kubitza
06:41 AM Revision 5928: bin/map: map_rows(): map_table(): Fixed bug where metadata values were being removed prematurely, by passing them through
Aaron Marcuse-Kubitza
06:40 AM Revision 5927: bin/map: map_rows(): Fixed bug where metadata values were being passed to functions that expected columns, by placing them directly in the XML import tree and then removing them from the mappings
Aaron Marcuse-Kubitza
05:51 AM Revision 5926: input.Makefile: Maps validation: %/new_terms.csv: Filter out metadata values, which are not new terms even though they are in the same map column as the input columns
Aaron Marcuse-Kubitza
05:46 AM Revision 5925: inputs/.TNRS/tnrs_accepted/map.csv: Set taxonIsCanonical to true, using new metadata value syntax
Aaron Marcuse-Kubitza
05:40 AM Revision 5924: mappings/VegCore.csv: Added taxonIsCanonical
Aaron Marcuse-Kubitza
05:37 AM Revision 5923: input.Makefile: SVN: $(svnFiles): Added _MySQL files that would otherwise be excluded with $(filter-out _% ...)
Aaron Marcuse-Kubitza
05:34 AM Revision 5922: inputs/*/: Placed all logs/ subdirs and import_order.txt under version control, using `make inputs/.../add`
Aaron Marcuse-Kubitza
05:31 AM Revision 5921: input.Makefile: SVN: add: Factored list of files to add out into $(svnFiles) var
Aaron Marcuse-Kubitza
05:30 AM Revision 5920: Added inputs/.TNRS/tnrs/header.csv
Aaron Marcuse-Kubitza
05:28 AM Revision 5919: input.Makefile: SVN: $(svnFilesGlob): Added *.make
Aaron Marcuse-Kubitza
05:21 AM Revision 5918: input.Makefile: SVN: add: Fixed bug where _-prefixed dirs (such as _MySQL) were incorrectly being included in the list of files to put under version control
Aaron Marcuse-Kubitza
05:15 AM Revision 5917: Calls to `make inputs/.TNRS/cleanup`: Do `make inputs/.TNRS/tnrs_accepted/reinstall; make inputs/.TNRS/tnrs_other/reinstall` instead to use new split TNRS tables
Aaron Marcuse-Kubitza
05:04 AM Revision 5916: inputs/.TNRS/schema.sql: Removed no longer needed tnrs_accepted_names index, now that the accepted names have been split into a separate table
Aaron Marcuse-Kubitza
05:00 AM Revision 5915: inputs/.TNRS/: Split tnrs table into tnrs_accepted and tnrs_other, with tnrs_accepted imported first. This will eventually allow TNRS data to be imported without needing to use _taxonlabel_set_matched_label_id(), which is extremely slow due to an unknown bug in PostgreSQL's handling of UPDATEs on large tables (possibly limited to UPDATEs inside functions).
Aaron Marcuse-Kubitza
05:00 AM Revision 5914: input.Makefile: SVN: add: Fixed bug where logs/*.log.sql was incorrectly being included in the list of files to put under version control
Aaron Marcuse-Kubitza
04:53 AM Revision 5913: lib/common.Makefile: Filesystem: $(wildcard/): Fixed bug where {} exprs weren't handled correctly when some branches contained *, by using `shopt -s nullglob` and bash directly instead of trying to emulate it with grep
Aaron Marcuse-Kubitza
04:44 AM Revision 5912: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also provide the schema to create.sql in :schema
Aaron Marcuse-Kubitza
04:16 AM Revision 5911: bin/map: Added support for including literal metadata values in the map spreadsheet, by prefixing them with ':'
Aaron Marcuse-Kubitza
03:56 AM Revision 5910: mappings/VegCore.csv: Added distanceToCountry_km, distanceToStateProvince_km
Aaron Marcuse-Kubitza
03:45 AM Revision 5909: mappings/VegCore.csv: Added distanceToCountry_m, distanceToStateProvince_m
Aaron Marcuse-Kubitza
03:36 AM Revision 5908: mappings/VegCore-VegBIEN.csv: Mapped acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude. Mapped decimalLatitude/Longitude to matched place's coordinates when acceptedDecimalLatitude/Longitude not provided (as is the case for the geoscrub table).
Aaron Marcuse-Kubitza
03:22 AM Revision 5907: input.Makefile: SVN: $(svnFilesGlob): Added *schema*.sql
Aaron Marcuse-Kubitza
02:57 AM Revision 5906: mappings/VegCore.csv: Replaced georeferenceInvalid with latLongDomainValid, latLongDomainInvalid because this field is actually just whether the coordinates are in the range of valid decimal values, not whether they are within the named place reference. Re-sourced georeferenceValid to bien_web.observation#isGeovalid rather than isValidLatLong. VegCore-VegBIEN.csv: Replaced now-removed georeferenceInvalid with latLongInCountry for locationID redirection checks.
Aaron Marcuse-Kubitza
01:54 AM Revision 5905: mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided
Aaron Marcuse-Kubitza
01:06 AM Revision 5904: mappings/VegCore-VegBIEN.csv: accepted* taxonlabel: Removed ancestor hierarchy because this is populated, in much greater detail, when the accepted name is imported as an input name and the TNRS-parsed components are available
Aaron Marcuse-Kubitza
12:55 AM Revision 5903: sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.
Aaron Marcuse-Kubitza
12:36 AM Revision 5902: sql.py: distinct_table(): Changed comment about distinct_on column index to include just the input table, so that the function does not guarantee a unique index on the output table's distinct_on columns
Aaron Marcuse-Kubitza
12:15 AM Revision 5901: mappings/VegCore.csv: Added acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude
Aaron Marcuse-Kubitza

10/31/2012

11:57 PM Revision 5900: mappings/VegCore.csv: Renamed latLongValid, latLongInvalid to georeferenceValid, georeferenceInvalid to correspond to DwC term georeferenceVerificationStatus
Aaron Marcuse-Kubitza
11:45 PM Revision 5899: mappings/VegCore.csv: Added latLongValid, latLongInvalid, latLongInCountry, latLongInStateProvince
Aaron Marcuse-Kubitza
11:14 PM Revision 5898: input.Makefile: Staging tables installation: Treat any .sql file whose name contains (not just ends with) "schema" as a schema file and sort it before other .sql files
Aaron Marcuse-Kubitza
10:17 PM Revision 5897: schemas/vegbien.sql: Functions containing UPDATE statements: Use quote_nullable() instead of quote_literal() to properly encode NULL values
Aaron Marcuse-Kubitza
10:10 PM Revision 5896: schemas/vegbien.sql: Functions containing UPDATE statements: Use PL/pgSQL's EXECUTE statement to avoid caching query plans. This is necessary because as the table grows over time, the optimal query plan may change.
Aaron Marcuse-Kubitza
10:05 PM Revision 5895: sql_io.py: put_table(): ensure_cond(): When deleting rows rows that do not satisfy the condition, handle sql.DoesNotExistExceptions caused by columns in the condition that were not replaced with NULL. These occur when out_table is a function, and the columns of the table the condition relates to therefore can't be found using out_table.
Aaron Marcuse-Kubitza
09:59 PM Revision 5894: sql_io.py: put_table(): Calling function: Do not cache the function call, because it may be retried after error handling
Aaron Marcuse-Kubitza
09:58 PM Revision 5893: sql_gen.py: NotCond: Treat a condition that evaluates to NULL as false instead, so that the boolean effect of the condition is completely inverted
Aaron Marcuse-Kubitza
09:42 PM Revision 5892: sql_gen.py: null_as_str: Use new null instead of hardcoding 'NULL'
Aaron Marcuse-Kubitza
09:41 PM Revision 5891: sql_gen.py: Added null
Aaron Marcuse-Kubitza
09:40 PM Revision 5890: sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged
Aaron Marcuse-Kubitza
08:45 PM Revision 5889: sql_io.py: put_table(): sql.DatabaseErrors: Factored exception-handling code out into handle_unknown_exc(), for use by other exception handlers
Aaron Marcuse-Kubitza
08:39 PM Revision 5888: sql_io.py: put_table(): ensure_cond(): is_function: Fixed bug where can't replace out_table_cols with NULL because out_table is a function, not a table
Aaron Marcuse-Kubitza

10/30/2012

04:59 PM Revision 5887: my2pg*: Turn off escape_string_warning because \-escaped strings are standard in MySQL
Aaron Marcuse-Kubitza
04:58 PM Revision 5886: my2pg.data: Turn off standard_conforming_strings like in my2pg
Aaron Marcuse-Kubitza
04:42 PM Revision 5885: my2pg: Also remove any CHARACTER SET modifier on a column definition
Aaron Marcuse-Kubitza
04:26 PM Revision 5884: schemas/vegbien.sql: analytical_aggregate_view: Make size classes cumulative ranges (stems above a certain DBH) rather than bins, per Brad's request
Aaron Marcuse-Kubitza
04:26 PM Revision 5883: schemas/vegbien.sql: analytical_aggregate_view: Make size classes cumulative ranges (stems above a certain DBH) rather than bins, per Brad's request
Aaron Marcuse-Kubitza
04:18 PM Revision 5882: input.Makefile: SVN: add: Add header override files with any extension, not just .csv
Aaron Marcuse-Kubitza
04:15 PM Revision 5881: README.TXT: Datasource setup: Replaced manual `svn add` commands with one `make inputs/<datasrc>/add` before committing to add newly-created files
Aaron Marcuse-Kubitza
04:00 PM Revision 5880: input.Makefile: SVN: add: Also add any *.sql, *when it's in a subdir*. This applies to create.sql, cleanup.sql, etc.
Aaron Marcuse-Kubitza
03:58 PM Revision 5879: lib/common.Makefile: SVN: Added $(add*)
Aaron Marcuse-Kubitza
03:55 PM Revision 5878: input.Makefile: SVN: add: Also add any newly-created files which should be under version control
Aaron Marcuse-Kubitza
03:35 PM Revision 5877: input.Makefile: Fixed bug where _MySQL/%.sql files weren't being built from associated .make files by adding special `%.sql: %.sql.make` rule to override `%.sql: _MySQL/%.sql`
Aaron Marcuse-Kubitza
03:33 PM Revision 5876: input.Makefile: `%: %.make`: Factored $(if $(wildcard $@)... test out into $(make_script) so all `%: %.make`-like rules could use it directly
Aaron Marcuse-Kubitza
03:09 PM Revision 5875: lib/forwarding.Makefile: $(subdirs): Use all folders other than ../ ./ .svn/ instead of listing folders that start with . explicitly
Aaron Marcuse-Kubitza
02:31 PM Revision 5874: schemas/vegbien.sql: analytical_stem_view: Use accepted_taxonlabel.taxonomicname instead of accepted_taxonverbatim.taxonomicname in order to have the family prepended
Aaron Marcuse-Kubitza
12:41 PM Revision 5873: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
12:38 PM Revision 5872: schemas/vegbien.sql: placerank: Reordered in path order, using <http://rs.tdwg.org/dwc/terms/#dcindex> and <http://vegbank.org/vegbank/views/dba_fielddescription_detail.jsp?view=detail&wparam=1415&entity=dba_fielddescription&params=1415> as a guide. Documented the source of the values.
Aaron Marcuse-Kubitza
12:26 PM Revision 5871: schemas/vegbien.sql: placename: Removed non-name-related fields, because placename is designed only to store a hierarchy of placenames, not additional place information
Aaron Marcuse-Kubitza
12:23 PM Revision 5870: schemas/vegbien.sql: Moved placedescription from placename to place (and renamed it to description), because it applies to the place itself, not the name for the place
Aaron Marcuse-Kubitza
12:16 PM Revision 5869: schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species
Aaron Marcuse-Kubitza
12:13 PM Revision 5868: schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species
Aaron Marcuse-Kubitza
12:03 PM Revision 5867: schemas/vegbien.sql: analytical_aggregate_view: Include all analytical_stem species, not just those whose stems have non-NULL DBH
Aaron Marcuse-Kubitza
11:57 AM Revision 5866: schemas/vegbien.sql: Renamed aggregated_analytical_db to analytical_aggregate to match the name of analytical_stem
Aaron Marcuse-Kubitza
11:55 AM Revision 5865: schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB
Aaron Marcuse-Kubitza
11:52 AM Revision 5864: schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB
Aaron Marcuse-Kubitza
11:38 AM Revision 5863: schemas/vegbien.sql: Removed no longer used locationcoords
Aaron Marcuse-Kubitza
11:35 AM Revision 5862: schemas/vegbien.sql: analytical_db_view: Use new coordinates instead of locationcoords
Aaron Marcuse-Kubitza
11:23 AM Revision 5861: mappings/VegCore-VegBIEN.csv: Remapped latitude/longitude to new coordinates table
Aaron Marcuse-Kubitza
11:15 AM Revision 5860: schemas/vegbien.sql: place: Added coordinates_id
Aaron Marcuse-Kubitza
11:01 AM Revision 5859: schemas/vegbien.sql: Added coordinates table
Aaron Marcuse-Kubitza
10:40 AM Revision 5858: schemas/vegbien.sql: place: Removed municipality, site because they are not used in the geoscrubbing
Aaron Marcuse-Kubitza
10:19 AM Revision 5857: schemas/vegbien.sql: place: Place custom hierarchy of placenames in placename table instead of in otherranks field
Aaron Marcuse-Kubitza
10:04 AM Revision 5856: schemas/vegbien.sql: place.matched_place_id: Changed comment to say that places are linked in a three-level (instead of two-level) hierarchy of datasource place -> verbatim place -> accepted place, and this field contains the closest match
Aaron Marcuse-Kubitza
09:54 AM Revision 5855: schemas/vegbien.sql: Renamed placepath to place since this contains primary information about the place, including the reference to the canonical place
Aaron Marcuse-Kubitza
09:42 AM Revision 5854: schemas/vegbien.sql: Renamed place to placename since it refers just to a name for a place, without coordinates
Aaron Marcuse-Kubitza
07:18 AM Revision 5853: schemas/vegbien.sql: analytical_db_view: Exclude original taxondeterminations, so that there is only one taxondetermination for each taxonoccurrence
Aaron Marcuse-Kubitza
07:03 AM Revision 5852: schemas/vegbien.sql: make_analytical_db(): Also make new aggregated_analytical_db
Aaron Marcuse-Kubitza
07:02 AM Revision 5851: schemas/vegbien.sql: sync_analytical_db_to_view(): DROP TABLE: Use IF EXISTS in case analytical_db table has already been deleted, or not yet created
Aaron Marcuse-Kubitza
07:01 AM Revision 5850: schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view())
Aaron Marcuse-Kubitza
07:01 AM Revision 5849: schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view())
Aaron Marcuse-Kubitza
06:56 AM Revision 5848: lib/PostgreSQL-MySQL.csv: custom types: Also match column names enclosed in ``
Aaron Marcuse-Kubitza
06:49 AM Revision 5847: schemas/vegbien.sql: Store materialized analytical_db table in schema so aggregating views can reference it. Added sync_analytical_db_to_view() to maintain analytical_db table.
Aaron Marcuse-Kubitza
06:30 AM Revision 5846: schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated
Aaron Marcuse-Kubitza
06:23 AM Revision 5845: schemas/vegbien.sql: analytical_db_view: Reordered columns in path order
Aaron Marcuse-Kubitza
06:04 AM Revision 5844: schemas/: Moved unit conversion functions from functions.sql to vegbien.sql so the unit conversion functions used by analytical_db_view wouldn't need to be stored both in functions.sql and in vegbien.sql. (All unit conversion functions used by analytical_db_view must be stored in the public schema so that analytical_db_view doesn't get cascadingly deleted when the functions schema is reinstalled.)
Aaron Marcuse-Kubitza
05:52 AM Revision 5843: schemas/vegbien.sql: analytical_db_view: Use public._m2_to_ha() instead of functions._m2_to_ha()
Aaron Marcuse-Kubitza
05:51 AM Revision 5842: schemas/vegbien.sql: Copied _m2_to_ha() to public schema for use by analytical_db_view
Aaron Marcuse-Kubitza
05:40 AM Revision 5841: schemas/vegbien.sql: analytical_db_view: Added diameterBreastHeight_cm
Aaron Marcuse-Kubitza
05:38 AM Revision 5840: schemas/functions.sql, vegbien.sql: Added _m_to_cm()
Aaron Marcuse-Kubitza
05:07 AM Revision 5839: schemas/vegbien.sql: Copied _cm_to_m() to public schema for use by new aggregated_analytical_db_view
Aaron Marcuse-Kubitza
04:19 AM Revision 5838: schemas/vegbien.sql: analytical_db_view: datasource table: Fixed bug where need to filter by creator_id = party_id in order to use just root parties (datasources)
Aaron Marcuse-Kubitza
03:40 AM Revision 5837: tnrs_db: Fetching names to scrub: Omit sql.select() fields param because it will be filled in with its default value
Aaron Marcuse-Kubitza
03:29 AM Revision 5836: import_all: Pass command-line args (such as make vars) to all commands, not just with_all, so that a custom public schema is properly used by all commands
Aaron Marcuse-Kubitza
02:57 AM Revision 5835: inputs/.NCBI/nodes/create.sql: Make genus completely globally unique by removing duplicates. Note that only duplicates with ranks at or below the genus level need be removed, which for this dataset is just genus and subgenus.
Aaron Marcuse-Kubitza
02:00 AM Revision 5834: schemas/vegbien.sql: taxonlabel: taxonlabel_required_key constraint: Also allow taxonlabels with just a sourceaccessioncode, to support looking up parent taxonlabels using just their sourceaccessioncode (e.g. in NCBI)
Aaron Marcuse-Kubitza
01:23 AM Revision 5833: mappings/VegCore-VegBIEN.csv: matched taxonlabel: Don't include taxonName in the concatenated taxonomicname. This also prevents the creation of the matched taxonlabel entirely when only the taxonName is provided.
Aaron Marcuse-Kubitza
01:20 AM Revision 5832: mappings/VegCore-VegBIEN.csv: Don't create matched taxonlabel if taxonName was provided. This fixes a bug where an NCBI node was incorrectly pointing to a TNRS name, when the reference should only be the other way around. This may also fix the TNRS slowdown, if it was caused by circular matched_label_id references.
Aaron Marcuse-Kubitza
12:47 AM Revision 5831: schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_insert(): Fixed bug where also need to set canon_label_id based on matched_label_id here, not just in taxonlabel_2_set_canon_label_id_on_update(), because the matched_label_id could be specified when the taxonlabel is first created
Aaron Marcuse-Kubitza
12:34 AM Revision 5830: schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_*(): Fixed bug where need to use := instead of = to perform assignment of canon_label_id
Aaron Marcuse-Kubitza
12:17 AM Revision 5829: schemas/tree_cross-links.sql: Updated for schema changes
Aaron Marcuse-Kubitza
12:16 AM Revision 5828: schemas/vegbien.sql: taxonlabel_update_ancestors(): Include ancestors for both parent_id and matched_label_id rather than just one or the other. This avoids needing to delete existing ancestors for the parent_id when a matched_label_id is added and overrides it. This should reduce the TNRS import time if the slowdown was due to the need to delete parent_id ancestors when later adding a matched_label_id (which only occurs in a separate step in the TNRS datasource).
Aaron Marcuse-Kubitza
12:07 AM Revision 5827: sql_io.py: put_table(): ensure_cond(): Fixed bug where test if any rows failed cond did not check if cur != None (which is the case when cond == sql_gen.true_expr) before checking cur.rowcount
Aaron Marcuse-Kubitza

10/29/2012

10:26 PM Revision 5826: sql_gen.py: simplify_expr(): Don't require () around NULL IS NULL and NULL IS NOT NULL because extra parentheses are not provided in index conditions, only in check constraint conditions
Aaron Marcuse-Kubitza
10:06 PM Revision 5825: inputs/import.stats.xls: Updated import times. The TNRS import has slowed down significantly, possibly due to a bug in the autopopulation of the taxonlabel_relationship table when the input data contains cycles.
Aaron Marcuse-Kubitza
09:37 PM Revision 5824: sql_io.py: put_table(): Assertion that into and full_in_table have the same row count: Allow into to have more rows than full_in_table, in case an input row matched multiple output rows. This should not happen for a properly-configured database, but seems to happen periodically nevertheless (currently, to the MO datasource) and should not abort the import when it does.
Aaron Marcuse-Kubitza

10/26/2012

08:18 PM Revision 5823: sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist
Aaron Marcuse-Kubitza
08:10 PM Revision 5822: schemas/vegbien.sql: analytical_db_view: Renamed datasource's taxonverbatim to datasource_taxonverbatim to distinguish it from the other taxonverbatims that are joined on (parsed_taxonverbatim, accepted_taxonverbatim)
Aaron Marcuse-Kubitza
07:18 PM Revision 5821: inputs/.NCBI/nodes/create.sql: Make genus (mostly) globally unique by removing kingdom Animalia, which has significant genus overlap with plants. This reduces the number of duplicated genera from 578 to 65 (determined with `SELECT name_txt, count(*), array_agg(rank) FROM "NCBI".nodes GROUP BY name_txt HAVING count(*) > 1 AND 'genus' = ALL (array_agg(rank))`).
Aaron Marcuse-Kubitza
07:08 PM Revision 5820: inputs/.NCBI/nodes/create.sql: Added foreign key on parent tax_id with covering index
Aaron Marcuse-Kubitza
07:06 PM Revision 5819: input.Makefile: Staging tables installation: Added %/uninstall, %/reinstall to allow reinstalling individual tables
Aaron Marcuse-Kubitza
06:00 PM Revision 5818: sql_io.py: put_table(): ensure_cond(): When adding the failed condition to the errors table, also include the original, untranslated condition from the DB schema in addition to the translation of the condition into the input schema
Aaron Marcuse-Kubitza
05:45 PM Revision 5817: sql_io.py: track_data_error(): Fixed bug where errors whose column had no srcs (indicated by () ) were incorrectly being ignored. This affected NOT NULL exceptions where the column was not provided by the dataset.
Aaron Marcuse-Kubitza
05:38 PM Revision 5816: sql_gen.py: If no cols had srcs, return [] instead of the [()] that itertools.product() would have returned
Aaron Marcuse-Kubitza
05:38 PM Revision 5815: sql_io.py: track_data_error(): Support errors with no columns by inserting a single entry with column set to NULL
Aaron Marcuse-Kubitza
05:35 PM Revision 5814: strings.py: Added join()
Aaron Marcuse-Kubitza
05:00 PM Revision 5813: sql_io.py: mk_errors_table(): Made "column" column nullable, because some errors (such as check constraint violations) don't have any corresponding columns if its columns weren't provided in the input data
Aaron Marcuse-Kubitza
04:35 PM Revision 5812: inputs/test_taxonomic_names/test_scrub: `make inputs/.TNRS/reinstall`: Use new $schema_only option so that an empty TNRS schema is installed rather than one containing inputs/.TNRS/data.sql
Aaron Marcuse-Kubitza
04:34 PM Revision 5811: inputs/.TNRS/: Added data.sql containing the test_taxonomic_names TNRS results, so that a new installation of VegBIEN will contain the necessary data to make the tests pass, including the TNRS import test
Aaron Marcuse-Kubitza
04:32 PM Revision 5810: input.Makefile: Staging tables installation: If $schema_only option is set, only install .sql files ending in schema.sql
Aaron Marcuse-Kubitza
04:24 PM Revision 5809: inputs/Makefile: $(rsyncLogs): Use $(rsync) instead of $(rsync*) now that it supports excluding just temp files and .svn rather than all .*
Aaron Marcuse-Kubitza
04:21 PM Revision 5808: lib/common.Makefile: rsync: $(rsync): Exclude .svn, *#, and .DS_Store rather than all .* because dirs beginning with . created by the user (such as .NCBI, .TNRS) should be included in the sync
Aaron Marcuse-Kubitza
04:18 PM Revision 5807: Added inputs/REMIB/Specimen.src/.map.csv.last_cleanup
Aaron Marcuse-Kubitza
04:10 PM Revision 5806: Added inputs/bien_web/observation/+header.csv
Aaron Marcuse-Kubitza
04:09 PM Revision 5805: input.Makefile: Staging tables installation: $(dbExports): When putting schemas first, don't require a . before "schema" to allow the entire filename to be schema.sql
Aaron Marcuse-Kubitza
03:44 PM Revision 5804: inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes
Aaron Marcuse-Kubitza
03:42 PM Revision 5803: inputs/.TNRS/tnrs/map.csv: Added _nullIf filter to remove "Unknown" values for Accepted_name_family
Aaron Marcuse-Kubitza
03:35 PM Revision 5802: README.TXT: Generate the local TNRS cache from the test_taxonomic_names rather than syncing it with the vegbiendev TNRS cache, so that the automated test's inserted row count stays the same regardless of the contents of the full-DB TNRS cache
Aaron Marcuse-Kubitza
03:34 PM Revision 5801: README.TXT: Backups: Added TNRS cache section
Aaron Marcuse-Kubitza
03:12 PM Revision 5800: inputs/.TNRS/tnrs/test.xml.ref: Accepted inserted row count using TNRS cache created from test_taxonomic_names. Using a standard set of names for the test ensures that the inserted row count will not change when the full-DB TNRS cache changes.
Aaron Marcuse-Kubitza
02:48 PM Revision 5799: inputs/.TNRS/schema.sql: tnrs_accepted_names: Prepend the Accepted_name_family to the taxonomic name that will be submitted back to TNRS for parsing, because TNRS input names now always include the family when it's provided
Aaron Marcuse-Kubitza
02:46 PM Revision 5798: inputs/.TNRS/schema.sql: tnrs_accepted_names: Use simpler array_to_string() instead of || and COALESCE() to put together the taxonomic name that will be submitted back to TNRS for parsing. Note that this requires defining an IMMUTABLE wrapper function for array_to_string(), because pg_catalog.array_to_string() is declared STABLE but indexes require functions to be IMMUTABLE (http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg156323.html).
Aaron Marcuse-Kubitza
02:42 PM Revision 5797: inputs/.TNRS/schema.sql: Don't hardcode the schema name
Aaron Marcuse-Kubitza
02:40 PM Revision 5796: input.Makefile: Staging tables installation: sql/install: Provide the datasource's schema to the script in :schema, so it can refer to its own elements explicitly when it's not possible to rely on the search_path. This is the case for functions that have the same signature as (and are intended to replace) a pg_catalog function, because the pg_catalog function will be used in preference to the datasource function regardless of the search_path.
Aaron Marcuse-Kubitza
02:06 PM Revision 5795: input.Makefile: Staging tables installation: $(cleanup): If a cleanup.sql is provided, *only* run it and don't do default cleanup, to allow tables to override rather than just add to default cleanup operations. This prevents the automatic replacement of certain strings (sql_io.null_strs) with NULL on TNRS, and keeps the TNRS cache mostly as it was output by the TNRS service. Note that empty strings are still replaced with NULL by COPY FROM in sql_io.append_csv(). This is necessary for TNRS import to work properly, because although '' generally means NULL, it is not treated that way by PostgreSQL.
Aaron Marcuse-Kubitza
01:53 PM Revision 5794: input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup
Aaron Marcuse-Kubitza
01:52 PM Revision 5793: input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup
Aaron Marcuse-Kubitza

10/25/2012

05:16 PM Revision 5792: sql.py: parse_exception(): function MissingCastException: If first param's type is anyelement (for polymorphic function, which had mismatched arg types), use type text, as all types can cast to it
Aaron Marcuse-Kubitza
05:12 PM Revision 5791: sql_io.py: cast(): Set the created function's value param type to anyelement to support any input type, not just text
Aaron Marcuse-Kubitza
04:53 PM Revision 5790: mappings/VegCore-VegBIEN.csv: Only prepend the family to the concatenated scientificName for TNRS if it ends in -aceae (using _taxon_family_require_std()), to avoid sending unsupported, nonstandard families to TNRS which it will place in Unmatched_terms
Aaron Marcuse-Kubitza
04:48 PM Revision 5789: schemas/vegbien.sql: Added _taxon_family_require_std()
Aaron Marcuse-Kubitza
04:09 PM Revision 5788: mappings/VegCore-VegBIEN.csv: Prepend the family to the concatenated scientificName input to TNRS, so that TNRS can use it to disambiguate the genus
Aaron Marcuse-Kubitza
03:50 PM Revision 5787: tnrs_db: Making TNRS request: Fixed bug where needed to remove else block now that there is no except block
Aaron Marcuse-Kubitza
03:45 PM Revision 5786: tnrs.py: retrieval_request_template: Turn on taxonomic_constraint (to match family before genus) and source_sorting (to always return any result from the first source before returning results from any other sources, regardless of match %)
Aaron Marcuse-Kubitza
12:50 PM Revision 5785: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
12:46 PM Revision 5784: mappings/VegCore.csv: speciesBinomial: Changed definition to genus+specificEpithet, not genus+species, to match the scientific meaning of specificEpithet vs. species
Aaron Marcuse-Kubitza
12:45 PM Revision 5783: schemas/vegbien.sql: taxonverbatim: Renamed species to specific_epithet to avoid confusion with the scientific meaning of species (genus+specificEpithet), since this field contains just the specific epithet
Aaron Marcuse-Kubitza
12:30 PM Revision 5782: input.Makefile: Verification of import: verify: Use tables from the verify/*.ref files themselves rather than from the datasource's subdirs, in order to match the tables in mappings/verify.*.sql
Aaron Marcuse-Kubitza
11:52 AM Revision 5781: schemas/vegbien.sql: analytical_db_view: Added stemobservation.tag, stemobservation.height_m for use in plot change over time analysis <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis>
Aaron Marcuse-Kubitza
11:45 AM Revision 5780: schemas/vegbien.sql: analytical_db_view: Fixed typo in scientificNameWithMorphospecies
Aaron Marcuse-Kubitza
11:41 AM Revision 5779: schemas/vegbien.sql: analytical_db_view: Renamed columns to VegCore names (https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/mappings/VegCore.csv)
Aaron Marcuse-Kubitza
11:36 AM Revision 5778: mappings/VegCore.csv: Added cultivatedBasis
Aaron Marcuse-Kubitza
11:34 AM Revision 5777: mappings/VegCore.csv: Added scientificNameWithMorphospecies
Aaron Marcuse-Kubitza
10:55 AM Revision 5776: mappings/VegCore.csv: Added speciesBinomial
Aaron Marcuse-Kubitza
10:49 AM Revision 5775: schemas/vegbien.sql: analytical_db_view: Generate species by concatenating genus and specific epithet, since according to Brad this field is actually the binomial, not the specificEpithet
Aaron Marcuse-Kubitza
10:41 AM Revision 5774: schemas/vegbien.sql: Removed no longer used plot_change_over_time view. Use one of the queries at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Plot_change_over_time_analysis> instead.
Aaron Marcuse-Kubitza
10:36 AM Revision 5773: mappings/VegCore-VegBIEN.csv: location: Populate sourceaccessioncode with locationID + subplot when subplot is unique only within the parent plot, so that location always has a sourceaccessioncode to use as the plotCode in analytical_db_view
Aaron Marcuse-Kubitza
10:07 AM Revision 5772: lib/PostgreSQL-MySQL.csv: Remove views because they can contain arbitrary expressions, whose syntax may not be compatible with MySQL
Aaron Marcuse-Kubitza
10:04 AM Revision 5771: schemas/vegbien.sql: analytical_db_view: Use location.sourceaccessioncode as plotCode instead of authorlocationcode because authorlocationcode isn't globally unique (for subplots, it's only unique within the parent plot)
Aaron Marcuse-Kubitza
09:48 AM Revision 5770: schemas/vegbien.sql: plantobservation: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)
Aaron Marcuse-Kubitza
09:47 AM Revision 5769: schemas/vegbien.sql: aggregateoccurrence: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)
Aaron Marcuse-Kubitza
09:42 AM Revision 5768: schemas/vegbien.sql: aggregateoccurrence: Made taxonoccurrence_id optional when sourceaccessioncode is specified, so that aggregateoccurrence doesn't get pruned away in datasource tables that link just a stemobservation to a plantobservation (and therefore don't provide a taxonoccurrence to satisfy the previous taxonoccurrence_id NOT NULL constraint)
Aaron Marcuse-Kubitza
09:31 AM Revision 5767: schemas/vegbien.sql: taxonoccurrence: Added taxonoccurrence_required_key check constraint to ensure that all taxonoccurrences are properly identified, and empty taxonoccurrences are properly pruned. This fixes a bug where taxon-only and stem-only data did not properly prune the taxonoccurrence that would otherwise get created because it's included in the mappings.
Aaron Marcuse-Kubitza
07:51 AM Revision 5766: sql_io.py: put_table(): insert_into_pkeys(): Use new sql.add_pkey_or_index() instead of sql.add_pkey() in order to just print a warning if for some reason there were duplicate entries for an input row in the iteration's pkeys table. This should provide a workaround for bugs (often in the schema itself, related to its unique indexes) that cause an input row to match multiple output rows when joining on the output table using the unique constraint's columns.
Aaron Marcuse-Kubitza
07:44 AM Revision 5765: sql.py: Added add_pkey_or_index()
Aaron Marcuse-Kubitza
07:32 AM Revision 5764: sql.py: parse_exception(): Parse "could not create unique index ... Key is duplicated" errors as DuplicateKeyException
Aaron Marcuse-Kubitza
07:27 AM Revision 5763: sql.py: parse_exception(): DuplicateKeyException: Factored out creation of DuplicateKeyException into helper function
Aaron Marcuse-Kubitza
07:20 AM Revision 5762: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza

10/24/2012

06:31 PM Revision 5761: tnrs_db: Removed tnrs.InvalidResponse exception handler that retries the query because the current query does not track which names have been submitted to but not processed by TNRS, so the error would continue to happen repeatedly
Aaron Marcuse-Kubitza
06:13 PM Revision 5760: schemas/vegbien.sql: location: Added index on parent_id to speed up plot change over time joins
Aaron Marcuse-Kubitza
05:45 PM Revision 5759: schemas/vegbien.sql: location: Added index on creator_id to speed up analytical_db_view joins
Aaron Marcuse-Kubitza
05:15 PM Revision 5758: schemas/vegbien.sql: stemobservation: Added index on plantobservation_id to speed up analytical_db_view joins
Aaron Marcuse-Kubitza

10/23/2012

01:08 PM Revision 5757: schemas/vegbien.sql: Added initial plot_change_over_time view
Aaron Marcuse-Kubitza
12:53 PM Revision 5756: Added inputs/bien_web/
Aaron Marcuse-Kubitza
12:43 PM Revision 5755: schemas/vegbien.sql: analytical_db_view: Reordered taxonoccurrence.growthform to put if after the bien_web.observation fields
Aaron Marcuse-Kubitza
12:32 PM Revision 5754: schemas/vegbien.sql: analytical_db_view: Include taxonoccurrence.growthform
Aaron Marcuse-Kubitza
12:27 PM Revision 5753: schemas/vegbien.sql: analytical_db_view: Generate taxonMorphospecies by concatenating the scientificName to the morphospecies
Aaron Marcuse-Kubitza
12:23 PM Revision 5752: schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take taxonomic name components from the accepted taxonlabel's taxonverbatim instead of the datasource's taxonverbatim, which does not contain the accepted name
Aaron Marcuse-Kubitza
12:19 PM Revision 5751: schemas/vegbien.sql: analytical_db_view: identifiedBy: Added NULLIF() to keep empty strings out of the analytical DB
Aaron Marcuse-Kubitza
12:13 PM Revision 5750: schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take taxonomic name components from the accepted taxonlabel's taxonverbatim instead of the datasource's taxonverbatim, which does not contain the accepted name
Aaron Marcuse-Kubitza
12:10 PM Revision 5749: schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to take morphospecies from the parsed taxonlabel's taxonverbatim, where it has been parsed out, instead of the datasource's taxonverbatim, which has it as part of the verbatim input name
Aaron Marcuse-Kubitza
11:58 AM Revision 5748: analytical_db_view: Added stemobservation.xposition_m, yposition_m
Aaron Marcuse-Kubitza
11:46 AM Revision 5747: inputs/.TNRS/tnrs/map.csv: Added new Time_submitted field
Aaron Marcuse-Kubitza
11:45 AM Revision 5746: inputs/REMIB/Specimen/header.csv: Regenerated for new staging tables format
Aaron Marcuse-Kubitza
11:41 AM Revision 5745: inputs/.TNRS/tnrs/test.xml.ref: Accepted correct inserted row count, which most likely became detached from the primary row count when the TNRS cache was cleared and repopulated with test data
Aaron Marcuse-Kubitza
11:28 AM Revision 5744: schemas/vegbien.sql: analytical_db_view: Reordered joins in path order, putting datasource before location. This will enable more naturally reusing the SELECT query for other analyses.
Aaron Marcuse-Kubitza
11:15 AM Revision 5743: mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Do not include rank in the mapping because taxonomicname is globally unique, and thus it isn't used in looking up the NCBI taxonlabel
Aaron Marcuse-Kubitza
11:05 AM Revision 5742: inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes
Aaron Marcuse-Kubitza
10:49 AM Revision 5741: mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Also attach TNRS genus to NCBI backbone. This causes attachment to be made with as many of family and genus as are provided and have an entry in NCBI.
Aaron Marcuse-Kubitza
10:45 AM Revision 5740: mappings/VegCore-VegBIEN.csv: family -> NCBI backbone: Removed extra path after _if statement's cond/_exists
Aaron Marcuse-Kubitza
10:39 AM Revision 5739: mappings/VegCore-VegBIEN.csv: Instead of connecting the acceptedFamily to the NCBI backbone, connect the family for the TNRS matched taxonlabel. This connects more families and also connects the same set of fields as will be connected for the genus.
Aaron Marcuse-Kubitza
10:01 AM Revision 5738: mappings/VegCore-VegBIEN.csv: TNRS<->NCBI attachment: Fixed bug where needed to attach accepted family to NCBI using taxonomicname, which is globally unique, rather than taxonepithet, which is only unique within the parent taxon
Aaron Marcuse-Kubitza
09:34 AM Revision 5737: inputs/.TNRS/tnrs/: Added Time_submitted column at beginning and populate it in tnrs_db with the time the batch TNRS request was submitted
Aaron Marcuse-Kubitza
09:08 AM Revision 5736: csvs.py: RowNumFilter: Use new ColInsertFilter
Aaron Marcuse-Kubitza
09:08 AM Revision 5735: csvs.py: Added ColInsertFilter
Aaron Marcuse-Kubitza
08:43 AM Revision 5734: schemas/vegbien.sql: Removed no longer used _is_higher_taxon(). Use _has_taxonomic_name() or _taxonomic_name_is_epithet() instead.
Aaron Marcuse-Kubitza
08:42 AM Revision 5733: mappings/VegCore-VegBIEN.csv: taxonName->taxonepithet: Use new _taxonomic_name_is_epithet() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field
Aaron Marcuse-Kubitza
08:36 AM Revision 5732: schemas/vegbien.sql: Added _taxonomic_name_is_epithet()
Aaron Marcuse-Kubitza
08:33 AM Revision 5731: mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field
Aaron Marcuse-Kubitza
08:30 AM Revision 5730: mappings/VegCore-VegBIEN.csv: taxonName->taxonomicname: Use new _has_taxonomic_name() instead of _is_higher_taxon(), because it's more specific to the filtering task for this field
Aaron Marcuse-Kubitza
08:25 AM Revision 5729: schemas/vegbien.sql: Added _has_taxonomic_name() for lower taxon ranks that typically have a globally unique taxonomic name
Aaron Marcuse-Kubitza
08:10 AM Revision 5728: schemas/functions.sql: Removed unit conversion functions that take a text input, since casts to the parameter type (double precision) are now automatically performed by sql_io.put_table(), using sql.parse_exception()'s function MissingCastException parsing
Aaron Marcuse-Kubitza
08:01 AM Revision 5727: mappings/VegCore-VegBIEN.csv: _is_higher_taxon() calls: Default to true if the rank can't be parsed to a taxonrank enum value
Aaron Marcuse-Kubitza
07:56 AM Revision 5726: sql_io.py: put_table(): is_function: Moved definition of wrapper function inside try block of main loop because the creation of the empty pkeys table (whose row type is needed for the wrapper function) can itself produce MissingCastExceptions, which must be thrown inside the loop in order to be handled properly
Aaron Marcuse-Kubitza
07:05 AM Revision 5725: db_xml.py: put(): Indicate no parent_ids_loc using no_parent_ids_loc sentinel instead of None to support parent_ids_locs that are equal to None (e.g. if the parent node had an error). Always forward parent_ids_loc to children with fkeys to parent, even on error, because the parent table may not be required for the child tables to be valid, such as for taxonomic-data-only datasets that nevertheless have nodes for the non-taxonomic tables in their mappings.
Aaron Marcuse-Kubitza
06:38 AM Revision 5724: sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text
Aaron Marcuse-Kubitza
05:59 AM Revision 5723: sql.py: parse_exception(): InvalidValueException: Fixed bug in regexp where can't use .*? before (?:...)? surrounding matched value, because it prevents the value from being matched now that it is optional
Aaron Marcuse-Kubitza
05:52 AM Revision 5722: inputs/.NCBI/nodes/header.csv: Updated for new staging table format, which includes a row_num column in each joined table
Aaron Marcuse-Kubitza
05:51 AM Revision 5721: inputs/.NCBI/nodes/create.sql: Updated for new src table names
Aaron Marcuse-Kubitza
05:36 AM Revision 5720: xml_func.py: process(): Pass on_error through to sql_io.put(). This fixes a bug in row-based import where DB errors in the xml_func.process() phase would abort the entire import instead of being tracked and having the return value set to None.
Aaron Marcuse-Kubitza
05:33 AM Revision 5719: sql_io.py: put(): Pass on_error through to put_table()
Aaron Marcuse-Kubitza
05:19 AM Revision 5718: sql_io.py: put_table(): log_exc(): Return False if removing all rows and have callers break the main loop so that no further exception-handling code is processed before the main loop is exited
Aaron Marcuse-Kubitza
05:17 AM Revision 5717: sql.py: parse_exception(): InvalidValueException: Also match exceptions which don't provide a specific value but just indicate that a value was invalid, such as PL/Python's "day is out of range for month"
Aaron Marcuse-Kubitza
04:39 AM Revision 5716: db_xml.py: put(): Inserting children with fkeys to parent: Don't do this if this node had an error and sql_io.put_table() returned None as the generated pkey. This fixes a bug where a node with an error will still try to create children with fkeys to parent, but pass None as the fkey to parent, which the recursive put() call will then incorrectly treat as there being no field with an fkey to parent at all rather than a field whose value is NULL. This causes function overload resolution to be unable to find the intended function, because it is missing a parameter.
Aaron Marcuse-Kubitza
04:34 AM Revision 5715: sql.py: parse_exception(): function MissingCastException: Return the actual type of the function's 1st param, using new function_param0_type(), rather than just text
Aaron Marcuse-Kubitza
04:31 AM Revision 5714: sql.py: parse_exception(): function MissingCastException: Fixed bug where can't return the function name as the name of what was missing the cast, because this must be a column
Aaron Marcuse-Kubitza
04:28 AM Revision 5713: sql.py: Added function_param0_type()
Aaron Marcuse-Kubitza
04:26 AM Revision 5712: sql.py: parse_exception(): function MissingCastException: Only treat DoesNotExistException as a MissingCastException if the query that was run did not already include a cast, to avoid infinite exception-handling recursion
Aaron Marcuse-Kubitza
04:24 AM Revision 5711: sql.py: parse_exception(): function MissingCastException: Fixed bug where determining whether the exception is a MissingCastException rather than a DoesNotExistException needs to check whether the function exists rather than whether it's the same in the exception message as in the query that was run. The exception message will of course copy the function name verbatim from the query, so there is no information in the exception message itself to indicate whether the DoesNotExistException was caused by a missing cast or by a nonexistent function.
Aaron Marcuse-Kubitza
04:19 AM Revision 5710: sql.py: parse_exception(): function MissingCastException: Documented that the regexp match to extract the function name also checks that a function signature with param types was matched, indicating a function call rather than cast to regproc. This check will also help avoid infinite recursion when function MissingCastException parsing calls database structure introspection functions.
Aaron Marcuse-Kubitza
04:15 AM Revision 5709: sql.py: parse_exception(): function MissingCastException: Don't match quotes around the function name because this particular exception (incorrect param type) does not include them. Casts to regproc, which also produce a DoesNotExistException, include the quotes but do not indicate a MissingCastException.
Aaron Marcuse-Kubitza
04:12 AM Revision 5708: sql.py: parse_exception(): function MissingCastException: Fixed bug where the 1st param's type in the exception's function signature is not actually the type the argument needs to have, because this is just the argument's current type
Aaron Marcuse-Kubitza
04:04 AM Revision 5707: sql.py: parse_exception(): typed_name_re: Also match identifiers without quotes, such as functions in "No function matches the given name and argument types" errors. This fixes a bug where DoesNotExistExceptions could not be parsed as MissingCastExceptions when applicable because the DoesNotExistException pattern would not even match.
Aaron Marcuse-Kubitza
03:57 AM Revision 5706: inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa
Aaron Marcuse-Kubitza
01:24 AM Revision 5705: schemas/vegbien.sql: taxonlabel: taxonlabel_2_set_canon_label_id(): Only run if matched_label_id has actually changed, to avoid infinite recursion when updating canon_label_id on labels that resolve to this label when there are cycles in the data
Aaron Marcuse-Kubitza
01:21 AM Revision 5704: inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa
Aaron Marcuse-Kubitza
12:57 AM Revision 5703: inputs/.NCBI/: Renamed higher_taxa to nodes because it currently doesn't just contain the higher taxa
Aaron Marcuse-Kubitza
12:49 AM Revision 5702: inputs/.NCBI/: Renamed names, nodes to *.src so they wouldn't get an automatic row_num column and can be used in higher_taxa's join
Aaron Marcuse-Kubitza
12:38 AM Revision 5701: inputs/NCU-NCSC/Specimen/+header.csv: Fixed bug where needed ! at beginning to indicate a header override file, which prevents the following row from being treated as data
Aaron Marcuse-Kubitza
12:36 AM Revision 5700: units.py: MissingUnitsException: Fixed bug where quantity is a Quantity object, not a string, and thus needs to be converted to a string using strings.ustr()
Aaron Marcuse-Kubitza
12:25 AM Revision 5699: inputs/FIA/Organism/test.xml.ref: Accepted new test output now that FIA table is sorted in the order of the original CSV after staging table reinstallation
Aaron Marcuse-Kubitza
12:24 AM Revision 5698: inputs/VegBank/taxonobservation_/create.sql: Removed dropping of row_num column, which is no longer added on non-CSV tables
Aaron Marcuse-Kubitza
12:22 AM Revision 5697: input.Makefile: Staging tables installation: %/install: Moved "table-scope src table's row_num col" comment outside of define block so it wouldn't be echoed to stdout even when the table is not a src table
Aaron Marcuse-Kubitza
12:17 AM Revision 5696: Added inputs/NCU-NCSC/Specimen/+header.csv header override to remove empty, unnamed column at end
Aaron Marcuse-Kubitza
12:05 AM Revision 5695: inputs/*/*/header.csv: Regenerated for new staging tables format (which now includes a row_num column on every CSV table), as part of reinstalling staging tables
Aaron Marcuse-Kubitza

10/22/2012

11:59 PM Revision 5694: inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where DROP VIEW statements needed IF EXISTS because CASCADEs on previous DROP VIEWs may have already dropped the view in question
Aaron Marcuse-Kubitza
11:57 PM Revision 5693: input.Makefile: Staging tables installation: %/install: Fixed bug where a .src table's row_num column needed to have the table name prefixed (making it globally unique) to allow joining the table with other tables
Aaron Marcuse-Kubitza
11:31 PM Revision 5692: input.Makefile: Staging tables installation: sql/install: Fixed bug where $(logInstall) needed to be called with arguments, so that either > or >> would be used before the install log's filename
Aaron Marcuse-Kubitza
08:22 PM Revision 5691: tnrs.py: submission_request_template: Use just Tropicos as the name source, as Brad says "GCC is for only one family (Asteraceae)" and USDA's "taxonomy is of lower quality and sometimes conflicts with Tropicos"
Aaron Marcuse-Kubitza

10/19/2012

06:20 PM Revision 5690: sql.py: parse_exception(): function MissingCastException: Support functions with named parameters
Aaron Marcuse-Kubitza
06:18 PM Revision 5689: sql.py: parse_exception(): function MissingCastException: Support function names enclosed in quotes on the context line
Aaron Marcuse-Kubitza
06:15 PM Revision 5688: mappings/VegCore-VegBIEN.csv: taxonName: Place it in taxonomicname instead of taxonepithet for lower taxa, because the only datasource that currently provides this field (NCBI) actually provides the full taxonomicname instead of the epithet at the current rank for lower taxa. (taxonomicname is not applicable to higher taxa because their names are not guaranteed to be globally unique.) taxonName may need to be renamed and/or redefined to account for this ambiguity in NCBI's usage.
Aaron Marcuse-Kubitza
06:14 PM Revision 5687: mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.
Aaron Marcuse-Kubitza
06:04 PM Revision 5686: mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.
Aaron Marcuse-Kubitza
05:57 PM Revision 5685: schemas/vegbien.sql: Added _is_higher_taxon()
Aaron Marcuse-Kubitza
05:52 PM Revision 5684: README.TXT: Documentation: To import and scrub just the test taxonomic names: Added `make inputs/.TNRS/cleanup` after `make backups/TNRS.backup/restore` because the PostgreSQL collation may differ between vegbiendev's and the user's DB
Aaron Marcuse-Kubitza
05:50 PM Revision 5683: sql.py: parse_exception(): DoesNotExistException: If item not found was a function and not found only because of a missing cast, raise MissingCastException instead. This should allow automatic casts to be added on function parameters as well as table columns.
Aaron Marcuse-Kubitza
05:28 PM Revision 5682: schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join to taxonverbatim on taxonverbatim_id (the pkey) instead of taxonlabel_id, which used to be the pkey but is now an fkey
Aaron Marcuse-Kubitza
05:22 PM Revision 5681: inputs/test_taxonomic_names/test_scrub: Remove any previous version of public.test_taxonomic_names before renaming public to it
Aaron Marcuse-Kubitza
05:19 PM Revision 5680: inputs/test_taxonomic_names/test_scrub: Fixed bug where public.sql export did not include the "CREATE SCHEMA public" statement, because pg_dump doesn't add it to backups, by using new schemas/rename/% make target to first rename the public schema and then exporting it
Aaron Marcuse-Kubitza
05:12 PM Revision 5679: root Makefile: VegBIEN DB: Schemas: schemas/rotate: Use new schemas/rename/%
Aaron Marcuse-Kubitza
05:12 PM Revision 5678: root Makefile: VegBIEN DB: Schemas: Added schemas/rename/% to rename the public schema
Aaron Marcuse-Kubitza
04:54 PM Revision 5677: mappings/VegCore-VegBIEN.csv: Removed filter preventing taxonomicStatus from being placed in taxonlabel if a morphospecies was provided, because the morphospecies actually never goes in the *matched* taxonlabel, only the *verbatim* taxonlabel
Aaron Marcuse-Kubitza
04:50 PM Revision 5676: mappings/VegCore-VegBIEN.csv: morphospecies: Also place it in the verbatim (input name's) taxonlabel. Note that it does not go in the matched name's taxonlabel, because that contains only fields from the matched name. The verbatim taxonlabel is thus a synonym of the matched taxonlabel where there is no morphospecies, or a child of it if there is a morphospecies.
Aaron Marcuse-Kubitza
04:36 PM Revision 5675: mappings/VegCore-VegBIEN.csv: Do not place taxonomicStatus in taxonlabel if a morphospecies was provided, to prevent it from being incorrectly marked as accepted
Aaron Marcuse-Kubitza
04:25 PM Revision 5674: mappings/VegCore-VegBIEN.csv: morphospecies -> taxonverbatim.morphospecies: Fixed bug where needed suffix with _if statement then clause
Aaron Marcuse-Kubitza
04:23 PM Revision 5673: inputs/test_taxonomic_names/_scrub/public.sql, TNRS.sql: Regenerated with schema changes
Aaron Marcuse-Kubitza
03:45 PM Revision 5672: pg_dump_vegbien: Added opts env var to allow specifying options to a Makefile command, which does not take positional arguments
Aaron Marcuse-Kubitza
03:37 PM Revision 5671: README.TXT: Schema changes: files to update with any renamings: Removed tnrs_db because that is now abstracted from the schema through the tnrs_input_name view. Note that PostgreSQL will automatically update tnrs_input_name with any table or column renames, which is the significant advantage of using a view rather than a hardcoded query.
Aaron Marcuse-Kubitza
03:35 PM Revision 5670: schemas/vegbien.sql: tnrs_input_name: Use DISTINCT instead of DISTINCT ON because there is only one column
Aaron Marcuse-Kubitza
03:34 PM Revision 5669: tnrs_db: Use new tnrs_input_name view to avoid hardcoding changing schema information
Aaron Marcuse-Kubitza
03:25 PM Revision 5668: inputs/test_taxonomic_names/test_scrub, README.TXT: Documented that `make schemas/public/reinstall` must come after TNRS restore to recreate the tnrs_input_name view, which has a dependency on the TNRS schema
Aaron Marcuse-Kubitza
03:23 PM Revision 5667: schemas/vegbien.sql: Added tnrs_input_name view for use by tnrs_db
Aaron Marcuse-Kubitza
12:53 PM Revision 5666: schemas/vegbien.sql: taxonlabel, taxonverbatim: Updated comments for new taxonlabel/taxonverbatim split
Aaron Marcuse-Kubitza
12:42 PM Revision 5665: schemas/vegbien.sql: taxonlabel_update_ancestors(): Use aliased types (http://www.postgresql.org/docs/8.3/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-TYPE) where possible
Aaron Marcuse-Kubitza
12:37 PM Revision 5664: schemas/vegbien.sql: taxonlabel_update_ancestors(): Adding new parent's ancestors: Change unique_violations to warnings so they don't abort the import. unique_violations should never happen unless there are cycles of two or mode nodes, but they seem to be happening nevertheless, so this will provide a workaround to that problem.
Aaron Marcuse-Kubitza
12:18 PM Revision 5663: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza

10/18/2012

04:58 PM Revision 5662: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
04:55 PM Revision 5661: tnrs_db: Updated with schema changes
Aaron Marcuse-Kubitza
04:54 PM Revision 5660: schemas/vegbien.sql: taxonverbatim: Removed subclass relationship to taxonlabel in order to allow multiple taxonverbatims to point to the same taxonlabel. This involves adding a taxonverbatim_id serial column and pointing all fkeys to taxonverbatim to that column.
Aaron Marcuse-Kubitza
04:43 PM Revision 5659: schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join on taxonverbatim before joining on taxonlabel, now that taxondetermination is linked directly to taxonverbatim. Interestingly, PostgreSQL did not flag this error when the schema was changed, but only when the schema was reloaded from the DDL.
Aaron Marcuse-Kubitza
04:30 PM Revision 5658: schemas/vegbien.ERD.mwb: Moved taxonlabel to the right of taxonverbatim to make room for taxonverbatim to expand
Aaron Marcuse-Kubitza
04:21 PM Revision 5657: schemas/vegbien.sql: Link taxondetermination to taxonverbatim (which is a subclass of taxonlabel) instead of directly to taxonlabel. This will enable later having multiple taxonverbatims for one taxonlabel.
Aaron Marcuse-Kubitza
04:04 PM Revision 5656: schemas/vegbien.sql: taxonlabel: Renamed identifyingtaxonomicname to taxonomicname because the taxonomicname provided by the datasource is now in taxonverbatim, so there is no name collision. Note that both of these fields store the same type of information, but taxonlabel's is autogenerated while taxonverbatim's is verbatim (and is only set if provided by the datasource).
Aaron Marcuse-Kubitza
03:57 PM Revision 5655: schemas/vegbien.sql: taxonlabel: Moved non-scoping fields to new taxonverbatim subclass table, which contains the component parts of the taxonlabel
Aaron Marcuse-Kubitza
03:06 PM Revision 5654: schemas/vegbien.sql: taxonlabel: Renamed taxonlabel_2_propagate_canon_label_id() to taxonlabel_2_set_canon_label_id() for clarity
Aaron Marcuse-Kubitza
03:04 PM Revision 5653: schemas/vegbien.sql: taxonlabel_2_propagate_canon_label_id(): If no matched taxonlabel, make self-reference. This fixes a bug in analytical_db_view where rows without a canon_label_id were excluded because they did not have a corresponding canonical taxonlabel.
Aaron Marcuse-Kubitza
02:53 PM Revision 5652: schemas/vegbien.sql: taxonlabel_unique unique index: Removed binomial, author, taxonomicname, and morphospecies because these are now part of the identifyingtaxonomicname, which is also in the unique index
Aaron Marcuse-Kubitza
02:44 PM Revision 5651: schemas/vegbien.sql: taxonlabel: Require either an identifyingtaxonomicname or a taxonepithet. The NCBI inserted row count decreases by one because this prunes off a taxonlabel created for a parent node which was not contained in the first two rows (remember that NCBI taxa are not in dependency order, so parents are often imported after children).
Aaron Marcuse-Kubitza
02:41 PM Revision 5650: mappings/VegCore-VegBIEN.csv: Also generate the identifyingtaxonomicname for the original* taxondetermination's taxonlabel
Aaron Marcuse-Kubitza
02:31 PM Revision 5649: schemas/vegbien.sql: taxonlabel: Renamed taxonomicnamewithauthor to taxonomicname because it is equivalent to Darwin Core's scientificName
Aaron Marcuse-Kubitza
02:25 PM Revision 5648: mappings/VegCore-VegBIEN.csv: Also include morphospecies in the identifyingtaxonomicname, except for the matched TNRS taxonlabel, which should not contain morphospecies information
Aaron Marcuse-Kubitza
02:14 PM Revision 5647: mappings/VegCore-VegBIEN.csv: Mapped acceptedScientificName
Aaron Marcuse-Kubitza
01:51 PM Revision 5646: mappings/VegCore-VegBIEN.csv: Also create the identifyingtaxonomicname on the verbatim taxonlabel supplied by the datasource, in addition to on the TNRS input taxonlabel that the verbatim taxonlabel is matched up with
Aaron Marcuse-Kubitza
01:46 PM Revision 5645: mappings/VegCore-VegBIEN.csv: Expanded brace expressions for putting together the identifyingtaxonomicname
Aaron Marcuse-Kubitza
01:21 PM Revision 5644: mappings/VegCore-VegBIEN.csv: Always generate the concatenated identifyingtaxonomicname, even for higher taxa, to ensure that this field is always populated. Note that this will cause names of higher taxa to be scrubbed by TNRS, but this is usually not a problem because such names either have no match or not a close enough match based on the name only. Naming conventions generally cause names at different ranks to be different, so that collisions with lower ranks should not be a problem.
Aaron Marcuse-Kubitza
01:05 PM Revision 5643: tnrs_db: Fixed bug where needed to remove internal identifyingtaxonomicname duplicates as well as duplicates with existing Name_submitted values, to avoid violating the TNRS.tnrs pkey constraint when the scrubbed names are later inserted. Note that the taxonlabel_0_unique_identifying_name unique index is not sufficient to prevent internal duplicates, because it includes the creator_id (and thus allows multiple instances of the same name defined by different creators).
Aaron Marcuse-Kubitza
01:01 PM Revision 5642: sql.py: mk_select(): Don't add table0 to order_by with no table, because this could cause it not to match a corresponding DISTINCT ON column with no explicit table. PostgreSQL apparently does not treat a column with no explicit table and a column with the applicable table as identical for purposes of ORDER BY/DISTINCT ON checking, even when they refer to the same physical column.
Aaron Marcuse-Kubitza
12:53 PM Revision 5641: sql.py: mk_select(): order_by defaults to first distinct_on column when distinct_on provided
Aaron Marcuse-Kubitza
12:36 PM Revision 5640: tnrs_db: Updated with schema changes
Aaron Marcuse-Kubitza
12:33 PM Revision 5639: schemas/vegbien.sql: taxonlabel: Renamed taxonomicnamewithauthor to taxonomicname because it is equivalent to Darwin Core's scientificName
Aaron Marcuse-Kubitza
12:25 PM Revision 5638: schemas/vegbien.sql: taxonlabel: Renamed taxonomicname to binomial because it excludes the author
Aaron Marcuse-Kubitza
12:15 PM Revision 5637: schemas/vegbien.sql: taxonlabel.taxonomicname, taxonomicnamewithauthor comments: Corrected to show that taxonomicnamewithauthor is actually scientificName, while taxonomicname does not directly correspond to a DwC term (but would be the binomial)
Aaron Marcuse-Kubitza
12:13 PM Revision 5636: schemas/vegbien.sql: taxonlabel.taxonomicnamewithauthor comment: Removed no longer applicable 'Equivalent to "Name sec. x"'. The "sec" is now stored in taxonconcept.concept_reference_id.
Aaron Marcuse-Kubitza
12:10 PM Revision 5635: mappings/Makefile: .VegCore.csv.last_cleanup: Remove duplicate entries using uniq
Aaron Marcuse-Kubitza
12:09 PM Revision 5634: mappings/VegCore.csv: Removed duplicate entries using uniq
Aaron Marcuse-Kubitza
12:06 PM Revision 5633: mappings/VegCore.csv: Removed *scientificNameWithAuthorship, which are now represented by *scientificName
Aaron Marcuse-Kubitza
12:04 PM Revision 5632: mappings: Renamed *scientificNameWithAuthorship to *scientificName because scientificNameWithAuthorship is actually a synonym of DwC's scientificName ("The full scientific name, with authorship and date information if known" <http://rs.tdwg.org/dwc/terms/#scientificName>)
Aaron Marcuse-Kubitza
11:57 AM Revision 5631: mappings: Renamed *scientificName to *binomial because DwC defines the scientificName as "The full scientific name, with authorship and date information if known", but many datasources do not include the author in their scientific name, and the fields scientificName is mapped to in VegBIEN assume it does not include the author
Aaron Marcuse-Kubitza
11:44 AM Revision 5630: mappings/VegCore.csv: Added verbatimBinomial
Aaron Marcuse-Kubitza
11:41 AM Revision 5629: mappings/VegCore.csv: Redefined *binomial to "Taxonomic name without author", rather than genus+species
Aaron Marcuse-Kubitza
11:32 AM Revision 5628: schemas/vegbien.sql: taxonconcept.taxonlabel_id: Changed type from serial to integer because this is a subclass, and therefore each taxonconcept must first have a corresponding entry in taxonlabel
Aaron Marcuse-Kubitza
11:29 AM Revision 5627: schemas/vegbien.sql: Moved taxonlabel.concept_reference_id to new taxonconcept table, which is a subclass of taxonlabel that adds information about who the taxon concept is according to
Aaron Marcuse-Kubitza
11:13 AM Revision 5626: taxonlabel: Renamed accepted_label_id to canon_label_id to allow any taxonlabel to be the canonical taxonlabel for this taxonlabel, whether or not its status is accepted
Aaron Marcuse-Kubitza
11:01 AM Revision 5625: schemas/filter_ERD.csv: Remove the methodtaxonclass.submethod_id fkey to taxonlabel, to make room in the ERD for additional taxon tables
Aaron Marcuse-Kubitza
10:52 AM Revision 5624: schemas/vegbien.sql: establishmentmeans_dwc: Corrected source comment
Aaron Marcuse-Kubitza
10:51 AM Revision 5623: schemas/vegbien.sql: taxonomic_status enum: Added source comment
Aaron Marcuse-Kubitza
10:49 AM Revision 5622: schemas/vegbien.sql: taxonlabel_relationship: Added relationship, with relationship enum
Aaron Marcuse-Kubitza
10:21 AM Revision 5621: mappings/VegCore-VegBIEN.csv: Mapped taxonomicStatus
Aaron Marcuse-Kubitza
10:20 AM Revision 5620: inputs/.TNRS/tnrs/test.xml.ref: Updated inserted row count
Aaron Marcuse-Kubitza
10:15 AM Revision 5619: mappings/VegCore.csv: Removed duplicate entry for taxonomicStatus, which is also a DwC term
Aaron Marcuse-Kubitza
10:14 AM Revision 5618: mappings/VegCore.csv: Added taxonomicStatus
Aaron Marcuse-Kubitza
10:13 AM Revision 5617: schemas/vegbien.sql: taxonlabel: Added taxonstatus, with taxonomic_status enum
Aaron Marcuse-Kubitza
09:40 AM Revision 5616: schemas/vegbien.sql: taxonlabel.creator_id comment: Removed no longer accurate comment that this is the "according to" and "Name sec. x", which is now stored in concept_reference_id
Aaron Marcuse-Kubitza
09:37 AM Revision 5615: schemas/vegbien.sql: taxonlabel: Added concept_reference_id, which is the entity that defined the taxon concept (who the taxon label is according to)
Aaron Marcuse-Kubitza
09:22 AM Revision 5614: schemas/vegbien.ERD.mwb: Moved taxonlabel_relationship to the right of taxonlabel to provide room for taxonlabel to grow
Aaron Marcuse-Kubitza

10/17/2012

04:27 PM Revision 5613: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
04:25 PM Revision 5612: mappings/VegCore-VegBIEN.csv: Remapped morphospecies to new taxonlabel.morphospecies per today's conference call
Aaron Marcuse-Kubitza
04:23 PM Revision 5611: schemas/vegbien.sql: taxonlabel: Added separate morphospecies field per today's conference call, where it was decided it could not go in taxonepithet (the lowest-rank component of the name)
Aaron Marcuse-Kubitza
04:17 PM Revision 5610: schemas/vegbien.sql: Deleted taxonusage table per today's conference call, where it was decided that it was not needed
Aaron Marcuse-Kubitza
04:14 PM Revision 5609: schemas/vegbien.sql: Renamed taxonlabel_ancestor to taxonlabel_relationship per today's conference call, where it was decided that it would eventually contain asserted relationships (such as synonym and parent) in addition to autopopulated ancestor relationships
Aaron Marcuse-Kubitza
04:12 PM Revision 5608: schemas/vegbien.sql: Renamed taxonconcept to taxonlabel per today's conference call, where it was decided that taxonconcept contained too many unrelated fields to be purely a taxon concept
Aaron Marcuse-Kubitza
04:01 PM Revision 5607: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
04:01 PM Revision 5606: inputs/test_taxonomic_names/_scrub/public.sql, TNRS.sql: Regenerated with schema changes
Aaron Marcuse-Kubitza
01:47 PM Revision 5605: schemas/vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
01:45 PM Revision 5604: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
01:44 PM Revision 5603: schemas/vegbien.sql: taxonconcept_ancestor: Renamed taxonconcept_id to descendant_id to emphasize the direction of the relationship between the two taxonconcepts
Aaron Marcuse-Kubitza
01:35 PM Revision 5602: schemas/vegbien.ERD.mwb: Added taxonconcept_ancestor to the diagram since it is now a core table for storing taxonomic information
Aaron Marcuse-Kubitza
01:15 PM Revision 5601: mappings/VegCore-VegBIEN.csv: Mapped accordingTo to taxonconcept.creator_id, and have it take the place of identifiedBy when both are present
Aaron Marcuse-Kubitza
01:12 PM Revision 5600: mappings/VegCore-VegBIEN.csv: Remapped people's names split apart into name components in party to new party.fullname, which does not require splitting or make assumptions about the number of people who may be listed in a particular name field and which components of their name(s) are present
Aaron Marcuse-Kubitza
01:02 PM Revision 5599: schemas/vegbien.sql: party: Added fullname
Aaron Marcuse-Kubitza
12:55 PM Revision 5598: mappings/VegCore.csv: Added accordingTo
Aaron Marcuse-Kubitza
12:47 PM Revision 5597: inputs/.TNRS/tnrs/map.csv: Mapped Name_matched_url to scientificNameID, since the URL uniquely identifies the matched taxonconcept
Aaron Marcuse-Kubitza
12:43 PM Revision 5596: schemas/vegbien.sql: taxonconcept: Renamed taxonname to taxonepithet for clarity and to be consistent with TCS's use of "epithet" to denote what the taxonname was intended to be (http://www.tdwg.org/standards/117/download/#/UserGuidev_1.3.pdf)
Aaron Marcuse-Kubitza
12:18 PM Revision 5595: schemas/vegbien.sql: taxonconcept.creator_id: Documented that this is the concept reference for a taxon concept with an "according to", or the identifier's name for a nominal concept, and is equivalent to "Name sec. x"
Aaron Marcuse-Kubitza
11:50 AM Revision 5594: sql_io.py: import_csv(): Add a row_num column at the beginning of the table, which is autopopulated by csvs.RowNumFilter (it cannot be autopopulated by the serial datatype, because this does not support COPY FROM with a NULL-equivalent value in the serial field). This fixes a bug in csv2db where rows would not stay in inserted order upon querying the table, and would be returned in a different order each query, which prevented LIMIT/OFFSET based subsetting from returning consistent, nonoverlapping results. This occurs because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html>), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.
Aaron Marcuse-Kubitza
11:43 AM Revision 5593: csvs.py: Added RowNumFilter, which adds a row # column at the beginning of each row
Aaron Marcuse-Kubitza
11:42 AM Revision 5592: streams.py: LineCountStream, LineCountInputStream: Fixed bug where line_num was 1 too high because it started at 1 *and* was incremented *before* each line is returned. It now properly starts at 1, but the initial line_num value is 0 to increment to 1 upon encountering the first line. This off-by-one behavior may have been needed for code that associates an error message with a line #, but such code should add 1 to the line_num to get the line # of the error *if* the error prevents the next line from being read by the LineCount*Stream.
Aaron Marcuse-Kubitza
11:04 AM Revision 5591: sql_io.py: import_csv(): Take a reader and header rather than a stream to allow callers to pass in a wrapped CSV reader for filtering, etc.
Aaron Marcuse-Kubitza
11:00 AM Revision 5590: sql_io.py: append_csv(): Take a reader and header rather than a stream_info and stream to allow callers to use the simpler csvs.reader_and_header() function. This also allows callers to pass in a wrapped CSV reader for filtering, etc.
Aaron Marcuse-Kubitza
10:44 AM Revision 5589: csv2db, tnrs_db: Removed ProgressInputStream wrapper around input stream, which is no longer needed (and causes overlapping output) now that sql_io.append_csv() prints # rows read
Aaron Marcuse-Kubitza
10:42 AM Revision 5588: sql_io.py: append_csv(): Wrap input stream in a ProgressInputStream that reports rows (rather than lines) read
Aaron Marcuse-Kubitza
10:40 AM Revision 5587: csvs.py: InputRewriter: Use new StreamFilter to translate StopIteration EOF to ''
Aaron Marcuse-Kubitza
10:36 AM Revision 5586: csvs.py: Added StreamFilter
Aaron Marcuse-Kubitza
10:36 AM Revision 5585: csvs.py: InputRewriter: Also support stream inputs which report EOF as '' instead of StopIteration
Aaron Marcuse-Kubitza
09:55 AM Revision 5584: sql_io.py: append_csv(): Removed no longer used INSERT mode, since all callers now use the default COPY FROM
Aaron Marcuse-Kubitza
09:53 AM Revision 5583: sql_io.py: import_csv(): Removed no longer needed manual setting of use_copy_from, which defaults to True in append_csv()
Aaron Marcuse-Kubitza
09:50 AM Revision 5582: csv2db: Removed no longer needed manual setting of use_copy_from, which defaults to True in sql_io.import_csv()
Aaron Marcuse-Kubitza
09:49 AM Revision 5581: csv2db: Removed no longer needed separate handling of sql.DatabaseErrors, because all recoverable errors caused by COPY FROM (EncodingException and ragged rows) are now handled or avoided
Aaron Marcuse-Kubitza
09:46 AM Revision 5580: csv2db: Handle EncodingException separately by changing the connection encoding to LATIN1 and retrying
Aaron Marcuse-Kubitza
09:45 AM Revision 5579: sql.py: DbConn: Added set_encoding()
Aaron Marcuse-Kubitza
09:32 AM Revision 5578: sql_io.py: append_csv(): Parse any exceptions generated by the COPY FROM using new sql.parse_exception()
Aaron Marcuse-Kubitza
09:28 AM Revision 5577: sql.py: run_query(): Factored exception parsing out into new parse_exception()
Aaron Marcuse-Kubitza
09:22 AM Revision 5576: sql.py: Added EncodingException and parse it in run_query()
Aaron Marcuse-Kubitza
09:14 AM Revision 5575: sql.py: Removed no longer used NameException
Aaron Marcuse-Kubitza
09:14 AM Revision 5574: csvs.py: Filter: Added empty close() method to support using it as a stream (such as with streams.ProgressInputStream)
Aaron Marcuse-Kubitza
09:01 AM Revision 5573: sql_io.py: append_csv(): Don't disable COPY FROM for TSVs, which are now supported using csvs.InputRewriter
Aaron Marcuse-Kubitza
08:59 AM Revision 5572: sql_io.py: append_csv(): COPY FROM: Wrap provided stream in standardizing stream to fix ragged rows (with unequal # columns) and nonstandard CSV dialects (such as TSV with \-escaped newlines)
Aaron Marcuse-Kubitza
08:56 AM Revision 5571: csvs.py: Added InputRewriter, which wraps a reader, writing each row back to CSV
Aaron Marcuse-Kubitza
08:54 AM Revision 5570: csvs.py: Added ColCtFilter, which gives all rows the same # columns
Aaron Marcuse-Kubitza
07:25 AM Revision 5569: sql_io.py: row_num_col_def: Changed type to integer so the row_num can be populated directly by the insert process
Aaron Marcuse-Kubitza
07:19 AM Revision 5568: sql_io.py: Added row_num_col_def for use by import_csv(). The row_num column will be necessary again because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html>), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.
Aaron Marcuse-Kubitza

10/16/2012

10:58 PM Revision 5567: mappings/VegCore.csv: Removed unit-ambiguous height. Use height_m, height_ft instead.
Aaron Marcuse-Kubitza
10:57 PM Revision 5566: mappings/Veg+-VegCore.csv: Added height
Aaron Marcuse-Kubitza
10:57 PM Revision 5565: mappings/Veg+-VegCore.csv: Added height
Aaron Marcuse-Kubitza
10:52 PM Revision 5564: mappings/VegCore-VegBIEN.csv: Removed no longer used height mapping. Use height_m, height_ft instead.
Aaron Marcuse-Kubitza
10:39 PM Revision 5563: README.TXT: Data import: import_all: Added NCBI backbone to note about import_all not immediately returning control to the shell
Aaron Marcuse-Kubitza
10:30 PM Revision 5562: inputs/FIA/Organism/map.csv: Height: Remapped to height_ft, assuming units based on the range of values, the height of the tallest tree, and location inside the U.S.
Aaron Marcuse-Kubitza
10:23 PM Revision 5561: inputs/FIA/Organism/test.xml.ref: Accepted new inserted row count
Aaron Marcuse-Kubitza
10:01 PM Revision 5560: mappings/VegCore-VegBIEN.csv: Mapped height_ft
Aaron Marcuse-Kubitza
09:58 PM Revision 5559: schemas/functions.sql: Added _ft_to_m()
Aaron Marcuse-Kubitza
09:52 PM Revision 5558: mappings/VegCore.csv: Added height_ft
Aaron Marcuse-Kubitza
09:38 PM Revision 5557: inputs/SALVIAS/stems/map.csv: stem_height_m: Remapped to height_m using units from <http://salvias.net/Documents/salvias_data_dictionary.html#Plot+data>
Aaron Marcuse-Kubitza
09:37 PM Revision 5556: inputs/SALVIAS-CSV/Organism/map.csv: stem_height_m: Re-sourced units to stem_height_m rather than height_m definition in SALVIAS data dictionary
Aaron Marcuse-Kubitza
09:29 PM Revision 5555: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
09:23 PM Revision 5554: schemas/vegbien.sql: taxonconcept: taxonconcept_update_ancestors() trigger: Fixed bug where matched_concept_id needed to be changed to NULL when equal to taxonconcept_id, to avoid including the node itself with its parent's ancestors (which would violate the taxonconcept_ancestor pkey)
Aaron Marcuse-Kubitza
09:19 PM Revision 5553: sql_io.py: put_table(): Ensuring into's out_pkey is different from in_pkey: Prepend "out." instead of out_table to avoid long column names for the output pkey
Aaron Marcuse-Kubitza
09:18 PM Revision 5552: sql_gen.py: concat(): Allow multiple "column" suffixes with "." when matching the existing suffix
Aaron Marcuse-Kubitza
08:47 PM Revision 5551: schemas/vegbien.sql: taxonconcept: taxonconcept_update_ancestors() trigger: Corrected comment explaining why we don't need an ON DELETE trigger to say that this is because the foreign key for *taxonconcept_ancestor.ancestor_id*, not taxonconcept.parent_id, is ON DELETE CASCADE. The auto-deletion will also occur if taxonconcept.parent_id is ON DELETE CASCADE, because taxonconcept_ancestor.taxonconcept_id is ON DELETE CASCADE, but it is not actually necessary to have cascading deletes on taxonconcept.parent_id (and SET NULL may in fact sometimes be more appropriate).
Aaron Marcuse-Kubitza
08:33 PM Revision 5550: schemas/tree_cross-links.sql: Removed header comments added by pgAdmin
Aaron Marcuse-Kubitza
08:30 PM Revision 5549: schemas/tree_cross-links.sql: Updated for new taxonconcept_update_ancestors() trigger
Aaron Marcuse-Kubitza
08:21 PM Revision 5548: schemas/vegbien.sql: taxonconcept: Rewrote taxonconcept() trigger to avoid completely reinserting the taxonconcept_ancestor entries of all descendants every time taxonconcept changes or using trigger recursion to find descendants. Instead, just delete the old parent's ancestors from and add the new parent's ancestors to each descendant, using taxonconcept_ancestor itself (with the new taxonconcept_ancestor_descendants index) to find all descendants. As an additional optimization, only update taxonconcept_ancestor if the parent_id or matched_concept_id has actually changed. This fixes a bug in NCBI where inserting taxonconcepts out of dependency order caused taxonconcept_ancestor entries to be repeatedly regenerated, slowing the import down to a crawl.
Aaron Marcuse-Kubitza
07:42 PM Revision 5547: schemas/vegbien.sql: taxonconcept: Added taxonconcept_3_parent_id_avoid_self_ref() trigger to avoid recursive references in root taxonconcepts (taxonconcepts with no parent). This will simplify the new taxonconcept_update_ancestors() trigger.
Aaron Marcuse-Kubitza
06:32 PM Revision 5546: schemas/vegbien.sql: taxonconcept_ancestor: Added taxonconcept_ancestor_descendants index to support looking up all the descendants for a taxonconcept. This will be used by the new taxonconcept_update_ancestors() trigger, which will support inserting taxonconcepts out of dependency order (such as for NCBI).
Aaron Marcuse-Kubitza
04:35 PM Revision 5545: schemas/vegbien.sql: *_update_ancestors(): Made trigger deferred, so that it would run after all rows have been inserted in a bulk insert, such as during column-based import. This ensures that ancestors lists are not populated until all parents are inserted, which may occur out of order for datasources (such as NCBI) whose nodes are not in dependency order. (A node that newly acquires a parent will have to update all its descendants, which will then be updated again when its parent acquires its own parent.)
Aaron Marcuse-Kubitza
04:28 PM Revision 5544: lib/PostgreSQL-MySQL.csv: Also filter out constraint triggers in addition to regular triggers
Aaron Marcuse-Kubitza

10/15/2012

05:37 PM Revision 5543: inputs/Madidi/Organism/map.csv: Total height: Remapped to height_m, assuming units based on the range and precision of values
Aaron Marcuse-Kubitza
05:33 PM Revision 5542: inputs/VegBank/stemcount/map.csv: stemheight: Remapped to height_m using units from <http://vegbank.org/vegbank/views/dba_tabledescription_detail.jsp?view=detail&wparam=stemcount&entity=dba_tabledescription&where=where_tablename>
Aaron Marcuse-Kubitza
05:29 PM Revision 5541: inputs/SALVIAS/plotObservations/map.csv, inputs/SALVIAS-CSV/Organism/map.csv: height_m, stem_height_m: Remapped to height_m using units from <http://salvias.net/Documents/salvias_data_dictionary.html#Plot+data>
Aaron Marcuse-Kubitza
05:24 PM Revision 5540: mappings/VegCore-VegBIEN.csv: Mapped height_m
Aaron Marcuse-Kubitza
05:15 PM Revision 5539: mappings/VegCore.csv: Added height_m
Aaron Marcuse-Kubitza
04:20 PM Revision 5538: mappings/VegCore.csv, VegCore-VegBIEN.csv: Removed no longer used and unit-ambiguous organismX, organismY. Use organismX_m, organismY_m instead.
Aaron Marcuse-Kubitza
04:18 PM Revision 5537: inputs/VegBank/stemlocation/map.csv: stemxposition, stemyposition: Remapped to organismX_m/organismY_m using units from <http://vegbank.org/vegbank/views/dba_tabledescription_detail.jsp?view=detail&wparam=stemlocation&entity=dba_tabledescription&where=where_tablename>
Aaron Marcuse-Kubitza
04:06 PM Revision 5536: inputs/TEAM/*/map.csv: 1ha Plot X Coordinate, 1ha Plot Y Coordinate: Remapped to organismX_m/organismY_m using units from <https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/inputs/TEAM/_src/TEAM-DataPackage-20120920191251_3859/Vegetation+-+Trees+&+Lianas/Vegetation-Tree-and-Liana-Metadata-1.5.pdf>
Aaron Marcuse-Kubitza
03:59 PM Revision 5535: inputs/SALVIAS/plotObservations/map.csv, inputs/SALVIAS-CSV/Organism/map.csv: x_position, y_position: Remapped to organismX_m/organismY_m using units from <http://salvias.net/Documents/salvias_data_dictionary.html#Plot+data>
Aaron Marcuse-Kubitza
03:51 PM Revision 5534: inputs/Madidi/Organism/map.csv: Subplot X, Subplot Y: Remapped to organismX_m/organismY_m, assuming units based on the size of values relative to the plot area, which has units of ha
Aaron Marcuse-Kubitza
03:44 PM Revision 5533: inputs/CTFS/StemObservation/map.csv: x, y: Remapped to organismX_m/organismY_m, assuming units based on the size of values relative to plot area, which has units of ha
Aaron Marcuse-Kubitza
03:30 PM Revision 5532: mappings/VegCore-VegBIEN.csv: Mapped organismX_m, organismY_m
Aaron Marcuse-Kubitza
03:29 PM Revision 5531: mappings/VegCore.csv: Added organismX_m, organismY_m
Aaron Marcuse-Kubitza
03:23 PM Revision 5530: sql_io.py: put_table(): full_in_table: Create it using new sql.copy_table() instead of sql.run_query_into()
Aaron Marcuse-Kubitza
03:23 PM Revision 5529: sql.py: Added copy_table()
Aaron Marcuse-Kubitza
03:14 PM Revision 5528: sql.mk_select() calls: Removed no longer needed order_by=None when limit=0
Aaron Marcuse-Kubitza
03:11 PM Revision 5527: sql.py: mk_select(): Set order_by to None if limit == 0
Aaron Marcuse-Kubitza
03:09 PM Revision 5526: inputs/.TNRS/schema.sql: Documented that accepted names must be processed before any names that resolve to them, because the entry for the accepted name contains all the ranks parsed out but the resolved name of another entry contains just some ranks and the taxonomic name. Column-based import will do this automatically when the total # of rows is <= the partition_size (because _taxonconcept_set_matched_concept_id()'s accepted taxonconcept is created after the main taxonconcept), but TNRS has more rows than this so sorting is needed to ensure that all the accepted names are processed in the first partitions.
Aaron Marcuse-Kubitza
02:52 PM Revision 5525: sql.py: table_order_by(): Cache the order_by in table.order_by and propagate it when a LIKE table is created
Aaron Marcuse-Kubitza
02:51 PM Revision 5524: sql_gen.py: Table: Added order_by attr to cache the results of table_order_by()
Aaron Marcuse-Kubitza
02:36 PM Revision 5523: sql.select() calls: Removed order_by=None everywhere that a stable row order is required (i.e. consistent between selects, or consistent between table transformations). This causes several tests to return different inserted row counts, because the input table is now being accessed in pkey order instead of in table order. This fixes a bug where tables with more rows than ~100 would return different results for repeated calls of the same non-ordered select.
Aaron Marcuse-Kubitza
02:27 PM Revision 5522: sql.py: mk_select(): Use table_order_by() instead of table_pkey_col() to determine what column(s) to order by if order_by is set to order_by_pkey
Aaron Marcuse-Kubitza
02:26 PM Revision 5521: sql.py: Added table_pkey_index(), index_order_by(), table_cluster_on(), table_order_by()
Aaron Marcuse-Kubitza
01:10 PM Revision 5520: sql.py: Added index_exprs() and use it in index_cols()
Aaron Marcuse-Kubitza
01:08 PM Revision 5519: README.TXT: Data import: On local machine: Added `make inputs/.TNRS/cleanup`, which is necessary because the PostgreSQL collation may differ between vegbiendev's and your DB
Aaron Marcuse-Kubitza
12:24 PM Revision 5518: schemas/vegbien.sql: taxonconcept: taxonconcept_update_ancestors(): Use matched_concept_id's ancestors instead if available. (Recursively applied, this will use the ancestors of the accepted concept.) This facilitates finding all children of and matches to an accepted concept, which will all have an entry for that concept in taxonconcept_ancestor. Note that the concept's own parents will not be indexed in taxonconcept_ancestor, because only accepted ancestors are now stored in taxonconcept_ancestor. Documented that taxonconcept_ancestor now stores the *accepted* ancestors of a taxonconcept.
Aaron Marcuse-Kubitza
12:14 PM Revision 5517: schemas/vegbien.sql: taxonconcept: taxonconcept_2_propagate_accepted_concept_id(): Also update accepted_concept_id on concepts that resolve to this concept, which may have been created before this concept was marked as accepted if concepts are not imported in dependency order (accepted concepts first). Added index on matched_concept_id to speed up finding concepts that resolve to this concept.
Aaron Marcuse-Kubitza
12:10 PM Revision 5516: sql.py: mk_select(): order_by is order_by_pkey: Only order by the table's actual pkey, if it has one, rather than using the first column if it doesn't
Aaron Marcuse-Kubitza
12:08 PM Revision 5515: inputs/.TNRS/tnrs/test.xml.ref: Updated inserted row count
Aaron Marcuse-Kubitza
10:21 AM Revision 5514: db_xml.py: partition_size: Increased to 1,000,000 (>= NCBI.higher_taxa's size) so NCBI.higher_taxa can be imported completely in one partition. This is necessary because NCBI's taxonconcepts are not in dependency order (parents first), so a later partition cannot rely on the parents of its taxonconcepts having already been imported. Instead, all taxonconcepts must be imported at once and then separately, the parents of all taxonconcepts must be set.
Aaron Marcuse-Kubitza
10:08 AM Revision 5513: mappings/VegCore-VegBIEN.csv: taxonconcept.parent_id when explicit parent provided: Set taxonconcept.parent_id using new _taxonconcept_set_parent_id() *after* creating the child taxonconcept, so that the parent_id will point to the already-inserted parent taxonconcept instead of creating a new, empty parent taxonconcept. This creates a two-step import, where first the taxonconcepts are imported, and then the parent_ids are matched up. This is necessary for column-based import because all the parent taxonconcepts are imported in a separate iteration from the child taxonconcepts with only their sourceaccessioncode, so this iteration must occur after the child taxonconcept iteration in order to match up with fully-populated taxonconcepts. Row-based import, on the other hand, does not require _taxonconcept_set_parent_id() but does require the taxonconcepts to be provided in dependency order (parents first), which is unfortunately not the case for NCBI.
Aaron Marcuse-Kubitza
09:57 AM Revision 5512: schemas/vegbien.sql: *_update_ancestors(): Telling immediate children to update their ancestors lists: Exclude self to avoid infinite recursion
Aaron Marcuse-Kubitza
09:57 AM Revision 5511: schemas/vegbien.sql: *_update_ancestors(): Telling immediate children to update their ancestors lists: Exclude self to avoid infinite recursion
Aaron Marcuse-Kubitza
09:41 AM Revision 5510: schemas/vegbien.sql: Added _taxonconcept_set_parent_id()
Aaron Marcuse-Kubitza
09:37 AM Revision 5509: schemas/vegbien.sql: Renamed _set_matched_taxonconcept() to _taxonconcept_set_matched_concept_id() so that the function name is prefixed with the table it applies to
Aaron Marcuse-Kubitza
09:35 AM Revision 5508: db_xml.py: put(): Treat a child node which is a function (starts with _) as a child with fkey to parent rather than as a field in the table. Such a function accepts the table's pkey as one of its arguments.
Aaron Marcuse-Kubitza
09:05 AM Revision 5507: sql_gen.py: map_expr(): Don't replace an unquoted name when followed by ",", as it would be in an into table name for a function with multiple arguments (e.g. family in "_join_words(1=Field family, 2=Field name)")
Aaron Marcuse-Kubitza
08:49 AM Revision 5506: schemas/vegbien.sql: locationevent: Moved obsstartdate, obsenddate to top of table so they would be visible in the ERD
Aaron Marcuse-Kubitza
08:45 AM Revision 5505: sql_io.py: put_table(): ensure_cond(): track_data_error(): Concatenate the columns in the constraint together using , rather than adding a separate entry for each column, because the constraint is applicable to all columns together rather than to each column separately
Aaron Marcuse-Kubitza
08:26 AM Revision 5504: sql_io.py: put_table(): Renamed ignore_cond() to ensure_cond() for clarity
Aaron Marcuse-Kubitza
08:22 AM Revision 5503: import_all: Also import the NCBI tree of life, before the TNRS names
Aaron Marcuse-Kubitza
08:17 AM Revision 5502: mappings/VegCore-VegBIEN.csv: Also map acceptedFamily to the corresponding NCBI family
Aaron Marcuse-Kubitza
08:07 AM Revision 5501: lib/PostgreSQL-MySQL.csv: custom types: Also exclude time. Reordered excluded (built-in) types by name.
Aaron Marcuse-Kubitza
07:57 AM Revision 5500: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
07:50 AM Revision 5499: schemas/vegbien.sql: Changed `timestamp with time zone` fields to `date` because time information is not stored in these fields, and it's confusing to have an arbitrary timezone (the server's timezone) and an arbitrary time (midnight) set for input data that only has a precision to the nearest day
Aaron Marcuse-Kubitza
07:43 AM Revision 5498: sql_gen.py: null_sentinels: Added entry for date
Aaron Marcuse-Kubitza
07:40 AM Revision 5497: lib/PostgreSQL-MySQL.csv: custom types: Also exclude date, datetime
Aaron Marcuse-Kubitza
07:11 AM Revision 5496: README.TXT: Documentation: To import and scrub just the test taxonomic names: Run `make backups/TNRS.backup/restore` in the background because it takes awhile
Aaron Marcuse-Kubitza
06:45 AM Revision 5495: mappings/VegCore.csv: Re-sourced TaxonomicRankEnum fields to the official TCS schema rather than the TCS version in VegX
Aaron Marcuse-Kubitza
06:42 AM Revision 5494: schemas/vegbien.sql: taxonrank: Updated source to the TCS schema (rather than VegBank) for the new, expanded list. Note that although the list itself was compiled from the TCS version in VegX, the official TCS download does not differ from the VegX TCS in the TaxonomicRankEnum fields (the xs: namespace has just been replaced with xsd: by VegX).
Aaron Marcuse-Kubitza

10/12/2012

05:21 PM Revision 5493: schemas/vegbien.sql: analytical_db_view: taxonconcept: Join again on the accepted_concept_id in order to use the accepted taxonconcept rather than the verbatim taxonconcept from the datasource
Aaron Marcuse-Kubitza
05:14 PM Revision 5492: schemas/: svn:ignore log files
Aaron Marcuse-Kubitza
05:11 PM Revision 5491: Added inputs/.NCBI/. This uses many of the new schema and mappings features, such as taxonconcept.sourceaccessioncode and parentTaxonID
Aaron Marcuse-Kubitza
05:07 PM Revision 5490: mappings/VegCore-VegBIEN.csv: identifyingtaxonomicname: Don't create if taxonconcept has an explicit parent, because the taxonName (which is generally only a component of the full taxonomic name, e.g. specificEpithet) is not globally unique. Datasources that provide name components in such a way that levels at or below family can't be directly concatenated cannot currently receive an identifyingtaxonomicname for input to TNRS.
Aaron Marcuse-Kubitza
04:54 PM Revision 5489: mappings/VegCore-VegBIEN.csv: taxonName->identifyingtaxonomicname: Don't include the rank with the taxonName, because TNRS only allows the rank to be included in the taxonomic name if it's infraspecific (otherwise, it returns no or an invalid match due to the presence of what it sees as an invalid term or a name component)
Aaron Marcuse-Kubitza
04:48 PM Revision 5488: mappings/VegCore-VegBIEN.csv: Mapped taxonName to the TNRS input taxonconcept's identifyingtaxonomicname
Aaron Marcuse-Kubitza
04:28 PM Revision 5487: mappings/VegCore-VegBIEN.csv: Only forward taxonRank to the parent taxonconcept (which stores the infraspecific taxonconcept when the infraspecificEpithet is provided) if there is no explicit parent provided via parentTaxonID/etc.
Aaron Marcuse-Kubitza
04:09 PM Revision 5486: mappings/VegCore-VegBIEN.csv: Mapped parentScientificNameID, parentTaxonConceptID, parentTaxonID
Aaron Marcuse-Kubitza
04:03 PM Revision 5485: mappings/VegCore.csv: Added parentScientificNameID, parentTaxonConceptID, parentTaxonID
Aaron Marcuse-Kubitza
03:53 PM Revision 5484: input.Makefile: $(inDatasrc): Also include the vegbien_dest $schemas in the search_path, so that the datasource's SQL scripts (create.sql, etc.) can use VegBIEN functions and types
Aaron Marcuse-Kubitza
03:44 PM Revision 5483: lib/common.Makefile: Added $(comma)
Aaron Marcuse-Kubitza
02:41 PM Revision 5482: inputs/test_taxonomic_names/_scrub/public.sql: Regenerated with schema changes
Aaron Marcuse-Kubitza
02:38 PM Revision 5481: input.Makefile: Maps building: %/.map.csv.last_cleanup: Fixed bug where needed to include $(coreMap) as a prerequisite, because even though it is not used directly in this target's recipe, it is used by targets invoked via recursive make after the main recipe runs. In general, whenever targets forward commands to a recursive make target, they also need to forward those recursive targets' prerequisites by including them in their own prerequisites list.
Aaron Marcuse-Kubitza
02:29 PM Revision 5480: mappings/VegCore-VegBIEN.csv: Mapped taxonConceptID, taxonID, scientificNameID to taxonconcept.sourceaccessioncode. Note that taxonconcept stores all of these taxonomic entities, using creator_id+creationdate, taxonname+rank+parent_id, and identifyingtaxonomicname, respectively.
Aaron Marcuse-Kubitza
02:28 PM Revision 5479: mappings/VegCore-VegBIEN.csv: Mapped taxonConceptID, taxonID, scientificNameID to taxonconcept.sourceaccessioncode. Note that taxonconcept stores all of these taxonomic entities, using creator_id+creationdate, taxonname+rank+parent_id, and identifyingtaxonomicname, respectively.
Aaron Marcuse-Kubitza
02:13 PM Revision 5478: mappings/VegCore-VegBIEN.csv: Mapped taxonName
Aaron Marcuse-Kubitza
02:11 PM Revision 5477: mappings/VegCore.csv: Added taxonName
Aaron Marcuse-Kubitza
02:05 PM Revision 5476: schemas/vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
01:55 PM Revision 5475: schemas/vegbien.sql: Copied functions in the functions schema that are also used by the public schema to the public schema, so that reinstalling the functions schema would not cause anything that depends on a function in it to be cascadingly deleted. Currently, this just affects analytical_db_view, which uses _fraction_to_percent().
Aaron Marcuse-Kubitza
01:44 PM Revision 5474: inputs/test_taxonomic_names/_scrub/public.sql: Regenerated with schema changes
Aaron Marcuse-Kubitza
01:36 PM Revision 5473: schemas/vegbien.sql: taxonconcept: Added taxonconcept_2_propagate_accepted_concept_id() trigger to auto-populate the accepted_concept_id
Aaron Marcuse-Kubitza
12:53 PM Revision 5472: schemas/vegbien.sql: taxonconcept.sourceaccessioncode: Added descriptive comment
Aaron Marcuse-Kubitza
12:53 PM Revision 5471: schemas/vegbien.sql: taxonconcept.accepted_concept_id: Added descriptive comment
Aaron Marcuse-Kubitza
12:48 PM Revision 5470: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
12:47 PM Revision 5469: schemas/vegbien.sql: taxonconcept: Added sourceaccessioncode, and allow it to scope the taxonconcept when provided
Aaron Marcuse-Kubitza
12:33 PM Revision 5468: inputs/test_taxonomic_names/_scrub/public.sql: Regenerated with schema changes
Aaron Marcuse-Kubitza
12:29 PM Revision 5467: schemas/vegbien.sql: taxonconcept: Renamed canon_concept_id to matched_concept_id, because this is actually the closest-match taxonconcept in the match hierarchy (datasource concept -> parsed concept -> matched concept -> accepted concept) rather than the accepted synonym, which goes in accepted_concept_id
Aaron Marcuse-Kubitza
05:51 AM Revision 5466: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
05:47 AM Revision 5465: schemas/vegbien.sql: taxonconcept: Renamed canon_concept_id to matched_concept_id, because this is actually the closest-match taxonconcept in the match hierarchy (datasource concept -> parsed concept -> matched concept -> accepted concept) rather than the accepted synonym, which goes in accepted_concept_id
Aaron Marcuse-Kubitza
05:34 AM Revision 5464: schemas/vegbien.sql: taxonconcept: Added accepted_concept_id
Aaron Marcuse-Kubitza
05:27 AM Revision 5463: schemas/vegbien.sql: taxonconcept.canon_concept_id: comment: Changed "accepted synonym" to "closest match", since canon_concept_id is actually a hierarchy from datasource concept -> parsed concept -> matched concept -> accepted concept
Aaron Marcuse-Kubitza
05:22 AM Revision 5462: schemas/vegbien.sql: taxonconcept: Added order # to trigger names so they run in a defined order (triggers are run in alphabetical order)
Aaron Marcuse-Kubitza
04:53 AM Revision 5461: README.TXT: Use new revision # in log filenames to get all the logs for an import. Changed <datetime> to <version> because the rotated public schema now also includes the svn revision.
Aaron Marcuse-Kubitza
04:44 AM Revision 5460: lib/common.Makefile: $(version): Include both the svn revision when make was started as well as the svn revision when the command is actually run (when these values differ), in case svn was updated between the time an import was started and the time a particular table started being imported. Because tables within a datasource are imported sequentially, it is possible that an update would have happened before the last table started importing.
Aaron Marcuse-Kubitza
04:23 AM Revision 5459: Makefile: Moved setting of $(root) before include of lib/common.Makefile because it's used by lib/common.Makefile
Aaron Marcuse-Kubitza
04:21 AM Revision 5458: Factored OS section out from Makefile, input.Makefile into lib/common.Makefile
Aaron Marcuse-Kubitza
04:13 AM Revision 5457: Makefile, input.Makefile: Use new $(version), which unlike $(date) also includes the svn revision, to version log files, etc. This way, the working copy can be put back to the way it was at the time of a given import (excluding changes to nonversioned files). This also makes it easier to get all the log files for a particular import when different tables' imports started at different times.
Aaron Marcuse-Kubitza
04:08 AM Revision 5456: Makefile: Added $(root) for use with $(rootRevision)
Aaron Marcuse-Kubitza
04:08 AM Revision 5455: lib/common.Makefile: Added $(version), to replace $(date) for versioning log files, etc., and helper function $(rootRevision)
Aaron Marcuse-Kubitza
04:07 AM Revision 5454: lib/common.Makefile: Added $(revision)
Aaron Marcuse-Kubitza
04:04 AM Revision 5453: input.Makefile: Removed no longer used $(SED)
Aaron Marcuse-Kubitza
04:03 AM Revision 5452: lib/common.Makefile: Added $(sed)
Aaron Marcuse-Kubitza
03:58 AM Revision 5451: Factored $(date) out from Makefile, input.Makefile into lib/common.Makefile
Aaron Marcuse-Kubitza
03:18 AM Revision 5450: sql_io.py: put_table(): DuplicateKeyException: Fixed bug where indexes with conditions needed to have the input rows filtered by the condition, to prevent trying to retrieve an existing/inserted row using a join on the index columns when the index in fact does not apply. This fixes a bug in the import of taxonconcept where the taxonconcept_0_unique_identifying_name unique index has a condition which was not satisfied for input rows with no identifyingtaxonomicname, causing any input row with NULL in this column to match *all* taxonconcepts with a NULL identifyingtaxonomicname. This uses ignore_cond()'s new support for constraints that did not fail at least once.
Aaron Marcuse-Kubitza
03:12 AM Revision 5449: sql_io.py: put_table(): ignore_cond(): Added support for constraints that did not fail at least once, and therefore should not be required to simplify to a non-false value. As part of this, only track the failed constraint in the errors table if it actually failed at least once based on the deleted row count or the `failed` param.
Aaron Marcuse-Kubitza
03:05 AM Revision 5448: sql_gen.py: map_expr(): Fixed bug where names were being replaced when they were inside another name. This occurred with combined names created by sql_io.into_table_name().
Aaron Marcuse-Kubitza
01:11 AM Revision 5447: sql.py: ConstraintException: message: Wrap condition in strings.as_tt()
Aaron Marcuse-Kubitza
12:30 AM Revision 5446: sql.py: run_query(): DuplicateKeyException: Also retrieve the index's condition using new index_cond()
Aaron Marcuse-Kubitza
12:28 AM Revision 5445: sql.py: Added index_cond()
Aaron Marcuse-Kubitza
12:11 AM Revision 5444: sql_io.py: put_table(): insert_into_pkeys(): Take a query as the param instead of sql.mk_select()'s params, to allow the caller to pass in any query without needing insert_into_pkeys() to manually pass through those args
Aaron Marcuse-Kubitza

10/11/2012

11:40 PM Revision 5443: sql.py: constraint_cond(): Fixed NotImplementedError message to apply to this function
Aaron Marcuse-Kubitza
09:36 PM Revision 5442: sql_io.py: put_table(): ignore_cond(): Log message: Replaced don't with do not so it wouldn't mess up syntax highlighting when viewing the log file in a text editor
Aaron Marcuse-Kubitza
09:07 PM Revision 5441: input.Makefile: Staging tables installation: Don't delete %/header.csv on error, because header.csv is a byproduct rather than the primary output and is created roughly atomically
Aaron Marcuse-Kubitza
08:40 PM Revision 5440: schemas/vegbien.sql: *_ancestor tables: Added descriptive comment that these are ancestor cross link tables
Aaron Marcuse-Kubitza
08:23 PM Revision 5439: csvs.py: sniff(): Support multi-char delims using \t, such as \t|\t used by NCBI. Support custom line suffixes, such as \t| used by NCBI.
Aaron Marcuse-Kubitza
08:18 PM Revision 5438: csvs.py: TsvReader.next(): Remove only the autodetected line ending instead of any standard line ending. Note that this requires all header override files to use the same line ending as the CSV they override, which is now the case.
Aaron Marcuse-Kubitza
08:15 PM Revision 5437: csvs.py: is_tsv(): Support multi-char delimiters by checking only the first char of the delimiter
Aaron Marcuse-Kubitza
08:12 PM Revision 5436: csvs.py: sniff(): Also autodetect the line ending
Aaron Marcuse-Kubitza
08:11 PM Revision 5435: csvs.py: sniff(): Also autodetect the line ending
Aaron Marcuse-Kubitza
08:02 PM Revision 5434: inputs/test_taxonomic_names/Taxon/+header.txt: Changed line endings to \r\n to match testNames.txt line endings. This will be necessary when the line ending is autodetected by csvs.sniff().
Aaron Marcuse-Kubitza
07:59 PM Revision 5433: csvs.py: TsvReader.next(): Renamed raw_contents var to line, since this is just the line with the ending removed
Aaron Marcuse-Kubitza
07:36 PM Revision 5432: strings.py: Replaced no longer used contains_any() with find_any(), which returns any found substring, or None if none of the substrings were found
Aaron Marcuse-Kubitza
07:22 PM Revision 5431: csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader
Aaron Marcuse-Kubitza
07:21 PM Revision 5430: csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader
Aaron Marcuse-Kubitza
06:58 PM Revision 5429: csvs.py: TsvReader: Use str.split() instead of csv.reader().next() to parse the row, for efficiency and to support multi-char delimiters. This is possible because the TSV dialect doesn't use CSV parsing features other than the delimiter and newline-escaping (which is handled separately).
Aaron Marcuse-Kubitza
06:02 PM Revision 5428: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza

10/10/2012

11:43 AM Revision 5427: input.Makefile: $(exts): Added .dmp
Aaron Marcuse-Kubitza
11:43 AM Revision 5426: csvs.py: delims: Added |
Aaron Marcuse-Kubitza
11:28 AM Revision 5425: Removed no longer used inputs/.public/. Use inputs/.TNRS/ and inputs/.TNRS/tnrs/tnrs.make instead.
Aaron Marcuse-Kubitza
11:23 AM Revision 5424: README.TXT: Documentation: To import and scrub just the test taxonomic names: Added steps to restore the original DB when the test scrub is complete
Aaron Marcuse-Kubitza
11:22 AM Revision 5423: inputs/test_taxonomic_names/test_scrub: Also export the results to inputs/test_taxonomic_names/_scrub/
Aaron Marcuse-Kubitza
11:06 AM Revision 5422: inputs/test_taxonomic_names/test_scrub: Use regular for .. in loop with a list of what's being processed in each iteration (match_input_names, parse_accepted_names)
Aaron Marcuse-Kubitza
10:58 AM Revision 5421: inputs/.TNRS/tnrs/map.csv: Mapped Genus_score, Specific_epithet_score
Aaron Marcuse-Kubitza
10:56 AM Revision 5420: mappings/VegCore-VegBIEN.csv: Mapped matchedGenusFit_fraction, matchedSpeciesFit_fraction. Reordered canon_concept_fit_fraction _maxs in the order they would be used if _alt were being used instead.
Aaron Marcuse-Kubitza
10:52 AM Revision 5419: mappings/VegCore.csv: Added matchedSpeciesFit_fraction
Aaron Marcuse-Kubitza
10:47 AM Revision 5418: mappings/VegCore.csv: matchedFamilyFit_fraction: Source the "matched" to Family_matched, which is a closer fit than Name_matched. matchedGenusFit_fraction: Fixed Genus_matched source to use #detailed_download instead of #simple_download.
Aaron Marcuse-Kubitza
10:42 AM Revision 5417: mappings/VegCore.csv: Added matchedGenusFit_fraction
Aaron Marcuse-Kubitza
10:18 AM Revision 5416: README.TXT: Removed extra trailing whitespace
Aaron Marcuse-Kubitza
10:18 AM Revision 5415: README.TXT: Documentation: To import and scrub just the test taxonomic names: Use new inputs/test_taxonomic_names/test_scrub
Aaron Marcuse-Kubitza
10:17 AM Revision 5414: Added inputs/test_taxonomic_names/test_scrub
Aaron Marcuse-Kubitza
10:01 AM Revision 5413: schemas/vegbien.sql: taxonconcept: Renamed canon_taxonconcept_id to canon_concept_id to shorten the name, which is used often
Aaron Marcuse-Kubitza
09:45 AM Revision 5412: schemas/vegbien.sql: taxonconcept: Added taxonconcept_canon_concept_min_fit() trigger to remove the canon_concept_id link from insufficient matches. These occur when e.g. a name in another language is approximated to a latin name or when the input name is not a proper taxon but TNRS provides a best-guess match anyway.
Aaron Marcuse-Kubitza
09:42 AM Revision 5411: inputs/.TNRS/tnrs/map.csv: Mapped Family_score to new matchedFamilyFit_fraction
Aaron Marcuse-Kubitza
09:39 AM Revision 5410: mappings/VegCore-VegBIEN.csv: Use matchedFamilyFit_fraction as canon_concept_fit_fraction when greater than matchedTaxonFit_fraction, because if there is at least a matched family, there is a valid taxonconcept to attach to
Aaron Marcuse-Kubitza
09:39 AM Revision 5409: xml_func.py: Simplifying functions: Added _min, _max as passthroughs
Aaron Marcuse-Kubitza
09:34 AM Revision 5408: schemas/functions.sql: Added _max(), _min()
Aaron Marcuse-Kubitza
09:21 AM Revision 5407: mappings/VegCore.csv: Added matchedFamilyFit_fraction
Aaron Marcuse-Kubitza
09:04 AM Revision 5406: mappings/VegCore-VegBIEN.csv: Remapped matchedTaxonFit_fraction to the verbatim* taxonconcept, because this is actually for the verbatim* concept's fit to the matched concept, not the matched concept's fit to the accepted concept
Aaron Marcuse-Kubitza
08:59 AM Revision 5405: inputs/.TNRS/tnrs/map.csv: Restored *-prefixed output terms for unmapped terms that had initially been mapped to OMIT but could reasonably match to something in the future. Continue mapping Name_number to OMIT because it isn't globally unique (it identifies the name only within one TNRS batch).
Aaron Marcuse-Kubitza
08:45 AM Revision 5404: inputs/.TNRS/tnrs/map.csv: Mapped Overall_score to new matchedTaxonFit_fraction
Aaron Marcuse-Kubitza
08:44 AM Revision 5403: mappings/VegCore-VegBIEN.csv: Mapped matchedTaxonFit_fraction to _set_canon_taxonconcept(canon_concept_fit_fraction)
Aaron Marcuse-Kubitza
08:37 AM Revision 5402: mappings/VegCore.csv: Added matchedTaxonFit_fraction
Aaron Marcuse-Kubitza
08:20 AM Revision 5401: schemas/vegbien.sql: _set_canon_taxonconcept(): Also set the canon_concept_fit_fraction
Aaron Marcuse-Kubitza
08:10 AM Revision 5400: schemas/vegbien.sql: taxonconcept: Added canon_concept_fit_fraction to store the closeness of fit of the canon_concept
Aaron Marcuse-Kubitza
07:55 AM Revision 5399: schemas/vegbien.sql: taxonconcept: Renamed canon_taxonconcept_id to canon_concept_id to shorten the name, which is used often
Aaron Marcuse-Kubitza
07:10 AM Revision 5398: sql.py: mk_update(): in_place: Convert columns of type character varying to text so that they can be merge-joined with text columns. Note that these two types are equivalent but not aliases of one another, so the explicit type change is needed.
Aaron Marcuse-Kubitza
07:07 AM Revision 5397: sql_gen.py: Added canon_type()
Aaron Marcuse-Kubitza
06:52 AM Revision 5396: sql.py: mk_update(): in_place: Factored retrieval of column type out into separate statement for clarity
Aaron Marcuse-Kubitza
06:27 AM Revision 5395: schemas/functions.sql: _join*(): Fixed bug where was returning '' instead of NULL when only NULL inputs were provided, because array_to_string() always returns a non-NULL string. Functions must always return NULL in place of '' to ensure that empty strings do not find their way into VegBIEN, and to prevent inconsistencies between row-based and column-based import (row-based import folds empty strings to NULL while column-based import relies on having a clean input table).
Aaron Marcuse-Kubitza
06:10 AM Revision 5394: sql_io.py: cleanup_table(): Use sql.table_pkey_col() instead of sql.pkey_col() so that only an actual pkey column is removed from the list of columns to clean. This fixes a bug where the first column in the table was not cleaned up if there was no pkey. Note that this bug only affected newly re-created staging tables, because staging tables previously had a special row_num pkey column added if they did not already have a pkey. The row_num column is now added by column-based import instead.
Aaron Marcuse-Kubitza
05:51 AM Revision 5393: sql.py: table_pkey_col(): Raise a DoesNotExistException if the table has no pkey
Aaron Marcuse-Kubitza
05:23 AM Revision 5392: sql.py: pkey_col(): Call table_pkey_col() directly rather than via pkey_name(). pkey_name(): Call pkey_col() instead of table_pkey_col() now that pkey_col() calls table_pkey_col().
Aaron Marcuse-Kubitza
05:14 AM Revision 5391: sql.py: pkey_col(): Documented that if there is no pkey, returns the first column in the table
Aaron Marcuse-Kubitza
05:13 AM Revision 5390: sql.py: pkey_col(): Specify recover directly as a kw_arg because it's the only kw_arg passed to pkey_name()
Aaron Marcuse-Kubitza
05:10 AM Revision 5389: sql.py: Added table_pkey_col() and use it in pkey_name()
Aaron Marcuse-Kubitza
05:01 AM Revision 5388: sql.py: Renamed pkey() to pkey_name()
Aaron Marcuse-Kubitza
04:45 AM Revision 5387: sql.py: Renamed pkey_col_() to pkey_col()
Aaron Marcuse-Kubitza
04:43 AM Revision 5386: sql.py: Removed no longer used pkey_col
Aaron Marcuse-Kubitza
04:43 AM Revision 5385: db_xml.py: cleanup_table(): Inline sql.pkey_col ('row_num') because this is the only place it's used
Aaron Marcuse-Kubitza
04:37 AM Revision 5384: cleanup_table(): Use new sql.table_cols() instead of sql.table_col_names()
Aaron Marcuse-Kubitza
04:36 AM Revision 5383: sql.py: Added table_cols()
Aaron Marcuse-Kubitza
04:16 AM Revision 5382: db_xml.py: put(): Fixed bug where needed to avoid truncating the pkeys_loc table, in case it's the same as one of the in_tables. This occurs now that sql_io.put_table() passes through the actual input column instead of the joined-together input table's column when ignoring all rows.
Aaron Marcuse-Kubitza
03:33 AM Revision 5381: sql_io.py: put_table(): Resolving default value column: If ignoring all rows, use input cols directly instead of cols from joined-together input table. In addition to being simpler, this prevents the returned column's name from growing longer and longer as each iteration prepends its input table table name to the default value column name.
Aaron Marcuse-Kubitza
03:07 AM Revision 5380: sql_io.py: put_table(): Moved changing the table of the default value column from Resolving the default value column to Setting pkeys of missing rows, because the table change is only needed in this section
Aaron Marcuse-Kubitza
03:04 AM Revision 5379: sql_io.py: put_table(): Resolving default value column: Always call sql_gen.remove_col_rename() because it will just pass the value through if it's not a column
Aaron Marcuse-Kubitza
02:41 AM Revision 5378: sql_gen.py: simplify_parens(): Removed extra simplify_parens() at end because it is done in the final iteration that performs no other replacements, so it is not necessary to also do it explicitly
Aaron Marcuse-Kubitza
02:30 AM Revision 5377: sql_io.py: put_table(): Replaced limit_ref integer with ignore_all_ref boolean, because it is no longer used as a select statement limit
Aaron Marcuse-Kubitza
02:29 AM Revision 5376: sql_io.py: put_table(): remove_all_rows(): Corrected "just create an empty pkeys table" comment to "just return the default value column"
Aaron Marcuse-Kubitza
02:27 AM Revision 5375: sql_io.py: put_table(): mk_main_select(): Removed setting limit to limit_ref[0], because an empty pkeys table is no longer created when ignoring all rows
Aaron Marcuse-Kubitza
02:19 AM Revision 5374: sql_io.py: put_table(): Setting pkeys of missing rows: Removed "limit_ref[0] == 0" check because this code is never reached in that case
Aaron Marcuse-Kubitza
02:16 AM Revision 5373: sql_io.py: put_table(): Ignoring all rows for unrecoverable errors: Even in multi-row mode, just return whatever the default value or column was, instead of creating an output table containing the default value filled in for every row. This also assists the optimization to skip empty levels of taxonconcepts, because it folds the empty level to that level's parent level rather than creating a whole new temp table with ultimately the same contents.
Aaron Marcuse-Kubitza
01:57 AM Revision 5372: sql_gen.py: not_false_re, not_true_re: Appended \b to ensure that true/false is only matched as a single word
Aaron Marcuse-Kubitza
01:56 AM Revision 5371: sql_gen.py: simplify_expr(): Also simplify "NOT false" to true
Aaron Marcuse-Kubitza
01:53 AM Revision 5370: sql_gen.py: simplify_expr(): Also simplify "NOT true" to false
Aaron Marcuse-Kubitza
01:24 AM Revision 5369: sql_io.py: put_table(): ignore_cond(): Changed "Ignoring rows where" message with the negated (filter-out) condition to "Ignoring rows that don't satisfy" with the filter condition for clarity
Aaron Marcuse-Kubitza
01:22 AM Revision 5368: sql_io.py: put_table(): ignore_cond(): If cond simplifies to false, remove all rows instead of filtering out individual rows which will all be filtered out. This optimization should improve import times of tables, such as taxonconcept, which use a check constraint instead of NOT NULL constraints to prevent empty rows. The taxonomic schema refactoring caused the creation of many more levels of taxonconcepts, many of which (such as variety, forma, cultivar) are empty for most datasources, so this optimization should also reduce overall import times for datasources that have any empty levels of taxonconcept. Note that this optimization is only possible now that sql_gen.simplify_expr() is able to simplify all the way to a single boolean value for the taxonconcept_required_key constraint.
Aaron Marcuse-Kubitza
12:55 AM Revision 5367: Moved expression transforming functions from sql.py to sql_gen.py because they do not manipulate an actual database and merely generate SQL
Aaron Marcuse-Kubitza
12:38 AM Revision 5366: sql.py: Added true_expr, false_expr and use them where their values are used
Aaron Marcuse-Kubitza
12:34 AM Revision 5365: sql.py: simplify_expr(): Also simplify "AND true" expressions
Aaron Marcuse-Kubitza
12:30 AM Revision 5364: sql.py: simplify_expr(): Also simplify "AND false" expressions
Aaron Marcuse-Kubitza
12:19 AM Revision 5363: sql.py: Added atom_re and use it in simplify_parens()
Aaron Marcuse-Kubitza
12:19 AM Revision 5362: sql.py: Added or_re and use it in simplify_expr()
Aaron Marcuse-Kubitza
12:18 AM Revision 5361: sql.py: logic_op_re(): Added expr_re param for an expr on the other side of the operator
Aaron Marcuse-Kubitza
 

Also available in: Atom