Activity - BIEN 3 - NCEAS Projects

Activity

From 10/26/2012 to 11/24/2012

11/24/2012

03:07 PM Revision 6447: root Makefile: PostgreSQL: $(postgresReload-Linux): Try chmoding both as your user and as the bien user: Aaron Marcuse-Kubitza
02:46 PM Revision 6446: input.Makefile: Testing: $(runTest): Ignore failed diffs when the test is compared to another test's output (e.g. in by_col mode): Aaron Marcuse-Kubitza
02:41 PM Revision 6445: bin/map: in_is_db: If table does not exist, set table to None so that db_xml.put_table() doesn't try to access it. This fixes a bug in metadata-only map spreadsheets under column-based import.: Aaron Marcuse-Kubitza
02:40 PM Revision 6444: db_xml.py: put_table(): Support None in_table by calling put() directly: Aaron Marcuse-Kubitza
02:29 PM Revision 6443: Removed no longer used geoscrub.*.sql. Use geoscrub_output instead.: Aaron Marcuse-Kubitza
02:27 PM Revision 6442: Removed no longer used geoscrub_cleaned_unique. Use geoscrub_output instead.: Aaron Marcuse-Kubitza
02:25 PM Revision 6441: Removed no longer used geoscrub_cultivated. Use analytical_stem_view.cultivated instead.: Aaron Marcuse-Kubitza
02:25 PM Revision 6440: Removed no longer used geoscrub_cultivated. Use analytical_stem_view.cultivated instead.: Aaron Marcuse-Kubitza
02:23 PM Revision 6439: schemas/vegbien.sql: analytical_stem_view: cultivated: Removed BIEN2's geoscrub_cultivated, which has now been replaced by the primary corresponding scripts (and never had particularly many matches to the locations in any case): Aaron Marcuse-Kubitza
02:14 PM Revision 6438: schemas/vegbien.sql: analytical_stem_view: cultivated: Use OR instead of _or() to combine cultivated_family_locations.country IS NOT NULL with the other values, because this field's false value should not be used in place of NULL if all the other values are NULL, as it would be with _or(). (cultivated_family_locations.country IS NOT NULL can indicate presence, but not absence, of cultivated status.): Aaron Marcuse-Kubitza
02:06 PM Revision 6437: schemas/functions.sql, vegbien.sql: _and(), _or(): Added comment comparing the function and the corresponding logical operator: Aaron Marcuse-Kubitza
01:50 PM Revision 6436: schemas/vegbien.sql: public: Added _or(), for use by analytical_stem_view: Aaron Marcuse-Kubitza
01:48 PM Revision 6435: schemas/vegbien.sql: analytical_stem_view: cultivated: Also set if family/country combination found in cultivated_family_locations: Aaron Marcuse-Kubitza
01:39 PM Revision 6434: schemas/vegbien.sql: cultivated_family_locations: Added data from nimoy:/home/boyle/bien2/geoscrub/cultivated/cult_by_taxon/flag_by_taxa.inc: Aaron Marcuse-Kubitza
01:33 PM Revision 6433: schemas/vegbien.sql: Added cultivated_family_locations to store locations where various taxon families are considered cultivated: Aaron Marcuse-Kubitza
01:24 PM Revision 6432: mappings/VegCore-VegBIEN.csv: Mapped locality description fields to location.iscultivated using _locationnarrative_is_cultivated(): Aaron Marcuse-Kubitza
01:23 PM Revision 6431: xml_func.py: Simplifying functions: Added passthru entries for _and, _or: Aaron Marcuse-Kubitza
01:06 PM Revision 6430: schemas/vegbien.sql: Added _locationnarrative_is_cultivated(): Aaron Marcuse-Kubitza
12:57 PM Revision 6429: lib/PostgreSQL-MySQL.csv: Change text to varchar(255) because text columns can't be used in indexes in MySQL: Aaron Marcuse-Kubitza
12:51 PM Revision 6428: lib/PostgreSQL-MySQL.csv: Resaved in Excel, which removed unnecessary quotes around fields: Aaron Marcuse-Kubitza
12:22 PM Revision 6427: schemas/vegbien.sql: analytical_aggregate: Added identifiedBy, which is no longer a scoping field (which would prevent scientificNameWithMorphospecies from being unique) now that there is only one taxondetermination for each taxonoccurrence: Aaron Marcuse-Kubitza
12:05 PM Revision 6426: schemas/vegbien.sql: analytical_stem_view: dateCollected: For plots data, use the locationevent obsstartdate instead of the collectiondate in order to group taxonoccurrences/stems from the same locationevent together: Aaron Marcuse-Kubitza
11:59 AM Revision 6425: schemas/vegbien.sql: analytical_* pkeys: Added dateCollected because the records are actually unique within the location*event*, not the location: Aaron Marcuse-Kubitza
11:57 AM Revision 6424: schemas/vegbien.sql: analytical_stem_view: Exclude records with no collectiondate or obsstartdate, which is required to uniquely identify a record: Aaron Marcuse-Kubitza
11:54 AM Revision 6423: analytical_stem_view: dateCollected: Use locationevent.obsstartdate when aggregateoccurrence.collectiondate is not provided: Aaron Marcuse-Kubitza
11:37 AM Revision 6422: schemas/vegbien.sql: analytical_stem_view: Include only the current taxondetermination for each taxonoccurrence, to avoid cross-joining taxondeterminations with stems and thus multiplying the number of rows for datasources that have multiple taxondeterminations per taxonoccurrence: Aaron Marcuse-Kubitza
11:33 AM Revision 6421: schemas/vegbien.sql: taxondetermination: Added AFTER trigger to set the current taxondetermination for the taxonoccurrence: Aaron Marcuse-Kubitza
11:11 AM Revision 6420: lib/PostgreSQL-MySQL.csv: Statements ending in ";": When matching any character, use .*? (with the (?s) flag) instead of [^;]* in order to allow embedded ; to be matched. This fixes a bug where a CREATE VIEW statement was not removed because it contained an embedded ; .: Aaron Marcuse-Kubitza
11:06 AM Revision 6419: schemas/vegbien.sql: taxondetermination: Added unique index to ensure that there is only one current determination for each taxonoccurrence: Aaron Marcuse-Kubitza
11:05 AM Revision 6418: lib/PostgreSQL-MySQL.csv: Remove indexes with WHERE clauses: Aaron Marcuse-Kubitza
10:34 AM Revision 6417: schemas/vegbien.sql: analytical_aggregate: Added primary key on institutionCode, plotName, scientificNameWithMorphospecies, recordNumber. Note that this makes these fields NOT NULL, which should not be a problem because there are inner joins instead of LEFT JOINs on most of the tables which provide them, and LEFT JOINed tables have their identifying fields combined to create a NOT NULL value.: Aaron Marcuse-Kubitza
10:27 AM Revision 6416: schemas/vegbien.sql: analytical_stem_view: recordNumber: Combine identifying fields in taxonoccurrence, plantobservation, and stemobservation to ensure that this field is unique within the plot and not NULL: Aaron Marcuse-Kubitza
10:23 AM Revision 6415: lib/PostgreSQL-MySQL.csv: Only match a statement-terminating ; when it's at the end of a line: Aaron Marcuse-Kubitza
10:02 AM Revision 6414: schemas/vegbien.sql: analytical_aggregate: Added primary key on institutionCode, plotName, scientificNameWithMorphospecies. Note that this makes these fields NOT NULL, which should not be a problem because there are inner joins instead of LEFT JOINs on the tables which provide them.: Aaron Marcuse-Kubitza
09:21 AM Revision 6413: db_xml.py: put(): _setDefault(): Delay the evaluation of each col_default's value until the col_default is actually retrieved. This fixes a bug in the source table mappings where the explicit source entry was being created *after* the col_default source entry, causing the initial entry, which did not have the additional fields populated, to be used instead.: Aaron Marcuse-Kubitza
09:14 AM Revision 6412: dicts.py: Added WrapDict, a dict that runs a function on each value retrieved: Aaron Marcuse-Kubitza
08:59 AM Revision 6411: db_xml.py: put(): _setDefault(): Fixed bug where need to copy col_defaults before calling update() on it, to avoid modifying the input value (which may be reused by the caller, expecting it to be unmodified): Aaron Marcuse-Kubitza
08:54 AM Revision 6410: db_xml.py: put(): col_defaults param: Fixed bug where need to use None as default value, because col_defaults will be modified by put() and the {} default value is a global instance: Aaron Marcuse-Kubitza
08:29 AM Revision 6409: mappings/VegCore-VegBIEN.csv: source table mappings: Set shortname to env var $source when it's not explicitly specified, because shortname is a required field of source: Aaron Marcuse-Kubitza
08:16 AM Revision 6408: db_xml.py: put(): Pass through the values of nodes which are text nodes: Aaron Marcuse-Kubitza
08:15 AM Revision 6407: db_xml.py: put(): put_(): Support _setDefault() values which are text nodes, by passing text strings through when put_() is run on all col_defaults entries: Aaron Marcuse-Kubitza
07:50 AM Revision 6406: db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names: Aaron Marcuse-Kubitza
07:47 AM Revision 6405: dicts.py: DictProxy: Implemented __delitem__(): Aaron Marcuse-Kubitza
07:32 AM Revision 6404: bin/map: update_in_label(): Removed hardcoded source_id col_default, which is now set in mappings/VegCore-VegBIEN.csv's output root: Aaron Marcuse-Kubitza
07:29 AM Revision 6403: mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env(): Aaron Marcuse-Kubitza
07:25 AM Revision 6402: db_xml.py: put(): Added _setDefault() built-in function, which adds an entry to col_defaults: Aaron Marcuse-Kubitza
07:23 AM Revision 6401: xml_func.py: _env(): Fixed bug where need to retrieve actual string value of name param using xml_dom.NodeTextEntryIter instead of NodeEntryIter: Aaron Marcuse-Kubitza
07:20 AM Revision 6400: xml_func.py: _env(): Fixed bug where need to use xml_dom.replace_with_text() instead of xml_dom.replace() because replace() requires a DOM node: Aaron Marcuse-Kubitza
06:44 AM Revision 6399: bin/map: update_in_label(): Set $source env var to the in_label (datasource name), to make it available to _env(): Aaron Marcuse-Kubitza
06:43 AM Revision 6398: xml_func.py: Simplifying functions: Added _env(): Aaron Marcuse-Kubitza
06:05 AM Revision 6397: Added inputs/VegBank/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
06:00 AM Revision 6396: Added inputs/SpeciesLink/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:55 AM Revision 6395: Added inputs/SALVIAS*/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:47 AM Revision 6394: Added inputs/REMIB/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:41 AM Revision 6393: Added inputs/GBIF/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:34 AM Revision 6392: Added inputs/TEAM/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:33 AM Revision 6391: Placed inputs/TEAM/_src/Vegetation-Tree-and-Liana-Metadata-1.5.pdf under version control: Aaron Marcuse-Kubitza
05:27 AM Revision 6390: inputs/FIA/import_order.txt: Added Source, which needs to come before Organism: Aaron Marcuse-Kubitza
05:22 AM Revision 6389: Added inputs/Madidi/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:19 AM Revision 6388: Added inputs/FIA/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:14 AM Revision 6387: Added inputs/CVS/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:07 AM Revision 6386: Added inputs/CTFS/Source/, containing referenceType metadata: Aaron Marcuse-Kubitza
05:05 AM Revision 6385: bin/map: Support map spreadsheets containing only metadata mappings (with no corresponding staging table), by falling back to an empty table when the named table does not exist: Aaron Marcuse-Kubitza
04:19 AM Revision 6384: mappings/VegCore-VegBIEN.csv: institutionCode: Also map to the sourcename's matched source, which identifies whether the source is a herbarium: Aaron Marcuse-Kubitza
04:08 AM Revision 6383: schemas/vegbien.sql: source: Made shortname NOT NULL to ensure that all datasources have a globally-unique short name: Aaron Marcuse-Kubitza
03:33 AM Revision 6382: import_all: Added import of inputs/.herbaria/ before the main import: Aaron Marcuse-Kubitza
03:28 AM Revision 6381: Added inputs/.herbaria/: Aaron Marcuse-Kubitza
03:25 AM Revision 6380: input.Makefile: SVN: add: Also run %/add on all data subdirs: Aaron Marcuse-Kubitza
03:21 AM Revision 6379: input.Makefile: Existing maps discovery: Moved tables discovery to its own section, above SVN so it can be used by SVN: Aaron Marcuse-Kubitza
03:11 AM Revision 6378: mappings/VegCore.csv: referenceType: Fixed sort order: Aaron Marcuse-Kubitza
03:09 AM Revision 6377: mappings/VegCore-VegBIEN.csv: Mapped referenceType: Aaron Marcuse-Kubitza
03:06 AM Revision 6376: mappings/VegCore.csv: Added referenceType: Aaron Marcuse-Kubitza
02:10 AM Revision 6375: mappings/VegCore-VegBIEN.csv: institutionCode: Remap to source.shortname when specimen information is not provided, as is the case for geoscrub.herbaria on nimoy: Aaron Marcuse-Kubitza
01:47 AM Revision 6374: inputs/bien_web/observation/map.csv: Mapped observationID->occurrenceID: Aaron Marcuse-Kubitza
01:20 AM Revision 6373: README.TXT: Datasource setup: Add input data for each table present in the datasource: Added step to run `make inputs/<datasrc>/<table>/install` if the table is in a .sql export: Aaron Marcuse-Kubitza
01:17 AM Revision 6372: README.TXT: Datasource setup: MySQL inputs: Added step to install the export, which needs to happen before mapping individual tables: Aaron Marcuse-Kubitza
01:13 AM Revision 6371: README.TXT: Datasource setup: Add input data for each table present in the datasource: Replaced "CSV" with "CSV(s)" because there can be multiple CSV part files for one table: Aaron Marcuse-Kubitza
01:11 AM Revision 6370: README.TXT: Datasource setup: Add input data for each table present in the datasource: Don't add a CSV or create.sql file for tables that are in a .sql export: Aaron Marcuse-Kubitza
01:06 AM Revision 6369: README.TXT: Schema changes: Sync ERD with vegbien.sql schema: Changed instructions to just select tables with arrows next to them rather than all tables, because each table that's updated will have its lines reset and the number of lines that need to be fixed should be minimized: Aaron Marcuse-Kubitza
01:02 AM Revision 6368: README.TXT: Datasource setup: Accept the test cases: `make inputs/<datasrc>/test by_col=1`: Clarified that errors could indicate bugs in the *VegBIEN* unique constraints: Aaron Marcuse-Kubitza
12:59 AM Revision 6367: README.TXT: Data import: To remake analytical DB: Added explicit public schema setting since the analytical DB is often manually remade *after* the public schema has been renamed. Removed warnings that certain commands must be run after running make_analytical_db, because the "remake analytical DB" instructions no longer require this.: Aaron Marcuse-Kubitza
12:48 AM Revision 6366: README.TXT: Datasource setup: MySQL inputs: Added steps to export the database to a PostgreSQL-compatible .sql file, which can be directly used by the install process without the need to export each table as CSV: Aaron Marcuse-Kubitza
12:36 AM Revision 6365: README.TXT: Datasource setup: Choosing a table name: Documented that for .sql exports, you must use the name of the table in the DB export, not a suggested or custom name: Aaron Marcuse-Kubitza
12:34 AM Revision 6364: input.Makefile: Staging tables installation: $(dbExports): Also include the files that would be generated by running _MySQL/*.make and creating the corresponding PostgreSQL translations: Aaron Marcuse-Kubitza
12:18 AM Revision 6363: input.Makefile: Staging tables installation: Moved .sql export downloading and translation to separate Input data retrieval section: Aaron Marcuse-Kubitza

11/23/2012

11:41 PM Revision 6362: Added lib/MySQL.{data,schema}.sql.make templates to use in datasources' _MySQL/ dirs: Aaron Marcuse-Kubitza
10:38 PM Revision 6361: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza

11/21/2012

11:13 PM Revision 6360: schemas/vegbien.sql: analytical_stem_view: scientificNameWithMorphospecies: Changed to use Brad's formula, which concatenates genus and specific_epithet/morphospecies, and uses family if just the family is present, rather than using the full taxonomic name: Aaron Marcuse-Kubitza
11:05 PM Revision 6359: mappings/VegCore-VegBIEN.csv: Concatenated taxonlabel: Don't prepend family if the taxonName/scientificName itself is the family, so that the family is not duplicated in the concatenated taxonomic name: Aaron Marcuse-Kubitza
10:19 PM Revision 6358: schemas/functions.sql: _nullIf(): Removed NOT NULL constraint on null param, to support use a (nullable) column rather than a literal as the null-equivalent value: Aaron Marcuse-Kubitza
09:08 PM Revision 6357: xml_func.py: Simplifying functions: Added _nullIf(), to remove calls with no null value: Aaron Marcuse-Kubitza
09:00 PM Revision 6356: xml_dom.py: Added prune_parent(): Aaron Marcuse-Kubitza
08:51 PM Revision 6355: schemas/functions.sql: Added _or(): Aaron Marcuse-Kubitza
08:20 PM Revision 6354: schemas/functions.sql: Added _merge_words(): Aaron Marcuse-Kubitza
08:04 PM Revision 6353: schemas/vegbien.sql: analytical_*: Renamed geosourceValid to geovalid. (It had gotten renamed in the reference -> source rename.): Aaron Marcuse-Kubitza
08:00 PM Revision 6352: mappings/VegCore.csv: Renamed georeferenceValid to geovalid: Aaron Marcuse-Kubitza
07:48 PM Revision 6351: inputs/import.stats.xls: Updated import times. This now includes the Canadensys plants-related datasources HIBG, JBM, QFA, TRT, TRTE, UBC, VASCAN, and WIN.: Aaron Marcuse-Kubitza

11/20/2012

09:59 PM Revision 6350: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
09:42 PM Revision 6349: Added inputs/HIBG/: Aaron Marcuse-Kubitza
09:33 PM Revision 6348: Added inputs/JBM/: Aaron Marcuse-Kubitza
09:29 PM Revision 6347: Added inputs/VASCAN/: Aaron Marcuse-Kubitza
09:22 PM Revision 6346: Added inputs/WIN/: Aaron Marcuse-Kubitza
09:18 PM Revision 6345: Added inputs/UBC/: Aaron Marcuse-Kubitza
09:14 PM Revision 6344: Added inputs/TRTE/Specimen/: Aaron Marcuse-Kubitza
09:11 PM Revision 6343: Added inputs/QFA/: Aaron Marcuse-Kubitza
09:06 PM Revision 6342: Added inputs/TRT/: Aaron Marcuse-Kubitza
08:21 PM Revision 6341: schemas/vegbien.sql: Allow bien_read to SELECT from all tables in the public schema: Aaron Marcuse-Kubitza
08:10 PM Revision 6340: schemas/vegbien.sql: Allow bien_read to SELECT from analytical_aggregate, analytical_stem: Aaron Marcuse-Kubitza
08:09 PM Revision 6339: lib/PostgreSQL-MySQL.csv: Removed GRANT/REVOKE because SCHEMA GRANTs are not supported in MySQL: Aaron Marcuse-Kubitza
07:57 PM Revision 6338: pg_dump_vegbien: non-$owners mode: Removed --no-privileges in order to include GRANTs to other users: Aaron Marcuse-Kubitza
07:49 PM Revision 6337: root Makefile: PostgreSQL: $(postgresReload-Linux): Making schemas/*.conf world-readable: Fixed bug where need to do this as the bien user, which owns the files: Aaron Marcuse-Kubitza
07:46 PM Revision 6336: root Makefile: PostgreSQL: $(postgresReload-*): Make schemas/*.conf world-readable so it's readable by the postgres user, which the .conf installation is run as: Aaron Marcuse-Kubitza
07:43 PM Revision 6335: root Makefile: PostgreSQL: $(postgresReload-*): Also install pg_hba.conf: Aaron Marcuse-Kubitza
07:36 PM Revision 6334: root Makefile: PostgreSQL: Added postgres_reload to reload postgresql.conf and restart the DB: Aaron Marcuse-Kubitza
07:30 PM Revision 6333: root Makefile: PostgreSQL: postgres-*: Factored postgresql.conf installation out in to $(postgresReload-*): Aaron Marcuse-Kubitza
07:15 PM Revision 6332: schemas/: Synced pg_hba.conf and pg_hba.Mac.conf's bien entries, which adds phpPgAdmin support (template1 access) on the Mac and bien_read access on Linux: Aaron Marcuse-Kubitza
06:56 PM Revision 6331: root Makefile: VegBIEN DB: DB and users: Also create bien_read user for read-only access to the DB: Aaron Marcuse-Kubitza
06:53 PM Revision 6330: schemas/pg_hba.Mac.conf: Allow access to the bien group rather than just the bien user, which will include bien_read: Aaron Marcuse-Kubitza
06:35 PM Revision 6329: schemas/pg_hba.Mac.conf: Fixed bug where also need to allow password-based logins from the same machine, in order to work with pgAdmin: Aaron Marcuse-Kubitza
06:06 PM Revision 6328: schemas/vegbien.ERD.poster.pdf: Updated to 33x51in poster size and 0.25in margins: Aaron Marcuse-Kubitza
05:35 PM Revision 6327: README.TXT: Schema changes: Creating a poster of the ERD: Added section with the State St FedEx Kinkos' rates for posters ($10.25/sq ft laminated): Aaron Marcuse-Kubitza
05:29 PM Revision 6326: README.TXT: Schema changes: Creating a poster of the ERD: Changed "Measure the fractional height of the text onscreen" to "Determine the poster size": Aaron Marcuse-Kubitza
05:19 PM Revision 6325: Added schemas/vegbien.ERD.poster.pdf: Aaron Marcuse-Kubitza
04:10 PM Revision 6324: Added schemas/vegbien.ERD.poster.core.print_options.png: Aaron Marcuse-Kubitza
04:01 PM Revision 6323: Added schemas/vegbien.ERD.poster.core.pdf: Aaron Marcuse-Kubitza
03:29 PM Revision 6322: schemas/pg_hba.Mac.conf: Fixed bug where needed ident entry for postgres superuser: Aaron Marcuse-Kubitza
03:18 PM Revision 6321: Added config/bien_read_password: Aaron Marcuse-Kubitza
02:53 PM Revision 6320: README.TXT: Schema changes: Added instructions to calculate the minimum VegBIEN poster size (to make the text as least as big as on the VegBank ERD poster), which is 35x54in portrait: Aaron Marcuse-Kubitza

11/19/2012

08:01 PM Revision 6319: schemas/vegbien.sql: analytical_stem_view: cultivated: Use location.iscultivated when taxonoccurrence.iscultivated is not available: Aaron Marcuse-Kubitza
07:55 PM Revision 6318: Added inputs/FIA/FIA_COND_unique/, which contains the oldgrowth flag: Aaron Marcuse-Kubitza
07:53 PM Revision 6317: mappings/VegCore-VegBIEN.csv: Mapped oldGrowth: Aaron Marcuse-Kubitza
07:48 PM Revision 6316: schemas/functions.sql: Added _not(): Aaron Marcuse-Kubitza
07:43 PM Revision 6315: mappings/VegCore.csv: Added oldGrowth: Aaron Marcuse-Kubitza
07:36 PM Revision 6314: mappings/VegCore-VegBIEN.csv: Remapped cultivated to location when a TaxonOccurrence is not provided, indicating that the record is a plot: Aaron Marcuse-Kubitza
07:35 PM Revision 6313: mappings/VegCore-VegBIEN.csv: Remapped cultivated to location when a TaxonOccurrence is not provided, indicating that the record is a plot: Aaron Marcuse-Kubitza
07:25 PM Revision 6312: schemas/vegbien.sql: location: Added iscultivated for cases when entire plots rather than individual taxonoccurrences are marked as cultivated: Aaron Marcuse-Kubitza
07:17 PM Revision 6311: inputs/FIA/: Added FIA_COND table from nimoy.geoscrub and code to generate a unique plot table from it, including the oldgrowth calculated field: Aaron Marcuse-Kubitza
06:46 PM Revision 6310: Added inputs/FIA/Organism/postprocess.sql to cast PlotCD to a bigint: Aaron Marcuse-Kubitza
06:22 PM Revision 6309: my2pg: Also remove (#) after bigint: Aaron Marcuse-Kubitza
06:05 PM Revision 6308: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
06:03 PM Revision 6307: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
05:54 PM Revision 6306: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
05:54 PM Revision 6305: schemas/vegbien.sql: source: Renamed fulltext to citation because according to the VegBank data dictionary <http://vegbank.org/vegbank/views/dba_tabledescription_detail.jsp?view=detail&wparam=reference&entity=dba_tabledescription&where=where_tablename#fulltext> this is actually the full text *of the reference citation*, not of the reference itself (it would be unusual to store that in VegBank): Aaron Marcuse-Kubitza
05:48 PM Revision 6304: schemas/vegbien.sql: Removed no longer needed sourcejournal, which can be stored in source and pointed to via parent_id instead of sourcejournal_id. sourcejournal.journal maps to source.fulltext, issn to isbn, and abbreviation to shortname.: Aaron Marcuse-Kubitza
05:48 PM Revision 6303: mappings/VegCore-VegBIEN.csv: Mapped acceptedCounty, county to the matched place: Aaron Marcuse-Kubitza
05:41 PM Revision 6302: schemas/vegbien.sql: source: Added matched_source_id: Aaron Marcuse-Kubitza
05:34 PM Revision 6301: sql.py: parse_exception(): function MissingCastException: If 1st param is hstore, only perform the cast on the value param. This fixes a bug in _map() calls whose value is a non-text type, such as SALVIAS.plotMetadata.AccessCode.: Aaron Marcuse-Kubitza
05:32 PM Revision 6300: sql_io.py: cast(): Use sql_gen.Cast() to generate the cast, in order to take advantage of its support for casts to unknown: Aaron Marcuse-Kubitza
05:30 PM Revision 6299: sql_gen.py: Cast: Support casts to unknown by casting to text first: Aaron Marcuse-Kubitza
04:59 PM Revision 6298: schemas/postgresql.conf: Turn on the error log: Aaron Marcuse-Kubitza
04:58 PM Revision 6297: schemas/pg_hba.conf: Also grant the bien user access to template1, which is accessed by phpPgAdmin: Aaron Marcuse-Kubitza
04:24 PM Revision 6296: schemas/vegbien.sql: source: Added parent_id for nested sources, e.g. an article in a journal: Aaron Marcuse-Kubitza
04:23 PM Revision 6295: lib/forwarding.Makefile: $(subdirs): Also exclude .archive/: Aaron Marcuse-Kubitza
04:09 PM Revision 6294: mappings/VegCore-VegBIEN.csv: Mapped acceptedCounty, county to the matched place: Aaron Marcuse-Kubitza
04:08 PM Revision 6293: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
03:54 PM Revision 6292: Renamed inputs/_archive/ to .archive/ so it wouldn't be treated as a datasource: Aaron Marcuse-Kubitza
03:49 PM Revision 6291: README.TXT: Documentation: Redmine-formatted list of steps for column-based import: Use ACAD instead of QMOR, which was removed: Aaron Marcuse-Kubitza
03:45 PM Revision 6290: inputs/Makefile: Import logs: $(rsyncLogs): Include log files at any depth in the directory tree rather than just 1-2 levels deep. This adds log files whose containing directories have been moved to _archive/ directories.: Aaron Marcuse-Kubitza
03:29 PM Revision 6289: Added inputs/_archive/: Aaron Marcuse-Kubitza
03:27 PM Revision 6288: Removed inputs/QMOR/ because it's an insect collection: Aaron Marcuse-Kubitza
03:25 PM Revision 6287: schemas/vegbien.sql: projectcontributor: Removed surname, since this information is stored in party_id->party.surname: Aaron Marcuse-Kubitza
03:23 PM Revision 6286: schemas/vegbien.sql: projectcontributor: Removed cheatrole, since there is already a role field and this field was unused in VegBank: Aaron Marcuse-Kubitza
03:21 PM Revision 6285: schemas/vegbien.sql: role: Added values from projectcontributor.ROLE_ID <http://vegbank.org/vegbank/views/dba_tabledescription_detail.jsp?view=detail&wparam=projectcontributor&entity=dba_tabledescription&where=where_tablename#ROLE_ID>: Aaron Marcuse-Kubitza
03:17 PM Revision 6284: schemas/vegbien.sql: sourcecontributor: role: Changed type to role: Aaron Marcuse-Kubitza
03:15 PM Revision 6283: schemas/vegbien.sql: role enum: Added VegBank data dictionary values from <http://vegbank.org/vegbank/views/dba_fielddescription_detail.jsp?view=detail&wparam=1331&entity=dba_fielddescription&params=1331>: Aaron Marcuse-Kubitza
03:03 PM Revision 6282: schemas/vegbien.sql: sourcecontributor: Renamed position to order for consistency with the ERD definition <http://vegbank.org/vegbank/views/dba_tabledescription_detail.jsp?view=detail&wparam=referencecontributor&entity=dba_tabledescription&where=where_tablename#position> and disambiguation from other meanings of position which are similar to role: Aaron Marcuse-Kubitza
03:00 PM Revision 6281: schemas/vegbien.sql: sourcecontributor: Renamed roletype to role for consistency with the ERD definition <http://vegbank.org/vegbank/views/dba_tabledescription_detail.jsp?view=detail&wparam=referencecontributor&entity=dba_tabledescription&where=where_tablename#roleType>: Aaron Marcuse-Kubitza
02:53 PM Revision 6280: inputs/.geoscrub/geoscrub_output/map.csv: Mapped to county, acceptedCounty: Aaron Marcuse-Kubitza
02:52 PM Revision 6279: mappings/VegCore-VegBIEN.csv: Mapped acceptedCounty, county to the matched place: Aaron Marcuse-Kubitza
02:50 PM Revision 6278: mappings/VegCore.csv: Added acceptedCounty: Aaron Marcuse-Kubitza
02:42 PM Revision 6277: schemas/pg_hba.Mac.conf: Changed to match schemas/pg_hba.conf: Aaron Marcuse-Kubitza
02:37 PM Revision 6276: schemas/pg_hba.conf: Fixed bug where also need an IPv6 bien entry with md5 authentication, because the IPv4 md5 authentication does not apply to "localhost", which is translated to the IPv6 address ::1: Aaron Marcuse-Kubitza
02:27 PM Revision 6275: schemas/pg_hba.conf: Fixed bug where also need a *local* bien entry with md5 authentication, because the host-based md5 authentication applies only to literal IP addresses, not "localhost": Aaron Marcuse-Kubitza
02:08 PM Revision 6274: Added schemas/pg_hba.Mac.conf: Aaron Marcuse-Kubitza
02:01 PM Revision 6273: schemas/pg_hba.conf: Restrict all accesses to the server except the bien user accessing vegbien using ident or a password, and the postgres superuser logging in using ident: Aaron Marcuse-Kubitza
01:25 PM Revision 6272: inputs/.geoscrub/geoscrub_output/map.csv: Mapped countyvalidity to latLongInCounty: Aaron Marcuse-Kubitza
01:24 PM Revision 6271: schemas/functions.sql: _map(): Fixed bug where entries that map to NULL were incorrectly being treated as if the entry didn't exist. Note that -> returns NULL both if the entry's value is NULL and if the entry doesn't exist, so ? must be used to recheck the presence of the key in the hstore.: Aaron Marcuse-Kubitza
12:48 PM Revision 6270: mappings/VegCore-VegBIEN.csv: Mapped latLongInCounty: Aaron Marcuse-Kubitza
12:46 PM Revision 6269: mappings/VegCore.csv: Added latLongInCounty: Aaron Marcuse-Kubitza
12:43 PM Revision 6268: schemas/vegbien.sql: Added distance_to_county_m. Note that this can also be used to store latLongInCounty by mapping true to 0 and false to -1.: Aaron Marcuse-Kubitza
12:22 PM Revision 6267: schemas/pg_hba.conf: Changed trust authentication back to ident/md5. Not sure how it got set to trust since I used md5 when enabling remote access to the DB for the bien user.: Aaron Marcuse-Kubitza
12:08 PM Revision 6266: Added schemas/pg_hba.conf: Aaron Marcuse-Kubitza
11:48 AM Revision 6265: schemas/vegbien.sql: place: Removed placecode to prevent datasources from creating duplicate entries for the same place, with different placecodes. This was a problem with the original BIEN2 geoscrub dataset, which contained duplicates.: Aaron Marcuse-Kubitza
10:54 AM Revision 6264: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza

11/16/2012

07:06 PM Revision 6263: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
07:03 PM Revision 6262: schemas/vegbien.sql: analytical_stem_view: Fixed bug where need to join taxonoccurrence.collector_id to party because it's now an fkey rather than a literal name: Aaron Marcuse-Kubitza
06:58 PM Revision 6261: schemas/vegbien.sql: analytical_*: Added coordinateUncertaintyInMeters: Aaron Marcuse-Kubitza
06:34 PM Revision 6260: schemas/vegbien.sql: analytical_stem_view: Join to newWorldCountries on 2-digit ISO code instead of country name, to increase (BIEN2) newWorldCountries and GADM overlap: Aaron Marcuse-Kubitza
06:29 PM Revision 6259: psql_vegbien: Run with sh because it no longer needs bash support: Aaron Marcuse-Kubitza
06:28 PM Revision 6258: psql_script_vegbien: Fixed bug where needs to be run with bash instead of sh: Aaron Marcuse-Kubitza
06:27 PM Revision 6257: Added inputs/newWorld/iso_code_gadm/: Aaron Marcuse-Kubitza
06:16 PM Revision 6256: Added inputs/newWorld/_src/: Aaron Marcuse-Kubitza
06:15 PM Revision 6255: inputs/XAL/Specimen/map.csv: darwin:FieldNumber: Removed command to determine that field is unused, because UNUSED is a factual assertion that does not need a reason to be specified each time: Aaron Marcuse-Kubitza
06:11 PM Revision 6254: inputs/XAL/Specimen/map.csv: Remapped darwin:CoordinatePrecision to UNUSED: Aaron Marcuse-Kubitza
06:08 PM Revision 6253: inputs/NY/Specimen/map.csv: Remapped CoordinatePrecision to coordinateUncertaintyInMeters, assuming units of m based on the range and precision of values: Aaron Marcuse-Kubitza
06:03 PM Revision 6252: mappings/VegCore.csv: coordinatePrecision: Added units (degrees) to form coordinatePrecision_deg: Aaron Marcuse-Kubitza
06:00 PM Revision 6251: mappings/VegCore-VegBIEN.csv: Removed mapping for coordinatePrecision, which is not the same as coordsaccuracy_m. coordinatePrecision is instead "the precision of the coordinates" themselves in degrees (<http://rs.tdwg.org/dwc/terms/#coordinatePrecision>).: Aaron Marcuse-Kubitza
05:53 PM Revision 6250: schemas/vegbien.sql: coordinates: Changed coordinates.coordsaccuracy_deg units to m: Aaron Marcuse-Kubitza
05:51 PM Revision 6249: Regenerated inputs/bien_web/observation/test.xml.ref: Aaron Marcuse-Kubitza
05:17 PM Revision 6248: schemas/vegbien.ERD.mwb: Added projectcontributor, locationeventcontributor to ERD: Aaron Marcuse-Kubitza
05:02 PM Revision 6247: schemas/vegbien.sql: higher_plant_group_nodes: Added root->NULL mapping to store all the families that don't match any higher plant group: Aaron Marcuse-Kubitza
04:58 PM Revision 6246: schemas/vegbien.sql: higher_plant_group_nodes: Allow NULL values for higher_plant_group, to allow mapping all remaining families to NULL in family_higher_plant_group: Aaron Marcuse-Kubitza
04:09 PM Revision 6245: psql_vegbien: Fixed bug where did not display command prompt when run from command line, by moving automatic setting of search_path to psql_script_vegbien. psql_script_vegbien is now used instead of psql_vegbien wherever the search_path needs to be set, so removing this functionality from psql_vegbien is not a problem.: Aaron Marcuse-Kubitza
04:03 PM Revision 6244: input.Makefile: BIEN commands: $(psqlAsBien): Use psql_script_vegbien, which automatically adds the $(psqlOpts), instead of psql_vegbien: Aaron Marcuse-Kubitza
03:54 PM Revision 6243: schemas/functions.sql: _map(): Support any entry having the value '*' (not just the '*' entry), which passes through that value. Support an entry having the value '!', which raises an exception.: Aaron Marcuse-Kubitza
03:40 PM Revision 6242: inputs/SALVIAS/plotMetadata_/map.csv: AccessCode: Removed _map entry for 4, which does not apply to plots: Aaron Marcuse-Kubitza
03:07 PM Revision 6241: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
01:00 PM Revision 6240: schemas/vegbien.sql: locationevent: Added accesslevel: Aaron Marcuse-Kubitza
12:54 PM Revision 6239: inputs/SALVIAS/plotMetadata_/map.csv: Mapped AccessCode to dcterms:accessRights with appropriate _map filter: Aaron Marcuse-Kubitza
12:49 PM Revision 6238: Added inputs/.geoscrub/geoscrub_cleaned_unique/_no_import to disable geoscrub_cleaned_unique, since the new geoscrub_output supersedes it: Aaron Marcuse-Kubitza
12:47 PM Revision 6237: Added inputs/.geoscrub/geoscrub_output/: Aaron Marcuse-Kubitza
12:46 PM Revision 6236: Added inputs/.geoscrub/_src/README.TXT: Aaron Marcuse-Kubitza
12:29 PM Revision 6235: Regenerated inputs/bien_web/observation/VegBIEN.csv: Aaron Marcuse-Kubitza
12:24 PM Revision 6234: Added inputs/.geoscrub/_src/ to store Jim's geoscrub CSV: Aaron Marcuse-Kubitza
12:21 PM Revision 6233: schemas/functions.sql: _map(): Changed error message for an unmapped value to "Value not in map" rather than "Invalid map value", because an unmapped value is not necessarily explicitly invalid: Aaron Marcuse-Kubitza
12:16 PM Revision 6232: inputs/VegBank/plot_/map.csv: confidentialitystatus filter: Merged mappings for 0 with other public-equivalent fields. Note that fuzzed plots are still public, because the private columns have been removed.: Aaron Marcuse-Kubitza

11/15/2012

11:16 PM Revision 6231: inputs/VegBank/plot_/map.csv: Mapped confidentialitystatus to dcterms:accessRights with an appropriate _map filter: Aaron Marcuse-Kubitza
11:16 PM Revision 6230: mappings/VegCore-VegBIEN.csv: Mapped dcterms:accessRights: Aaron Marcuse-Kubitza
11:14 PM Revision 6229: schemas/functions.sql: _map(): Raise data_exception if value not in map and no default provided (not the same as a NULL default value): Aaron Marcuse-Kubitza
10:54 PM Revision 6228: mappings/VegCore-VegBIEN.csv: verbatimGrowthForm: Removed _map filter, which applied only to SALVIAS and has now been moved to the applicable SALVIAS tables: Aaron Marcuse-Kubitza
10:51 PM Revision 6227: inputs/SALVIAS*/plotObservations/map.csv: Remapped Habit to growthForm with _map filter applied: Aaron Marcuse-Kubitza
10:43 PM Revision 6226: sql_io.py: put_table(): Special handling for functions with hstore params: Fixed bug where need to unwrap literal values of mapping, which might be sql_gen.Literal objects: Aaron Marcuse-Kubitza
10:43 PM Revision 6225: sql_gen.py: Added get_value(): Aaron Marcuse-Kubitza
10:42 PM Revision 6224: dicts.py: join(): Added support for unhashable types, which are passed through. This adds support for SQL literal values which are dicts (hstores).: Aaron Marcuse-Kubitza
10:25 PM Revision 6223: xml_func.py: Removed no longer used _map(), which has been replaced by a corresponding DB function: Aaron Marcuse-Kubitza
10:22 PM Revision 6222: schemas/functions.sql: Added _map(), which uses the new hstore functionality. This expands _map() functionality to column-based import.: Aaron Marcuse-Kubitza
10:20 PM Revision 6221: root Makefile: VegBIEN DB: DB and bien user: mk_db: hstore extension: Fixed bug where need to use `CREATE EXTENSION hstore SCHEMA pg_catalog` instead of createlang, because hstore must be explicitly created in pg_catalog or else it will be created in the public schema instead, causing it to get deleted every time the public schema is reinstalled and cascading the delete to everything (including in other schemas) that uses hstore: Aaron Marcuse-Kubitza
10:04 PM Revision 6220: sql_io.py: put_table(): Added special handling for functions with hstore params. Note that although _map() doesn't exist yet as a DB function, this code must be in place before _map() is created to avoid param type mismatch errors.: Aaron Marcuse-Kubitza
08:57 PM Revision 6219: root Makefile: PostgreSQL: postgres-Linux: Changed plpython to plpython3 in order to install plpython3u: Aaron Marcuse-Kubitza
08:30 PM Revision 6218: schemas/py_functions.sql: _date(): Removed features that require dateutil, which is not available under plpython3u. This includes removing the now-unused date string parameter.: Aaron Marcuse-Kubitza
08:26 PM Revision 6217: mappings/VegCore-VegBIEN.csv: Removed _date/date, because _date using a string date argument is no longer supported under plpython3u (dateutil is missing). Note that PostgreSQL's own date parsing is sufficient for most dates, so this use of _date is not strictly necessary and removing it will improve import times.: Aaron Marcuse-Kubitza
08:12 PM Revision 6216: schemas/py_functions.sql: Replaced xrange() with range() for plpython3u: Aaron Marcuse-Kubitza
08:05 PM Revision 6215: root Makefile: Python: python-Linux: Also install python3, needed by plpython3u: Aaron Marcuse-Kubitza
08:04 PM Revision 6214: schemas/py_functions.sql: Updated except clause syntax for PostgreSQL 9.1.6: Aaron Marcuse-Kubitza
08:03 PM Revision 6213: schemas/*.sql: Updated for PostgreSQL 9.1.6, which has standard_conforming_strings = on (which affects \-escapes in string literals), escape_string_warning not explicitly set, and uses ALTER TABLE ONLY instead of ALTER TABLE: Aaron Marcuse-Kubitza
07:49 PM Revision 6212: README.TXT: Removed step to manually run make_analytical_db, now that this is done automatically by import_all. Added separate instructions to remake the analytical DB.: Aaron Marcuse-Kubitza
07:45 PM Revision 6211: import_all: Change to main directory make targets are run from. Use relative paths to bin/ commands, which is possible now that the current dir is set.: Aaron Marcuse-Kubitza
07:41 PM Revision 6210: import_all: Create a background process that waits until the import is done and then runs make_analytical_db: Aaron Marcuse-Kubitza
07:36 PM Revision 6209: Added waitpid: Aaron Marcuse-Kubitza
06:52 PM Revision 6208: import_all: Documented that `wait %1` waits for asynchronous commands: Aaron Marcuse-Kubitza
06:40 PM Revision 6207: root Makefile: VegBIEN DB: DB and bien user: mk_db: Also install hstore extension. Note that this is only supported by PostgreSQL 9.1+.: Aaron Marcuse-Kubitza
06:33 PM Revision 6206: input.Makefile: Editing import: Updated queries for current schema: Aaron Marcuse-Kubitza
06:27 PM Revision 6205: inputs/.geoscrub/geoscrub_cultivated/create.sql: Fixed bug where need to filter out NULL lat/longs because primary keys can't contain NULL values: Aaron Marcuse-Kubitza
06:17 PM Revision 6204: schemas/py_functions.sql: Changed function languages to plpython3u to match the new installed version. Note that plpythonu is not available on Mac under PostgreSQL 9.1.6.: Aaron Marcuse-Kubitza
05:59 PM Revision 6203: reinstall_all: Fixed bug where also need to include datasources starting with . such as .TNRS/, by using with_all's new $all option: Aaron Marcuse-Kubitza
05:58 PM Revision 6202: with_all: Added $all option to also include datasources starting with . such as .TNRS/. This is necessary for reinstall_all, which needs to install *all* datasources.: Aaron Marcuse-Kubitza
05:18 PM Revision 6201: root Makefile: PostgreSQL: $(pg_ctl-*): Fixed bug where need to pause for a few seconds after restarting PostgreSQL, to wait for the server to be ready to accept connections: Aaron Marcuse-Kubitza
05:12 PM Revision 6200: root Makefile: Installation: uninstall: Removed inputs/uninstall because the DB will be uninstalled anyway, so the inputs don't need to be individually removed first: Aaron Marcuse-Kubitza
05:11 PM Revision 6199: schemas/postgresql.Mac.conf: Added back unix_socket_directory setting, which is apparently still needed in PostgreSQL 9.1.6: Aaron Marcuse-Kubitza
05:06 PM Revision 6198: root Makefile: PostgreSQL: postgres-Linux: Also install postgresql.conf: Aaron Marcuse-Kubitza
04:54 PM Revision 6197: root Makefile: PostgreSQL: postgres-Darwin: Also install postgresql.Mac.conf: Aaron Marcuse-Kubitza
04:40 PM Revision 6196: root Makefile: PostgreSQL: $(macUsePostgresLib): Factored out PostgreSQL dir to $(macPostgresDir): Aaron Marcuse-Kubitza
04:38 PM Revision 6195: schemas/postgresql.Mac.conf: Updated to PostgreSQL 9.1.6's postgresql.conf: Aaron Marcuse-Kubitza
04:29 PM Revision 6194: root Makefile: Datasources: inputs/install: Fixed bug where need to `wait` after `. bin/reinstall_all` to wait for inputs to finish installing before installing the public schema. This is necessary because views in the public schema now have dependencies on some datasources, such as TNRS.: Aaron Marcuse-Kubitza
04:25 PM Revision 6193: root Makefile: PostgreSQL: $(psqlAsAdmin): Use new $(asAdmin): Aaron Marcuse-Kubitza
04:25 PM Revision 6192: root Makefile: VegBIEN DB: Schemas: schemas/public/install: Use $(psqlNoSearchPath) instead of $(psqlAsBien) because the search_path is set by vegbien.sql: Aaron Marcuse-Kubitza
04:16 PM Revision 6191: root Makefile: Datasources: Added inputs/install override which runs `. bin/reinstall_all` instead, in order to install all datasources simultaneously: Aaron Marcuse-Kubitza
04:03 PM Revision 6190: root Makefile: Python: python-Darwin: Added instructions to install Python 3.2 (Python 2 comes with Mac OS X, but Python 3.2 is needed for plpython3u): Aaron Marcuse-Kubitza
03:55 PM Revision 6189: root Makefile: VegBIEN DB: DB and bien user: mk_db: Updated for PostgreSQL 9.1.6 on the Mac, which only provides plpython3u (Python 3): Aaron Marcuse-Kubitza
03:54 PM Revision 6188: root Makefile: VegBIEN DB: DB and bien user: mk_db: Updated for PostgreSQL 9.1.6, which requires the DB name to be specified on the command line instead of in the $PGDATABASE env var set by postgres_vegbien. Fixed bug where need to run createlang as postgres superuser, because plpythonu is an untrusted language (with unrestricted access to the entire DB).: Aaron Marcuse-Kubitza
03:51 PM Revision 6187: root Makefile: PostgreSQL: postgres-Darwin: Updated for PostgreSQL 9.1.6, which requires some /usr/lib/ symlinks to be changed to newer versions installed in the PostgreSQL lib/ dir: Aaron Marcuse-Kubitza
03:49 PM Revision 6186: input.Makefile: $(psqlAsBien), csv2db: Turn off the automatic search_path where needed, because when the input is installed, the schemas in it may not exist yet: Aaron Marcuse-Kubitza
02:16 PM Revision 6185: schemas/vegbien.sql: place: Renamed geosource_valid to geovalid. (It had gotten renamed in the reference -> source rename.): Aaron Marcuse-Kubitza
02:12 PM Revision 6184: schemas/vegbien.sql: location: Renamed confidentialitystatus->accesslevel, confidentialityreason->accessconditions to match the corresponding fields in source. Note that accessconditions stores more than confidentialityreason did, because it can contain details about the accesslevel in addition to the reason for it.: Aaron Marcuse-Kubitza
02:07 PM Revision 6183: schemas/vegbien.sql: source.accesslevel, location.confidentialitystatus: Changed type to accesslevel: Aaron Marcuse-Kubitza
02:03 PM Revision 6182: schemas/vegbien.sql: Added accesslevel enum: Aaron Marcuse-Kubitza
01:51 PM Revision 6181: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza

11/14/2012

06:37 PM Revision 6180: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
06:30 PM Revision 6179: schemas/vegbien.sql: Renamed reference -> source to make this table more broadly applicable, and because this now stores the datasource metadata: Aaron Marcuse-Kubitza
06:19 PM Revision 6178: schemas/vegbien.sql: referencename: Scope it by top-level datasource, because institutionCodes (which map to this field) are not globally unique. This involves renaming the previous reference_id field, which was for the matched reference, to matched_reference_id, to allow a scoping reference_id field.: Aaron Marcuse-Kubitza
06:16 PM Revision 6177: mappings/VegCore-VegBIEN.csv: Made taxonoccurrence.verbatimcollectorname an fkey to party, and renamed it to collector_id: Aaron Marcuse-Kubitza
05:57 PM Revision 6176: inputs/VegBank/taxonobservation_/map.csv: Mapped new givenname, surname (from collector_id's party) to recordedBy: Aaron Marcuse-Kubitza
05:54 PM Revision 6175: inputs/VegBank/taxonobservation_/create.sql: Also join to collector_id's party to include collector name: Aaron Marcuse-Kubitza
05:53 PM Revision 6174: inputs/VegBank/vegbank.~.clean_up.sql: Rename taxoninterpretation.party_id to taxoninterpretation_party_id to make it globally unique when joining taxoninterpretation to other tables: Aaron Marcuse-Kubitza
05:48 PM Revision 6173: inputs/VegBank/vegbank.~.clean_up.sql: Rename party.d_obscount to party_d_obscount to make it globally unique when joining with other tables: Aaron Marcuse-Kubitza
05:43 PM Revision 6172: inputs/VegBank/vegbank.~.clean_up.sql: Rename taxoninterpretation.party_id to taxoninterpretation_party_id to make it globally unique when joining taxoninterpretation to other tables: Aaron Marcuse-Kubitza
05:35 PM Revision 6171: mappings/VegCore-VegBIEN.csv: Made taxonoccurrence.verbatimcollectorname an fkey to party, and renamed it to collector_id: Aaron Marcuse-Kubitza
05:32 PM Revision 6170: input.Makefile: Existing maps discovery: $(allTables): Fixed bug where need to remove extra whitespace before $(tables) when there are no $(joinedTables): Aaron Marcuse-Kubitza
05:32 PM Revision 6169: lib/mappings.Makefile: Checking if $(termsSubdirs) defined: Fixed bug where can't use ifndef because that checks if the variable is *empty*, not undefined. Need to use `ifeq ($(origin var),undefined)` instead.: Aaron Marcuse-Kubitza
05:11 PM Revision 6168: inputs/TEAM/V*/map.csv: Omit *Method, because it just contains "Derived" for a small fraction of the rows: Aaron Marcuse-Kubitza
04:47 PM Revision 6167: inputs/SALVIAS/: Updated to new salvias_plots export on nimoy, which has a different schema: Aaron Marcuse-Kubitza
04:03 PM Revision 6166: inputs/SALVIAS/salvias_plots.~.clean_up.sql: Moved Ensure globally unique column names to end to match VegBank order: Aaron Marcuse-Kubitza
03:54 PM Revision 6165: my2pg: *int types: Added mediumint: Aaron Marcuse-Kubitza
03:30 PM Revision 6164: Placed inputs/SALVIAS/_archive/ under version control: Aaron Marcuse-Kubitza
03:18 PM Revision 6163: inputs/SALVIAS/salvias_plots.~.clean_up.sql: Remove private data that should not be publicly visible, indicated by plotMetadata.AccessCode = 1: Aaron Marcuse-Kubitza
03:17 PM Revision 6162: inputs/SALVIAS/salvias_plots.~.clean_up.sql: Enable cascading deletes by adding the necessary fkeys: Aaron Marcuse-Kubitza
03:17 PM Revision 6161: Added inputs/SALVIAS/_src/salvias_data_access_controls.txt: Aaron Marcuse-Kubitza
02:26 PM Revision 6160: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
02:25 PM Revision 6159: inputs/.geoscrub/import_order.txt: Fixed bug where geoscrub_cultivated needs to be installed *after* geoscrub_cleaned_unique, not before as it would be with the default alphabetical sort order: Aaron Marcuse-Kubitza
02:24 PM Revision 6158: inputs/.geoscrub/geoscrub_cultivated/: Use _no_import file to exclude geoscrub_cultivated from the import, because it's used directly as a lookup table by analytical_stem rather than being imported. This ensures that there is no import log or input row count for geoscrub_cultivated in the import times, which would skew the import row count because the row count would be included even though no columns are mapped.: Aaron Marcuse-Kubitza
02:18 PM Revision 6157: input.Makefile: $(tables): Fixed bug where need to use $(importTables) instead of $(tables) in all places that should use only imported tables, rather than just in the import process itself: Aaron Marcuse-Kubitza
02:13 PM Revision 6156: input.Makefile: Import to VegBIEN: Added support for tables which should be installed but not imported, but which must be installed *after* tables which are imported rather than before. This currently applies to geoscrub.geoscrub_cultivated, which depends on geoscrub_cleaned_unique (and therefore must be installed after it), but which should not be imported because it's used directly as a lookup table by analytical_stem.: Aaron Marcuse-Kubitza
10:02 AM Revision 6155: inputs/VegBank/vegbank.~.clean_up.sql: Documented that plots with confidentialitystatus >= 4 are not deleted if their embargos have already expired. This applies to the Shenandoah NP data, which has confidentialitystatus = 5 but is no longer embargoed according to the embargo table: Aaron Marcuse-Kubitza

11/13/2012

08:10 PM Revision 6154: inputs/SALVIAS/: Mapped unmapped fields with a VegCore/VegBIEN equivalent. plotMetadata_/: Remapped life_zone to communityID because it is now _alt-ed together with vegetation_*, and thus not just a description with life_zone_code as its globally unique name.: Aaron Marcuse-Kubitza
07:35 PM Revision 6153: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
07:10 PM Revision 6152: schemas/vegbien.sql: referencetype: Added terms from reference.referencetype closed list in VegBank data dictionary. Cited sources in comment.: Aaron Marcuse-Kubitza
06:39 PM Revision 6151: schemas/vegbien.sql: reference.referencetype: Changed type to referencetype enum: Aaron Marcuse-Kubitza
06:38 PM Revision 6150: schemas/vegbien.sql: Added referencetype enum, containing VegBank's values in reference.referencetype as well as values for bien_web.datasource.aggregatorOrPrimary and bien_web.dataSourceNormalized.isHerbarium,isAggregator: Aaron Marcuse-Kubitza
06:23 PM Revision 6149: specimenreplicate: Made institution_id an fkey to referencename instead of party, to later be matched up with reference entries for each aggregator's subprovider: Aaron Marcuse-Kubitza
06:15 PM Revision 6148: schemas/vegbien.sql: referencename: Added referencename_unique unique index on name: Aaron Marcuse-Kubitza
06:00 PM Revision 6147: schemas/vegbien.sql: referencename: Made reference_id optional so it can be populated later when referencenames are scrubbed: Aaron Marcuse-Kubitza
05:58 PM Revision 6146: schemas/vegbien.sql: referencename: Renamed identifier to name because it is specifically any name for the reference, not necessarily an ID: Aaron Marcuse-Kubitza
05:53 PM Revision 6145: schemas/vegbien.sql: Renamed referencealtident to referencename to allow any verbatim reference name to go here, with reference containing the corresponding accepted reference name: Aaron Marcuse-Kubitza
05:50 PM Revision 6144: schemas/vegbien.sql: reference: Added accesslevel, accessconditions from bien_web.datasource: Aaron Marcuse-Kubitza
05:41 PM Revision 6143: schemas/vegbien.sql: address: Added street2 from bien_web.party.address2: Aaron Marcuse-Kubitza
05:38 PM Revision 6142: schemas/vegbien.sql: address: Renamed fields to bien_web.party names: Aaron Marcuse-Kubitza
05:12 PM Revision 6141: schemas/vegbien.sql: party: Added department from bien_web.party: Aaron Marcuse-Kubitza
05:06 PM Revision 6140: inputs/SALVIAS/plotMetadata_/map.csv: Mapped lookup_MethodCode_Description to new observationMeasure: Aaron Marcuse-Kubitza
05:06 PM Revision 6139: schemas/vegbien.sql: method: Made name optional when description or observationmeasure is specified: Aaron Marcuse-Kubitza
05:03 PM Revision 6138: schemas/vegbien.sql: method: method_unique: Include observationmeasure since the method name sometimes is not globally unique (e.g. in SALVIAS): Aaron Marcuse-Kubitza
04:58 PM Revision 6137: mappings/VegCore-VegBIEN.csv: Mapped observationMeasure: Aaron Marcuse-Kubitza
04:57 PM Revision 6136: mappings/VegCore.csv: observationMeasure: Added source to DwC samplingProtocol: Aaron Marcuse-Kubitza
04:54 PM Revision 6135: mappings/VegCore.csv: Added observationMeasure: Aaron Marcuse-Kubitza
04:40 PM Revision 6134: schemas/vegbien.ERD.mwb: Added family_higher_plant_group: Aaron Marcuse-Kubitza
04:28 PM Revision 6133: schemas/vegbien.sql: Removed VegBank-internal fields starting with d_: Aaron Marcuse-Kubitza
04:19 PM Revision 6132: schemas/vegbien.ERD.mwb: Moved tables so commclass would have more room. Moved revision back to original spot.: Aaron Marcuse-Kubitza
04:07 PM Revision 6131: schemas/filter_ERD.csv: Display referencecontributor->party connection in ERD: Aaron Marcuse-Kubitza
03:56 PM Revision 6130: schemas/vegbien.sql: Removed no longer used table referenceparty: Aaron Marcuse-Kubitza
03:54 PM Revision 6129: schemas/vegbien.sql: referencecontributor: Point to party instead of referenceparty, which duplicates party: Aaron Marcuse-Kubitza
03:51 PM Revision 6128: schemas/vegbien.sql: party: Added new suffix field to party_unique unique index: Aaron Marcuse-Kubitza
03:49 PM Revision 6127: schemas/vegbien.sql: party: Added fields from referenceparty. Note that referenceparty.type is named partytype.: Aaron Marcuse-Kubitza
03:25 PM Revision 6126: inputs/SALVIAS/salvias_plots.~.clean_up.sql: Rename lookup_MethodCode.Description to lookup_MethodCode_Description to make it globally unique when joined with plotMetadata: Aaron Marcuse-Kubitza
03:24 PM Revision 6125: input.Makefile: SVN: $(svnFilesGlob): Added root-level .sql files containing ~, which run additional commands after the original data is imported: Aaron Marcuse-Kubitza
03:22 PM Revision 6124: inputs/SALVIAS/_MySQL/: Updated svn:ignore from running `make inputs/SALVIAS/add`: Aaron Marcuse-Kubitza
02:30 PM Revision 6123: mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates entry itself needed to have its datasource (reference) set to geoscrub, in addition to the place entry that uses it, in order to match up properly with geoscrub's corresponding input place (whose coordinates as well as place are owned by the geoscrub datasource): Aaron Marcuse-Kubitza
02:22 PM Revision 6122: mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates mappings with and without matched_place_id=0 need to sort together in order to be merged, by prepending ".," to the place attrs list: Aaron Marcuse-Kubitza
02:22 PM Revision 6121: inputs/VegBank/plot_/test.xml.ref: Updated inserted row count: Aaron Marcuse-Kubitza
12:00 PM Revision 6120: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza

11/09/2012

08:30 PM Revision 6119: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
08:20 PM Revision 6118: inputs/Makefile: Input data: $(rsyncLogs): Also include logs from the datasource's top-level logs/ dir, which contains make_analytical_db.log.sql: Aaron Marcuse-Kubitza
08:09 PM Revision 6117: inputs/VegBank/vegbank.~.clean_up.sql: Remove still-embargoed plots: Aaron Marcuse-Kubitza
08:07 PM Revision 6116: inputs/VegBank/vegbank.~.clean_up.sql: Enable cascading deletes by setting all foreign keys to ON DELETE CASCADE: Aaron Marcuse-Kubitza
07:49 PM Revision 6115: Added inputs/VegBank/_src/vegbank.schema.sql.make and vegbank.schema.sql: Aaron Marcuse-Kubitza
07:48 PM Revision 6114: input.Makefile: Staging tables installation: sql/install: Use new pg_dump_limit to remove security and schema-setting commands: Aaron Marcuse-Kubitza
07:46 PM Revision 6113: Added pg_dump_limit to filter a PostgreSQL DB dump to remove security and schema-setting commands: Aaron Marcuse-Kubitza
06:37 PM Revision 6112: inputs/.geoscrub/geoscrub_cleaned_unique/create.sql: Removed no longer needed index on latitudeDecimalVerbatim, longitudeDecimalVerbatim, which is now on geoscrub_cultivated instead: Aaron Marcuse-Kubitza
06:32 PM Revision 6111: schemas/vegbien.sql: analytical_stem_view: Fixed bug where needed to join on new geoscrub_cultivated, not geoscrub, for all geoscrub-related information. geoscrub contains many duplicate records, causing one input row to match many rows in geoscrub, when there should only be one entry for each coordinate pair.: Aaron Marcuse-Kubitza
06:26 PM Revision 6110: Added inputs/.geoscrub/geoscrub_cultivated/: Aaron Marcuse-Kubitza
06:04 PM Revision 6109: inputs/.geoscrub/geoscrub_cleaned_unique/create.sql: Added index on latitudeDecimalVerbatim, longitudeDecimalVerbatim for use by analytical_stem_view: Aaron Marcuse-Kubitza
05:34 PM Revision 6108: inputs/newWorld/geoscrub.schema.~.changes.sql: Change countryNameStd type to text to allow merge-joining with place.country in analytical_stem_view: Aaron Marcuse-Kubitza
05:28 PM Revision 6107: inputs/newWorld/geoscrub.schema.~.changes.sql: ALTER TABLE ... ALTER COLUMN statement: Reformatted to allow adding additional ALTER COLUMN clauses: Aaron Marcuse-Kubitza
05:25 PM Revision 6106: inputs/.geoscrub/geoscrub_cleaned_unique/create.sql: Change latitudeDecimalVerbatim, longitudeDecimalVerbatim types to double precision to allow merge-joining with coordinates.latitude_deg, longitude_deg in analytical_stem_view: Aaron Marcuse-Kubitza
05:12 PM Revision 6105: README.TXT: Data import: Instead of using `make schemas/rotate` and then renaming the public schema to the correct name, just rename directly to the correct name using `make schemas/rename/...`. Use new import_name to determine the import name instead of manually finding the date in the first datasource's log file name.: Aaron Marcuse-Kubitza
05:06 PM Revision 6104: Added import_name, which gets the name of an import based on its log file names: Aaron Marcuse-Kubitza
04:50 PM Revision 6103: README.TXT: Data import: Moved checking that imports were successful before running make_analytical_db: Aaron Marcuse-Kubitza
04:41 PM Revision 6102: root Makefile: Installation: Fixed bug where schemas/install needed to happen *after* inputs/install because some of the public schema's views now depend on inputs: Aaron Marcuse-Kubitza
04:07 PM Revision 6101: schemas/vegbien.sql: analytical_stem_view: cultivatedBasis: Concatenate ''::text to geoscrub.isCultivatedReason so it will be cast to a text field both on PostgreSQL 9.1.1 (local machine), which removes any explicit cast to text when creating the view, and 9.1.6 (vegbiendev), which requires an explicit cast to text: Aaron Marcuse-Kubitza
03:49 PM Revision 6100: schemas/vegbien.sql: analytical_stem_view: cultivatedBasis: Use geoscrub.isCultivatedReason instead when geoscrub.isCultivated is used as the source for cultivated: Aaron Marcuse-Kubitza
12:53 PM Revision 6099: schemas/vegbien.sql: analytical_stem_view: Use geoscrub.isCultivated when taxonoccurrence.iscultivated is not provided (joining to geoscrub on the coordinates): Aaron Marcuse-Kubitza

11/08/2012

06:38 PM Revision 6098: root Makefile: VegBIEN DB: Schemas: Run all schema installs and uninstalls using no_search_path=1, so that the schemas in the automatic search_path are not required for the command to run: Aaron Marcuse-Kubitza
06:37 PM Revision 6097: psql_vegbien: Added $no_search_path option to turn off the automatic SET search_path directive: Aaron Marcuse-Kubitza
06:11 PM Revision 6096: schemas/vegbien.sql: taxonverbatim: Added growthform for growthform based on the taxon name rather than provided with the input data's taxonoccurrence: Aaron Marcuse-Kubitza
06:00 PM Revision 6095: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
05:47 PM Revision 6094: inputs/SALVIAS/plotMetadata/: LEFT JOINed with lookup_MethodCode to create plotMetadata_: Aaron Marcuse-Kubitza
04:52 PM Revision 6093: schemas/vegbien.sql: threatened_taxonlabel_view: Fixed bug where needed DISTINCT on resulting taxonlabel_id because some descendants apparently appear in multiple threatened taxonlabels' subtrees: Aaron Marcuse-Kubitza
04:42 PM Revision 6092: schemas/vegbien.sql: analytical_*: Added threatened, using new threatened_taxonlabel lookup table: Aaron Marcuse-Kubitza
04:12 PM Revision 6091: schemas/vegbien.sql: reference_by_shortname(): Fixed bug where need to use $-syntax to reference params in sql-language functions: Aaron Marcuse-Kubitza
04:07 PM Revision 6090: schemas/vegbien.sql: threatened_taxonlabel_view: Use new reference_by_shortname(): Aaron Marcuse-Kubitza
03:45 PM Revision 6089: root Makefile: VegBIEN DB: Schemas: public: schemas/public/uninstall: Fixed bug where need to run psql_vegbien without public in the search_path because it may have already been deleted: Aaron Marcuse-Kubitza
03:44 PM Revision 6088: root Makefile: VegBIEN DB: Schemas: public: schemas/public/install: Fixed bug where need to run psql_vegbien without public in the search_path because it doesn't exist, by setting public to the empty string (deleting it): Aaron Marcuse-Kubitza
03:42 PM Revision 6087: vegbien_dest: $schemas: Don't include the , before $public if it has been set to the empty string (deleted): Aaron Marcuse-Kubitza
03:27 PM Revision 6086: schemas/vegbien.sql: Added reference_by_shortname(). Using this function instead of the manual query should force the query planner to evaluate this expression first, rather than complexly reordering joins to place this nested select as a filter condition.: Aaron Marcuse-Kubitza
03:00 PM Revision 6085: schemas/vegbien.sql: Added threatened_taxonlabel derived table with generating view threatened_taxonlabel_view: Aaron Marcuse-Kubitza
02:48 PM Revision 6084: Updated inputs/UNCC/Specimen/test.xml.ref inserted row count: Aaron Marcuse-Kubitza
01:38 PM Revision 6083: mappings/VegCore.csv: Added threatened: Aaron Marcuse-Kubitza
01:21 PM Revision 6082: inputs/VegBank/vegbank.~.clean_up.sql: Remove private columns (plot.reallatitude, reallongitude) that should not be publicly visible: Aaron Marcuse-Kubitza
01:13 PM Revision 6081: inputs/CVS/Organism/map.csv: Removed now-dropped realLatitude, realLongitude: Aaron Marcuse-Kubitza
01:12 PM Revision 6080: inputs/CVS/Organism/map.csv: Removed now-dropped realLatitude, realLongitude: Aaron Marcuse-Kubitza
01:12 PM Revision 6079: Added inputs/CVS/Organism/postprocess.sql to drop private realLatitude, realLongitude columns: Aaron Marcuse-Kubitza
01:10 PM Revision 6078: input.Makefile: Staging tables installation: Added back postprocess.sql, which is now used for one-time dropping of private columns that should not be publicly visible: Aaron Marcuse-Kubitza
12:47 PM Revision 6077: input.Makefile: Maps building: %/.map.csv.last_cleanup: $(dict) canon/translate: Use new $(translate?): Aaron Marcuse-Kubitza
12:45 PM Revision 6076: input.Makefile: Maps building: %/.map.csv.last_cleanup: Added $(srcDict) as a prerequisite, so that .last_cleanup will be re-run if it changes. Added empty $(srcDict) target in case it doesn't exist.: Aaron Marcuse-Kubitza
12:39 PM Revision 6075: inputs/bien_web/observation/map.csv: Omit *_index because they are placeholder columns created by the MySQL to PostgreSQL translation: Aaron Marcuse-Kubitza
12:37 PM Revision 6074: input.Makefile: Maps building: %/.map.csv.last_cleanup: Fixed bug where can only canon/translate using $(srcDict) if it exists for the datasource: Aaron Marcuse-Kubitza
12:26 PM Revision 6073: inputs/bien_web/observation/: Regenerated from actual bien_web.observation schema on nimoy, which has additional columns: Aaron Marcuse-Kubitza
12:24 PM Revision 6072: input.Makefile: SVN: $(svnFilesGlob): Added top-level map.csv, which can be used to apply a datasource-global data dictionary to all tables: Aaron Marcuse-Kubitza
12:18 PM Revision 6071: input.Makefile: Maps building: %/.map.csv.last_cleanup: Also apply any map.csv at the top level of the datasource directory. This can be used to apply a datasource-global data dictionary to all tables.: Aaron Marcuse-Kubitza
12:01 PM Revision 6070: my2pg: Also remove column comments. Note that these cannot be translated by sed, because PostgreSQL only allows setting column comments in a separate statement, not inline with the column's entry in the CREATE TABLE statement, and sed can only make replacements contiguous with the input line.: Aaron Marcuse-Kubitza
11:28 AM Revision 6069: mappings/VegCore.csv: Removed incorrect duplicate entry for verbatimSubgenus: Aaron Marcuse-Kubitza
10:58 AM Revision 6068: schemas/vegbien.sql: _taxon_family_require_std(): Fixed bug where name needed to be lowercased before checking if it ended in -aceae, to support family names that are uppercase. Note that this resulted in the family not being prepended to the TNRS input name for datasources with uppercase family names, so the next DB import will likely produce a number of unscrubbed TNRS input names which now have the uppercase family prepended.: Aaron Marcuse-Kubitza
10:17 AM Revision 6067: inputs/.TNRS/tnrs/tnrs.make: Fixed bug where need to reference the log file path relative to the make script itself, because otherwise the log file would go in inputs/.TNRS/logs/tnrs.make.log.sql: Aaron Marcuse-Kubitza
10:07 AM Revision 6066: inputs/.TNRS/tnrs/tnrs.make: Fixed bug where need to use just the basename of $0 as the log file name: Aaron Marcuse-Kubitza
09:51 AM Revision 6065: Added inputs/IUCN/: Aaron Marcuse-Kubitza
09:51 AM Revision 6064: input.Makefile: SVN: add: Added _src/ (when it exists). $(_svnFilesGlob): Added .url, .pdf files in _src/.: Aaron Marcuse-Kubitza
07:47 AM Revision 6063: psql_vegbien: Use bash because it supports substitutions: Aaron Marcuse-Kubitza
07:46 AM Revision 6062: psql_vegbien: Set the search_path to $out_schemas set by vegbien_dest. This will enable running any psql_vegbien script on a schema other than public.: Aaron Marcuse-Kubitza
07:07 AM Revision 6061: schemas/vegbien.sql: analytical_stem_view: Changed inner joins on non-datasource taxonlabels to LEFT JOINs, to ensure that an entry is included in the analytical DB even if there was no matched taxonlabel. In theory, this shouldn't be necessary, because every taxonlabel should have a canonical taxonlabel since canon_label_id is auto-populated to the taxonlabel_id if there is no matched_label_id; there should be a taxonverbatim for every datasource and accepted taxonlabel because datasources link to taxonlabel via taxonverbatim and TNRS populates a taxonverbatim for every accepted taxonlabel; and there should be a parsed taxonlabel for every datasource taxonlabel because the mappings populate it.: Aaron Marcuse-Kubitza
06:56 AM Revision 6060: schemas/vegbien.sql: analytical_stem_view: Removed join on specimenreplicate, because it isn't used in the analytical DB. Each specimen will still get an entry in analytical_*, because it gets its own location.: Aaron Marcuse-Kubitza
06:45 AM Revision 6059: README.TXT: Data import: Before starting import, added step to run `make inputs/upload` and reinstall newly-uploaded datasources: Aaron Marcuse-Kubitza
03:56 AM Task #346: add georeferencing support to schema: Georeferencing information can be stored in the geovalidation place entry, which the original place is linked to via ... Aaron Marcuse-Kubitza
03:42 AM Task #292: VegBank metadata query mechanism: Added possible strategy Aaron Marcuse-Kubitza
03:32 AM Task #343 (Resolved): integrate TNRS into VegBIEN: Aaron Marcuse-Kubitza
03:32 AM Task #308 (Resolved): do a direct transfer of some public data from VegBank: Core fields in VegBank have been mapped. A recent full export of the live VegBank DB is used as the input. Aaron Marcuse-Kubitza
03:31 AM Task #335 (Rejected): Look into using Sybase Powerbuilder or IBM Enterprise Vision to map data: We are using map spreadsheets and auto-mapping instead, which work well so far and would not be easy to translate to ... Aaron Marcuse-Kubitza
03:30 AM Task #312: Finish importing SALVIAS data: Still need to map methods, probably using LEFT JOIN Aaron Marcuse-Kubitza
03:27 AM Task #365 (Rejected): retrieve taxonomic hierarchy in analytical layer by using dynamic queries to external sources: We're using a fixed version of the NCBI tree of life instead Aaron Marcuse-Kubitza
03:26 AM Task #424: Finish translating XML functions to SQL functions for column-based import: Most translated; still left:
* _map
* _range*
* _avg
* _compass
* a few others Aaron Marcuse-Kubitza
03:24 AM Task #440: aggregating validations of imports: Queries work for current schema Aaron Marcuse-Kubitza
03:22 AM Task #454 (Resolved): update summarizing queries for current schema: @make inputs/SALVIAS/verify/@ and @make inputs/NY/verify/@ work again Aaron Marcuse-Kubitza
03:22 AM Revision 6058: README.TXT: Schema changes: Remember to update the following files with any renamings: Added mappings/verify.*.sql: Aaron Marcuse-Kubitza
03:16 AM Task #476 (Rejected): develop map spreadsheet -> header override file translation utility: Shouldn't do this because it would prevent map spreadsheets from having multiple output locations for the same input ... Aaron Marcuse-Kubitza
03:11 AM Task #495 (Resolved): add separate datasource table rather than using party for this: Now using @reference@ for this Aaron Marcuse-Kubitza
03:07 AM Task #521 (Resolved): make place* tables use a structure similar to taxonconcept: Aaron Marcuse-Kubitza
02:21 AM Revision 6057: README.TXT: Data import: make_analytical_db: Documented how to view progress in log file: Aaron Marcuse-Kubitza
02:18 AM Revision 6056: make_analytical_db: Run all commands synchronously so the log file output doesn't become jumbled: Aaron Marcuse-Kubitza
02:16 AM Revision 6055: make_analytical_db: Fixed bug where log file needed to be appended to instead of overwritten: Aaron Marcuse-Kubitza
02:15 AM Revision 6054: make_analytical_db: Wrap each individual command in `set -x` to avoid echoing low-level commands such as sleep, wait: Aaron Marcuse-Kubitza
02:02 AM Revision 6053: make_analytical_db: Moved log file to inputs/analytical_db/logs/make_analytical_db.log.sql so it would be synced along with the other import logs: Aaron Marcuse-Kubitza
01:57 AM Revision 6052: inputs/.TNRS/tnrs/tnrs.make: Output the time at which it's run, so this is included in the log file: Aaron Marcuse-Kubitza
01:53 AM Revision 6051: inputs/.TNRS/tnrs/tnrs.make: Moved log file to logs/tnrs.make.log.sql so it would automatically be synced along with the other import logs: Aaron Marcuse-Kubitza
01:49 AM Revision 6050: make_analytical_db: Moved log file to inputs/analytical_db/logs/make_analytical_db.log.sql so it would be synced along with the other import logs: Aaron Marcuse-Kubitza
01:40 AM Revision 6049: inputs/Makefile: Import logs: $(rsyncLogs): Always download the TNRS daemon log, rather than requiring tnrs_log=1 to be specified to download it: Aaron Marcuse-Kubitza
01:37 AM Revision 6048: make_analytical_db: Output the time at which it's run, so this is included in the log file: Aaron Marcuse-Kubitza
01:36 AM Revision 6047: make_analytical_db: Store output in schemas/make_analytical_db.log: Aaron Marcuse-Kubitza
01:24 AM Revision 6046: schemas/vegbien.sql: Removed no longer used make_analytical_db(). Use bin/make_analytical_db instead.: Aaron Marcuse-Kubitza
01:23 AM Revision 6045: make_analytical_db: Use new psql_verbose_vegbien: Aaron Marcuse-Kubitza
01:22 AM Revision 6044: Added psql_verbose_vegbien: Aaron Marcuse-Kubitza
01:18 AM Revision 6043: make_analytical_db: Use psql_script_vegbien, which contains the necessary psq options, instead of setting those options manually: Aaron Marcuse-Kubitza
01:15 AM Revision 6042: make_analytical_db: Run the SQL commands directly with psql instead of using the make_analytical_db() function. This provides incremental results and avoids running all commands in one transaction, thus preventing pgAdmin from freezing when the user attempts to access a table used in analytical DB creation (because the TRUNCATE statement fully locks the table until the entire analytical DB is built).: Aaron Marcuse-Kubitza
12:46 AM Revision 6041: schemas/vegbien.sql: make_analytical_db(): Added make_family_higher_plant_group(): Aaron Marcuse-Kubitza
12:17 AM Revision 6040: inputs/import.stats.xls: Updated import times. Fixed input row counts and import times to include derived data, such as TNRS and geoscrub, which adds to the import time and therefore should be considered in the import's speed. (TNRS was already being included in the import time for some, but not all, imports.): Aaron Marcuse-Kubitza

11/06/2012

05:02 PM Revision 6039: schemas/vegbien.sql: analytical_*: Added higherPlantGroup: Aaron Marcuse-Kubitza
04:06 PM Revision 6038: pg_dump_vegbien: When not in data mode, use --inserts so the INSERT statements generated can be run directly in pgAdmin, they can be directly translated to MySQL, and an empty COPY FROM statement isn't generated for every table: Aaron Marcuse-Kubitza
04:05 PM Revision 6037: schemas/vegbien.sql: Added family_higher_plant_group lookup table, generated with make_family_higher_plant_group() using the contents of higher_plant_group_nodes: Aaron Marcuse-Kubitza
03:40 PM Revision 6036: schemas/vegbien.sql: Added family_higher_plant_group lookup table, generated with make_family_higher_plant_group() using the contents of higher_plant_group_nodes: Aaron Marcuse-Kubitza
03:23 PM Revision 6035: mappings/VegCore-VegBIEN.csv: Always map taxonNameOrEpithet to taxonomicname, now that it's globally unique at all ranks in the datasource that provides it (NCBI): Aaron Marcuse-Kubitza
03:18 PM Revision 6034: inputs/.NCBI/nodes/create.sql: Make name_txt completely globally unique by removing all duplicates, not just duplicate genera: Aaron Marcuse-Kubitza
03:17 PM Revision 6033: inputs/.NCBI/nodes/create.sql: Make name_txt (mostly) globally unique by removing several other kingdoms/superkingdoms, not just Animalia: Aaron Marcuse-Kubitza
02:56 PM Revision 6032: inputs/.NCBI/nodes/create.sql: Making genus globally unique: Moved comment with kingdom name to line with DELETE, and put "delete cascades to descendants" comment on its own line: Aaron Marcuse-Kubitza
01:57 PM Revision 6031: schemas/vegbien.sql: Added higher_plant_group_nodes lookup table: Aaron Marcuse-Kubitza
01:57 PM Revision 6030: schemas/vegbien.sql: Added higher_plant_group_nodes lookup table: Aaron Marcuse-Kubitza
01:49 PM Revision 6029: schemas/Makefile: DDLs: $(pg_dump): Turn off schema-only mode so that lookup table contents are included as well: Aaron Marcuse-Kubitza
01:47 PM Revision 6028: pg_dump_vegbien: Except in (raw) data mode, filter out pg_catalog.setval() statements (only lookup table contents should be preserved): Aaron Marcuse-Kubitza
01:39 PM Revision 6027: lib/PostgreSQL-MySQL.csv: Remove SELECT pg_catalog.setval() statements: Aaron Marcuse-Kubitza
01:27 PM Revision 6026: lib/PostgreSQL-MySQL.csv: Remove COPY statements: Aaron Marcuse-Kubitza
01:18 PM Revision 6025: pg_dump_vegbien: $schema flag defaults to on to export only schema: Aaron Marcuse-Kubitza
01:08 PM Revision 6024: pg_dump_vegbien: Fixed bug where schema name var can't be named $schema because that would conflict with the $schema flag: Aaron Marcuse-Kubitza
01:03 PM Revision 6023: root Makefile: VegBIEN DB: DB and bien user: db: Run inputs/.TNRS/install before schemas/install because public.tnrs_input_name depends on TNRS.tnrs: Aaron Marcuse-Kubitza
12:33 PM Revision 6022: pg_dump_vegbien: Only include just the schema if new $schema flag is set. This causes constant lookup tables to automatically be exported for DB schemas, which was not fully supported by the $data flag because it also forced $owners to be enabled.: Aaron Marcuse-Kubitza
12:17 PM Revision 6021: schemas/vegbien.sql: Added higher_plant_group enum: Aaron Marcuse-Kubitza
11:25 AM Revision 6020: mappings/VegCore.csv: Added higherPlantGroup: Aaron Marcuse-Kubitza
10:40 AM Revision 6019: schemas/vegbien.sql: analytical_*: Added isNewWorld: Aaron Marcuse-Kubitza
10:32 AM Revision 6018: mappings/VegCore.csv: Added isNewWorld: Aaron Marcuse-Kubitza
10:21 AM Revision 6017: inputs/newWorld/: Added geoscrub.schema.~.changes.sql to add a unique constraint on countryNameStd and change isNewWorld's type to boolean: Aaron Marcuse-Kubitza
09:59 AM Revision 6016: Added inputs/newWorld/: Aaron Marcuse-Kubitza
09:55 AM Revision 6015: README.TXT: Datasource setup: Moved Auto-create the map spreadsheets after Install the staging tables, because for DB-only tables, creating the map spreadsheets requires the header.csv generated by the staging tables install: Aaron Marcuse-Kubitza
09:48 AM Revision 6014: input.Makefile: SVN: add: Also set _MySQL/'s svn:ignore: Aaron Marcuse-Kubitza
09:30 AM Revision 6013: inputs/SALVIAS/_MySQL/salvias_plots.*.sql.make: Use new my2pg_export: Aaron Marcuse-Kubitza
09:27 AM Revision 6012: inputs/.geoscrub/_MySQL/geoscrub.*.sql.make: Use new my2pg_export: Aaron Marcuse-Kubitza
09:25 AM Revision 6011: Added my2pg_export: Aaron Marcuse-Kubitza

11/05/2012

11:16 PM Revision 6010: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
11:14 PM Revision 6009: schemas/vegbien.ERD.mwb: Added analytical_stem, analytical_aggregate to ERD: Aaron Marcuse-Kubitza
10:49 PM Revision 6008: schemas/vegbien.sql: analytical_*: Added georeferenceValid, distanceToCountry_km, distanceToStateProvince_km: Aaron Marcuse-Kubitza
10:34 PM Revision 6007: schemas/vegbien.sql: Added _m_to_km(): Aaron Marcuse-Kubitza
10:06 PM Revision 6006: inputs/.geoscrub/geoscrub_cleaned_unique/map.csv: Removed geoscrubID->locationID mapping because it allowed multiple occurrences of the same placename/coordinates combination, which was causing problems when datasources try to link up to the geoscrub places (and end up matching multiple output rows for each input row): Aaron Marcuse-Kubitza
09:54 PM Revision 6005: sql_io.py: put_table(): Removed assertion that into's row count be at least full_in_table's row count, because now that DISTINCT ON is used to satisfy the into table pkey, this is no longer necessarily true: Aaron Marcuse-Kubitza
09:15 PM Revision 6004: schemas/vegbien.sql: method: method_unique: Added reference_id to make method unique within the datasource instead of globally unique within VegBIEN: Aaron Marcuse-Kubitza
09:12 PM Revision 6003: mappings/VegCore.csv: Added samplingProtocolID: Aaron Marcuse-Kubitza
08:48 PM Revision 6002: mappings/VegCore-VegBIEN.csv: subplot locationevent: Only populate parent locationevent's location unique IDs if a subplot #/subplotID is actually specified. (The lack of a location unique ID will cause the parent locationevent's location to be removed, as well as the parent locationevent itself if there is no parent locationevent unique ID.) This fixes a bug where top-level plots in datasources that provide a nullable subplot #/subplotID were incorrectly getting connected to parent locationevents.: Aaron Marcuse-Kubitza
08:34 PM Revision 6001: mappings/verify.plots.sql: Added # locations, # location events, which also include subplots: Aaron Marcuse-Kubitza
08:17 PM Revision 6000: inputs/SALVIAS/verify/plots.ref.sql: Standardized line wrapping to put ; on its own line in multi-line queries and at the end of the line in one-line queries: Aaron Marcuse-Kubitza
08:13 PM Revision 5999: mappings/verify.plots.sql: Added "top-level" to # locations, # location events names because these queries do not include all locations: Aaron Marcuse-Kubitza
07:52 PM Revision 5998: inputs/SALVIAS/verify/plots.ref.sql: # locations: Fixed count to include all plotMetadata entries instead of just unique coordinates, because each plotMetadata entry now gets its own location: Aaron Marcuse-Kubitza
07:34 PM Revision 5997: mappings/VegCore-VegBIEN.csv: matched place: When any of the accepted* names is specified, set matched_place_id=0 (self-reference) to indicate that the matched place is an accepted (scrubbed) place: Aaron Marcuse-Kubitza
05:30 PM Revision 5996: schemas/vegbien.sql: analytical_stem_view: Use the canonical (scrubbed) place instead of the datasource's place where available. Note that canon_place_id is always populated (with a self-reference if needed), to enable using a just a LEFT JOIN for this.: Aaron Marcuse-Kubitza
05:14 PM Revision 5995: schemas/vegbien.sql: analytical_stem_view: Use datasource_* prefix instead of reference_* prefix that tables had gotten when datasource_id was renamed to reference_id: Aaron Marcuse-Kubitza
05:04 PM Revision 5994: inputs/SALVIAS/verify/plots.ref: Regenerated on vegbiendev using the PostgreSQL 9.1 pg_catalog.default collation: Aaron Marcuse-Kubitza
04:49 PM Revision 5993: sql_io.py: put_table(): Setting pkeys of missing rows: Fixed bug in column-based import where when input rows match multiple output rows in one of this iteration's input tables, the into table's pkey constraint is violated because full_in_table contains multiple entries for an input pkey: Aaron Marcuse-Kubitza
04:42 PM Revision 5992: schemas/vegbien.sql: place: Added index on matched_place_id to facilitate looking up places by matched_place_id in place_set_canon_place_id(): Aaron Marcuse-Kubitza
04:26 PM Revision 5991: schemas/vegbien.sql: Added geoscrub_input derived table with associated view. Build geoscrub_input as part of make_analytical_db().: Aaron Marcuse-Kubitza
03:56 PM Revision 5990: sql.py: Added table_has_pkey(): Aaron Marcuse-Kubitza
03:13 PM Revision 5989: sql.py: table_pkey_col(): For PostgreSQL DBs, use pg_catalog via index_cols() and table_pkey_index(), in order to use the search_path to look up the table. This fixes a bug where the pkey would be selected from information_schema.table_constraints in random order, and this order sometimes returned the corresponding table in the public schema but sometimes in other schemas, such as VegBank. This became a problem now that VegBIEN has a place table, which conflicts with VegBank's place table. (Most other VegBank tables that are mapped to have been renamed in VegBIEN.): Aaron Marcuse-Kubitza
02:41 PM Revision 5988: sql.py: add_pkey_or_index(): Use new add_pkey_index(): Aaron Marcuse-Kubitza
02:41 PM Revision 5987: sql.py: Added add_pkey_index(): Aaron Marcuse-Kubitza
02:29 PM Revision 5986: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
02:15 PM Revision 5985: import_times: Use $'' quoting to expand tab, in order to also work on Mac OS X: Aaron Marcuse-Kubitza

11/02/2012

06:14 PM Revision 5984: schemas/vegbien.sql: place: Added place_set_canon_place_id_on_*() triggers, analogous to taxonlabel.taxonlabel_2_set_canon_label_id_on_*(): Aaron Marcuse-Kubitza
06:12 PM Revision 5983: schemas/vegbien.sql: place: Added canon_place_id: Aaron Marcuse-Kubitza
06:04 PM Revision 5982: mappings/verify.plots.sql, inputs/SALVIAS/verify/plots.ref.sql: Renamed project column to projectname to avoid confusion with the project table: Aaron Marcuse-Kubitza
06:03 PM Revision 5981: input.Makefile: Verification of import: Fixed bug where needed to make %.ref .PRECIOUS instead of verify/%.ref: Aaron Marcuse-Kubitza
05:57 PM Revision 5980: inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id everywhere, not just as the ID to list for each project, to match the corresponding VegBIEN queries: Aaron Marcuse-Kubitza
05:55 PM Revision 5979: input.Makefile: Verification of import: Make verify/%.ref .PRECIOUS because there must always be a .ref for the make rules to work: Aaron Marcuse-Kubitza
05:34 PM Revision 5978: inputs/SALVIAS/verify/plots.ref.sql: Use project_name instead of project_id as the ID to list for each project, since the SALVIAS projects table is now mapped to the DB and the project_name is used as project.projectname: Aaron Marcuse-Kubitza
05:18 PM Revision 5977: mappings/VegCore-VegBIEN.csv: subplots: Also complete the locationevent/location diamond (subplot event -> {subplot location, parent plot event} -> parent plot location) when an eventDate or range is specified, as this is also an identifying field for locationevent. This fixes a bug where subplots data without explicit plot events (such as SALVIAS and TEAM) was not being connected to the appropriate parent plot *event* as well as parent plot location. This should fix the SALVIAS verification # location events, which should include only parent plots' locationevents to correspond with # locations, which only includes parent plots' locations, and uses locationevent.parent_id being NULL to determine what is a parent plot event.: Aaron Marcuse-Kubitza
04:45 PM Revision 5976: input.Makefile: Verification of import: $(verify): Use $(inDatasrc) so that the full standard search_path, including any custom public schema, is made available to the VegBIEN-side verification script: Aaron Marcuse-Kubitza
04:33 PM Revision 5975: mappings/verify.plots.sql: # locations: Fixed bug where need to use location instead of locationevent: Aaron Marcuse-Kubitza
04:30 PM Revision 5974: mappings/verify.specimens.sql: Updated for current schema: Aaron Marcuse-Kubitza
04:26 PM Revision 5973: input.Makefile: Verification of import: verify/%.out: Made the target .PRECIOUS so that partial output will be saved in case of error to help debugging: Aaron Marcuse-Kubitza
04:13 PM Revision 5972: mappings/verify.plots.sql: Updated for current schema: Aaron Marcuse-Kubitza
04:01 PM Revision 5971: Updated validation/BIEN2_Analytical_DB_overview.docx with Brad's revision: Aaron Marcuse-Kubitza
03:58 PM Revision 5970: schemas/vegbien.sql: taxonlabel_set_canon_label_id(): Fixed bug where always need to set canon_label_id on insert (because it must be a self-reference rather than NULL if there is no matched_label_id), but wasn't being set when no matched_label_id because the IS DISTINCT FROM check returned false: Aaron Marcuse-Kubitza
03:42 PM Revision 5969: mappings/VegCore-VegBIEN.csv: decimalLatitude/Longitude->geoscrub input coordinates: Also set to NULL if 0 here, not just for the coordinates linked to the datasource's place instance: Aaron Marcuse-Kubitza
03:37 PM Revision 5968: sql_io.py: put_table(): Switched back to using run_query_into()'s add_pkey_ option now that it uses sql.add_pkey_or_index() instead of sql.add_pkey(): Aaron Marcuse-Kubitza
03:35 PM Revision 5967: sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.: Aaron Marcuse-Kubitza
03:24 PM Revision 5966: README.TXT: Data import: After import, removed steps to reinstall tnrs_accepted, tnrs_other because this would overwrite the versions of these staging tables that were used to build the current version of the database. (The backup should store a snapshot of the database's input data as it was used to build the database.) Note that these tables will still be reinstalled when the next import starts.: Aaron Marcuse-Kubitza
02:54 PM Revision 5965: Added validation/BIEN2_Analytical_DB_overview.docx: Aaron Marcuse-Kubitza
02:54 PM Revision 5964: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
12:00 PM Revision 5963: inputs/.TNRS/tnrs_*/create.sql: Fixed bug where need to COALESCE() the WHERE condition with false to ensure that a boolean value, not NULL, is negated by NOT. Otherwise, tnrs_other's query would exclude rows for which the condition returned NULL rather than false.: Aaron Marcuse-Kubitza
11:43 AM Revision 5962: schemas/vegbien.sql: make_analytical_db(): Truncate tables before (re-)populating them in case make_analytical_db() has already been run: Aaron Marcuse-Kubitza
11:40 AM Revision 5961: schemas/vegbien.sql: analytical_stem_view: Updated to use reference instead of party to store the datasource name: Aaron Marcuse-Kubitza
11:28 AM Revision 5960: schemas/vegbien.sql: Renamed make_analytical_stem() back to make_analytical_db(): Aaron Marcuse-Kubitza

11/01/2012

10:52 AM Revision 5959: import_all: After starting geoscrub import in the background, wait for make commands to scroll by before starting NCBI import: Aaron Marcuse-Kubitza
10:39 AM Revision 5958: mappings/VegCore-VegBIEN.csv: matched place: Also map verbatim place's geoscrub-related fields to the matched place, to link up with geoscrub's corresponding input place: Aaron Marcuse-Kubitza
10:22 AM Revision 5957: import_all: Removed explicit by_col=1 from datasources that don't require it for proper import. (It will still be set if the user provides it on the command line.): Aaron Marcuse-Kubitza
10:18 AM Revision 5956: schemas/functions.sql: Removed duplicate previous version of _and() with fewer parameters: Aaron Marcuse-Kubitza
10:14 AM Revision 5955: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
10:11 AM Revision 5954: schemas/vegbien.sql: method: Made reference_id NOT NULL so the method table would automatically be datasource-scoped: Aaron Marcuse-Kubitza
10:09 AM Revision 5953: mappings/VegCore-VegBIEN.csv: Renamed creator_ids to reference_id since they are now fkeys to reference: Aaron Marcuse-Kubitza
10:04 AM Revision 5952: schemas/vegbien.sql: Made creator_ids an fkey to reference instead of party, so that datasources are stored separately from people and to allow adding reference-type metadata (URL, copyright, etc.) for each datasource: Aaron Marcuse-Kubitza
09:51 AM Revision 5951: schemas/vegbien.sql: reference: Added unique index: Aaron Marcuse-Kubitza
09:47 AM Revision 5950: mappings/VegCore-VegBIEN.csv: Remapped accordingTo to taxonconcept.concept_reference_id: Aaron Marcuse-Kubitza
09:46 AM Revision 5949: mappings/VegCore-VegBIEN.csv: Don't populate taxonlabel.creator_id using identifiedBy, because that was meant for creating a taxonconcept when the accordingTo was not specified, but taxonconcepts are now modeled differently using a taxonconcept subclass of taxonlabel with a concept_reference_id: Aaron Marcuse-Kubitza
09:36 AM Revision 5948: schemas/vegbien.sql: taxonverbatim: Renamed binomial to taxonname because this field is actually the taxonomic name without the author, not just the genus+specific epithet binomial: Aaron Marcuse-Kubitza
09:29 AM Revision 5947: mappings/VegCore.csv: Renamed *Binomial to *TaxonName because this field can store more ranks than just the genus+specificEpithet binomial (that goes in speciesBinomial): Aaron Marcuse-Kubitza
09:22 AM Revision 5946: mappings/VegCore.csv: Renamed taxonName to taxonNameOrEpithet for clarity: Aaron Marcuse-Kubitza
09:16 AM Revision 5945: schemas/vegbien.sql: analytical_aggregate_view: Added growthForm, cultivated, cultivatedBasis from analytical_stem. Note that this will create multiple entries for a taxon name when some taxonoccurrences are marked as cultivated and others aren't, or different taxonoccurrences have different growthform entries.: Aaron Marcuse-Kubitza
09:01 AM Revision 5944: import_all: Added geoscrub import, which can happen concurrently with NCBI/TNRS but must come before the main datasources for the matched places to link up properly: Aaron Marcuse-Kubitza
08:59 AM Revision 5943: import_all: Documented that TNRS import must come after NCBI for cross links to be made: Aaron Marcuse-Kubitza
08:53 AM Revision 5942: input.Makefile: Staging tables installation: `%.sql: _MySQL/%.sql`, etc.: Only run if target does not exist, to avoid regenerating the target when a .make script which generates the target's prerequisite is checked out from svn (causing its mod time to be newer than the target): Aaron Marcuse-Kubitza
08:24 AM Revision 5941: Added inputs/.geoscrub/: Aaron Marcuse-Kubitza
08:21 AM Revision 5940: input.Makefile: SVN: $(*svnFilesGlob): Added */, _MySQL/ subdirs: Aaron Marcuse-Kubitza
08:20 AM Revision 5939: lib/common.Makefile: SVN: $(add*): Fixed bug where need to use --depth=empty to ensure directory contents are not added unless explicitly listed: Aaron Marcuse-Kubitza
08:09 AM Revision 5938: mappings/VegCore-VegBIEN.csv: Mapped georeferenceValid, latLongInCountry, latLongInStateProvince: Aaron Marcuse-Kubitza
08:05 AM Revision 5937: schemas/functions.sql: Added _and(): Aaron Marcuse-Kubitza
07:40 AM Revision 5936: mappings/VegCore-VegBIEN.csv: Mapped distanceToCountry_km, distanceToStateProvince_km: Aaron Marcuse-Kubitza
07:39 AM Revision 5935: schemas/vegbien.sql: Added _km_to_m(): Aaron Marcuse-Kubitza
07:36 AM Revision 5934: schemas/vegbien.sql: Added _km_to_m(): Aaron Marcuse-Kubitza
07:31 AM Revision 5933: schemas/vegbien.sql: place: Added georeference_valid, distance_to_country_m, distance_to_state_m: Aaron Marcuse-Kubitza
07:18 AM Revision 5932: mappings/VegCore-VegBIEN.csv: matched taxonlabel's ancestors: Only create the cross links to NCBI if the name is accepted (taxonIsCanonical): Aaron Marcuse-Kubitza
06:55 AM Revision 5931: mappings/VegCore-VegBIEN.csv: accepted* taxonomic terms: If is canonical name, don't populate a separate accepted taxonlabel because the taxonlabel itself is accepted. If is not canonical name, populate matched_label_id directly when the row is created instead of waiting until after all rows are created and using _taxonlabel_set_matched_label_id(). This is possible now that accepted taxonlabels occur only in TNRS's tnrs_accepted table, which is specially marked as taxonIsCanonical. Avoiding _taxonlabel_set_matched_label_id() should fix the slowdown that occurs when it's used on large tables.: Aaron Marcuse-Kubitza
06:48 AM Revision 5930: mappings/VegCore-VegBIEN.csv: taxonIsCanonical: Moved then=0 into non-identifying _if statement attributes: Aaron Marcuse-Kubitza
06:41 AM Revision 5929: mappings/VegCore-VegBIEN.csv: Mapped taxonIsCanonical: Aaron Marcuse-Kubitza
06:41 AM Revision 5928: bin/map: map_rows(): map_table(): Fixed bug where metadata values were being removed prematurely, by passing them through: Aaron Marcuse-Kubitza
06:40 AM Revision 5927: bin/map: map_rows(): Fixed bug where metadata values were being passed to functions that expected columns, by placing them directly in the XML import tree and then removing them from the mappings: Aaron Marcuse-Kubitza
05:51 AM Revision 5926: input.Makefile: Maps validation: %/new_terms.csv: Filter out metadata values, which are not new terms even though they are in the same map column as the input columns: Aaron Marcuse-Kubitza
05:46 AM Revision 5925: inputs/.TNRS/tnrs_accepted/map.csv: Set taxonIsCanonical to true, using new metadata value syntax: Aaron Marcuse-Kubitza
05:40 AM Revision 5924: mappings/VegCore.csv: Added taxonIsCanonical: Aaron Marcuse-Kubitza
05:37 AM Revision 5923: input.Makefile: SVN: $(svnFiles): Added _MySQL files that would otherwise be excluded with $(filter-out _% ...): Aaron Marcuse-Kubitza
05:34 AM Revision 5922: inputs/*/: Placed all logs/ subdirs and import_order.txt under version control, using `make inputs/.../add`: Aaron Marcuse-Kubitza
05:31 AM Revision 5921: input.Makefile: SVN: add: Factored list of files to add out into $(svnFiles) var: Aaron Marcuse-Kubitza
05:30 AM Revision 5920: Added inputs/.TNRS/tnrs/header.csv: Aaron Marcuse-Kubitza
05:28 AM Revision 5919: input.Makefile: SVN: $(svnFilesGlob): Added *.make: Aaron Marcuse-Kubitza
05:21 AM Revision 5918: input.Makefile: SVN: add: Fixed bug where _-prefixed dirs (such as _MySQL) were incorrectly being included in the list of files to put under version control: Aaron Marcuse-Kubitza
05:15 AM Revision 5917: Calls to `make inputs/.TNRS/cleanup`: Do `make inputs/.TNRS/tnrs_accepted/reinstall; make inputs/.TNRS/tnrs_other/reinstall` instead to use new split TNRS tables: Aaron Marcuse-Kubitza
05:04 AM Revision 5916: inputs/.TNRS/schema.sql: Removed no longer needed tnrs_accepted_names index, now that the accepted names have been split into a separate table: Aaron Marcuse-Kubitza
05:00 AM Revision 5915: inputs/.TNRS/: Split tnrs table into tnrs_accepted and tnrs_other, with tnrs_accepted imported first. This will eventually allow TNRS data to be imported without needing to use _taxonlabel_set_matched_label_id(), which is extremely slow due to an unknown bug in PostgreSQL's handling of UPDATEs on large tables (possibly limited to UPDATEs inside functions).: Aaron Marcuse-Kubitza
05:00 AM Revision 5914: input.Makefile: SVN: add: Fixed bug where logs/*.log.sql was incorrectly being included in the list of files to put under version control: Aaron Marcuse-Kubitza
04:53 AM Revision 5913: lib/common.Makefile: Filesystem: $(wildcard/): Fixed bug where {} exprs weren't handled correctly when some branches contained *, by using `shopt -s nullglob` and bash directly instead of trying to emulate it with grep: Aaron Marcuse-Kubitza
04:44 AM Revision 5912: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also provide the schema to create.sql in :schema: Aaron Marcuse-Kubitza
04:16 AM Revision 5911: bin/map: Added support for including literal metadata values in the map spreadsheet, by prefixing them with ':': Aaron Marcuse-Kubitza
03:56 AM Revision 5910: mappings/VegCore.csv: Added distanceToCountry_km, distanceToStateProvince_km: Aaron Marcuse-Kubitza
03:45 AM Revision 5909: mappings/VegCore.csv: Added distanceToCountry_m, distanceToStateProvince_m: Aaron Marcuse-Kubitza
03:36 AM Revision 5908: mappings/VegCore-VegBIEN.csv: Mapped acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude. Mapped decimalLatitude/Longitude to matched place's coordinates when acceptedDecimalLatitude/Longitude not provided (as is the case for the geoscrub table).: Aaron Marcuse-Kubitza
03:22 AM Revision 5907: input.Makefile: SVN: $(svnFilesGlob): Added *schema*.sql: Aaron Marcuse-Kubitza
02:57 AM Revision 5906: mappings/VegCore.csv: Replaced georeferenceInvalid with latLongDomainValid, latLongDomainInvalid because this field is actually just whether the coordinates are in the range of valid decimal values, not whether they are within the named place reference. Re-sourced georeferenceValid to bien_web.observation#isGeovalid rather than isValidLatLong. VegCore-VegBIEN.csv: Replaced now-removed georeferenceInvalid with latLongInCountry for locationID redirection checks.: Aaron Marcuse-Kubitza
01:54 AM Revision 5905: mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided: Aaron Marcuse-Kubitza
01:06 AM Revision 5904: mappings/VegCore-VegBIEN.csv: accepted* taxonlabel: Removed ancestor hierarchy because this is populated, in much greater detail, when the accepted name is imported as an input name and the TNRS-parsed components are available: Aaron Marcuse-Kubitza
12:55 AM Revision 5903: sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.: Aaron Marcuse-Kubitza
12:36 AM Revision 5902: sql.py: distinct_table(): Changed comment about distinct_on column index to include just the input table, so that the function does not guarantee a unique index on the output table's distinct_on columns: Aaron Marcuse-Kubitza
12:15 AM Revision 5901: mappings/VegCore.csv: Added acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude: Aaron Marcuse-Kubitza

10/31/2012

11:57 PM Revision 5900: mappings/VegCore.csv: Renamed latLongValid, latLongInvalid to georeferenceValid, georeferenceInvalid to correspond to DwC term georeferenceVerificationStatus: Aaron Marcuse-Kubitza
11:45 PM Revision 5899: mappings/VegCore.csv: Added latLongValid, latLongInvalid, latLongInCountry, latLongInStateProvince: Aaron Marcuse-Kubitza
11:14 PM Revision 5898: input.Makefile: Staging tables installation: Treat any .sql file whose name contains (not just ends with) "schema" as a schema file and sort it before other .sql files: Aaron Marcuse-Kubitza
10:17 PM Revision 5897: schemas/vegbien.sql: Functions containing UPDATE statements: Use quote_nullable() instead of quote_literal() to properly encode NULL values: Aaron Marcuse-Kubitza
10:10 PM Revision 5896: schemas/vegbien.sql: Functions containing UPDATE statements: Use PL/pgSQL's EXECUTE statement to avoid caching query plans. This is necessary because as the table grows over time, the optimal query plan may change.: Aaron Marcuse-Kubitza
10:05 PM Revision 5895: sql_io.py: put_table(): ensure_cond(): When deleting rows rows that do not satisfy the condition, handle sql.DoesNotExistExceptions caused by columns in the condition that were not replaced with NULL. These occur when out_table is a function, and the columns of the table the condition relates to therefore can't be found using out_table.: Aaron Marcuse-Kubitza
09:59 PM Revision 5894: sql_io.py: put_table(): Calling function: Do not cache the function call, because it may be retried after error handling: Aaron Marcuse-Kubitza
09:58 PM Revision 5893: sql_gen.py: NotCond: Treat a condition that evaluates to NULL as false instead, so that the boolean effect of the condition is completely inverted: Aaron Marcuse-Kubitza
09:42 PM Revision 5892: sql_gen.py: null_as_str: Use new null instead of hardcoding 'NULL': Aaron Marcuse-Kubitza
09:41 PM Revision 5891: sql_gen.py: Added null: Aaron Marcuse-Kubitza
09:40 PM Revision 5890: sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged: Aaron Marcuse-Kubitza
08:45 PM Revision 5889: sql_io.py: put_table(): sql.DatabaseErrors: Factored exception-handling code out into handle_unknown_exc(), for use by other exception handlers: Aaron Marcuse-Kubitza
08:39 PM Revision 5888: sql_io.py: put_table(): ensure_cond(): is_function: Fixed bug where can't replace out_table_cols with NULL because out_table is a function, not a table: Aaron Marcuse-Kubitza

10/30/2012

04:59 PM Revision 5887: my2pg*: Turn off escape_string_warning because \-escaped strings are standard in MySQL: Aaron Marcuse-Kubitza
04:58 PM Revision 5886: my2pg.data: Turn off standard_conforming_strings like in my2pg: Aaron Marcuse-Kubitza
04:42 PM Revision 5885: my2pg: Also remove any CHARACTER SET modifier on a column definition: Aaron Marcuse-Kubitza
04:26 PM Revision 5884: schemas/vegbien.sql: analytical_aggregate_view: Make size classes cumulative ranges (stems above a certain DBH) rather than bins, per Brad's request: Aaron Marcuse-Kubitza
04:26 PM Revision 5883: schemas/vegbien.sql: analytical_aggregate_view: Make size classes cumulative ranges (stems above a certain DBH) rather than bins, per Brad's request: Aaron Marcuse-Kubitza
04:18 PM Revision 5882: input.Makefile: SVN: add: Add header override files with any extension, not just .csv: Aaron Marcuse-Kubitza
04:15 PM Revision 5881: README.TXT: Datasource setup: Replaced manual `svn add` commands with one `make inputs/<datasrc>/add` before committing to add newly-created files: Aaron Marcuse-Kubitza
04:00 PM Revision 5880: input.Makefile: SVN: add: Also add any *.sql, *when it's in a subdir*. This applies to create.sql, cleanup.sql, etc.: Aaron Marcuse-Kubitza
03:58 PM Revision 5879: lib/common.Makefile: SVN: Added $(add*): Aaron Marcuse-Kubitza
03:55 PM Revision 5878: input.Makefile: SVN: add: Also add any newly-created files which should be under version control: Aaron Marcuse-Kubitza
03:35 PM Revision 5877: input.Makefile: Fixed bug where _MySQL/%.sql files weren't being built from associated .make files by adding special `%.sql: %.sql.make` rule to override `%.sql: _MySQL/%.sql`: Aaron Marcuse-Kubitza
03:33 PM Revision 5876: input.Makefile: `%: %.make`: Factored $(if $(wildcard $@)... test out into $(make_script) so all `%: %.make`-like rules could use it directly: Aaron Marcuse-Kubitza
03:09 PM Revision 5875: lib/forwarding.Makefile: $(subdirs): Use all folders other than ../ ./ .svn/ instead of listing folders that start with . explicitly: Aaron Marcuse-Kubitza
02:31 PM Revision 5874: schemas/vegbien.sql: analytical_stem_view: Use accepted_taxonlabel.taxonomicname instead of accepted_taxonverbatim.taxonomicname in order to have the family prepended: Aaron Marcuse-Kubitza
12:41 PM Revision 5873: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
12:38 PM Revision 5872: schemas/vegbien.sql: placerank: Reordered in path order, using <http://rs.tdwg.org/dwc/terms/#dcindex> and <http://vegbank.org/vegbank/views/dba_fielddescription_detail.jsp?view=detail&wparam=1415&entity=dba_fielddescription&params=1415> as a guide. Documented the source of the values.: Aaron Marcuse-Kubitza
12:26 PM Revision 5871: schemas/vegbien.sql: placename: Removed non-name-related fields, because placename is designed only to store a hierarchy of placenames, not additional place information: Aaron Marcuse-Kubitza
12:23 PM Revision 5870: schemas/vegbien.sql: Moved placedescription from placename to place (and renamed it to description), because it applies to the place itself, not the name for the place: Aaron Marcuse-Kubitza
12:16 PM Revision 5869: schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species: Aaron Marcuse-Kubitza
12:13 PM Revision 5868: schemas/vegbien.sql: analytical_aggregate_view: Added coverPercent, which is the sum of all coverPercents for that species: Aaron Marcuse-Kubitza
12:03 PM Revision 5867: schemas/vegbien.sql: analytical_aggregate_view: Include all analytical_stem species, not just those whose stems have non-NULL DBH: Aaron Marcuse-Kubitza
11:57 AM Revision 5866: schemas/vegbien.sql: Renamed aggregated_analytical_db to analytical_aggregate to match the name of analytical_stem: Aaron Marcuse-Kubitza
11:55 AM Revision 5865: schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB: Aaron Marcuse-Kubitza
11:52 AM Revision 5864: schemas/vegbien.sql: Renamed analytical_db to analytical_stem since this contains just the individual stems, not the aggregated data in the main analytical DB: Aaron Marcuse-Kubitza
11:38 AM Revision 5863: schemas/vegbien.sql: Removed no longer used locationcoords: Aaron Marcuse-Kubitza
11:35 AM Revision 5862: schemas/vegbien.sql: analytical_db_view: Use new coordinates instead of locationcoords: Aaron Marcuse-Kubitza
11:23 AM Revision 5861: mappings/VegCore-VegBIEN.csv: Remapped latitude/longitude to new coordinates table: Aaron Marcuse-Kubitza
11:15 AM Revision 5860: schemas/vegbien.sql: place: Added coordinates_id: Aaron Marcuse-Kubitza
11:01 AM Revision 5859: schemas/vegbien.sql: Added coordinates table: Aaron Marcuse-Kubitza
10:40 AM Revision 5858: schemas/vegbien.sql: place: Removed municipality, site because they are not used in the geoscrubbing: Aaron Marcuse-Kubitza
10:19 AM Revision 5857: schemas/vegbien.sql: place: Place custom hierarchy of placenames in placename table instead of in otherranks field: Aaron Marcuse-Kubitza
10:04 AM Revision 5856: schemas/vegbien.sql: place.matched_place_id: Changed comment to say that places are linked in a three-level (instead of two-level) hierarchy of datasource place -> verbatim place -> accepted place, and this field contains the closest match: Aaron Marcuse-Kubitza
09:54 AM Revision 5855: schemas/vegbien.sql: Renamed placepath to place since this contains primary information about the place, including the reference to the canonical place: Aaron Marcuse-Kubitza
09:42 AM Revision 5854: schemas/vegbien.sql: Renamed place to placename since it refers just to a name for a place, without coordinates: Aaron Marcuse-Kubitza
07:18 AM Revision 5853: schemas/vegbien.sql: analytical_db_view: Exclude original taxondeterminations, so that there is only one taxondetermination for each taxonoccurrence: Aaron Marcuse-Kubitza
07:03 AM Revision 5852: schemas/vegbien.sql: make_analytical_db(): Also make new aggregated_analytical_db: Aaron Marcuse-Kubitza
07:02 AM Revision 5851: schemas/vegbien.sql: sync_analytical_db_to_view(): DROP TABLE: Use IF EXISTS in case analytical_db table has already been deleted, or not yet created: Aaron Marcuse-Kubitza
07:01 AM Revision 5850: schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view()): Aaron Marcuse-Kubitza
07:01 AM Revision 5849: schemas/vegbien.sql: Added aggregated_analytical_db_view and materialized table aggregated_analytical_db (synced using sync_aggregated_analytical_db_to_view()): Aaron Marcuse-Kubitza
06:56 AM Revision 5848: lib/PostgreSQL-MySQL.csv: custom types: Also match column names enclosed in ``: Aaron Marcuse-Kubitza
06:49 AM Revision 5847: schemas/vegbien.sql: Store materialized analytical_db table in schema so aggregating views can reference it. Added sync_analytical_db_to_view() to maintain analytical_db table.: Aaron Marcuse-Kubitza
06:30 AM Revision 5846: schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated: Aaron Marcuse-Kubitza
06:23 AM Revision 5845: schemas/vegbien.sql: analytical_db_view: Reordered columns in path order: Aaron Marcuse-Kubitza
06:04 AM Revision 5844: schemas/: Moved unit conversion functions from functions.sql to vegbien.sql so the unit conversion functions used by analytical_db_view wouldn't need to be stored both in functions.sql and in vegbien.sql. (All unit conversion functions used by analytical_db_view must be stored in the public schema so that analytical_db_view doesn't get cascadingly deleted when the functions schema is reinstalled.): Aaron Marcuse-Kubitza
05:52 AM Revision 5843: schemas/vegbien.sql: analytical_db_view: Use public._m2_to_ha() instead of functions._m2_to_ha(): Aaron Marcuse-Kubitza
05:51 AM Revision 5842: schemas/vegbien.sql: Copied _m2_to_ha() to public schema for use by analytical_db_view: Aaron Marcuse-Kubitza
05:40 AM Revision 5841: schemas/vegbien.sql: analytical_db_view: Added diameterBreastHeight_cm: Aaron Marcuse-Kubitza
05:38 AM Revision 5840: schemas/functions.sql, vegbien.sql: Added _m_to_cm(): Aaron Marcuse-Kubitza
05:07 AM Revision 5839: schemas/vegbien.sql: Copied _cm_to_m() to public schema for use by new aggregated_analytical_db_view: Aaron Marcuse-Kubitza
04:19 AM Revision 5838: schemas/vegbien.sql: analytical_db_view: datasource table: Fixed bug where need to filter by creator_id = party_id in order to use just root parties (datasources): Aaron Marcuse-Kubitza
03:40 AM Revision 5837: tnrs_db: Fetching names to scrub: Omit sql.select() fields param because it will be filled in with its default value: Aaron Marcuse-Kubitza
03:29 AM Revision 5836: import_all: Pass command-line args (such as make vars) to all commands, not just with_all, so that a custom public schema is properly used by all commands: Aaron Marcuse-Kubitza
02:57 AM Revision 5835: inputs/.NCBI/nodes/create.sql: Make genus completely globally unique by removing duplicates. Note that only duplicates with ranks at or below the genus level need be removed, which for this dataset is just genus and subgenus.: Aaron Marcuse-Kubitza
02:00 AM Revision 5834: schemas/vegbien.sql: taxonlabel: taxonlabel_required_key constraint: Also allow taxonlabels with just a sourceaccessioncode, to support looking up parent taxonlabels using just their sourceaccessioncode (e.g. in NCBI): Aaron Marcuse-Kubitza
01:23 AM Revision 5833: mappings/VegCore-VegBIEN.csv: matched taxonlabel: Don't include taxonName in the concatenated taxonomicname. This also prevents the creation of the matched taxonlabel entirely when only the taxonName is provided.: Aaron Marcuse-Kubitza
01:20 AM Revision 5832: mappings/VegCore-VegBIEN.csv: Don't create matched taxonlabel if taxonName was provided. This fixes a bug where an NCBI node was incorrectly pointing to a TNRS name, when the reference should only be the other way around. This may also fix the TNRS slowdown, if it was caused by circular matched_label_id references.: Aaron Marcuse-Kubitza
12:47 AM Revision 5831: schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_insert(): Fixed bug where also need to set canon_label_id based on matched_label_id here, not just in taxonlabel_2_set_canon_label_id_on_update(), because the matched_label_id could be specified when the taxonlabel is first created: Aaron Marcuse-Kubitza
12:34 AM Revision 5830: schemas/vegbien.sql: taxonlabel_2_set_canon_label_id_on_*(): Fixed bug where need to use := instead of = to perform assignment of canon_label_id: Aaron Marcuse-Kubitza
12:17 AM Revision 5829: schemas/tree_cross-links.sql: Updated for schema changes: Aaron Marcuse-Kubitza
12:16 AM Revision 5828: schemas/vegbien.sql: taxonlabel_update_ancestors(): Include ancestors for both parent_id and matched_label_id rather than just one or the other. This avoids needing to delete existing ancestors for the parent_id when a matched_label_id is added and overrides it. This should reduce the TNRS import time if the slowdown was due to the need to delete parent_id ancestors when later adding a matched_label_id (which only occurs in a separate step in the TNRS datasource).: Aaron Marcuse-Kubitza
12:07 AM Revision 5827: sql_io.py: put_table(): ensure_cond(): Fixed bug where test if any rows failed cond did not check if cur != None (which is the case when cond == sql_gen.true_expr) before checking cur.rowcount: Aaron Marcuse-Kubitza

10/29/2012

10:26 PM Revision 5826: sql_gen.py: simplify_expr(): Don't require () around NULL IS NULL and NULL IS NOT NULL because extra parentheses are not provided in index conditions, only in check constraint conditions: Aaron Marcuse-Kubitza
10:06 PM Revision 5825: inputs/import.stats.xls: Updated import times. The TNRS import has slowed down significantly, possibly due to a bug in the autopopulation of the taxonlabel_relationship table when the input data contains cycles.: Aaron Marcuse-Kubitza
09:37 PM Revision 5824: sql_io.py: put_table(): Assertion that into and full_in_table have the same row count: Allow into to have more rows than full_in_table, in case an input row matched multiple output rows. This should not happen for a properly-configured database, but seems to happen periodically nevertheless (currently, to the MO datasource) and should not abort the import when it does.: Aaron Marcuse-Kubitza

10/26/2012

08:18 PM Revision 5823: sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist: Aaron Marcuse-Kubitza
08:10 PM Revision 5822: schemas/vegbien.sql: analytical_db_view: Renamed datasource's taxonverbatim to datasource_taxonverbatim to distinguish it from the other taxonverbatims that are joined on (parsed_taxonverbatim, accepted_taxonverbatim): Aaron Marcuse-Kubitza
07:18 PM Revision 5821: inputs/.NCBI/nodes/create.sql: Make genus (mostly) globally unique by removing kingdom Animalia, which has significant genus overlap with plants. This reduces the number of duplicated genera from 578 to 65 (determined with `SELECT name_txt, count(*), array_agg(rank) FROM "NCBI".nodes GROUP BY name_txt HAVING count(*) > 1 AND 'genus' = ALL (array_agg(rank))`).: Aaron Marcuse-Kubitza
07:08 PM Revision 5820: inputs/.NCBI/nodes/create.sql: Added foreign key on parent tax_id with covering index: Aaron Marcuse-Kubitza
07:06 PM Revision 5819: input.Makefile: Staging tables installation: Added %/uninstall, %/reinstall to allow reinstalling individual tables: Aaron Marcuse-Kubitza
06:00 PM Revision 5818: sql_io.py: put_table(): ensure_cond(): When adding the failed condition to the errors table, also include the original, untranslated condition from the DB schema in addition to the translation of the condition into the input schema: Aaron Marcuse-Kubitza
05:45 PM Revision 5817: sql_io.py: track_data_error(): Fixed bug where errors whose column had no srcs (indicated by () ) were incorrectly being ignored. This affected NOT NULL exceptions where the column was not provided by the dataset.: Aaron Marcuse-Kubitza
05:38 PM Revision 5816: sql_gen.py: If no cols had srcs, return [] instead of the [()] that itertools.product() would have returned: Aaron Marcuse-Kubitza
05:38 PM Revision 5815: sql_io.py: track_data_error(): Support errors with no columns by inserting a single entry with column set to NULL: Aaron Marcuse-Kubitza
05:35 PM Revision 5814: strings.py: Added join(): Aaron Marcuse-Kubitza
05:00 PM Revision 5813: sql_io.py: mk_errors_table(): Made "column" column nullable, because some errors (such as check constraint violations) don't have any corresponding columns if its columns weren't provided in the input data: Aaron Marcuse-Kubitza
04:35 PM Revision 5812: inputs/test_taxonomic_names/test_scrub: `make inputs/.TNRS/reinstall`: Use new $schema_only option so that an empty TNRS schema is installed rather than one containing inputs/.TNRS/data.sql: Aaron Marcuse-Kubitza
04:34 PM Revision 5811: inputs/.TNRS/: Added data.sql containing the test_taxonomic_names TNRS results, so that a new installation of VegBIEN will contain the necessary data to make the tests pass, including the TNRS import test: Aaron Marcuse-Kubitza
04:32 PM Revision 5810: input.Makefile: Staging tables installation: If $schema_only option is set, only install .sql files ending in schema.sql: Aaron Marcuse-Kubitza
04:24 PM Revision 5809: inputs/Makefile: $(rsyncLogs): Use $(rsync) instead of $(rsync*) now that it supports excluding just temp files and .svn rather than all .*: Aaron Marcuse-Kubitza
04:21 PM Revision 5808: lib/common.Makefile: rsync: $(rsync): Exclude .svn, *#, and .DS_Store rather than all .* because dirs beginning with . created by the user (such as .NCBI, .TNRS) should be included in the sync: Aaron Marcuse-Kubitza
04:18 PM Revision 5807: Added inputs/REMIB/Specimen.src/.map.csv.last_cleanup: Aaron Marcuse-Kubitza
04:10 PM Revision 5806: Added inputs/bien_web/observation/+header.csv: Aaron Marcuse-Kubitza
04:09 PM Revision 5805: input.Makefile: Staging tables installation: $(dbExports): When putting schemas first, don't require a . before "schema" to allow the entire filename to be schema.sql: Aaron Marcuse-Kubitza
03:44 PM Revision 5804: inputs/test_taxonomic_names/_scrub/public.test_taxonomic_names.sql, TNRS.sql: Regenerated with schema and mappings changes: Aaron Marcuse-Kubitza
03:42 PM Revision 5803: inputs/.TNRS/tnrs/map.csv: Added _nullIf filter to remove "Unknown" values for Accepted_name_family: Aaron Marcuse-Kubitza
03:35 PM Revision 5802: README.TXT: Generate the local TNRS cache from the test_taxonomic_names rather than syncing it with the vegbiendev TNRS cache, so that the automated test's inserted row count stays the same regardless of the contents of the full-DB TNRS cache: Aaron Marcuse-Kubitza
03:34 PM Revision 5801: README.TXT: Backups: Added TNRS cache section: Aaron Marcuse-Kubitza
03:12 PM Revision 5800: inputs/.TNRS/tnrs/test.xml.ref: Accepted inserted row count using TNRS cache created from test_taxonomic_names. Using a standard set of names for the test ensures that the inserted row count will not change when the full-DB TNRS cache changes.: Aaron Marcuse-Kubitza
02:48 PM Revision 5799: inputs/.TNRS/schema.sql: tnrs_accepted_names: Prepend the Accepted_name_family to the taxonomic name that will be submitted back to TNRS for parsing, because TNRS input names now always include the family when it's provided: Aaron Marcuse-Kubitza
02:46 PM Revision 5798: inputs/.TNRS/schema.sql: tnrs_accepted_names: Use simpler array_to_string() instead of || and COALESCE() to put together the taxonomic name that will be submitted back to TNRS for parsing. Note that this requires defining an IMMUTABLE wrapper function for array_to_string(), because pg_catalog.array_to_string() is declared STABLE but indexes require functions to be IMMUTABLE (http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg156323.html).: Aaron Marcuse-Kubitza
02:42 PM Revision 5797: inputs/.TNRS/schema.sql: Don't hardcode the schema name: Aaron Marcuse-Kubitza
02:40 PM Revision 5796: input.Makefile: Staging tables installation: sql/install: Provide the datasource's schema to the script in :schema, so it can refer to its own elements explicitly when it's not possible to rely on the search_path. This is the case for functions that have the same signature as (and are intended to replace) a pg_catalog function, because the pg_catalog function will be used in preference to the datasource function regardless of the search_path.: Aaron Marcuse-Kubitza
02:06 PM Revision 5795: input.Makefile: Staging tables installation: $(cleanup): If a cleanup.sql is provided, *only* run it and don't do default cleanup, to allow tables to override rather than just add to default cleanup operations. This prevents the automatic replacement of certain strings (sql_io.null_strs) with NULL on TNRS, and keeps the TNRS cache mostly as it was output by the TNRS service. Note that empty strings are still replaced with NULL by COPY FROM in sql_io.append_csv(). This is necessary for TNRS import to work properly, because although '' generally means NULL, it is not treated that way by PostgreSQL.: Aaron Marcuse-Kubitza
01:53 PM Revision 5794: input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup: Aaron Marcuse-Kubitza
01:52 PM Revision 5793: input.Makefile: Staging tables installation: Moved custom cleanup.sql cleanup operations to main $(cleanup) function, so custom cleanup operations would run whenever any target (such as %/install) invokes $(cleanup), not just manually through %/cleanup: Aaron Marcuse-Kubitza

Also available in: Atom