Project

General

Profile

Statistics
| Revision:

# Date Author Comment
6493 11/30/2012 10:46 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: Changed JOINs to LEFT JOINs to include occurrences without taxondeterminations

6492 11/30/2012 10:21 AM Aaron Marcuse-Kubitza

export_analytical_db: Use 'NULL' as the NULL value instead of \N, because MySQL has problems with \N

6491 11/30/2012 09:57 AM Aaron Marcuse-Kubitza

publish_analytical_db: Load to bien3_adb instead of bien_web

6490 11/29/2012 05:41 PM Aaron Marcuse-Kubitza

README.TXT: Data import: Added step to export analytical DB

6489 11/29/2012 01:11 PM Aaron Marcuse-Kubitza

root Makefile: $(postgres-Linux): Fixed bug where need $(asAdmin) before commands to rename existing *.conf

6488 11/29/2012 01:01 PM Aaron Marcuse-Kubitza

root Makefile: $(postgres-Linux): Also install postgresql-contrib, which contains the hstore extension

6487 11/28/2012 06:18 PM Aaron Marcuse-Kubitza

Added inputs/NVS/

6486 11/28/2012 06:04 PM Aaron Marcuse-Kubitza

inputs/CVS/Organism/map.csv: Mapped accordingTo to "Weakley 2006"

6485 11/28/2012 06:02 PM Aaron Marcuse-Kubitza

inputs/NY/Specimen/map.csv: Omit UniqueNYInternalRecordNumber to avoid confusion since this is an internal-only ID. This makes InstitutionCode+CollectionCode+CatalogNumber the globally unique identifier instead.

6484 11/28/2012 06:00 PM Aaron Marcuse-Kubitza

README.TXT: Added Datasource refreshing section with instructions for refreshing VegBank

6483 11/28/2012 05:57 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed taxonconcept.concept_source_id back to concept_reference_id

6482 11/28/2012 05:52 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed soilobs to soilsample per working group discussion

6481 11/28/2012 05:27 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: verify: Fixed bug where need to use $ prefix before string to parse newline

6480 11/28/2012 05:27 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: verify: Fixed bug where need to use $ prefix before string to parse newline

6479 11/28/2012 05:25 PM Aaron Marcuse-Kubitza

inputs/NY/verify/: svn:ignore .csv files

6478 11/28/2012 05:25 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Also svn:ignore .csv files

6477 11/28/2012 02:47 PM Aaron Marcuse-Kubitza

export_analytical_db: Export NULL as \N to work with MySQL

6476 11/28/2012 01:22 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_*: Added index on NOT NULL columns, starting with institutionCode

6475 11/28/2012 01:19 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_*: Removed primary keys and NOT NULL constraints on columns that sometimes have NULL values

6474 11/28/2012 01:08 PM Aaron Marcuse-Kubitza

publish_analytical_db: Added CSV dialect information

6473 11/28/2012 12:42 PM Aaron Marcuse-Kubitza

root Makefile: PostgreSQL: $(postgresReload-*): Rename existing *.conf to *.conf.old

6472 11/27/2012 06:44 PM Aaron Marcuse-Kubitza

publish_analytical_db: Use LOAD DATA LOCAL INFILE instead of LOAD DATA INFILE to avoid needing FILE permissions on bien_web

6471 11/27/2012 01:17 PM Aaron Marcuse-Kubitza

Added publish_analytical_db

6470 11/27/2012 12:43 PM Aaron Marcuse-Kubitza

export_analytical_db: Append the public schema version to the CSV filename

6469 11/27/2012 12:27 PM Aaron Marcuse-Kubitza

backups/Makefile: $(rsyncBackups): Added *.csv

6468 11/26/2012 06:12 PM Aaron Marcuse-Kubitza

Added export_analytical_db

6467 11/26/2012 06:10 PM Aaron Marcuse-Kubitza

backups/: Ignore _* and *.csv

6466 11/26/2012 01:35 PM Aaron Marcuse-Kubitza

make_analytical_db: mk_analytical_table(): Use explicit schema references everywhere. This fixes a bug where the TRUNCATE/INSERT steps on the public schema's table would reference the analytical_db view instead because they were not schema-scoped.

6465 11/26/2012 01:33 PM Aaron Marcuse-Kubitza

make_analytical_db: mk_analytical_table(): Factored table references in different schemas out into vars

6464 11/25/2012 09:31 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: recordNumber: Combine identifying fields in taxonoccurrence, plantobservation, and stemobservation to ensure that this field is unique within the plot and not NULL

6463 11/25/2012 09:13 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

6462 11/25/2012 08:52 PM Aaron Marcuse-Kubitza

make_analytical_db: Moved set -x () around just psql_verbose_vegbien so embedded $() expressions wouldn't also be in set -x (verbose) mode

6461 11/25/2012 08:49 PM Aaron Marcuse-Kubitza

make_analytical_db: Fixed bug where need to use bash instead of sh because vegbien_dest requires it

6460 11/25/2012 08:37 PM Aaron Marcuse-Kubitza

make_analytical_db: Factored analytical_* table creation code out into mk_analytical_table() function

6459 11/25/2012 08:28 PM Aaron Marcuse-Kubitza

make_analytical_db: Create analytical_db views pointing to the analytical_* versions in the public schema

6458 11/25/2012 08:21 PM Aaron Marcuse-Kubitza

vegbien_dest: $schemas: Removed analytical_db because views that will be added to it were shadowing public schema tables with the same names during population of those tables in make_analytical_db

6457 11/25/2012 07:47 PM Aaron Marcuse-Kubitza

vegbien_dest: Export $public, to make sure it's available to any invoked scripts as an env var

6456 11/25/2012 07:45 PM Aaron Marcuse-Kubitza

vegbien_dest: $schemas: Added analytical_db

6455 11/25/2012 07:38 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Added separate tab with stats for 2012-6~9. The Excel format apparently only supports 255 columns, so previous imports had been silently truncated off. Note that once the 2012-10 imports reach column 255, a new tab will need to be created with the 2012-10+ imports.

6454 11/25/2012 07:20 PM Aaron Marcuse-Kubitza

bin/map: in_is_db: by_col: Clearing errors table: Skip this if the table has been set to None because it didn't exist (and thus was a metadata-only map spreadsheet)

6453 11/25/2012 06:54 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: scientificNameWithMorphospecies: Fixed bug where need to use the specific_epithet from the accepted_taxonverbatim rather than the parsed_taxonverbatim

6452 11/25/2012 06:45 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: scientificNameWithMorphospecies: Include the family any time the genus is not specified, instead of just when accepted_taxonlabel.rank = 'family'. These should have the same effect since TNRS includes the rank, but using COALESCE is clearer.

6451 11/25/2012 06:41 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: scientificNameWithMorphospecies: Changed to also include morphospecies when just the family is specified

6450 11/25/2012 06:35 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: Fixed bug where location.authorlocationcode needed to be used as the plotName when location.sourceaccessioncode was not provided, to ensure that plotName would be NOT NULL

6449 11/25/2012 06:20 PM Aaron Marcuse-Kubitza

inputs/FIA/import_order.txt: Fixed bug where FIA_COND_unique needed to be explicitly included in import_order.txt now that we're using import_order.txt to import the Source metadata table before the data tables

6448 11/25/2012 06:15 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

6447 11/24/2012 03:07 PM Aaron Marcuse-Kubitza

root Makefile: PostgreSQL: $(postgresReload-Linux): Try chmoding both as your user and as the bien user

6446 11/24/2012 02:46 PM Aaron Marcuse-Kubitza

input.Makefile: Testing: $(runTest): Ignore failed diffs when the test is compared to another test's output (e.g. in by_col mode)

6445 11/24/2012 02:41 PM Aaron Marcuse-Kubitza

bin/map: in_is_db: If table does not exist, set table to None so that db_xml.put_table() doesn't try to access it. This fixes a bug in metadata-only map spreadsheets under column-based import.

6444 11/24/2012 02:40 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Support None in_table by calling put() directly

6443 11/24/2012 02:29 PM Aaron Marcuse-Kubitza

Removed no longer used geoscrub.*.sql. Use geoscrub_output instead.

6442 11/24/2012 02:27 PM Aaron Marcuse-Kubitza

Removed no longer used geoscrub_cleaned_unique. Use geoscrub_output instead.

6441 11/24/2012 02:25 PM Aaron Marcuse-Kubitza

Removed no longer used geoscrub_cultivated. Use analytical_stem_view.cultivated instead.

6440 11/24/2012 02:25 PM Aaron Marcuse-Kubitza

Removed no longer used geoscrub_cultivated. Use analytical_stem_view.cultivated instead.

6439 11/24/2012 02:23 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: cultivated: Removed BIEN2's geoscrub_cultivated, which has now been replaced by the primary corresponding scripts (and never had particularly many matches to the locations in any case)

6438 11/24/2012 02:14 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: cultivated: Use OR instead of _or() to combine cultivated_family_locations.country IS NOT NULL with the other values, because this field's false value should not be used in place of NULL if all the other values are NULL, as it would be with _or(). (cultivated_family_locations.country IS NOT NULL can indicate presence, but not absence, of cultivated status.)

6437 11/24/2012 02:06 PM Aaron Marcuse-Kubitza

schemas/functions.sql, vegbien.sql: _and(), _or(): Added comment comparing the function and the corresponding logical operator

6436 11/24/2012 01:50 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public: Added _or(), for use by analytical_stem_view

6435 11/24/2012 01:48 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: cultivated: Also set if family/country combination found in cultivated_family_locations

6434 11/24/2012 01:39 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: cultivated_family_locations: Added data from nimoy:/home/boyle/bien2/geoscrub/cultivated/cult_by_taxon/flag_by_taxa.inc

6433 11/24/2012 01:33 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added cultivated_family_locations to store locations where various taxon families are considered cultivated

6432 11/24/2012 01:24 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped locality description fields to location.iscultivated using _locationnarrative_is_cultivated()

6431 11/24/2012 01:23 PM Aaron Marcuse-Kubitza

xml_func.py: Simplifying functions: Added passthru entries for _and, _or

6430 11/24/2012 01:06 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added _locationnarrative_is_cultivated()

6429 11/24/2012 12:57 PM Aaron Marcuse-Kubitza

lib/PostgreSQL-MySQL.csv: Change text to varchar(255) because text columns can't be used in indexes in MySQL

6428 11/24/2012 12:51 PM Aaron Marcuse-Kubitza

lib/PostgreSQL-MySQL.csv: Resaved in Excel, which removed unnecessary quotes around fields

6427 11/24/2012 12:22 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_aggregate: Added identifiedBy, which is no longer a scoping field (which would prevent scientificNameWithMorphospecies from being unique) now that there is only one taxondetermination for each taxonoccurrence

6426 11/24/2012 12:05 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: dateCollected: For plots data, use the locationevent obsstartdate instead of the collectiondate in order to group taxonoccurrences/stems from the same locationevent together

6425 11/24/2012 11:59 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_* pkeys: Added dateCollected because the records are actually unique within the location*event*, not the location

6424 11/24/2012 11:57 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: Exclude records with no collectiondate or obsstartdate, which is required to uniquely identify a record

6423 11/24/2012 11:54 AM Aaron Marcuse-Kubitza

analytical_stem_view: dateCollected: Use locationevent.obsstartdate when aggregateoccurrence.collectiondate is not provided

6422 11/24/2012 11:37 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: Include only the current taxondetermination for each taxonoccurrence, to avoid cross-joining taxondeterminations with stems and thus multiplying the number of rows for datasources that have multiple taxondeterminations per taxonoccurrence

6421 11/24/2012 11:33 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxondetermination: Added AFTER trigger to set the current taxondetermination for the taxonoccurrence

6420 11/24/2012 11:11 AM Aaron Marcuse-Kubitza

lib/PostgreSQL-MySQL.csv: Statements ending in ";": When matching any character, use .*? (with the (?s) flag) instead of [^;]* in order to allow embedded ; to be matched. This fixes a bug where a CREATE VIEW statement was not removed because it contained an embedded ; .

6419 11/24/2012 11:06 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxondetermination: Added unique index to ensure that there is only one current determination for each taxonoccurrence

6418 11/24/2012 11:05 AM Aaron Marcuse-Kubitza

lib/PostgreSQL-MySQL.csv: Remove indexes with WHERE clauses

6417 11/24/2012 10:34 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_aggregate: Added primary key on institutionCode, plotName, scientificNameWithMorphospecies, recordNumber. Note that this makes these fields NOT NULL, which should not be a problem because there are inner joins instead of LEFT JOINs on most of the tables which provide them, and LEFT JOINed tables have their identifying fields combined to create a NOT NULL value.

6416 11/24/2012 10:27 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: recordNumber: Combine identifying fields in taxonoccurrence, plantobservation, and stemobservation to ensure that this field is unique within the plot and not NULL

6415 11/24/2012 10:23 AM Aaron Marcuse-Kubitza

lib/PostgreSQL-MySQL.csv: Only match a statement-terminating ; when it's at the end of a line

6414 11/24/2012 10:02 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_aggregate: Added primary key on institutionCode, plotName, scientificNameWithMorphospecies. Note that this makes these fields NOT NULL, which should not be a problem because there are inner joins instead of LEFT JOINs on the tables which provide them.

6413 11/24/2012 09:21 AM Aaron Marcuse-Kubitza

db_xml.py: put(): _setDefault(): Delay the evaluation of each col_default's value until the col_default is actually retrieved. This fixes a bug in the source table mappings where the explicit source entry was being created after the col_default source entry, causing the initial entry, which did not have the additional fields populated, to be used instead.

6412 11/24/2012 09:14 AM Aaron Marcuse-Kubitza

dicts.py: Added WrapDict, a dict that runs a function on each value retrieved

6411 11/24/2012 08:59 AM Aaron Marcuse-Kubitza

db_xml.py: put(): _setDefault(): Fixed bug where need to copy col_defaults before calling update() on it, to avoid modifying the input value (which may be reused by the caller, expecting it to be unmodified)

6410 11/24/2012 08:54 AM Aaron Marcuse-Kubitza

db_xml.py: put(): col_defaults param: Fixed bug where need to use None as default value, because col_defaults will be modified by put() and the {} default value is a global instance

6409 11/24/2012 08:29 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: source table mappings: Set shortname to env var $source when it's not explicitly specified, because shortname is a required field of source

6408 11/24/2012 08:16 AM Aaron Marcuse-Kubitza

db_xml.py: put(): Pass through the values of nodes which are text nodes

6407 11/24/2012 08:15 AM Aaron Marcuse-Kubitza

db_xml.py: put(): put_(): Support setDefault() values which are text nodes, by passing text strings through when put() is run on all col_defaults entries

6406 11/24/2012 07:50 AM Aaron Marcuse-Kubitza

db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names

6405 11/24/2012 07:47 AM Aaron Marcuse-Kubitza

dicts.py: DictProxy: Implemented delitem()

6404 11/24/2012 07:32 AM Aaron Marcuse-Kubitza

bin/map: update_in_label(): Removed hardcoded source_id col_default, which is now set in mappings/VegCore-VegBIEN.csv's output root

6403 11/24/2012 07:29 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()

6402 11/24/2012 07:25 AM Aaron Marcuse-Kubitza

db_xml.py: put(): Added _setDefault() built-in function, which adds an entry to col_defaults

6401 11/24/2012 07:23 AM Aaron Marcuse-Kubitza

xml_func.py: _env(): Fixed bug where need to retrieve actual string value of name param using xml_dom.NodeTextEntryIter instead of NodeEntryIter

6400 11/24/2012 07:20 AM Aaron Marcuse-Kubitza

xml_func.py: _env(): Fixed bug where need to use xml_dom.replace_with_text() instead of xml_dom.replace() because replace() requires a DOM node

6399 11/24/2012 06:44 AM Aaron Marcuse-Kubitza

bin/map: update_in_label(): Set $source env var to the in_label (datasource name), to make it available to _env()

6398 11/24/2012 06:43 AM Aaron Marcuse-Kubitza

xml_func.py: Simplifying functions: Added _env()

6397 11/24/2012 06:05 AM Aaron Marcuse-Kubitza

Added inputs/VegBank/Source/, containing referenceType metadata

6396 11/24/2012 06:00 AM Aaron Marcuse-Kubitza

Added inputs/SpeciesLink/Source/, containing referenceType metadata

6395 11/24/2012 05:55 AM Aaron Marcuse-Kubitza

Added inputs/SALVIAS*/Source/, containing referenceType metadata

6394 11/24/2012 05:47 AM Aaron Marcuse-Kubitza

Added inputs/REMIB/Source/, containing referenceType metadata