Project

General

Profile

Statistics
| Revision:

# Date Author Comment
4570 09/11/2012 02:09 AM Aaron Marcuse-Kubitza

mappings/: Removed no longer needed Veg+-VegCore.to_self.csv, because multiple levels of mappings are no longer needed to get to the VegCore term

4569 09/11/2012 02:07 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: DescriptionOfSite: Mapped directly to locality rather than to locationNarrative to avoid needing multiple levels of mappings to get to the VegCore term

4568 09/11/2012 01:56 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed scientificNameAuthorship/_alt and scientificNameAuthorship redirection mappings, which were only used by SpeciesLink but it now has the necessary _alts in its own map.csv

4567 09/11/2012 01:48 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed dateCollected/_alt and dateCollected redirection mappings, which were only needed when multiple dateCollected fields were being combined in Veg+-VegCore.csv

4566 09/11/2012 01:45 AM Aaron Marcuse-Kubitza

mappings/: Moved year/month/dayCollected mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayCollected values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateCollected field is now higher than the year/month/dayCollected fields when both are specified, because the dateCollected field presumably contains verbatim text while the year/month/dayCollected fields contain parsed date parts.

4565 09/11/2012 01:32 AM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/Organism/map.csv: Remapped census_date to eventDate, since it is not the start of a range

4564 09/11/2012 01:31 AM Aaron Marcuse-Kubitza

inputs/Madidi/Plot/map.csv: Remapped First evaluation to eventDate, since it is not necessarily the start of a range

4563 09/11/2012 01:23 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: startDate, endDate mappings: Removed _dateRangeStart/_dateRangeEnd filters because these are assumed to already be start and end dates of a range. (eventDate should be used for concatenated date ranges.)

4562 09/11/2012 01:09 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Don't map dateCollected to locationevent.obsstartdate/obsenddate because this is the date the specimen was collected, not the date (range) of the entire collection event. This distinction may not be meaningful for specimens data, but VegBIEN should reflect what the data provider designated. This also reduces the number of dateCollected-related mappings needed for any dateCollected-related field, such as year/month/dayCollected.

4561 09/11/2012 12:55 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed dateIdentified/_alt and dateIdentified redirection mappings, which were only needed when multiple dateIdentified fields were being combined in Veg+-VegCore.csv

4560 09/11/2012 12:50 AM Aaron Marcuse-Kubitza

mappings/: Moved year/month/dayIdentified mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayIdentified values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateIdentified field is now higher than the year/month/dayIdentified fields when both are specified, because the dateIdentified field presumably contains verbatim text while the year/month/dayIdentified fields contain parsed date parts.

4559 09/11/2012 12:34 AM Aaron Marcuse-Kubitza

mappings/: Moved verbatimGrowthForm filter mapping from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic

4558 09/11/2012 12:28 AM Aaron Marcuse-Kubitza

inputs/UNCC/Specimen/map.csv, inputs/NCU-NCSC/Specimen/map.csv: Remapped cultivated fields directly via new cultivated term, rather than via establishmentMeans

4557 09/11/2012 12:06 AM Aaron Marcuse-Kubitza

sql_io.py: mk_errors_table(): Don't cache the sql.table_exists() query, because the table will be created and its existence must be rechecked

4556 09/11/2012 12:02 AM Aaron Marcuse-Kubitza

sql.py: table_exists(): Allow caller to set whether query will be cached. This is useful if the table will later be created and its existence should be checked again.

4555 09/11/2012 12:00 AM Aaron Marcuse-Kubitza

sql.py: tables(): Allow caller to set whether query will be cached

4554 09/10/2012 11:51 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped cultivated

4553 09/10/2012 11:47 PM Aaron Marcuse-Kubitza

inputs/TEAM/: Added _src/README.TXT with Brad's comments on which files to use

4552 09/10/2012 11:01 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added cultivated

4551 09/10/2012 10:35 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Removed manual VACUUM run because this is done as part of $(exportHeader), which calls $(cleanup)

4550 09/10/2012 10:34 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(cleanup): Append output to log

4549 09/10/2012 10:21 PM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added pass-through _date(timestamp) for datasource date columns that are already timestamps

4548 09/10/2012 10:12 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Fixed bug where embedded \ in ADD COLUMN statement was not removed by the shell, because single quotes do not remove embedded \s

4547 09/10/2012 09:55 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: Also rename taxonobservation.reference_id to taxonobservation_reference_id

4546 09/10/2012 09:51 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(logInstall*Add): Fixed bug where needed to only add -a flag for tee when tee was actually being used (in verbose mode), not when &> is used instead

4545 09/10/2012 09:49 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/header.csv: Updated for new renames in vegbank.~.clean_up.sql

4544 09/10/2012 09:34 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also log the output of commands run after create.sql

4543 09/10/2012 09:30 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Factored $(call logInstall,$*/) out into $(logInstall*)

4542 09/10/2012 09:25 PM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added pass-through _dateRangeStart(timestamp), _dateRangeEnd(timestamp) for datasource date columns that are already timestamps

4541 09/10/2012 09:23 PM Aaron Marcuse-Kubitza

inputs/VegBank/plantconcept_/header.csv: Updated for new renames in vegbank.~.clean_up.sql

4540 09/10/2012 09:11 PM Aaron Marcuse-Kubitza

inputs/VegBank/plantconcept_/create.sql: Use new plantconcept_plantnames()

4539 09/10/2012 09:09 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: plantconcept_plantnames(): Use SQL SELECT query and WITH clause (http://www.postgresql.org/docs/8.4/static/queries-with.html) instead of temp table, because PostgreSQL does not support using temp tables inside functions that are called repeatedly (http://archives.postgresql.org/pgsql-general/2006-02/msg00516.php; it results in an "out of shared memory" error)

4538 09/10/2012 08:30 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: Removed hardcoded schema name, which is set dynamically by input.Makefile using `SET search_path`

4537 09/10/2012 08:26 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: Added plantconcept_plantnames()

4536 09/10/2012 07:28 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Made function STABLE instead of IMMUTABLE because it accesses DB tables

4535 09/10/2012 07:21 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where the original plantconcept table's columns needed to be renamed, rather than the derived table plantconcept_'s. Note that this script runs before any derived tables are created, so this would be the wrong place for these statements if the derived table's columns did need to be renamed.

4534 09/10/2012 07:05 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(dbExports): Sort each group of .sql files in lexical order, since $(wildcard) apparently does not sort them that way automatically on vegbiendev

4533 09/10/2012 06:53 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. Corrected input row count of CTFS.TaxonOccurrence, which had been set to the inserted row count (which is right above it in the log file).

4532 09/10/2012 06:35 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonrank: Added comment documenting source of values

4531 09/07/2012 04:57 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/map.csv: Mapped observation_id to eventID

4530 09/07/2012 04:49 PM Aaron Marcuse-Kubitza

inputs/TEAM/: Added VL

4529 09/07/2012 04:43 PM Aaron Marcuse-Kubitza

inputs/VegBank/: Added taxonobservation_/

4528 09/07/2012 04:43 PM Aaron Marcuse-Kubitza

inputs/VegBank/: Added plantconcept_/

4527 09/07/2012 04:22 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Ignore errors if create.sql already added a primary key

4526 09/07/2012 04:12 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Provide the table name as a var (:table) to the query

4525 09/07/2012 03:56 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined

4524 09/07/2012 03:55 PM Aaron Marcuse-Kubitza

to_do/timeline.doc: Updated to reflect additional time that validations will take, and analytical DB's dependency on it

4523 09/07/2012 02:54 PM Aaron Marcuse-Kubitza

Added validation/

4522 09/07/2012 12:56 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Time the install

4521 09/07/2012 12:54 PM Aaron Marcuse-Kubitza

inputs/VegBank/: Added plantconcept_/

4520 09/07/2012 12:35 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Renamed ancestor_id output param to plantconcept_id for clarity and so it can be directly USING-joined with plantconcept on plantconcept_id

4519 09/07/2012 12:24 PM Aaron Marcuse-Kubitza

inputs/VegBank/: Added vegbank.~.utils.sql (which runs after vegbank.sql), for use by tables' create.sql scripts

4518 09/07/2012 10:57 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4517 09/07/2012 10:43 AM Aaron Marcuse-Kubitza

inputs/VegBank/: Added observation_/

4516 09/07/2012 10:31 AM Aaron Marcuse-Kubitza

inputs/VegBank/: Added vegbank.~.clean_up.sql (which runs after vegbank.sql), to prevent "cannot alter type of a column used by a view or rule" errors

4515 09/07/2012 10:14 AM Aaron Marcuse-Kubitza

inputs/VegBank/: Added plot_/

4514 09/07/2012 10:13 AM Aaron Marcuse-Kubitza

inputs/VegBank/: Added plot_/

4513 09/07/2012 10:13 AM Aaron Marcuse-Kubitza

inputs/VegBank/: Added logs

4512 09/07/2012 10:12 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Log the output to the install log, just like for other %/install targets

4511 09/07/2012 10:06 AM Aaron Marcuse-Kubitza

vegbien_dest: schemas: Added public explicitly, even though it's already in the default search_path, in order to shadow any datasource's tables of the same name as a VegBIEN table (such as in VegBank). (VegBIEN tables are referenced without a schema, while datasource tables are referenced with a schema, so collisions are not a problem after this fix.)

4510 09/07/2012 09:55 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: sql/install: Fixed bug where needed space before \ at end of line, because one is not automatically added in a recipe command (although it's added elsewhere)

4509 09/07/2012 09:51 AM Aaron Marcuse-Kubitza

sql.py: run_query(): DuplicateException: Also match "of relation" part of error message, so that parsed column name does not contain "of relation"

4508 09/07/2012 09:24 AM Aaron Marcuse-Kubitza

subtract: Made it case- and punctuation-insensitive

4507 09/07/2012 09:18 AM Aaron Marcuse-Kubitza

mappings/: Removed no longer needed Veg+.cs-VegBIEN.csv, which is now the same as Veg+-VegBIEN.csv which was derived from it

4506 09/07/2012 09:16 AM Aaron Marcuse-Kubitza

join: Documented that it's case- and punctuation-insensitive.

4505 09/07/2012 09:16 AM Aaron Marcuse-Kubitza

bin/map: map_table(): Refactored to map simplified to original column names first and then determine column index for each original name, in order to avoid trying to recover the original name from a simplified name where multiple original names might collide onto the same simplified name. Documented that it's case- and punctuation-insensitive.

4504 09/07/2012 09:11 AM Aaron Marcuse-Kubitza

intersect, union: Made case- and punctuation-insensitive. mappings/Veg+-VegBIEN.csv: Removed no longer needed duplicate entries for each first letter case, which must now be removed for case- and punctuation-insensitive intersect/union to work. Note that the SpeciesLink `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.

4503 09/07/2012 08:42 AM Aaron Marcuse-Kubitza

bin/map: map_table(): Resolve all mappings and prefixes after applying maps.simplify()

4502 09/07/2012 08:37 AM Aaron Marcuse-Kubitza

inputs/SpeciesLink/Specimen/map.csv: _alt all scientificNameAuthorship synonyms together in one _alt

4501 09/07/2012 08:27 AM Aaron Marcuse-Kubitza

schemas/functions.sql: _alt(): Added extra numbered parameters. Eventually these will need to be converted to variadic args, but this will require special support from column-based import.

4500 09/07/2012 07:26 AM Aaron Marcuse-Kubitza

join: Use new maps.simplify()

4499 09/07/2012 07:26 AM Aaron Marcuse-Kubitza

maps.py: Added simplify()

4498 09/07/2012 07:23 AM Aaron Marcuse-Kubitza

join: Match terms with non-alphanumeric chars removed

4497 09/07/2012 07:15 AM Aaron Marcuse-Kubitza

join: Match terms case-insensitively

4496 09/06/2012 11:17 PM Aaron Marcuse-Kubitza

Added inputs/TEAM/

4495 09/06/2012 10:55 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Creating the into table: into_out_pkey: If is_function, just use "result" as the output column name, without prefixing the function name. This shortens the table names of function calls on function calls, which need a fixed column name to detect which columns are function results and use just the table names for those columns.

4494 09/06/2012 10:32 PM Aaron Marcuse-Kubitza

input.Makefile: Documentation: $(steps): Fixed bug where import make target needed to be changed to new single-table import target

4493 09/06/2012 09:38 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Changed LEFT JOINs to JOINs where tables contain information that's required for the analytical DB. This should also enable the PostgreSQL query planner to make additional join optimizations, in the hopes of avoiding disk-space-intensive hash joins.

4492 09/06/2012 08:42 PM Aaron Marcuse-Kubitza

Replaced repr() with strings.urepr() (or equivalent) everywhere needed, to avoid future UnicodeEncodeErrors

4491 09/06/2012 08:30 PM Aaron Marcuse-Kubitza

Replaced str() with strings.ustr() (or equivalent) everywhere needed, to avoid future UnicodeEncodeErrors

4490 09/06/2012 08:03 PM Aaron Marcuse-Kubitza

sql.py: map_expr(): Replacing without quotes: Don't match unquoted name where it's preceded or followed by '.', because this could be a '.' embedded in a punctuation-containing column name, such as those frequently used by column-based import. Note that because database-internal names currently do not contain punctuation, this situation only occurs when a database-internal expression (such as a check constraint condition) is replaced in two steps, and the first step introduces punctuation-containing column names into the expression.

4489 09/06/2012 07:19 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: project: Don't require projectname to be specified when sourceaccessioncode is provided

4488 09/06/2012 07:14 PM Aaron Marcuse-Kubitza

sql_gen.py: ensure_not_null(): If type_ is set, cast the column to it if needed

4487 09/06/2012 06:56 PM Aaron Marcuse-Kubitza

README.TXT: Data import: Added testing steps to perform on local machine before running the import

4486 09/06/2012 06:49 PM Aaron Marcuse-Kubitza

README.TXT: Documentation: Redmine-formatted list of steps for column-based import: Updated make command for new table subdir name

4485 09/06/2012 06:27 PM Aaron Marcuse-Kubitza

sql.py: run_query(): Parse "types cannot be matched" error as MissingCastException to type text

4484 09/06/2012 06:10 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Creating the into table: Fixed bug where in_pkey and out_pkey names would collide if the output and input pkeys have the same name (as is the case for SALVIAS.projects). This entails changing out_pkey to new into_out_pkey wherever the into table's out_pkey is created or referenced.

4483 09/06/2012 05:06 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Combining output and input pkeys in inserted order: Changed sql_gen.Table to sql_gen.Col when creating the column references (they have a similar effect, so using the wrong type did not cause any tests to fail)

4482 09/06/2012 04:49 PM Aaron Marcuse-Kubitza

README.TXT: Added steps before the import to `svn up` and update the schemas

4481 09/06/2012 04:47 PM Aaron Marcuse-Kubitza

README.TXT: Merged Backups > After a new import and Data import sections into one Data import section that contains the steps to perform and back up an import. Note that many `svn diff` lines result from a change in indentation.

4480 09/06/2012 04:35 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Combining output and input pkeys in inserted order: Fixed bug where column references would be ambiguous if the output and input pkeys have the same name (as is the case for SALVIAS.projects)

4479 09/06/2012 04:21 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added _nullIf() overload where the type param has type text, to handle cases where row-based import auto-casts all args to text in response to a 'could not determine polymorphic type because input has type "unknown"' error

4478 09/06/2012 04:18 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: party: Removed party_datasource unique index because it was causing problems with column-based import (due to multiple unique indexes covering the same columns in different ways), and because it prevented creation of more than one party per organization

4477 09/06/2012 03:54 PM Aaron Marcuse-Kubitza

xml_func.py: _if(): Documented that it must be run to remove conditions that functions._if() can't handle

4476 09/06/2012 03:42 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Testing: Added step to test column-based import (by_col=1), because it is stricter about types than row-based import and sometimes fails when row-based import succeeds

4475 09/05/2012 09:18 AM Aaron Marcuse-Kubitza

schemas/functions.sql: _nullIf(): Polymorphically support other datatypes besides text

4474 09/05/2012 09:09 AM Aaron Marcuse-Kubitza

bin/map: Clearing errors table: Fixed bug where needed to check if sql_io.errors_table() returned None (indicating that the errors table didn't exist) before calling sql.drop_table()

4473 09/05/2012 09:04 AM Aaron Marcuse-Kubitza

bin/map: Clearing errors table: Fixed bug where needed to use sql.drop_table() instead of sql.truncate() now that errors tables are not created until column-based import runs

4472 09/05/2012 08:54 AM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: $(missingMappingsCmd): Fixed bug where need to use system's sort, not bin/sort, now that bin/ is added to the PATH by this makefile

4471 09/05/2012 08:34 AM Aaron Marcuse-Kubitza

inputs/SALVIAS/verify/plots.ref: Regenerated on PostgreSQL staging tables. The orders have changed slightly because this is derived from a PostgreSQL translation of the queries, with corresponding changes in collations and NULL sort orders. The counts have also changed slightly, possibly due to the changes Brad made to the salvias_plots database on nimoy after the initial version was downloaded. (The current counts are correct according to the current salvias_plots database.)