/ - Changes - BIEN 3 - NCEAS Projects

root @ 4587

#	Date	Author	Comment
4587	09/11/2012 06:43 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: %/.map.csv.last_cleanup: Canonicalize map.csv using $(mappings)/$(via).vocab.csv
4586	09/11/2012 06:40 AM	Aaron Marcuse-Kubitza	Added canon
4585	09/11/2012 06:29 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.
4584	09/11/2012 05:49 AM	Aaron Marcuse-Kubitza	Added mappings/Veg+.vocab.csv
4583	09/11/2012 04:41 AM	Aaron Marcuse-Kubitza	inputs/GBIF/Specimen/map.csv: Remapped Original fields to new verbatim taxonomic terms
4582	09/11/2012 04:31 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.
4581	09/11/2012 04:23 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added min/max SlopeAspect/SlopeGradient
4580	09/11/2012 04:13 AM	Aaron Marcuse-Kubitza	inputs/VegBank/plot_/map.csv: Omit reallatitude/reallongitude because private data should not be placed in a public database
4579	09/11/2012 04:10 AM	Aaron Marcuse-Kubitza	inputs/CVS/Organism/map.csv: Omit realLatitude/realLongitude because private data should not be placed in a public database. Keeping VegBIEN free of restricted-access data allows anyone to run arbitrary queries on the database, without needing an entire security mechanism/front end just to manage users' read-only access to the data (as VegBank has). Note that the private coordinates are still accessible in the staging tables, so they will need to be locked down in order to make VegBIEN secure to public access.
4578	09/11/2012 03:16 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Remapped QuadratID to subplotID because the standard definition of an ID term is an ID that's unique within the datasource, and it's just CTFS's usage that makes it unique only within the plot
4577	09/11/2012 03:13 AM	Aaron Marcuse-Kubitza	inputs/CTFS/StemObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID
4576	09/11/2012 03:09 AM	Aaron Marcuse-Kubitza	inputs/CTFS/SubplotObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID
4575	09/11/2012 03:06 AM	Aaron Marcuse-Kubitza	inputs/CTFS/Subplot/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID. Omit QuadratName because QuadratID is used for the same purpose.
4574	09/11/2012 02:57 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Removed recordNumber/_alt and recordNumber redirection mappings so that Veg+-VegCore.csv contains only renamings, not business logic. Note that removing the global ordering of these fields does not affect the datasources which contain multiple recordNumber synonyms because they either have a custom ordering or one field is duplicated or unused.
4573	09/11/2012 02:49 AM	Aaron Marcuse-Kubitza	inputs/NY/Specimen/map.csv: Omit CollectorNumber because it is not used, so it does not need to be mapped
4572	09/11/2012 02:45 AM	Aaron Marcuse-Kubitza	inputs/ARIZ/Specimen/map.csv: Omit FieldNumber because it is identical to CollectorNumber, so it does not need to be mapped
4571	09/11/2012 02:19 AM	Aaron Marcuse-Kubitza	inputs/SpeciesLink/Specimen/map.csv: Added manual CollectorNumber mapping which places it after recordNumber/fieldNumber, so that mappings/Veg+-VegCore.csv doesn't need to maintain a global ordering between these fields and just needs to indicate their equivalency
4570	09/11/2012 02:09 AM	Aaron Marcuse-Kubitza	mappings/: Removed no longer needed Veg+-VegCore.to_self.csv, because multiple levels of mappings are no longer needed to get to the VegCore term
4569	09/11/2012 02:07 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: DescriptionOfSite: Mapped directly to locality rather than to locationNarrative to avoid needing multiple levels of mappings to get to the VegCore term
4568	09/11/2012 01:56 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Removed scientificNameAuthorship/_alt and scientificNameAuthorship redirection mappings, which were only used by SpeciesLink but it now has the necessary _alts in its own map.csv
4567	09/11/2012 01:48 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Removed dateCollected/_alt and dateCollected redirection mappings, which were only needed when multiple dateCollected fields were being combined in Veg+-VegCore.csv
4566	09/11/2012 01:45 AM	Aaron Marcuse-Kubitza	mappings/: Moved year/month/dayCollected mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayCollected values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateCollected field is now higher than the year/month/dayCollected fields when both are specified, because the dateCollected field presumably contains verbatim text while the year/month/dayCollected fields contain parsed date parts.
4565	09/11/2012 01:32 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS-CSV/Organism/map.csv: Remapped census_date to eventDate, since it is not the start of a range
4564	09/11/2012 01:31 AM	Aaron Marcuse-Kubitza	inputs/Madidi/Plot/map.csv: Remapped First evaluation to eventDate, since it is not necessarily the start of a range
4563	09/11/2012 01:23 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: startDate, endDate mappings: Removed _dateRangeStart/_dateRangeEnd filters because these are assumed to already be start and end dates of a range. (eventDate should be used for concatenated date ranges.)
4562	09/11/2012 01:09 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Don't map dateCollected to locationevent.obsstartdate/obsenddate because this is the date the specimen was collected, not the date (range) of the entire collection event. This distinction may not be meaningful for specimens data, but VegBIEN should reflect what the data provider designated. This also reduces the number of dateCollected-related mappings needed for any dateCollected-related field, such as year/month/dayCollected.
4561	09/11/2012 12:55 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Removed dateIdentified/_alt and dateIdentified redirection mappings, which were only needed when multiple dateIdentified fields were being combined in Veg+-VegCore.csv
4560	09/11/2012 12:50 AM	Aaron Marcuse-Kubitza	mappings/: Moved year/month/dayIdentified mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayIdentified values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateIdentified field is now higher than the year/month/dayIdentified fields when both are specified, because the dateIdentified field presumably contains verbatim text while the year/month/dayIdentified fields contain parsed date parts.
4559	09/11/2012 12:34 AM	Aaron Marcuse-Kubitza	mappings/: Moved verbatimGrowthForm filter mapping from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic
4558	09/11/2012 12:28 AM	Aaron Marcuse-Kubitza	inputs/UNCC/Specimen/map.csv, inputs/NCU-NCSC/Specimen/map.csv: Remapped cultivated fields directly via new cultivated term, rather than via establishmentMeans
4557	09/11/2012 12:06 AM	Aaron Marcuse-Kubitza	sql_io.py: mk_errors_table(): Don't cache the sql.table_exists() query, because the table will be created and its existence must be rechecked
4556	09/11/2012 12:02 AM	Aaron Marcuse-Kubitza	sql.py: table_exists(): Allow caller to set whether query will be cached. This is useful if the table will later be created and its existence should be checked again.
4555	09/11/2012 12:00 AM	Aaron Marcuse-Kubitza	sql.py: tables(): Allow caller to set whether query will be cached
4554	09/10/2012 11:51 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped cultivated
4553	09/10/2012 11:47 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Added _src/README.TXT with Brad's comments on which files to use
4552	09/10/2012 11:01 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added cultivated
4551	09/10/2012 10:35 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Removed manual VACUUM run because this is done as part of $(exportHeader), which calls $(cleanup)
4550	09/10/2012 10:34 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(cleanup): Append output to log
4549	09/10/2012 10:21 PM	Aaron Marcuse-Kubitza	schemas/py_functions.sql: Added pass-through _date(timestamp) for datasource date columns that are already timestamps
4548	09/10/2012 10:12 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Fixed bug where embedded \ in ADD COLUMN statement was not removed by the shell, because single quotes do not remove embedded \s
4547	09/10/2012 09:55 PM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: Also rename taxonobservation.reference_id to taxonobservation_reference_id
4546	09/10/2012 09:51 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(logInstall*Add): Fixed bug where needed to only add -a flag for tee when tee was actually being used (in verbose mode), not when &> is used instead
4545	09/10/2012 09:49 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/header.csv: Updated for new renames in vegbank.~.clean_up.sql
4544	09/10/2012 09:34 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also log the output of commands run after create.sql
4543	09/10/2012 09:30 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Factored $(call logInstall,$/) out into $(logInstall)
4542	09/10/2012 09:25 PM	Aaron Marcuse-Kubitza	schemas/py_functions.sql: Added pass-through _dateRangeStart(timestamp), _dateRangeEnd(timestamp) for datasource date columns that are already timestamps
4541	09/10/2012 09:23 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plantconcept_/header.csv: Updated for new renames in vegbank.~.clean_up.sql
4540	09/10/2012 09:11 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plantconcept_/create.sql: Use new plantconcept_plantnames()
4539	09/10/2012 09:09 PM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.utils.sql: plantconcept_plantnames(): Use SQL SELECT query and WITH clause (http://www.postgresql.org/docs/8.4/static/queries-with.html) instead of temp table, because PostgreSQL does not support using temp tables inside functions that are called repeatedly (http://archives.postgresql.org/pgsql-general/2006-02/msg00516.php; it results in an "out of shared memory" error)
4538	09/10/2012 08:30 PM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.utils.sql: Removed hardcoded schema name, which is set dynamically by input.Makefile using `SET search_path`
4537	09/10/2012 08:26 PM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.utils.sql: Added plantconcept_plantnames()
4536	09/10/2012 07:28 PM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Made function STABLE instead of IMMUTABLE because it accesses DB tables
4535	09/10/2012 07:21 PM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where the original plantconcept table's columns needed to be renamed, rather than the derived table plantconcept_'s. Note that this script runs before any derived tables are created, so this would be the wrong place for these statements if the derived table's columns did need to be renamed.
4534	09/10/2012 07:05 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(dbExports): Sort each group of .sql files in lexical order, since $(wildcard) apparently does not sort them that way automatically on vegbiendev
4533	09/10/2012 06:53 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import. Corrected input row count of CTFS.TaxonOccurrence, which had been set to the inserted row count (which is right above it in the log file).
4532	09/10/2012 06:35 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonrank: Added comment documenting source of values
4531	09/07/2012 04:57 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxonobservation_/map.csv: Mapped observation_id to eventID
4530	09/07/2012 04:49 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Added VL
4529	09/07/2012 04:43 PM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added taxonobservation_/
4528	09/07/2012 04:43 PM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added plantconcept_/
4527	09/07/2012 04:22 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Ignore errors if create.sql already added a primary key
4526	09/07/2012 04:12 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Provide the table name as a var (:table) to the query
4525	09/07/2012 03:56 PM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined
4524	09/07/2012 03:55 PM	Aaron Marcuse-Kubitza	to_do/timeline.doc: Updated to reflect additional time that validations will take, and analytical DB's dependency on it
4523	09/07/2012 02:54 PM	Aaron Marcuse-Kubitza	Added validation/
4522	09/07/2012 12:56 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Time the install
4521	09/07/2012 12:54 PM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added plantconcept_/
4520	09/07/2012 12:35 PM	Aaron Marcuse-Kubitza	inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Renamed ancestor_id output param to plantconcept_id for clarity and so it can be directly USING-joined with plantconcept on plantconcept_id
4519	09/07/2012 12:24 PM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added vegbank.~.utils.sql (which runs after vegbank.sql), for use by tables' create.sql scripts
4518	09/07/2012 10:57 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import
4517	09/07/2012 10:43 AM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added observation_/
4516	09/07/2012 10:31 AM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added vegbank.~.clean_up.sql (which runs after vegbank.sql), to prevent "cannot alter type of a column used by a view or rule" errors
4515	09/07/2012 10:14 AM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added plot_/
4514	09/07/2012 10:13 AM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added plot_/
4513	09/07/2012 10:13 AM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added logs
4512	09/07/2012 10:12 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Log the output to the install log, just like for other %/install targets
4511	09/07/2012 10:06 AM	Aaron Marcuse-Kubitza	vegbien_dest: schemas: Added public explicitly, even though it's already in the default search_path, in order to shadow any datasource's tables of the same name as a VegBIEN table (such as in VegBank). (VegBIEN tables are referenced without a schema, while datasource tables are referenced with a schema, so collisions are not a problem after this fix.)
4510	09/07/2012 09:55 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: sql/install: Fixed bug where needed space before \ at end of line, because one is not automatically added in a recipe command (although it's added elsewhere)
4509	09/07/2012 09:51 AM	Aaron Marcuse-Kubitza	sql.py: run_query(): DuplicateException: Also match "of relation" part of error message, so that parsed column name does not contain "of relation"
4508	09/07/2012 09:24 AM	Aaron Marcuse-Kubitza	subtract: Made it case- and punctuation-insensitive
4507	09/07/2012 09:18 AM	Aaron Marcuse-Kubitza	mappings/: Removed no longer needed Veg+.cs-VegBIEN.csv, which is now the same as Veg+-VegBIEN.csv which was derived from it
4506	09/07/2012 09:16 AM	Aaron Marcuse-Kubitza	join: Documented that it's case- and punctuation-insensitive.
4505	09/07/2012 09:16 AM	Aaron Marcuse-Kubitza	bin/map: map_table(): Refactored to map simplified to original column names first and then determine column index for each original name, in order to avoid trying to recover the original name from a simplified name where multiple original names might collide onto the same simplified name. Documented that it's case- and punctuation-insensitive.
4504	09/07/2012 09:11 AM	Aaron Marcuse-Kubitza	intersect, union: Made case- and punctuation-insensitive. mappings/Veg+-VegBIEN.csv: Removed no longer needed duplicate entries for each first letter case, which must now be removed for case- and punctuation-insensitive intersect/union to work. Note that the SpeciesLink `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.
4503	09/07/2012 08:42 AM	Aaron Marcuse-Kubitza	bin/map: map_table(): Resolve all mappings and prefixes after applying maps.simplify()
4502	09/07/2012 08:37 AM	Aaron Marcuse-Kubitza	inputs/SpeciesLink/Specimen/map.csv: _alt all scientificNameAuthorship synonyms together in one _alt
4501	09/07/2012 08:27 AM	Aaron Marcuse-Kubitza	schemas/functions.sql: _alt(): Added extra numbered parameters. Eventually these will need to be converted to variadic args, but this will require special support from column-based import.
4500	09/07/2012 07:26 AM	Aaron Marcuse-Kubitza	join: Use new maps.simplify()
4499	09/07/2012 07:26 AM	Aaron Marcuse-Kubitza	maps.py: Added simplify()
4498	09/07/2012 07:23 AM	Aaron Marcuse-Kubitza	join: Match terms with non-alphanumeric chars removed
4497	09/07/2012 07:15 AM	Aaron Marcuse-Kubitza	join: Match terms case-insensitively
4496	09/06/2012 11:17 PM	Aaron Marcuse-Kubitza	Added inputs/TEAM/
4495	09/06/2012 10:55 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Creating the into table: into_out_pkey: If is_function, just use "result" as the output column name, without prefixing the function name. This shortens the table names of function calls on function calls, which need a fixed column name to detect which columns are function results and use just the table names for those columns.
4494	09/06/2012 10:32 PM	Aaron Marcuse-Kubitza	input.Makefile: Documentation: $(steps): Fixed bug where import make target needed to be changed to new single-table import target
4493	09/06/2012 09:38 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Changed LEFT JOINs to JOINs where tables contain information that's required for the analytical DB. This should also enable the PostgreSQL query planner to make additional join optimizations, in the hopes of avoiding disk-space-intensive hash joins.
4492	09/06/2012 08:42 PM	Aaron Marcuse-Kubitza	Replaced repr() with strings.urepr() (or equivalent) everywhere needed, to avoid future UnicodeEncodeErrors
4491	09/06/2012 08:30 PM	Aaron Marcuse-Kubitza	Replaced str() with strings.ustr() (or equivalent) everywhere needed, to avoid future UnicodeEncodeErrors
4490	09/06/2012 08:03 PM	Aaron Marcuse-Kubitza	sql.py: map_expr(): Replacing without quotes: Don't match unquoted name where it's preceded or followed by '.', because this could be a '.' embedded in a punctuation-containing column name, such as those frequently used by column-based import. Note that because database-internal names currently do not contain punctuation, this situation only occurs when a database-internal expression (such as a check constraint condition) is replaced in two steps, and the first step introduces punctuation-containing column names into the expression.
4489	09/06/2012 07:19 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: project: Don't require projectname to be specified when sourceaccessioncode is provided
4488	09/06/2012 07:14 PM	Aaron Marcuse-Kubitza	sql_gen.py: ensure_not_null(): If type_ is set, cast the column to it if needed

Project

General

Profile