/ - Changes - BIEN 3 - NCEAS Projects

root @ 3777

#	Date	Author	Comment
3777	08/03/2012 04:59 PM	Aaron Marcuse-Kubitza	join: Fixed bug where when an input to mapped to multiple outputs, the joined row for each output needed to be output separately using writer.writerow()
3776	08/03/2012 03:52 PM	Aaron Marcuse-Kubitza	sort_map: Remove duplicates resulting from multiple outputs for the same input. mappings/Makefile: $(mkSelfMap): Removed uniq now that sort_map does this.
3775	08/03/2012 03:24 PM	Aaron Marcuse-Kubitza	mappings/Makefile: $(mkSelfMap): Run uniq on the output to remove duplicates resulting from multiple outputs for the same input
3774	08/03/2012 03:10 PM	Aaron Marcuse-Kubitza	expand_braces: Also expand XPaths containing [], with up to one level of nesting (which is the most we currently use), because many {} XPaths do in fact contain []. Debug-print intermediate values when env var expand_braces_debug is true. Added usage message.
3773	08/02/2012 11:13 PM	Aaron Marcuse-Kubitza	expand_braces: Fixed bug where ./{ and brackets with commas inside {} are unparseable, and should not be expanded
3772	08/02/2012 11:05 PM	Aaron Marcuse-Kubitza	expand_braces: Fixed bug where `head -1` seemed to read more lines than just the first, causing EOF to be returned after the first line, by using `read` instead. Support data containing \r (such as Excel-dialect CSVs) by removing it. Fixed bug where ./{...} was not being properly escaped.
3771	08/02/2012 10:08 PM	Aaron Marcuse-Kubitza	Added expand_braces
3770	08/02/2012 09:12 PM	Aaron Marcuse-Kubitza	mappings: location: Removed centerlatitude/centerlongitude mappings because the lat/long should be in only one place: the locationdetermination. It is up to the database querier to decide which locationdetermination(s) to use as the coordinates for a plot/specimen.
3769	08/02/2012 08:54 PM	Aaron Marcuse-Kubitza	bin/map: input is CSV: Removed unused map_ var
3768	08/02/2012 08:50 PM	Aaron Marcuse-Kubitza	bin/map: Documented that it's multi-safe (supports an input appearing multiple times)
3767	08/02/2012 08:39 PM	Aaron Marcuse-Kubitza	subtract: Documented that it's multi-safe (supports an input appearing multiple times)
3766	08/02/2012 08:32 PM	Aaron Marcuse-Kubitza	join: Made it multi-safe (supports an input appearing multiple times)
3765	08/02/2012 08:30 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: Added empty clean target to make sure `make clean` always works
3764	08/02/2012 08:03 PM	Aaron Marcuse-Kubitza	root Makefile, input.Makefile: Maps validation: Treat missing join mappings differently from missing non-empty join mappings, because they indicate mapping to an invalid location, which is a bug. Factored maps validation code out into new lib/mappings.Makefile.
3763	08/02/2012 08:00 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: Added vars for chars not allowed in make targets. Added functions/vars to replace "_" with " ".
3762	08/02/2012 07:38 PM	Aaron Marcuse-Kubitza	root Makefile: Include lib/common.Makefile
3761	08/02/2012 07:37 PM	Aaron Marcuse-Kubitza	input.Makefile: Include lib/common.Makefile
3760	08/02/2012 06:48 PM	Aaron Marcuse-Kubitza	intersect: Documented that it's multi-safe (supports an input appearing multiple times)
3759	08/02/2012 06:42 PM	Aaron Marcuse-Kubitza	union: Documented that it's multi-safe (supports an input appearing multiple times)
3758	08/02/2012 06:00 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Moved shared /specimenreplicate root to mappings in preparation for reversing the XPaths so that parent table paths (such as location) don't contain a prefix for child tables (specimenreplicate, locationevent, etc.). This reversing will avoid the need to "ch_root" the child table map to obtain maps for parent tables with the prefixes removed, allowing all hierarchical levels to use the same map spreadsheet.
3757	08/02/2012 05:53 PM	Aaron Marcuse-Kubitza	ch_root: Support column headers without a root, for non-hierarchical formats such as DwC
3756	08/02/2012 05:45 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: rsync: Time the rsync operation
3755	08/02/2012 05:29 PM	Aaron Marcuse-Kubitza	in_place: Wrap EXIT handler in shell function so that "-escaping can easily be used on the temp file path
3754	08/02/2012 05:26 PM	Aaron Marcuse-Kubitza	in_place: Documented that doesn't update file on error
3753	08/02/2012 05:23 PM	Aaron Marcuse-Kubitza	DwC mappings: Removed ':/list/' root (full version: '::[@xmlns:dcterms=http://purl.org/dc/terms/]/list/') from map spreadsheets to simplify the boilerplate in each file. Since intermediate DwC XML files no longer need to be produced for automated tests, these roots are not needed.
3752	08/02/2012 04:46 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import
3751	08/02/2012 04:40 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Moved independent-import data to separate tab so that it wouldn't get moved to the side whenever a new column of simultaneous-import data is inserted. It is also no longer updated, because all column-based imports are now done simultaneously.
3750	08/02/2012 04:32 PM	Aaron Marcuse-Kubitza	Use strings.ustr() or strings.urepr() everywhere that columns are stringified, in order to support column names with non-ASCII characters (such as in the Madidi data)
3749	08/02/2012 04:16 PM	Aaron Marcuse-Kubitza	strings.py: concat(): Convert args to raw (non-Unicode) strings first, so that multi-byte Unicode sequences are considered by # of bytes instead of # of chars. This is necessary because PostgreSQL truncates identifiers by # of bytes instead of # of chars, so that identifiers will actually be less than 63 chars long when some chars were multi-byte.
3748	08/02/2012 04:11 PM	Aaron Marcuse-Kubitza	strings.py: ustr(): Call str() method manually like urepr() to avoid Unicode errors when the returning string is non-ASCII
3747	08/02/2012 03:54 PM	Aaron Marcuse-Kubitza	strings.py: Added urepr() and use it in repr_no_u(), to better support repr() return values with non-ASCII characters. Avoiding repr() also provides a more complete stack trace in the case of such errors.
3746	08/01/2012 11:37 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantobservation: plantobservation_aggregateoccurrence_count_1() trigger: Don't raise an error if existing count was >1, because there are in fact datasets (notably SALVIAS) where input records for individual stems may themselves contain aggregate data (such as plant and stem counts). For this data, we have an anomalous condition where an aggregateoccurrence has count >1 but contains one plantobservation, due to the plant/stem count being included in the first stem's record. (See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Data-interpretation-issues> for more info on this problem.) Note that our desired 1:1 relationship between aggregateoccurrence and plantobservation is still guaranteed by a constraint, but the anomalous data may still cause irregularities later on in the analysis.
3745	08/01/2012 10:55 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Ignoring all rows on unrecoverable errors: Also support the case where has_joins == True, by setting it to False so that the no-joins case is effectively used
3744	08/01/2012 10:32 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Moved Simultaneously above Independently because that is how we are now running the imports
3743	08/01/2012 10:21 AM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
3742	08/01/2012 09:50 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: **1_to_1 and _unique_within*** unique indexes with a `WHERE sourceaccessioncode IS NULL` filter: Added IS NULL filters for other unique keys, so that these fallback indexes would only be used if there was no (or no other) way to uniquely identify their tables. For *_1_to_1 unique indexes, this is the case for specimens data.
3741	08/01/2012 09:48 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: **1_to_1 and _unique_within*** unique indexes with a `WHERE sourceaccessioncode IS NULL` filter: Added IS NULL filters for other unique keys, so that these fallback indexes would only be used if there was no (or no other) way to uniquely identify their tables. For *_1_to_1 unique indexes, this is the case for specimens data.
3740	08/01/2012 09:41 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: stemobservation: Replaced stemobservation_unique_code unique constraint with stemobservation_unique_within_plantobservation unique index that uses COALESCE and WHERE ... IS NOT NULL appropriately, to work with sql_gen's use of COALESCE indexes and (for the renaming) to better reflect what it does
3739	08/01/2012 09:36 AM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Synced with schema
3738	08/01/2012 09:30 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: **1_to_1 and _unique_within*** unique indexes intended to operate only when sourceaccessioncode is NULL: Changed to use `sourceaccessioncode IS NULL` WHERE condition instead of COALESCE element, since the sourceaccessioncode is not actually needed for the uniquification (it is already globally unique within the datasource if it's not NULL; this just covers the case where it is NULL)
3737	08/01/2012 09:23 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: *unique_within* unique indexes used for 1:1 relationships: Renamed to __1_to_1 to better reflect what they do
3736	08/01/2012 09:21 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: *unique_within* unique indexes used for 1:1 relationships: Renamed to __1_to_1 to better reflect what they do
3735	08/01/2012 08:58 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantobservation: Corrected plantobservation_aggregateoccurrence_id_1_to_1's name to plantobservation_aggregateoccurrence_1_to_1 because it's 1:1 with aggregateoccurrence, not aggregateoccurrence_id. Made it a unique index for consistency with our general method of expressing unique constraints on potentially nullable columns.
3734	08/01/2012 08:54 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimenreplicate: Renamed specimenreplicate_unique_plantobservation to specimenreplicate_plantobservation_1_to_1 to better reflect what it does
3733	08/01/2012 08:50 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: locationevent unique indexes: Renamed to *unique_within* to better reflect what they do
3732	08/01/2012 08:34 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: Removed redundant location_unique_sourceaccessioncode unique constraint, which has been replaced by location_unique_within_datasource
3731	08/01/2012 08:31 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Reset foreign key constraint names to autogenerated defaults for consistency
3730	08/01/2012 08:27 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Renamed _unique_datasource unique indexes to _unique_within_datasource to better reflect what they do
3729	08/01/2012 08:25 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: locationevent: Renamed locationevent_unique_accessioncode to locationevent_unique_within_location to better reflect what it does
3728	08/01/2012 08:22 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimenreplicate: Renamed specimenreplicate_unique_accessioncode to specimenreplicate_unique_within_datasource to better reflect what it does
3727	08/01/2012 08:11 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: stemobservation: Renamed stemobservation_unique_accessioncode to stemobservation_unique_within_plantobservation and also apply it to NULL sourceaccessioncodes, so that a plantobservation can have a single stemobservation for its single stem's traits without needing a separate sourceaccessioncode for it
3726	08/01/2012 08:02 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: aggregateoccurrence: Removed redundant aggregateoccurrence_unique_accessioncode unique constraint, which has been replaced by aggregateoccurrence_unique_within_taxonoccurrence
3725	08/01/2012 07:43 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantnamescope: Added CHECK constraint to ensure that at least one key column is specified (an empty plantnamescope doesn't make sense; use NULL instead)
3724	08/01/2012 07:32 AM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Synced with schema
3723	08/01/2012 07:23 AM	Aaron Marcuse-Kubitza	ch_root: Don't require both the input and output mappings to contain their respective new roots, since sometimes only one or the other root is being subset. This will occur, for example, in mappings that are flat on the input but normalized on the output, such as VegCSV.
3722	08/01/2012 07:06 AM	Aaron Marcuse-Kubitza	VegBIEN: Reversed aggregateoccurrence<->plantobservation relationship to point from plantobservation->aggregateoccurrence, so plantobservation could be scoped by aggregateoccurrence in the same way as all other core tables are scoped by their parent tables. This reversed direction was an anomaly due to the need to have a trigger auto-set aggregateoccurrence.count to 1 when there was an associated plantobservation. This was most easily accomplished on the aggregateoccurrence table itself, but required the reversed relationship. The trigger has now been reimplemented on plantobservation, which externally updates aggregateoccurrence.count.
3721	08/01/2012 06:53 AM	Aaron Marcuse-Kubitza	input.Makefile: Testing: diffing test outputs: Ignore changes in whitespace, due to e.g. different indent levels. This facilitates accepting tests when an element has been nested inside another element (or unnested), by showing only the opening and closing tags of the new outer element.
3720	08/01/2012 06:42 AM	Aaron Marcuse-Kubitza	dicts.py: DictProxy: Fixed bug where default value for inner param needed to be created in the constructor, or else every default instance would use and modify the same dictionary
3719	08/01/2012 06:26 AM	Aaron Marcuse-Kubitza	db_xml.py: put(): wrap_e(): Call augment_error() to add the current node to the error message
3718	08/01/2012 06:14 AM	Aaron Marcuse-Kubitza	db_xml.py: put(): Raise an error if there are multiple fields with the same name, instead of silently overwriting the first with the second. This generally indicates the need to use `:[@merge=1]` on the fields in question.
3717	08/01/2012 06:11 AM	Aaron Marcuse-Kubitza	dicts.py: Added OnceOnlyDict and helper exception KeyExistsError
3716	08/01/2012 06:10 AM	Aaron Marcuse-Kubitza	dicts.py: DictProxy: Added default value for inner param to facilitate creating empty wrapped dicts
3715	08/01/2012 05:48 AM	Aaron Marcuse-Kubitza	bin/map: out_is_db: row-based mode: Debug-log the processed XML tree produced by xml_func.process()
3714	08/01/2012 05:16 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Fixed bug where Missing mapping for NOT NULL column errors should actually be warnings because sometimes the mappings include extra tables which aren't used by the dataset
3713	08/01/2012 05:12 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Fixed bug where Missing mapping for NOT NULL column errors should actually be warnings because sometimes the mappings include extra tables which aren't used by the dataset
3712	08/01/2012 03:18 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: aggregateoccurrence: Added UNIQUE INDEX that makes an aggregateoccurrence unique within a taxonoccurrence. When the sourceaccessioncode isn't specified (as for individual organisms data, where this goes in plantobservation and taxonoccurrence), this ensures a 1:1 relationship between aggregateoccurrence and taxonoccurrence.
3711	08/01/2012 03:08 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonoccurrence: Added UNIQUE INDEX that makes a taxonoccurrence unique within a locationevent. When the sourceaccessioncode isn't specified (as for specimens data), this ensures a 1:1 relationship between taxonoccurrence and locationevent.
3710	08/01/2012 03:05 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: binomial (full) plantname: Also mapped to an alternative for taxonoccurrence.sourceaccessioncode, for aggregate plots data that distinguishes taxonoccurrences only by plantname (such as CVS)
3709	08/01/2012 02:23 AM	Aaron Marcuse-Kubitza	exc.py: e_msg(): Fixed bug where exceptions with nothing in e.args (such as StopIteration) caused a failed assertion. Fixed bug where exceptions with multiple values in e.args (such as certain IOErrors) caused a failed assertion.
3708	08/01/2012 01:27 AM	Aaron Marcuse-Kubitza	sql.py: flatten(): Documented that shouldn't cache query because the temp table will usually be truncated after use
3707	08/01/2012 01:05 AM	Aaron Marcuse-Kubitza	sql_gen.py: merge_not_null(): For clarity, use to_text() to represent NULL as the string 'NULL' instead of as the null sentinel for the column's type
3706	08/01/2012 01:02 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added to_text() and helper value null_as_str
3705	08/01/2012 12:52 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: plantobservation: sourceaccessioncode, authorplantcode: Removed no longer needed mapping to specimenreplicate.sourceaccessioncode, since specimenreplicate for plots data is now identified by its plantobservation fkey, without needing its own sourceaccessioncode
3704	07/31/2012 10:41 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore_cond(): Fixed bug where if is_literals, need to return NULL, instead of trying to filter invalid rows out of a nonexistant input table
3703	07/31/2012 09:57 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Replaced "/}" (with unnecessary "/") with "}"
3702	07/31/2012 09:51 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Replaced doubled "/"s with single "/"
3701	07/31/2012 09:05 PM	Aaron Marcuse-Kubitza	backups/Makefile: Added synchronization of backups with vegbiendev. Added downloading backups to After a new import steps.
3700	07/31/2012 09:04 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: rsync: $(remote): Fixed bug where the inputs/ dir was hardcoded, when the remote dir name needed to be determined dynamically based on the Makefile dir
3699	07/31/2012 08:54 PM	Aaron Marcuse-Kubitza	backups/Makefile: Refactored to include lib/common.Makefile
3698	07/31/2012 08:46 PM	Aaron Marcuse-Kubitza	inputs/Makefile: Added download-logs to download import logs onto local machine and added it to the "After a new import" steps
3697	07/31/2012 08:36 PM	Aaron Marcuse-Kubitza	Moved generally useful targets and vars from inputs/Makefile to lib/common.Makefile and lib/forwarding.Makefile
3696	07/31/2012 08:04 PM	Aaron Marcuse-Kubitza	bin/map: Don't create unneeded /_ignore/inLabel element containing the datasource name because sql_io.put_table() now autopopulates the datasource_id
3695	07/31/2012 07:57 PM	Aaron Marcuse-Kubitza	schemas/functions.sql, py_functions.sql: Removed no longer needed relational functions, since sql_io.put_table() supports regular SQL functions
3694	07/30/2012 08:31 PM	Aaron Marcuse-Kubitza	inputs/Madidi/maps/VegX.plots.csv: Mapped all mappable columns
3693	07/30/2012 08:28 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: elevation, elevationrange: Added _rangeStart/_rangeEnd filter
3692	07/30/2012 08:19 PM	Aaron Marcuse-Kubitza	sql_io.py: Wrapping mapping in a sql_gen.ColDict: Documented that sql_gen.ColDict sanitizes both keys and values passed into it
3691	07/30/2012 08:18 PM	Aaron Marcuse-Kubitza	sql_gen.py: ColDict: Documented that anything that isn't a column is wrapped in a NamedCol
3690	07/30/2012 08:04 PM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Accepting the test cases: Added instructions for what to do if you get errors
3689	07/30/2012 06:09 PM	Aaron Marcuse-Kubitza	bin/map: Fixed bug where needed to use sql.function_exists() to determine if something is a relational (now SQL) function, including in row-based mode, since that now uses sql_io.put_table(), which requires this. The bug fix relies on the new xml_func.process() feature that preserves unknown relational functions in case they are built-in functions rather than SQL functions.
3688	07/30/2012 06:04 PM	Aaron Marcuse-Kubitza	xml_func.py: process(): In row-based mode, when trying to evaluate function using DB, preserve unknown funcs because these might be built-in functions of db_xml.put(). The sql.DoesNotExistException should be raised again when db_xml.put() is run and it verifies whether the function is built-in or not (e.g. _simplifyPath is now built-in, for column-based support). See db_xml.put_special_funcs for built-in functions.
3687	07/30/2012 05:59 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Fixed bug where strings starting with "$" were interpreted as input columns in row-based mode (this should only apply to column-based mode). Explicitly store whether in row-based mode in is_literals var (similar to is_literals in sql_io.put_table()).
3686	07/30/2012 05:54 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): unrecoverable errors: Returning default value: is_literals: Remove column rename from default value so it doesn't get treated as a column by db_xml.put() (which is handled differently from a literal value)
3685	07/30/2012 03:53 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): put_(): Removed no longer needed in_row_ct_ref param, which is only used by put_table(). Rewrapped function body.
3684	07/30/2012 03:46 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore(): literals: Only replace invalid literal with NULL or remove row if that column actually contains the invalid value in question. This handles the case where all columns are being ignore()d because the specific column couldn't be identified, and this was not the invalid column.
3683	07/30/2012 03:02 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: plot: Mapped note
3682	07/30/2012 02:32 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: plot: Added landform mapping
3681	07/30/2012 02:24 PM	Aaron Marcuse-Kubitza	schemas/vegbank.ERD.pdf: Auto-repaired with Adobe Reader so that the repair message doesn't pop up whenever it's opened
3680	07/30/2012 02:22 PM	Aaron Marcuse-Kubitza	schemas: Added vegbank.ERD.pdf so the VegBank ERD is easily accessible when mapping
3679	07/30/2012 01:51 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: project: Mapped sourceaccessioncode. This entailed adding a distinguishing suffix to the projectname input mapping.
3678	07/30/2012 01:31 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, VegX-VegBIEN.stems.csv: Removed all manual mappings to datasource_id now that datasource_id is auto-populated, both on the VegBIEN output side and the DwC/VegX input side. This should greatly simplify many of the mappings!

Project

General

Profile