Project

General

Profile

Statistics
| Revision:

# Date Author Comment
3826 08/07/2012 05:38 AM Aaron Marcuse-Kubitza

mappings/Makefile: VegX-VegCSV.stems.csv: Clean up when edited using sort_map

3825 08/07/2012 05:27 AM Aaron Marcuse-Kubitza

Added mappings/VegCSV-VegBIEN.specimens.csv, which is generated from VegX-VegCSV.stems.csv

3824 08/07/2012 05:19 AM Aaron Marcuse-Kubitza

mappings/for_review: svn:ignore OpenOffice.org lock files

3823 08/07/2012 05:14 AM Aaron Marcuse-Kubitza

Added mappings/VegX-VegCSV.stems.csv. The initial version is autogenerated by joining the simplified VegBIEN XPaths of related maps.

3822 08/07/2012 05:05 AM Aaron Marcuse-Kubitza

join: Support discarding multiple outputs if they should be considered ambiguous

3821 08/07/2012 04:40 AM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: $(missingMappingsCmd): Support non-DwC mappings by matching entire line containing mapping, not just word characters. Remove any XML function so that merging of non-empty join mappings still works properly.

3820 08/07/2012 03:35 AM Aaron Marcuse-Kubitza

mappings/Makefile: Use new invert

3819 08/07/2012 03:35 AM Aaron Marcuse-Kubitza

Added invert

3818 08/07/2012 03:31 AM Aaron Marcuse-Kubitza

mappings/Makefile: for_review/VegBIEN-DwC2.specimens.csv: Include all comments column(s), not just the first

3817 08/07/2012 03:27 AM Aaron Marcuse-Kubitza

cols: Removed special handling of '+' because list_subset() now handles this col_num value itself, by appending the rest of the columns. Support intermixing int and '+' columns, by using new format.str2int_passthru().

3816 08/07/2012 03:23 AM Aaron Marcuse-Kubitza

util.py: list_subset(): Made an index of '+' append the rest of the list

3815 08/07/2012 03:21 AM Aaron Marcuse-Kubitza

format.py: Added str2int_passthru()

3814 08/07/2012 03:16 AM Aaron Marcuse-Kubitza

cols: Changed value for all columns to '+' so that it wouldn't need to be shell-escaped as '*' was

3813 08/07/2012 01:42 AM Aaron Marcuse-Kubitza

review: Remove keys except last. This should increase the number of matches between human-readable VegBIEN XPaths of VegX and DwC2.

3812 08/07/2012 01:39 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Use :[] instead of [] for all XML functions, so that the XML function args will get removed by review

3811 08/07/2012 01:18 AM Aaron Marcuse-Kubitza

review: Remove XML functions. This should increase the number of matches between human-readable VegBIEN XPaths of VegX and DwC2.

3810 08/07/2012 12:34 AM Aaron Marcuse-Kubitza

mappings/Makefile: human-readable maps in for_review: Simplify just the output column so that the input column can be programmatically linked back to the original input names/XPaths

3809 08/07/2012 12:26 AM Aaron Marcuse-Kubitza

mappings/Makefile: Removed no longer used $(chRoot), $(cpReview)

3808 08/07/2012 12:23 AM Aaron Marcuse-Kubitza

Removed the human-readable mappings mappings/for_review/VegX-VegBIEN.plots.csv, VegX-VegBIEN.organisms.csv because these are now duplicates of VegX-VegBIEN.stems.csv

3807 08/07/2012 12:20 AM Aaron Marcuse-Kubitza

review: Support limiting the XPath simplifying to custom columns, rather than always the first two

3806 08/07/2012 12:12 AM Aaron Marcuse-Kubitza

review: Usage message: Fixed typo

3805 08/07/2012 12:10 AM Aaron Marcuse-Kubitza

Added mappings/for_review/VegBIEN-DwC2.specimens.csv, generated by inverting for_review/DwC2-VegBIEN.specimens.csv. This will be used to help translate VegX->VegCSV.

3804 08/06/2012 11:44 PM Aaron Marcuse-Kubitza

mappings: Made VegX-VegBIEN.organisms.csv, VegX-VegBIEN.plots.csv symlinks to VegX-VegBIEN.stems.csv instead of building them in the Makefile by copying VegX-VegBIEN.stems.csv, since these files are now always the same

3803 08/06/2012 09:29 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: _if that maps to specimenreplicate via plantobservation or voucher: Refactored to map right-hand side of _eq in the left-hand side mapping, rather than in all then/else mappings. Distinguish this _if statement from others using new name param.

3802 08/06/2012 09:16 PM Aaron Marcuse-Kubitza

xml_func.py: _if(): Documented that can add `name` param to distinguish separate _if statements

3801 08/06/2012 09:08 PM Aaron Marcuse-Kubitza

xml_func.py: _if(): Made cond optional. When it's not specified or None, it is treated as False. This supports cases where all elements of the condition are required but not mapped to.

3800 08/06/2012 08:50 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: _if that maps to specimenreplicate via plantobservation or voucher: Refactored to map voucherType directly into _if/cond/_eq/left rather than mapping it to a temporary _ignore location and retrieving it with _ref

3799 08/06/2012 08:47 PM Aaron Marcuse-Kubitza

xml_func.py: Removed no longer used _simplifyPath(), which is now a built-in function of db_xml.put()

3798 08/06/2012 08:36 PM Aaron Marcuse-Kubitza

xml_func.py: _eq(): Documented that '' (empty node) is returned if a value was not mapped to, not if a value was None, since None arguments are no longer removed by process() (now XML functions do this manually with conv_items())

3797 08/06/2012 08:19 PM Aaron Marcuse-Kubitza

xml_func.py: _ref(): Only display "XPath reference target missing" warning if target node does not exist, not if it exists but is empty

3796 08/06/2012 08:17 PM Aaron Marcuse-Kubitza

xpath.py: get(): reference expansion: Use get_1() and check for None result instead of using get(), which returns multiple nodes when we just want the first

3795 08/06/2012 07:39 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Reversed XPaths so that they start with location instead of plantobservation

3794 08/06/2012 07:30 PM Aaron Marcuse-Kubitza

lib/common.Makefile: Added $(cp)

3793 08/06/2012 05:58 PM Aaron Marcuse-Kubitza

mappings/Makefile: Include lib/common.Makefile

3792 08/06/2012 05:57 PM Aaron Marcuse-Kubitza

lib/common.Makefile: Added $(CP)

3791 08/06/2012 05:36 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

3790 08/03/2012 09:59 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Reversed input XPaths so that they start with plot instead of individualOrganismObservation as stem

3789 08/03/2012 09:57 PM Aaron Marcuse-Kubitza

inputs/CTFS: Disabled maps because CTFS is not yet compatible with reversed XPaths, but the effort required to make it compatible is not worth including in the current commit. We lose only 2 test rows of test VegX data by doing this, since the full CTFS VegX files were never able to be imported.

3788 08/03/2012 08:31 PM Aaron Marcuse-Kubitza

ch_root, ch_root_via: Documented that these are usually not idempotent operations

3787 08/03/2012 07:42 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: input (VegX) root: Removed tcs namespace URL to simplify the XPath reversing process. It isn't needed now that we don't generate intermediate XML documents in the automated tests (because intermediate formats are no longer required to be XML schemas).

3786 08/03/2012 07:16 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Reversed XPaths so that they start with location instead of specimenreplicate

3785 08/03/2012 07:00 PM Aaron Marcuse-Kubitza

README.TXT: WinMerge setup: Documented how to get to Compare Options page

3784 08/03/2012 06:59 PM Aaron Marcuse-Kubitza

README.TXT: WinMerge setup: Added step to set Whitespace to Ignore change

3783 08/03/2012 06:55 PM Aaron Marcuse-Kubitza

README.TXT: Moved WinMerge setup to separate section. Changed Moved block detection link to the Configuration page.

3782 08/03/2012 06:32 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Expanded {} expressions using expand_braces, so that each distinct output for the same input is on its own line, improving readability. This will also help enable search-and-replace reversing of XPaths for the re-rooting to location.

3781 08/03/2012 06:17 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: VegX XPaths: Expanded {} expressions using expand_braces, so that later use of expand_braces on the file would not affect the VegX output mappings of the inputs' via maps (VegX.organisms.csv, etc.)

3780 08/03/2012 05:54 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Expanded {} expressions using expand_braces, so that each distinct output for the same input is on its own line, improving readability. This will also help enable search-and-replace reversing of XPaths for the re-rooting to location.

3779 08/03/2012 05:52 PM Aaron Marcuse-Kubitza

README.TXT: Accepting test cases: Documented that when refactoring mappings, it's helpful to use WinMerge to detect moved lines

3778 08/03/2012 05:14 PM Aaron Marcuse-Kubitza

expand_braces: Fixed bug where needed to get next line from stdin in raw mode, so that \ won't be parsed as escape chars

3777 08/03/2012 04:59 PM Aaron Marcuse-Kubitza

join: Fixed bug where when an input to mapped to multiple outputs, the joined row for each output needed to be output separately using writer.writerow()

3776 08/03/2012 03:52 PM Aaron Marcuse-Kubitza

sort_map: Remove duplicates resulting from multiple outputs for the same input. mappings/Makefile: $(mkSelfMap): Removed uniq now that sort_map does this.

3775 08/03/2012 03:24 PM Aaron Marcuse-Kubitza

mappings/Makefile: $(mkSelfMap): Run uniq on the output to remove duplicates resulting from multiple outputs for the same input

3774 08/03/2012 03:10 PM Aaron Marcuse-Kubitza

expand_braces: Also expand XPaths containing [], with up to one level of nesting (which is the most we currently use), because many {} XPaths do in fact contain []. Debug-print intermediate values when env var expand_braces_debug is true. Added usage message.

3773 08/02/2012 11:13 PM Aaron Marcuse-Kubitza

expand_braces: Fixed bug where ./{ and brackets with commas inside {} are unparseable, and should not be expanded

3772 08/02/2012 11:05 PM Aaron Marcuse-Kubitza

expand_braces: Fixed bug where `head -1` seemed to read more lines than just the first, causing EOF to be returned after the first line, by using `read` instead. Support data containing \r (such as Excel-dialect CSVs) by removing it. Fixed bug where ./{...} was not being properly escaped.

3771 08/02/2012 10:08 PM Aaron Marcuse-Kubitza

Added expand_braces

3770 08/02/2012 09:12 PM Aaron Marcuse-Kubitza

mappings: location: Removed centerlatitude/centerlongitude mappings because the lat/long should be in only one place: the locationdetermination. It is up to the database querier to decide which locationdetermination(s) to use as the coordinates for a plot/specimen.

3769 08/02/2012 08:54 PM Aaron Marcuse-Kubitza

bin/map: input is CSV: Removed unused map_ var

3768 08/02/2012 08:50 PM Aaron Marcuse-Kubitza

bin/map: Documented that it's multi-safe (supports an input appearing multiple times)

3767 08/02/2012 08:39 PM Aaron Marcuse-Kubitza

subtract: Documented that it's multi-safe (supports an input appearing multiple times)

3766 08/02/2012 08:32 PM Aaron Marcuse-Kubitza

join: Made it multi-safe (supports an input appearing multiple times)

3765 08/02/2012 08:30 PM Aaron Marcuse-Kubitza

lib/common.Makefile: Added empty clean target to make sure `make clean` always works

3764 08/02/2012 08:03 PM Aaron Marcuse-Kubitza

root Makefile, input.Makefile: Maps validation: Treat missing join mappings differently from missing non-empty join mappings, because they indicate mapping to an invalid location, which is a bug. Factored maps validation code out into new lib/mappings.Makefile.

3763 08/02/2012 08:00 PM Aaron Marcuse-Kubitza

lib/common.Makefile: Added vars for chars not allowed in make targets. Added functions/vars to replace "_" with " ".

3762 08/02/2012 07:38 PM Aaron Marcuse-Kubitza

root Makefile: Include lib/common.Makefile

3761 08/02/2012 07:37 PM Aaron Marcuse-Kubitza

input.Makefile: Include lib/common.Makefile

3760 08/02/2012 06:48 PM Aaron Marcuse-Kubitza

intersect: Documented that it's multi-safe (supports an input appearing multiple times)

3759 08/02/2012 06:42 PM Aaron Marcuse-Kubitza

union: Documented that it's multi-safe (supports an input appearing multiple times)

3758 08/02/2012 06:00 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Moved shared /specimenreplicate root to mappings in preparation for reversing the XPaths so that parent table paths (such as location) don't contain a prefix for child tables (specimenreplicate, locationevent, etc.). This reversing will avoid the need to "ch_root" the child table map to obtain maps for parent tables with the prefixes removed, allowing all hierarchical levels to use the same map spreadsheet.

3757 08/02/2012 05:53 PM Aaron Marcuse-Kubitza

ch_root: Support column headers without a root, for non-hierarchical formats such as DwC

3756 08/02/2012 05:45 PM Aaron Marcuse-Kubitza

lib/common.Makefile: rsync: Time the rsync operation

3755 08/02/2012 05:29 PM Aaron Marcuse-Kubitza

in_place: Wrap EXIT handler in shell function so that "-escaping can easily be used on the temp file path

3754 08/02/2012 05:26 PM Aaron Marcuse-Kubitza

in_place: Documented that doesn't update file on error

3753 08/02/2012 05:23 PM Aaron Marcuse-Kubitza

DwC mappings: Removed ':/list/' root (full version: '::[@xmlns:dcterms=http://purl.org/dc/terms/]/list/') from map spreadsheets to simplify the boilerplate in each file. Since intermediate DwC XML files no longer need to be produced for automated tests, these roots are not needed.

3752 08/02/2012 04:46 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

3751 08/02/2012 04:40 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Moved independent-import data to separate tab so that it wouldn't get moved to the side whenever a new column of simultaneous-import data is inserted. It is also no longer updated, because all column-based imports are now done simultaneously.

3750 08/02/2012 04:32 PM Aaron Marcuse-Kubitza

Use strings.ustr() or strings.urepr() everywhere that columns are stringified, in order to support column names with non-ASCII characters (such as in the Madidi data)

3749 08/02/2012 04:16 PM Aaron Marcuse-Kubitza

strings.py: concat(): Convert args to raw (non-Unicode) strings first, so that multi-byte Unicode sequences are considered by # of bytes instead of # of chars. This is necessary because PostgreSQL truncates identifiers by # of bytes instead of # of chars, so that identifiers will actually be less than 63 chars long when some chars were multi-byte.

3748 08/02/2012 04:11 PM Aaron Marcuse-Kubitza

strings.py: ustr(): Call str() method manually like urepr() to avoid Unicode errors when the returning string is non-ASCII

3747 08/02/2012 03:54 PM Aaron Marcuse-Kubitza

strings.py: Added urepr() and use it in repr_no_u(), to better support repr() return values with non-ASCII characters. Avoiding repr() also provides a more complete stack trace in the case of such errors.

3746 08/01/2012 11:37 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: plantobservation: plantobservation_aggregateoccurrence_count_1() trigger: Don't raise an error if existing count was >1, because there are in fact datasets (notably SALVIAS) where input records for individual stems may themselves contain aggregate data (such as plant and stem counts). For this data, we have an anomalous condition where an aggregateoccurrence has count >1 but contains one plantobservation, due to the plant/stem count being included in the first stem's record. (See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Data-interpretation-issues> for more info on this problem.) Note that our desired 1:1 relationship between aggregateoccurrence and plantobservation is still guaranteed by a constraint, but the anomalous data may still cause irregularities later on in the analysis.

3745 08/01/2012 10:55 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Ignoring all rows on unrecoverable errors: Also support the case where has_joins == True, by setting it to False so that the no-joins case is effectively used

3744 08/01/2012 10:32 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Moved Simultaneously above Independently because that is how we are now running the imports

3743 08/01/2012 10:21 AM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

3742 08/01/2012 09:50 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: 1_to_1 and *_unique_within unique indexes with a `WHERE sourceaccessioncode IS NULL` filter: Added IS NULL filters for other unique keys, so that these fallback indexes would only be used if there was no (or no other) way to uniquely identify their tables. For *_1_to_1 unique indexes, this is the case for specimens data.

3741 08/01/2012 09:48 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: 1_to_1 and *_unique_within unique indexes with a `WHERE sourceaccessioncode IS NULL` filter: Added IS NULL filters for other unique keys, so that these fallback indexes would only be used if there was no (or no other) way to uniquely identify their tables. For *_1_to_1 unique indexes, this is the case for specimens data.

3740 08/01/2012 09:41 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: stemobservation: Replaced stemobservation_unique_code unique constraint with stemobservation_unique_within_plantobservation unique index that uses COALESCE and WHERE ... IS NOT NULL appropriately, to work with sql_gen's use of COALESCE indexes and (for the renaming) to better reflect what it does

3739 08/01/2012 09:36 AM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Synced with schema

3738 08/01/2012 09:30 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: 1_to_1 and *_unique_within unique indexes intended to operate only when sourceaccessioncode is NULL: Changed to use `sourceaccessioncode IS NULL` WHERE condition instead of COALESCE element, since the sourceaccessioncode is not actually needed for the uniquification (it is already globally unique within the datasource if it's not NULL; this just covers the case where it is NULL)

3737 08/01/2012 09:23 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: unique_within unique indexes used for 1:1 relationships: Renamed to __1_to_1 to better reflect what they do

3736 08/01/2012 09:21 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: unique_within unique indexes used for 1:1 relationships: Renamed to __1_to_1 to better reflect what they do

3735 08/01/2012 08:58 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: plantobservation: Corrected plantobservation_aggregateoccurrence_id_1_to_1's name to plantobservation_aggregateoccurrence_1_to_1 because it's 1:1 with aggregateoccurrence, not aggregateoccurrence_id. Made it a unique index for consistency with our general method of expressing unique constraints on potentially nullable columns.

3734 08/01/2012 08:54 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: Renamed specimenreplicate_unique_plantobservation to specimenreplicate_plantobservation_1_to_1 to better reflect what it does

3733 08/01/2012 08:50 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: locationevent unique indexes: Renamed to unique_within to better reflect what they do

3732 08/01/2012 08:34 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: location: Removed redundant location_unique_sourceaccessioncode unique constraint, which has been replaced by location_unique_within_datasource

3731 08/01/2012 08:31 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Reset foreign key constraint names to autogenerated defaults for consistency

3730 08/01/2012 08:27 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed *_unique_datasource unique indexes to *_unique_within_datasource to better reflect what they do

3729 08/01/2012 08:25 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: locationevent: Renamed locationevent_unique_accessioncode to locationevent_unique_within_location to better reflect what it does

3728 08/01/2012 08:22 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: Renamed specimenreplicate_unique_accessioncode to specimenreplicate_unique_within_datasource to better reflect what it does

3727 08/01/2012 08:11 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: stemobservation: Renamed stemobservation_unique_accessioncode to stemobservation_unique_within_plantobservation and also apply it to NULL sourceaccessioncodes, so that a plantobservation can have a single stemobservation for its single stem's traits without needing a separate sourceaccessioncode for it