/ - Changes - BIEN 3 - NCEAS Projects

root @ 3843

#	Date	Author	Comment
3843	08/07/2012 09:08 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegCSV.stems.csv: Resolved ambiguous terms that appeared twice on the output side
3842	08/07/2012 08:52 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegCSV.stems.csv: Mapped VegX abioticObservation terms
3841	08/07/2012 08:36 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegCSV.stems.csv: Mapped standard DwC terms
3840	08/07/2012 08:13 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Sources: Replaced DwC with http://rs.tdwg.org/dwc/terms/, because DwC terms can come from many places but the DwC source referred specifically to this web page
3839	08/07/2012 08:06 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Corrected mapping for previousCatalogNumber
3838	08/07/2012 08:00 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added source of datasources' custom terms
3837	08/07/2012 07:51 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added source of DwC 1.2 (http://digir.net/schema/conceptual/darwin/2003/1.0/darwin2.xsd), aka DwC Classic, terms
3836	08/07/2012 07:43 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added source of custom NY staging table terms in nimoy.bien2_staging.nybg_raw
3835	08/07/2012 07:27 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added source of DwC 1.21 (http://digir.net/schema/conceptual/darwin/manis/1.21/darwin2.xsd) terms
3834	08/07/2012 07:02 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Sources: Replaced DwC with http://rs.tdwg.org/dwc/terms/, because DwC terms can come from many places but the DwC source referred specifically to this web page
3833	08/07/2012 06:51 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added source of remappings of DwC terms with /_alt added
3832	08/07/2012 06:46 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added source of DwC terms with namespace removed
3831	08/07/2012 06:32 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegCSV.stems.csv: Added "computer." before taxonomic terms whose VegX mapping used the "computer" role. (This is useful for datasources that supply separate determinations in the same row, such as SALVIAS.)
3830	08/07/2012 06:23 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Added Source column containing "DwC" for every field with a an entry in the Order column, so that the source of the term can be tracked once we start combining DwC and VegCSV
3829	08/07/2012 06:07 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS*/maps/VegX.organisms.csv: Fixed missing join mappings for stemobservation-related fields
3828	08/07/2012 05:56 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Repopulated Order values for the few rows that had lost it in the process of copying and pasting mappings
3827	08/07/2012 05:49 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Added Source column containing "DwC" for every field with a an entry in the Order column, so that the source of the term can be tracked once we start combining DwC and VegCSV
3826	08/07/2012 05:38 AM	Aaron Marcuse-Kubitza	mappings/Makefile: VegX-VegCSV.stems.csv: Clean up when edited using sort_map
3825	08/07/2012 05:27 AM	Aaron Marcuse-Kubitza	Added mappings/VegCSV-VegBIEN.specimens.csv, which is generated from VegX-VegCSV.stems.csv
3824	08/07/2012 05:19 AM	Aaron Marcuse-Kubitza	mappings/for_review: svn:ignore OpenOffice.org lock files
3823	08/07/2012 05:14 AM	Aaron Marcuse-Kubitza	Added mappings/VegX-VegCSV.stems.csv. The initial version is autogenerated by joining the simplified VegBIEN XPaths of related maps.
3822	08/07/2012 05:05 AM	Aaron Marcuse-Kubitza	join: Support discarding multiple outputs if they should be considered ambiguous
3821	08/07/2012 04:40 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: $(missingMappingsCmd): Support non-DwC mappings by matching entire line containing mapping, not just word characters. Remove any XML function so that merging of non-empty join mappings still works properly.
3820	08/07/2012 03:35 AM	Aaron Marcuse-Kubitza	mappings/Makefile: Use new invert
3819	08/07/2012 03:35 AM	Aaron Marcuse-Kubitza	Added invert
3818	08/07/2012 03:31 AM	Aaron Marcuse-Kubitza	mappings/Makefile: for_review/VegBIEN-DwC2.specimens.csv: Include all comments column(s), not just the first
3817	08/07/2012 03:27 AM	Aaron Marcuse-Kubitza	cols: Removed special handling of '+' because list_subset() now handles this col_num value itself, by appending the rest of the columns. Support intermixing int and '+' columns, by using new format.str2int_passthru().
3816	08/07/2012 03:23 AM	Aaron Marcuse-Kubitza	util.py: list_subset(): Made an index of '+' append the rest of the list
3815	08/07/2012 03:21 AM	Aaron Marcuse-Kubitza	format.py: Added str2int_passthru()
3814	08/07/2012 03:16 AM	Aaron Marcuse-Kubitza	cols: Changed value for all columns to '+' so that it wouldn't need to be shell-escaped as '*' was
3813	08/07/2012 01:42 AM	Aaron Marcuse-Kubitza	review: Remove keys except last. This should increase the number of matches between human-readable VegBIEN XPaths of VegX and DwC2.
3812	08/07/2012 01:39 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Use :[] instead of [] for all XML functions, so that the XML function args will get removed by review
3811	08/07/2012 01:18 AM	Aaron Marcuse-Kubitza	review: Remove XML functions. This should increase the number of matches between human-readable VegBIEN XPaths of VegX and DwC2.
3810	08/07/2012 12:34 AM	Aaron Marcuse-Kubitza	mappings/Makefile: human-readable maps in for_review: Simplify just the output column so that the input column can be programmatically linked back to the original input names/XPaths
3809	08/07/2012 12:26 AM	Aaron Marcuse-Kubitza	mappings/Makefile: Removed no longer used $(chRoot), $(cpReview)
3808	08/07/2012 12:23 AM	Aaron Marcuse-Kubitza	Removed the human-readable mappings mappings/for_review/VegX-VegBIEN.plots.csv, VegX-VegBIEN.organisms.csv because these are now duplicates of VegX-VegBIEN.stems.csv
3807	08/07/2012 12:20 AM	Aaron Marcuse-Kubitza	review: Support limiting the XPath simplifying to custom columns, rather than always the first two
3806	08/07/2012 12:12 AM	Aaron Marcuse-Kubitza	review: Usage message: Fixed typo
3805	08/07/2012 12:10 AM	Aaron Marcuse-Kubitza	Added mappings/for_review/VegBIEN-DwC2.specimens.csv, generated by inverting for_review/DwC2-VegBIEN.specimens.csv. This will be used to help translate VegX->VegCSV.
3804	08/06/2012 11:44 PM	Aaron Marcuse-Kubitza	mappings: Made VegX-VegBIEN.organisms.csv, VegX-VegBIEN.plots.csv symlinks to VegX-VegBIEN.stems.csv instead of building them in the Makefile by copying VegX-VegBIEN.stems.csv, since these files are now always the same
3803	08/06/2012 09:29 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: _if that maps to specimenreplicate via plantobservation or voucher: Refactored to map right-hand side of _eq in the left-hand side mapping, rather than in all then/else mappings. Distinguish this _if statement from others using new name param.
3802	08/06/2012 09:16 PM	Aaron Marcuse-Kubitza	xml_func.py: _if(): Documented that can add `name` param to distinguish separate _if statements
3801	08/06/2012 09:08 PM	Aaron Marcuse-Kubitza	xml_func.py: _if(): Made cond optional. When it's not specified or None, it is treated as False. This supports cases where all elements of the condition are required but not mapped to.
3800	08/06/2012 08:50 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: _if that maps to specimenreplicate via plantobservation or voucher: Refactored to map voucherType directly into _if/cond/_eq/left rather than mapping it to a temporary _ignore location and retrieving it with _ref
3799	08/06/2012 08:47 PM	Aaron Marcuse-Kubitza	xml_func.py: Removed no longer used _simplifyPath(), which is now a built-in function of db_xml.put()
3798	08/06/2012 08:36 PM	Aaron Marcuse-Kubitza	xml_func.py: _eq(): Documented that '' (empty node) is returned if a value was not mapped to, not if a value was None, since None arguments are no longer removed by process() (now XML functions do this manually with conv_items())
3797	08/06/2012 08:19 PM	Aaron Marcuse-Kubitza	xml_func.py: _ref(): Only display "XPath reference target missing" warning if target node does not exist, not if it exists but is empty
3796	08/06/2012 08:17 PM	Aaron Marcuse-Kubitza	xpath.py: get(): reference expansion: Use get_1() and check for None result instead of using get(), which returns multiple nodes when we just want the first
3795	08/06/2012 07:39 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Reversed XPaths so that they start with location instead of plantobservation
3794	08/06/2012 07:30 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: Added $(cp)
3793	08/06/2012 05:58 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Include lib/common.Makefile
3792	08/06/2012 05:57 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: Added $(CP)
3791	08/06/2012 05:36 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import
3790	08/03/2012 09:59 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Reversed input XPaths so that they start with plot instead of individualOrganismObservation as stem
3789	08/03/2012 09:57 PM	Aaron Marcuse-Kubitza	inputs/CTFS: Disabled maps because CTFS is not yet compatible with reversed XPaths, but the effort required to make it compatible is not worth including in the current commit. We lose only 2 test rows of test VegX data by doing this, since the full CTFS VegX files were never able to be imported.
3788	08/03/2012 08:31 PM	Aaron Marcuse-Kubitza	ch_root, ch_root_via: Documented that these are usually not idempotent operations
3787	08/03/2012 07:42 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: input (VegX) root: Removed tcs namespace URL to simplify the XPath reversing process. It isn't needed now that we don't generate intermediate XML documents in the automated tests (because intermediate formats are no longer required to be XML schemas).
3786	08/03/2012 07:16 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Reversed XPaths so that they start with location instead of specimenreplicate
3785	08/03/2012 07:00 PM	Aaron Marcuse-Kubitza	README.TXT: WinMerge setup: Documented how to get to Compare Options page
3784	08/03/2012 06:59 PM	Aaron Marcuse-Kubitza	README.TXT: WinMerge setup: Added step to set Whitespace to Ignore change
3783	08/03/2012 06:55 PM	Aaron Marcuse-Kubitza	README.TXT: Moved WinMerge setup to separate section. Changed Moved block detection link to the Configuration page.
3782	08/03/2012 06:32 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Expanded {} expressions using expand_braces, so that each distinct output for the same input is on its own line, improving readability. This will also help enable search-and-replace reversing of XPaths for the re-rooting to location.
3781	08/03/2012 06:17 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: VegX XPaths: Expanded {} expressions using expand_braces, so that later use of expand_braces on the file would not affect the VegX output mappings of the inputs' via maps (VegX.organisms.csv, etc.)
3780	08/03/2012 05:54 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Expanded {} expressions using expand_braces, so that each distinct output for the same input is on its own line, improving readability. This will also help enable search-and-replace reversing of XPaths for the re-rooting to location.
3779	08/03/2012 05:52 PM	Aaron Marcuse-Kubitza	README.TXT: Accepting test cases: Documented that when refactoring mappings, it's helpful to use WinMerge to detect moved lines
3778	08/03/2012 05:14 PM	Aaron Marcuse-Kubitza	expand_braces: Fixed bug where needed to get next line from stdin in raw mode, so that \ won't be parsed as escape chars
3777	08/03/2012 04:59 PM	Aaron Marcuse-Kubitza	join: Fixed bug where when an input to mapped to multiple outputs, the joined row for each output needed to be output separately using writer.writerow()
3776	08/03/2012 03:52 PM	Aaron Marcuse-Kubitza	sort_map: Remove duplicates resulting from multiple outputs for the same input. mappings/Makefile: $(mkSelfMap): Removed uniq now that sort_map does this.
3775	08/03/2012 03:24 PM	Aaron Marcuse-Kubitza	mappings/Makefile: $(mkSelfMap): Run uniq on the output to remove duplicates resulting from multiple outputs for the same input
3774	08/03/2012 03:10 PM	Aaron Marcuse-Kubitza	expand_braces: Also expand XPaths containing [], with up to one level of nesting (which is the most we currently use), because many {} XPaths do in fact contain []. Debug-print intermediate values when env var expand_braces_debug is true. Added usage message.
3773	08/02/2012 11:13 PM	Aaron Marcuse-Kubitza	expand_braces: Fixed bug where ./{ and brackets with commas inside {} are unparseable, and should not be expanded
3772	08/02/2012 11:05 PM	Aaron Marcuse-Kubitza	expand_braces: Fixed bug where `head -1` seemed to read more lines than just the first, causing EOF to be returned after the first line, by using `read` instead. Support data containing \r (such as Excel-dialect CSVs) by removing it. Fixed bug where ./{...} was not being properly escaped.
3771	08/02/2012 10:08 PM	Aaron Marcuse-Kubitza	Added expand_braces
3770	08/02/2012 09:12 PM	Aaron Marcuse-Kubitza	mappings: location: Removed centerlatitude/centerlongitude mappings because the lat/long should be in only one place: the locationdetermination. It is up to the database querier to decide which locationdetermination(s) to use as the coordinates for a plot/specimen.
3769	08/02/2012 08:54 PM	Aaron Marcuse-Kubitza	bin/map: input is CSV: Removed unused map_ var
3768	08/02/2012 08:50 PM	Aaron Marcuse-Kubitza	bin/map: Documented that it's multi-safe (supports an input appearing multiple times)
3767	08/02/2012 08:39 PM	Aaron Marcuse-Kubitza	subtract: Documented that it's multi-safe (supports an input appearing multiple times)
3766	08/02/2012 08:32 PM	Aaron Marcuse-Kubitza	join: Made it multi-safe (supports an input appearing multiple times)
3765	08/02/2012 08:30 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: Added empty clean target to make sure `make clean` always works
3764	08/02/2012 08:03 PM	Aaron Marcuse-Kubitza	root Makefile, input.Makefile: Maps validation: Treat missing join mappings differently from missing non-empty join mappings, because they indicate mapping to an invalid location, which is a bug. Factored maps validation code out into new lib/mappings.Makefile.
3763	08/02/2012 08:00 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: Added vars for chars not allowed in make targets. Added functions/vars to replace "_" with " ".
3762	08/02/2012 07:38 PM	Aaron Marcuse-Kubitza	root Makefile: Include lib/common.Makefile
3761	08/02/2012 07:37 PM	Aaron Marcuse-Kubitza	input.Makefile: Include lib/common.Makefile
3760	08/02/2012 06:48 PM	Aaron Marcuse-Kubitza	intersect: Documented that it's multi-safe (supports an input appearing multiple times)
3759	08/02/2012 06:42 PM	Aaron Marcuse-Kubitza	union: Documented that it's multi-safe (supports an input appearing multiple times)
3758	08/02/2012 06:00 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Moved shared /specimenreplicate root to mappings in preparation for reversing the XPaths so that parent table paths (such as location) don't contain a prefix for child tables (specimenreplicate, locationevent, etc.). This reversing will avoid the need to "ch_root" the child table map to obtain maps for parent tables with the prefixes removed, allowing all hierarchical levels to use the same map spreadsheet.
3757	08/02/2012 05:53 PM	Aaron Marcuse-Kubitza	ch_root: Support column headers without a root, for non-hierarchical formats such as DwC
3756	08/02/2012 05:45 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: rsync: Time the rsync operation
3755	08/02/2012 05:29 PM	Aaron Marcuse-Kubitza	in_place: Wrap EXIT handler in shell function so that "-escaping can easily be used on the temp file path
3754	08/02/2012 05:26 PM	Aaron Marcuse-Kubitza	in_place: Documented that doesn't update file on error
3753	08/02/2012 05:23 PM	Aaron Marcuse-Kubitza	DwC mappings: Removed ':/list/' root (full version: '::[@xmlns:dcterms=http://purl.org/dc/terms/]/list/') from map spreadsheets to simplify the boilerplate in each file. Since intermediate DwC XML files no longer need to be produced for automated tests, these roots are not needed.
3752	08/02/2012 04:46 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import
3751	08/02/2012 04:40 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Moved independent-import data to separate tab so that it wouldn't get moved to the side whenever a new column of simultaneous-import data is inserted. It is also no longer updated, because all column-based imports are now done simultaneously.
3750	08/02/2012 04:32 PM	Aaron Marcuse-Kubitza	Use strings.ustr() or strings.urepr() everywhere that columns are stringified, in order to support column names with non-ASCII characters (such as in the Madidi data)
3749	08/02/2012 04:16 PM	Aaron Marcuse-Kubitza	strings.py: concat(): Convert args to raw (non-Unicode) strings first, so that multi-byte Unicode sequences are considered by # of bytes instead of # of chars. This is necessary because PostgreSQL truncates identifiers by # of bytes instead of # of chars, so that identifiers will actually be less than 63 chars long when some chars were multi-byte.
3748	08/02/2012 04:11 PM	Aaron Marcuse-Kubitza	strings.py: ustr(): Call str() method manually like urepr() to avoid Unicode errors when the returning string is non-ASCII
3747	08/02/2012 03:54 PM	Aaron Marcuse-Kubitza	strings.py: Added urepr() and use it in repr_no_u(), to better support repr() return values with non-ASCII characters. Avoiding repr() also provides a more complete stack trace in the case of such errors.
3746	08/01/2012 11:37 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantobservation: plantobservation_aggregateoccurrence_count_1() trigger: Don't raise an error if existing count was >1, because there are in fact datasets (notably SALVIAS) where input records for individual stems may themselves contain aggregate data (such as plant and stem counts). For this data, we have an anomalous condition where an aggregateoccurrence has count >1 but contains one plantobservation, due to the plant/stem count being included in the first stem's record. (See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Data-interpretation-issues> for more info on this problem.) Note that our desired 1:1 relationship between aggregateoccurrence and plantobservation is still guaranteed by a constraint, but the anomalous data may still cause irregularities later on in the analysis.
3745	08/01/2012 10:55 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Ignoring all rows on unrecoverable errors: Also support the case where has_joins == True, by setting it to False so that the no-joins case is effectively used
3744	08/01/2012 10:32 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Moved Simultaneously above Independently because that is how we are now running the imports

Project

General

Profile