/ - Changes - BIEN 3 - NCEAS Projects

root @ 4065

#	Date	Author	Comment
4065	08/15/2012 10:43 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: occurrenceID: Mapped to specimenreplicate.sourceaccessioncode for mergability with DwC
4064	08/15/2012 09:14 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: Mapped voucherType to indirect voucher _if statements' conditions
4063	08/15/2012 09:02 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: locationID: location.sourceaccessioncode mapping: Added /_alt suffix for mergability with DwC
4062	08/15/2012 08:53 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: collectionID: Mapped to location.authorlocationcode as merge with collectionCode, the same way as it is for specimenreplicate.collectioncode_dwc
4061	08/15/2012 08:23 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: location_unique_within_datasource_by_authorlocationcode unique index: Added `parent_id IS NULL` condition so that an authorlocationcode is not unintentionally treated as globally unique when a parent location is available (which implies that the authorlocationcode is a subplot code)
4060	08/15/2012 08:20 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Added location.authorlocationcode mapping for mergability with DwC
4059	08/15/2012 08:13 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: location.authorlocationcode mappings: Added /_alt/3 for mergability with VegCSV mappings to same field
4058	08/15/2012 08:05 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Wrapped all mappings in direct voucher _if for mergability with VegCSV
4057	08/15/2012 07:57 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Moved direct/indirect voucher _if inwards to wrap just the value of catalognumber_dwc, not the catalognumber_dwc field node, to match the corresponding VegCSV mapping
4056	08/15/2012 07:48 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Replaced _alt with _merge where applicable to avoid losing source data on import when multiple fields collide
4055	08/15/2012 07:46 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up using `make mappings/`
4054	08/15/2012 07:18 AM	Aaron Marcuse-Kubitza	schemas/functions.sql: join_strs_transform(): Use STRICT optimization to avoid needing to manually check if the state value or input value is NULL (http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html#AEN51596)
4053	08/15/2012 07:15 AM	Aaron Marcuse-Kubitza	schemas/functions.sql: join_strs(), join_strs_transform(): Reversed order of params to enable strict optimization, which replaces the state value with the first parameter, which used to be the delimiter (http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html#AEN51596)
4052	08/15/2012 07:07 AM	Aaron Marcuse-Kubitza	Renamed join_strs_transform_preserve_empty() to join_strs_transform() now that there are no other join_strs_transform_...() functions
4051	08/15/2012 07:06 AM	Aaron Marcuse-Kubitza	schemas/functions.sql: Removed no longer used join_strs_transform_fold_empty()
4050	08/15/2012 07:06 AM	Aaron Marcuse-Kubitza	schemas/functions.sql: join_strs() aggregate: Use join_strs_transform_preserve_empty() as an optimization because all our data has already had '' replaced with NULL by sql_io.cleanup_table() in csv2db. This will help speed up _merges now that they are performed on a large scale in the slowest datasource, SpeciesLink.
4049	08/15/2012 07:02 AM	Aaron Marcuse-Kubitza	bin/map: collision_suffix: Changed to use _merge instead of _alt to avoid losing source data on import when multiple fields collide
4048	08/15/2012 06:58 AM	Aaron Marcuse-Kubitza	bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed
4047	08/15/2012 06:56 AM	Aaron Marcuse-Kubitza	bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed
4046	08/15/2012 06:52 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence.sourceaccessioncode mappings: Added catalogNumber mapping, which takes precendence over recordNumber and is applicable to specimens data and direct vouchers. recordNumber should only be used as a last resort (before the taxon name) because this is collector-assigned and often not unique within anything.
4045	08/15/2012 06:34 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Moved direct/indirect voucher _ifs inwards to wrap just the value of catalognumber_dwc, not the catalognumber_dwc field node, so that a future SQL function implementation of _if only needs to concern itself with returning one value or another, not with handling XML subtrees. The previous moving of the _ifs in r3942 was intended to effect this, but the _ifs weren't moved in far enough to wrap just the value.
4044	08/15/2012 06:21 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: eventDate mappings: Removed collectiondate mapping because the eventDate refers only to the plot event. Added /_alt suffixes for mergability with DwC.
4043	08/15/2012 06:15 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Split eventDate into eventDate and dateCollected, where eventDate refers only to the date of the sampling event, but dateCollected also refers to the date the particular specimen was collected. (This distinction is important in merging with VegCSV, because in plots data, these two fields are distinct.) Remapped datasources with dateCollected-related fields to new dateCollected.
4042	08/15/2012 05:55 AM	Aaron Marcuse-Kubitza	bin/map: Run new xml_func.simplify() on the root before printing the put template, so that _alts and _merges with only one element for the current datasource will be printed in their simplified form (with the _alt/_merge removed). This faciliates automated testing after an _alt/_merge suffix has been added, because the put template provided as part of the automated test will only change for those datasources that actually have an entry for both mappings, which greatly reduces the number of tests that need to be accepted.
4041	08/15/2012 05:51 AM	Aaron Marcuse-Kubitza	xml_func.py: Added simplify()
4040	08/15/2012 05:45 AM	Aaron Marcuse-Kubitza	xpath.py: put_obj(): Use new get_values(), so that the returned nodes are not modified by XML tree transformations, such as those performed by xml_func.process()
4039	08/15/2012 05:43 AM	Aaron Marcuse-Kubitza	Added get_values()
4038	08/15/2012 05:41 AM	Aaron Marcuse-Kubitza	xml_dom.py: is_empty(): Treat whitespace-only text nodes (including text nodes containing empty strings) as empty. This will also support None equivalents in text nodes, because they are isspace_none_str, which is considered whitespace.
4037	08/15/2012 05:36 AM	Aaron Marcuse-Kubitza	xml_func.py: _map(): Don't remove None params, because are valid values and must be supported. This will become an issue once empty strings in text nodes are considered equivalent to None.
4036	08/15/2012 05:33 AM	Aaron Marcuse-Kubitza	xml_func.py: _units(): Don't remove None params, because are valid values and must be supported. This will become an issue once empty strings in text nodes are considered equivalent to None.
4035	08/15/2012 05:25 AM	Aaron Marcuse-Kubitza	xml_func.py: _name(): Fixed bug where needed to pass None values through and handle no name parts to properly support NULL propagation
4034	08/15/2012 05:08 AM	Aaron Marcuse-Kubitza	xml_dom.py: value(), set_value(): Use new strings.isspace_none_str as sentinel None equivalent, to support cloning text nodes containing a sentinel None
4033	08/15/2012 05:06 AM	Aaron Marcuse-Kubitza	xml_dom.py: value(), set_value(): Use new strings.isspace_none_str as sentinel None equivalent, to support cloning text nodes containing a sentinel None
4032	08/15/2012 05:04 AM	Aaron Marcuse-Kubitza	strings.py: Added isspace_none_str to support clone-safe sentinel str values that pass isspace()
4031	08/15/2012 04:51 AM	Aaron Marcuse-Kubitza	xml_dom.py: is_whitespace(): Also consider empty text nodes to be whitespace
4030	08/15/2012 04:47 AM	Aaron Marcuse-Kubitza	xml_dom.py: is_whitespace(): Support text nodes whose value() is None by using .nodeValue instead
4029	08/15/2012 04:44 AM	Aaron Marcuse-Kubitza	xml_dom.py: set_value(): Don't set the value of a text node to None by removing it, because this prevents the node from being reused. Instead use a sentinel string value to denote None, and map to and from it.
4028	08/15/2012 04:40 AM	Aaron Marcuse-Kubitza	strings.py: Added none_str and helper class NonInternedStr to support sentinel str values
4027	08/15/2012 04:19 AM	Aaron Marcuse-Kubitza	xml_dom.py: set_value(): Support setting the value of a text node to None, by removing it
4026	08/15/2012 03:44 AM	Aaron Marcuse-Kubitza	Removed trailing whitespace on non-empty lines
4025	08/15/2012 03:40 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): DuplicateKeyException: is_literals: Fixed bug where sql.select() needed to select on just the join_cols, not the whole mapping
4024	08/15/2012 03:14 AM	Aaron Marcuse-Kubitza	xml_func.py: process(): Removed support for no longer used structural functions
4023	08/15/2012 03:13 AM	Aaron Marcuse-Kubitza	xml_func.py: Removed no longer used structural functions
4022	08/15/2012 03:05 AM	Aaron Marcuse-Kubitza	mappings/for_review/DwC2-VegBIEN.specimens.fields.csv: input root: Removed DwC XML path info since DwC is now a CSV schema
4021	08/15/2012 02:57 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: eventDate: Also map to obsstartdate/obsenddate, since the collectiondate is also the event date for specimens data, and for mergability with VegCSV
4020	08/15/2012 02:24 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: eventDate: Added mappings to obsstartdate/obsenddate, since users of this field (currently SALVIAS census_date) intend it as the plot event's date. Keep the mapping to collectiondate because a non-range plot event date is also the collectiondate of all organisms in that plot event.
4019	08/15/2012 02:05 AM	Aaron Marcuse-Kubitza	schemas/py_functions.sql: parse_date_range(): Always return a value for end date, even if string is not a date range. This enables using _dateRangeEnd() as a filter function on anything intended as an end date.
4018	08/15/2012 01:53 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: eventDate: collectiondate mapping: Removed _dateRangeStart filter because the eventDate (obsstartdate) is only valid as the date the specimen was collected if it is a single date, not a date range. (It is still valid as the obsstartdate/obsenddate if it's a range.)
4017	08/15/2012 01:49 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added dateCollected
4016	08/15/2012 12:45 AM	Aaron Marcuse-Kubitza	input via maps: Removed _date/date filter from date fields because the main mappings now have _date around all dates, so this filter is redundant
4015	08/15/2012 12:39 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS-CSV/maps/VegCSV.organisms.csv: census_date: Don't map directly to the year, as this field is allowed to be a full date even though our data sample contains only years. Note that _date/date will automatically detect plain years and treat them as years, and so will casts to timestamp.
4014	08/15/2012 12:33 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS*/maps/VegCSV.organisms.csv: census_date: Documented that this is for the subplot, not the organism, as all organisms in a subplot have the same value for it
4013	08/15/2012 12:09 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: verbatimLatitude/verbatimLongitude: Fixed mappings to use _alt/2 instead of _alt/1 to avoid collisions with decimalLatitude/decimalLongitude
4012	08/14/2012 11:54 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: _merge(): Changed sort_orders to match the $-variable name instead of the function parameter name, so each line of the VALUES clause would use the same number for both
4011	08/14/2012 11:52 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: _merge(): Filter out NULL values as optimization so DISTINCT ON only has to consider non-NULL values
4010	08/14/2012 11:48 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: join_strs(): Return NULL if all strings were NULL or ''. This fixes unexpected behavior in _merge() where all elements are NULL but the return value is non-NULL.
4009	08/14/2012 11:32 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: Added join_strs_transform_preserve_empty() and use it in join_strs_transform_fold_empty()
4008	08/14/2012 11:25 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: Renamed join_strs_() to join_strs_transform_fold_empty() for clarity and to indicate that it's for use by the join_strs() aggregate
4007	08/14/2012 11:11 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: recordNumber: Added VegCSV mappings for it
4006	08/14/2012 10:51 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: occurrenceID: Added VegCSV mappings for it
4005	08/14/2012 10:44 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: mappings to /location/sourceaccessioncode: Added _alt to prioritize them properly
4004	08/14/2012 10:39 PM	Aaron Marcuse-Kubitza	inputs/UNCC/maps/DwC.specimens.csv: herbarium: Fixed mapping to go to institutionCode instead of collectionCode
4003	08/14/2012 10:36 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Remapped institutionCode/collectionCode/catalogNumber location mappings to location.authorlocationcode
4002	08/14/2012 09:50 PM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Reset methodtaxonclass lines so that only one needs to be repositioned after syncing with the schema
4001	08/14/2012 09:31 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: locationID: Removed mapping to locationevent.sourceaccessioncode, because locationID relates to the plot, not the plot event. (The locationevent is scoped by the location when the sourceaccessioncode and authoreventcode are not specified, so duplicate elimination will still occur correctly.)
4000	08/14/2012 09:27 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Mapped locationID, for mergability with VegCSV
3999	08/14/2012 09:04 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: plotName: Removed authoreventcode mapping because plotName relates to the plot, not the plot event. (The locationevent is scoped by the location when the authoreventcode is not specified, so duplicate elimination will still occur correctly.) Instead map only authoreventcode-related fields (currently CVS's authorObsCode) to authoreventcode, via DwC's (confusingly-named) fieldNumber ("An identifier given to the event in the field").
3998	08/14/2012 08:40 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: locationevent: locationevent_unique_within_location: Added authoreventcode to index. It was already in the locationevent_unique_within_parent_by_authoreventcode index, but also needed to be in the no-parent (non-subplot) index. This fixes locationevent duplicate elimination when a locationevent sourceaccessioncode is not specified.
3997	08/14/2012 08:27 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: location_unique_within_datasource unique index: Added COALESCE and `WHERE sourceaccessioncode IS NOT NULL` now that sourceaccessioncode is nullable. Renamed location_unique_within_datasource and location_unique_authorlocationcode to location_unique_within_datasource_by_... to show that both are alternatives for globally unique keys. schemas/vegbien.ERD.mwb: Moved elements slightly to reduce the number of lines that need to be repositioned after syncing with the schema.
3996	08/14/2012 07:35 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Mapped verbatimElevation and samplingProtocol, for mergability with VegCSV
3995	08/14/2012 07:12 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import
3994	08/13/2012 06:12 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: location unique keys: Map to a new parent location for the location, instead of a parent locationevent for the locationevent. This much simpler mapping (which does not require _alt or _merge) is possible now that the necessary unique indexes have been set up.
3993	08/13/2012 05:52 PM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports, now including both pages in vegbien.ERD.core.pdf. Renamed schemas/vegbien.ERD.core.pdf to vegbien.ERD.pdf because it now includes the full schema.
3992	08/13/2012 05:48 PM	Aaron Marcuse-Kubitza	schemas/filter_ERD.csv: Removed extraneous lines to improve readability. schemas/vegbien.ERD.mwb: Reconfigured elements to put only the most important ones in the core subset (the top page).
3991	08/13/2012 03:59 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: Made sourceaccessioncode optional if authorlocationcode is specified, since either of these fields can now serve as the unique key
3990	08/13/2012 03:39 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: Map to new location.authorlocationcode
3989	08/13/2012 03:23 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: Support uniquely specifying a location by its authorlocationcode
3988	08/13/2012 03:13 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: Added authorlocationcode to unique indexes
3987	08/13/2012 02:58 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: Added authorlocationcode
3986	08/13/2012 02:45 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: location: Added location_unique_within_parent_by_coords unique index that uses COALESCE, replacing location_unique_subplot_coords unique constraint
3985	08/13/2012 02:07 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: maximumElevationInMeters: Fixed bug where _rangeEnd filter needed to be removed because this only works on a field which can be either a range or the start of a range, such as minimumElevationInMeters (on an end-of-range field, a single value will be removed completely). Added _alt for mergeability with DwC. minimumElevationInMeters: Added elevationrange-to mapping using _rangeEnd for mergeability with DwC.
3984	08/13/2012 01:53 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: minimum/maximumElevationInMeters, minimum/maximumDepthInMeters: Remove any "ca." prefix from value. Doing this on all elevation/depth fields will make the DwC and VegCSV mappings mergeable.
3983	08/13/2012 01:04 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: locality: Mapped using same XPath as DwC, to enable merging
3982	08/13/2012 01:01 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Mapped individualCount. This will enable merging with VegCSV.
3981	08/13/2012 12:51 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up. This still needs to be run manually with `make mappings/` because the derived maps are symlinks rather than make targets, so make never touches the non-derived map and doesn't run its recipe in the automated tests
3980	08/13/2012 12:48 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxondetermination mappings: Removed iscurrent=true because it is not the role of the mappings to specify which taxondetermination is the current one. Eventually, the order of the determinations will need to be specified using a sort # or similar, and the DB will select the current one for queries to use. Ensure all mappings have :[isoriginal=true] so that they match up between DwC and VegCSV.
3979	08/13/2012 12:35 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxondetermination mappings: Ensure all mappings have :[iscurrent=true] or equivalent so that they sort together, and match up between DwC and VegCSV
3978	08/13/2012 12:19 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: individualCount: Disambiguated alternate meaning as stem count by changing stem count fields to map to new stemCount term, which maps to plantobservation.stemcount
3977	08/13/2012 12:12 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added stemCount
3976	08/13/2012 12:10 PM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up
3975	08/13/2012 12:01 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Mapped identificationQualifier. This will enable merging with VegCSV.
3974	08/13/2012 11:59 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: identificationQualifier (taxon fit): Removed mapping to prefix of binomial field, since that field should just contain what the datasource said was the binomial. It's TNRS's job to concatenate the taxon fit, etc. with the binomial and other name parts for name resolution.
3973	08/13/2012 11:27 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: fieldNumber: Remapped to authoreventcode because this is (confusingly) the author code for the event, according to the DwC definition
3972	08/13/2012 11:22 AM	Aaron Marcuse-Kubitza	inputs/NY, ARIZ: FieldNumber: Remapped to recordNumber because term usage was inconsistent with DwC definition. Datasources sometimes confuse this term, because it seems like the collection number, but is actually the author code for the event (VegBank's authorObsCode).
3971	08/13/2012 11:20 AM	Aaron Marcuse-Kubitza	schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated
3970	08/13/2012 10:58 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Removed Source column and source-related comments because this information is now maintained in mappings/Veg+.terms.csv
3969	08/13/2012 10:55 AM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Removed Source column because this information is now maintained in mappings/Veg+.terms.csv
3968	08/13/2012 10:49 AM	Aaron Marcuse-Kubitza	mappings/VegCSV-VegBIEN.specimens.csv: Removed Source column and source-related comments because this information is now maintained in mappings/Veg+.terms.csv
3967	08/13/2012 10:44 AM	Aaron Marcuse-Kubitza	Added mappings/Veg+.terms.csv, which will serve the purpose of listing all available terms with their source. This will remove the need to store the sources in the mappings, where they are out of place and difficult to maintain during refactoring.
3966	08/13/2012 10:37 AM	Aaron Marcuse-Kubitza	Added mappings/Veg+.terms.csv, which will serve the purpose of listing all available terms with their source. This will remove the need to store the sources in the mappings, where they are out of place and difficult to maintain during refactoring.

Project

General

Profile