Project

General

Profile

Statistics
| Revision:
  • svn:ignore: *

# Date Author Comment
11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11396 10/21/2013 07:14 PM Aaron Marcuse-Kubitza

fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11249 10/10/2013 06:50 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/observation_/header.csv, map.csv: updated for refresh, which inserts hasobservationsynonym at the end of the observation table

11231 10/10/2013 07:54 AM Aaron Marcuse-Kubitza

inputs/VegBank/observation_/postprocess.sql: added __parent index on locationID to facilitate the LEFT JOINs used to create the validation input

11107 09/29/2013 08:58 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.

11013 09/19/2013 02:55 AM Aaron Marcuse-Kubitza

inputs//: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead

10944 09/12/2013 06:43 PM Aaron Marcuse-Kubitza

inputs/VegBank/: prepended the table name to each column name to prevent column collisions, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource

10943 09/12/2013 06:17 PM Aaron Marcuse-Kubitza

inputs/VegBank/: switched to new-style import, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource

10866 09/04/2013 11:06 PM Aaron Marcuse-Kubitza

inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix

10697 08/20/2013 08:49 PM Aaron Marcuse-Kubitza

inputs/VegBank/observation_/test.xml.ref: updated inserted row count

10674 08/18/2013 09:12 PM Aaron Marcuse-Kubitza

inputs/VegBank/observation_/postprocess.sql: added pkey

10627 08/08/2013 01:26 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/observation_/create.sql: ensure only one row per observation by selecting the first soilobs for each observation

10168 07/06/2013 02:22 PM Aaron Marcuse-Kubitza

inputs/*/*/logs: updated svn:ignore

8206 03/27/2013 08:23 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: subplotID,subplot -> location.sourceaccessioncode: Fixed bug where need /_first to handle the case where both subplotID and subplot are provided

8176 03/25/2013 09:01 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)

7469 02/05/2013 04:32 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv, inputs/*/*/map.csv: Applied term renamings from the new dynamically generated Veg+-VegCore.csv, which reflects the current state of the data dictionary. (Permanently switching to the new Veg+-VegCore.csv will be a separate change.) Updates to VegCore term names that have occurred since the data dictionary was created are now able to take effect, which involves remapping and inferring units on several fields.

7464 02/05/2013 03:40 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.

7463 02/05/2013 03:38 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Regenerated from wiki

7404 01/31/2013 04:01 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Regenerated from wiki

7215 01/14/2013 01:18 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: fieldNumber (authorEventCode): Fixed bug where locationevent.authorlocationcode should be authoreventcode

7009 12/21/2012 12:07 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: locationID/locationName + subplot -> location.sourceaccessioncode mapping: Fixed bug where subplot was incorrectly being mapped to this field even when there was no location*. (This field can only be populated if both location* and subplot are specified.) Also only map locationID for this, to avoid inconsistencies where one table supplies locationID+subplot, while another table supplies locationName+subplot, but they both get mapped to the same field, preventing plots from being matched up with their observations when creating the analytical_stem.

6992 12/20/2012 02:26 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: authortaxoncode mappings: Only use authorTaxonCode if there is no plant ID, because an individual plant gets its own taxonoccurrence and thus needs the taxonoccurrence's IDs to be unique to the plant, regardless of what the author designates as the taxonoccurrence code

6989 12/20/2012 01:23 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped authorTaxonCode

6482 11/28/2012 05:52 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed soilobs to soilsample per working group discussion

6406 11/24/2012 07:50 AM Aaron Marcuse-Kubitza

db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names

6403 11/24/2012 07:29 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()

6294 11/19/2012 04:09 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped acceptedCounty, county to the matched place

6217 11/15/2012 08:26 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Removed _date/date, because _date using a string date argument is no longer supported under plpython3u (dateutil is missing). Note that PostgreSQL's own date parsing is sufficient for most dates, so this use of _date is not strictly necessary and removing it will improve import times.

6002 11/05/2012 08:48 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: subplot locationevent: Only populate parent locationevent's location unique IDs if a subplot #/subplotID is actually specified. (The lack of a location unique ID will cause the parent locationevent's location to be removed, as well as the parent locationevent itself if there is no parent locationevent unique ID.) This fixes a bug where top-level plots in datasources that provide a nullable subplot #/subplotID were incorrectly getting connected to parent locationevents.

5977 11/02/2012 05:18 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: subplots: Also complete the locationevent/location diamond (subplot event -> {subplot location, parent plot event} -> parent plot location) when an eventDate or range is specified, as this is also an identifying field for locationevent. This fixes a bug where subplots data without explicit plot events (such as SALVIAS and TEAM) was not being connected to the appropriate parent plot event as well as parent plot location. This should fix the SALVIAS verification # location events, which should include only parent plots' locationevents to correspond with # locations, which only includes parent plots' locations, and uses locationevent.parent_id being NULL to determine what is a parent plot event.

5905 11/01/2012 01:54 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided

5773 10/25/2012 10:36 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: location: Populate sourceaccessioncode with locationID + subplot when subplot is unique only within the parent plot, so that location always has a sourceaccessioncode to use as the plotCode in analytical_db_view

5176 10/02/2012 11:37 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode: Only populate if needed to distinguish the taxonoccurrence within a plot

5031 09/27/2012 12:33 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)

4987 09/25/2012 07:22 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Removed unnecessary /_first/# suffix for multiple terms in the same _exists expression, because _exists() only checks whether its node is non-empty, and it does not matter how many child nodes it contains

4979 09/25/2012 04:52 PM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.

4911 09/21/2012 07:48 AM Aaron Marcuse-Kubitza

inputs/VegBank/observation_/map.csv: soilObs fields: Cited data dictionary source of units

4908 09/21/2012 07:03 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Soil component measurements: Added default units of percent (cmol_kg for cationExchangeCapacity). This involves translating the names everywhere and adding a _percent_to_fraction conversion in mappings/VegCore-VegBIEN.csv.

4857 09/19/2012 09:31 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.

4833 09/19/2012 04:16 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.)

4832 09/19/2012 04:07 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.

4786 09/18/2012 03:58 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Changed _frac units suffix to _fraction for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere

4754 09/17/2012 02:29 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added units suffix to additional VegBIEN fields that have units

4679 09/14/2012 05:59 PM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Changed output column header from Veg+ to VegCore because the names will be VegCore names after automapping. This is possible now that we're using new automapping scripts that do not require a particular column header.

4663 09/12/2012 05:13 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: $(newTerms): Fixed bug where header needed to be removed before running filter_out_ci because filter_out_ci only removes the header if it matches the vocabulary's header. Removing the header afterward can cause the first row to be removed instead if the header was already removed.

4656 09/12/2012 03:37 PM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Added Filter column to contain any suffix added after the term, so that the automapping mechanism does not have to deal with the filter expressions

4651 09/12/2012 02:18 PM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Removed no longer needed [Veg+] suffix in root, because the input column is no longer used by old-style map utilities such as union that needed this

4648 09/12/2012 01:57 PM Aaron Marcuse-Kubitza

filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.

4645 09/12/2012 01:30 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Removed no longer used %/src.csv, because it is no longer needed to generate map.full.csv from map.csv

4642 09/12/2012 01:02 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Removed no longer used %/map.full.csv

4640 09/12/2012 12:56 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/map.full.csv: Generate by copying map.csv, because the content of these files now differs only in the sort order of the names

4638 09/12/2012 12:43 PM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings&gt;. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.

4636 09/12/2012 12:14 PM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Added back automapped mappings to map.csv, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>

4621 09/12/2012 07:56 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Added /_simplifyPath:[next=parent_id]/path to root so the returned subplot location will be its parent location if there is no subplot name or ID (indicating that that particular plot did not have subplots). Note that this also causes the parent_id forwarding effect to occur for all other tables containing parent_id, which will help prevent similar issues with subplot events, etc. This will hopefully fix the SALVIAS.plotObservations bug where some organisms did not have a subplot #, causing the subplot location to become NULL and causing the corresponding locationevent rows not to match the locationevent_unique_within_location index filter condition (which requires a parent_id), which caused multiple output table pkeys to be returned for those rows, violating the locationevent_pkeys temp table's primary key.

4617 09/11/2012 11:01 AM Aaron Marcuse-Kubitza

Regenerated/modified inputs/*/*/src.csv to use the self-mapping format used by the new automapping mechanism

4596 09/11/2012 08:22 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Remove the CSV header from the terms lists so that multiple terms lists can easily be appended together

4594 09/11/2012 08:09 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: Generate reports on new and unmapped terms in map.csv

4563 09/11/2012 01:23 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: startDate, endDate mappings: Removed _dateRangeStart/_dateRangeEnd filters because these are assumed to already be start and end dates of a range. (eventDate should be used for concatenated date ranges.)

4517 09/07/2012 10:43 AM Aaron Marcuse-Kubitza

inputs/VegBank/: Added observation_/