/ - Changes - BIEN 3 - NCEAS Projects

root @ 4905

#	Date	Author	Comment
4905	09/20/2012 11:06 PM	Aaron Marcuse-Kubitza	inputs///map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
4904	09/20/2012 10:54 PM	Aaron Marcuse-Kubitza	inputs/SpeciesLink/Specimen/map.csv: Documented that dwc_geospatial_VerbatimLatitude/Longitude contain a mix of DMS and other verbatim coordinates
4903	09/20/2012 10:47 PM	Aaron Marcuse-Kubitza	inputs/QMOR/Specimen/map.csv: Remapped verbatimLatitude/verbatimLongitude to latitude_DMS/longitude_DMS since these fields contain DMS values
4902	09/20/2012 10:43 PM	Aaron Marcuse-Kubitza	inputs/Madidi/Plot/map.csv: Remapped Latitude/Longitude (DMS) to new latitude_DMS/longitude_DMS
4901	09/20/2012 10:41 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped latitude_DMS, longitude_DMS
4900	09/20/2012 10:38 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added latitude_DMS, longitude_DMS
4899	09/20/2012 10:34 PM	Aaron Marcuse-Kubitza	inputs/REMIB/Specimen/map.csv: Remapped lat_deg/long_deg to decimalLatitude/Longitude because these values are (integer) degrees suitable for decimalLatitude/Longitude. Note that the other DMS fields are not yet translated to decimal degrees.
4898	09/20/2012 10:28 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
4897	09/20/2012 10:26 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".
4896	09/20/2012 10:23 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Don't automatically regenerate the aggregated unmapped_terms.csv, new_terms.csv because this almost doubles the remake time when a mappings/ prerequisite changes (41s -> 75s)
4895	09/20/2012 10:14 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".
4894	09/20/2012 10:09 PM	Aaron Marcuse-Kubitza	inputs/GBIF/Specimen/map.csv: Remapped VerbatimLatitude/Longitude to decimalLatitude/Longitude because DecimalLatitude/Longitude just contains VerbatimLatitude/Longitude cast to a low-resolution float, which created spurious repeating decimals
4893	09/20/2012 09:56 PM	Aaron Marcuse-Kubitza	mappings/Makefile: .VegCore-VegBIEN.csv.last_cleanup: Generate VegCore-VegBIEN.unsourced_terms.csv whenever VegCore-VegBIEN.csv changes, to track VegCore terms that are mapped to VegBIEN but not documented in VegCore.csv. Note that this file is not svn:ignored, so it will show up with a ? when the user runs `svn st` if there are any unsourced terms.
4892	09/20/2012 09:47 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Changed catch-all `.%.last_cleanup: %` target to a specific target for VegCore-VegBIEN.csv, because it's the only file that uses this target
4891	09/20/2012 09:45 PM	Aaron Marcuse-Kubitza	mappings/: Don't generate a for_review version of Veg+-VegCore.csv, because it is identical to the machine-readable Veg+-VegCore.csv (there are no output XPaths to simplify)
4890	09/20/2012 09:41 PM	Aaron Marcuse-Kubitza	mappings/: Don't generate a for_review version of VegX-VegCore.csv, because it is identical to the machine-readable VegX-VegCore.csv (there are no output XPaths to simplify)
4889	09/20/2012 09:37 PM	Aaron Marcuse-Kubitza	mappings/: Removed Veg+.unmapped_terms.csv because these terms are found in each datasource's new_terms.csv, which are updated regularly, while this file isn't, and which exist for every datasource, while this file only contained terms from a few datasources
4888	09/20/2012 09:29 PM	Aaron Marcuse-Kubitza	inputs/ARIZ/Specimen/map.csv: Remapped VerbatimLatitude, VerbatimLongitude to UNUSED
4887	09/20/2012 09:21 PM	Aaron Marcuse-Kubitza	Regenerated root unmapped_terms.csv, new_terms.csv
4886	09/20/2012 09:19 PM	Aaron Marcuse-Kubitza	lib/mappings.Makefile: unmapped_terms.csv, new_terms.csv: Only remake if newer than existing %/unmapped_terms.csv, %/new_terms.csv which haven't been autoremoved. This avoids always remaking every unmapped_terms.csv, new_terms.csv whenever `make missing_mappings` is run. Note that these files will automatically be remade whenever their corresponding map.csv changes, so it is not necessary to actually remake %/unmapped_terms.csv, %/new_terms.csv; they are prerequisites only so that their modification time may be checked to determine whether unmapped_terms.csv, new_terms.csv needs to be remade.
4885	09/20/2012 09:11 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Automatically regenerate aggregated unmapped_terms.csv, new_terms.csv when a subdir's corresponding file changes
4884	09/20/2012 09:10 PM	Aaron Marcuse-Kubitza	inputs/: Regenerated aggregated unmapped_terms.csv, new_terms.csv
4883	09/20/2012 08:58 PM	Aaron Marcuse-Kubitza	inputs/REMIB/: Moved nodes.make into Specimen.src/ so it's with the data it generates
4882	09/20/2012 08:55 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Regenerated */new_terms.csv
4881	09/20/2012 08:30 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.
4880	09/20/2012 08:28 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.
4879	09/20/2012 06:58 PM	Aaron Marcuse-Kubitza	inputs/TEAM/VL, VT: Split concatenated flat files apart into separate parts each time a header is duplicated, so that the header would be autoremoved by cat_csv. Changed modified BIEN2 flat file headers back to original headers (the duplicated headers) so the headers of all part files would match up. (This is required for cat_csv header autoremoval to work properly.) This results in changes to the input column names in */map.csv.
4878	09/20/2012 06:49 PM	Aaron Marcuse-Kubitza	sql_io.py: null_strs: Added 'nulo' (used by REMIB)
4877	09/20/2012 06:13 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: DBH: Removed diameterBreastHeight_m alternative because datasources that don't append units to DBH almost always have units of cm or in
4876	09/20/2012 06:11 PM	Aaron Marcuse-Kubitza	inputs/TEAM/*/map.csv: Remapped dbh from diameterBreastHeight_m to diameterBreastHeight_cm, using the units defined in Vegetation-Metadata-1.4.pdf
4875	09/20/2012 06:05 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
4874	09/19/2012 11:16 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Added TeamPlotMetaData
4873	09/19/2012 11:09 PM	Aaron Marcuse-Kubitza	inputs/TEAM/_src/: Added ci-team_extract/Vegetation-Metadata-1.4.pdf and symlink to it in the _src subdir
4872	09/19/2012 10:51 PM	Aaron Marcuse-Kubitza	inputs/: Added aggregated unmapped_terms.csv, new_terms.csv which were not already under version control
4871	09/19/2012 10:41 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS-CSV/Organism/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension
4870	09/19/2012 10:36 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/stems/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for plotObservations.intercept_cm, which measures the same dimension
4869	09/19/2012 10:33 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/plotObservations/map.csv: Remapped temp_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension
4868	09/19/2012 10:25 PM	Aaron Marcuse-Kubitza	inputs/Madidi/Organism/map.csv: Remapped Diameter from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the range and precision of values
4867	09/19/2012 10:23 PM	Aaron Marcuse-Kubitza	inputs/FIA/Organism/map.csv: DBH: Changed units comment to include that assumption was also based on location inside the U.S., because some data outside the U.S. also uses fractional DBHs, but these are not likely to be inch measurements
4866	09/19/2012 10:19 PM	Aaron Marcuse-Kubitza	inputs/FIA/Organism/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_in, assuming units based on the range and precision of values
4865	09/19/2012 10:16 PM	Aaron Marcuse-Kubitza	inputs/CTFS/StemObservation/map.csv: DBH: Changed units comment to include that assumption was also based on the precision of values, because fractional DBHs sometimes indicate units of inches
4864	09/19/2012 10:13 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added diameterBreastHeight_in
4863	09/19/2012 10:09 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: Added _in_to_m()
4862	09/19/2012 10:00 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Remapped DBH from no longer existing term diameterBreastHeight to diameterBreastHeight_cm, diameterBreastHeight_m (both terms will be listed in the map spreadsheet after automapping, and the user can then choose one)
4861	09/19/2012 09:57 PM	Aaron Marcuse-Kubitza	inputs/CTFS/StemObservation/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units are cm based on the range of values
4860	09/19/2012 09:56 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added diameterBreastHeight_cm
4859	09/19/2012 09:41 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added stemID, which was only in mappings/VegCore-VegBIEN.csv
4858	09/19/2012 09:35 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: Inline $(unmappedTerms) because it's only used once
4857	09/19/2012 09:31 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.
4856	09/19/2012 09:19 PM	Aaron Marcuse-Kubitza	inputs/: Added unmapped_terms.csv, new_terms.csv which were not already under version control
4855	09/19/2012 08:43 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plot_/: Automapped with new parentPlotID term, which now has a join mapping in mappings/VegCore-VegBIEN.csv
4854	09/19/2012 08:41 PM	Aaron Marcuse-Kubitza	Regenerated unmapped_terms.csv, new_terms.csv
4853	09/19/2012 08:24 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Added parentPlotID
4852	09/19/2012 08:22 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Added parentLocationID, parentPlotName, which always map directly to the parent location, regardless of whether any subplot ID is present
4851	09/19/2012 08:16 PM	Aaron Marcuse-Kubitza	mappings/Veg+.unmapped_terms.csv: Removed vague term volumeCanopy, which has no definition in VegX
4850	09/19/2012 08:14 PM	Aaron Marcuse-Kubitza	mappings/Makefile: .VegCore.csv.last_cleanup: Fixed bug where needed to change sorting columns to match new column order
4849	09/19/2012 08:11 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Reordered columns to put Comments first, which matches mappings/Veg+-VegCore.csv
4848	09/19/2012 08:08 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Removed redundant stem_id->stemID mapping
4847	09/19/2012 08:07 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Standardized the capitalization of names, by camel-casing each name except for acronyms and "ID", which are made all uppercase
4846	09/19/2012 07:59 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Renamed diameterBreastHeight to diameterBreastHeight_m to assert units matching the VegBIEN field
4845	09/19/2012 07:44 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Removed duplicates
4844	09/19/2012 07:22 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: Use new mappings/VegCore.csv as the VegCore vocabulary to canonicalize on, in order to also canonicalize VegCore terms which are not yet mapped to VegBIEN. This results in several DwC terms getting their case standardized according to http://rs.tdwg.org/dwc/terms/. Continue to determine unmapped terms using mappings/VegCore-VegBIEN.csv, because a term should not be considered mapped until it has been mapped all the way through to VegBIEN.
4843	09/19/2012 07:12 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Removed trailing spaces from terms
4842	09/19/2012 07:05 PM	Aaron Marcuse-Kubitza	mappings/Veg+.unmapped_terms.csv: Removed duplicates of VegCore terms
4841	09/19/2012 07:02 PM	Aaron Marcuse-Kubitza	mappings/: Split Veg+.terms.csv into VegCore.csv and Veg+.unmapped_terms.csv
4840	09/19/2012 06:36 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Removed terms that are in mappings/Veg+-VegCore.csv
4839	09/19/2012 06:31 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Added sources where missing
4838	09/19/2012 06:20 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Added Source and Comments columns from mappings/Veg+.terms.csv. Reordered columns to put Comments first.
4837	09/19/2012 06:17 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Removed duplicate entries for stem_id/stemID, collector
4836	09/19/2012 05:56 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
4835	09/19/2012 05:24 PM	Aaron Marcuse-Kubitza	inputs/REMIB/Specimen/: Filter out invalid, frameshifted rows so they don't produce errors in the import or anomalies like thousands of taxondeterminations for one taxonoccurrence. This involves moving the CSVs to Specimen.src and using a create.sql to create the filtered table.
4834	09/19/2012 04:47 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Forward occurrenceID to taxonoccurrence.sourceaccessioncode when there is no other taxonoccurrence.sourceaccessioncode, to ensure that taxonoccurrence is uniquely identified so that there is one taxonoccurrence per organism
4833	09/19/2012 04:16 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.)
4832	09/19/2012 04:07 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.
4831	09/19/2012 03:36 PM	Aaron Marcuse-Kubitza	input.Makefile: Testing: %/test.by_col.xml: Do abort tester if by-column test fails. There are no longer small rowcount differences between row-based and column-based import on some datasources, so this is now possible.
4830	09/18/2012 11:13 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Added tag so that a stemobservation can be scoped by its tag when no other ID is specified
4829	09/18/2012 11:11 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Fixed bug where filter condition underconstrained stemobservation when neither sourceaccessioncode nor authorstemcode was specified, by making sure that at least one *_unique index always applies
4828	09/18/2012 11:08 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Remapped tag to new stemobservation.tag
4827	09/18/2012 11:06 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: stemobservation: Added tag, tags
4826	09/18/2012 10:53 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: tag: Removed no longer applicable comment
4825	09/18/2012 10:49 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed no longer used previousTag and the complex mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. tag: Removed iscurrent=true because there is now only one tag field.
4824	09/18/2012 10:41 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/*/map.csv: Remapped all versions of stem and tree tags to tag, with the second tag superceding the first, to avoid the complex VegCore-VegBIEN mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. inputs/SALVIAS-CSV/Organism/map.csv: stem and tree tags: Made the stem tag supercede the tree tag instead of vice versa, to have as specific of a tag as possible.
4823	09/18/2012 10:30 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/stems/map.csv: Copied Brad's comments on plotObservations.tag1, tag2 to stem_tag1, stem_tag2
4822	09/18/2012 10:18 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed _rangeStart and _rangeEnd filters from fields which should contain decimal values. These filters should be added on a per-datasource basis instead.
4821	09/18/2012 10:12 PM	Aaron Marcuse-Kubitza	inputs/ARIZ/Specimen/map.csv: Documented that MinimumElevationInMeters, MinimumElevationInMeters contain some verbatim values, including ranges and units
4820	09/18/2012 10:09 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed /_units:[default=m,to=m,to=]/value filter from fields. It should be added on a per-datasource basis instead.
4819	09/18/2012 10:05 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed /_replace:["\bca\.?"=]/value filter from fields. It should be added on a per-datasource basis instead.
4818	09/18/2012 09:36 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: verbatimElevation->elevation_m mapping: Translate units automatically (currently only works in row-based mode). Don't remove any "ca." prefix because this is a datasource-specific filter that does not apply to current datasources with verbatimElevation. Also map verbatimElevation to location.verbatimelevation.
4817	09/18/2012 09:21 PM	Aaron Marcuse-Kubitza	inputs/NCU-NCSC/Specimen/map.csv: Elevation: Removed comment that it includes units, because this is now part of the definition of verbatimElevation
4816	09/18/2012 09:20 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Documented that verbatimElevation must include units
4815	09/18/2012 09:14 PM	Aaron Marcuse-Kubitza	inputs/ARIZ/Specimen/map.csv: Remapped VerbatimElevation to UNUSED
4814	09/18/2012 09:11 PM	Aaron Marcuse-Kubitza	inputs///map.csv: Remapped all unused terms to special value UNUSED. Remapped all private terms to special value PRIVATE. Remapped all deliberately unmapped terms to special value OMIT.
4813	09/18/2012 08:53 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Remapped realLatitude, realLongitude to new special value PRIVATE, which is more specific than OMIT
4812	09/18/2012 08:51 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added special value PRIVATE
4811	09/18/2012 08:44 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added special values OMIT, UNUSED
4810	09/18/2012 08:20 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plot_/map.csv: Remapped elevation from verbatimElevation to elevationInMeters, since the values are all decimals. The units come from the data dictionary.
4809	09/18/2012 08:14 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Remapped elev_m from verbatimElevation to elevationInMeters, since the values are all decimals. Note that the units of SALVIAS Elev were provided by a comment from Brad (and can also be assumed to be the same as SALVIAS-CSV elev_m).
4808	09/18/2012 08:02 PM	Aaron Marcuse-Kubitza	inputs/NCU-NCSC/Specimen/map.csv: Documented that Elevation includes units
4807	09/18/2012 07:50 PM	Aaron Marcuse-Kubitza	inputs/Madidi/Plot/map.csv: Remapped Minimum altitude from minimumElevationInMeters to verbatimElevation_m, since it is a range, not a minimum. Note that the units are assumed based on the range of values present and the region the data is from (Madidi National Park).
4806	09/18/2012 07:46 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Also mapped verbatimElevation_m to verbatimelevation

Project

General

Profile