/ - Changes - BIEN 3 - NCEAS Projects

root @ 4929

#	Date	Author	Comment
4929	09/21/2012 04:17 PM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added stemcount/
4928	09/21/2012 04:10 PM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Fixed bug where couldn't run any update statement when no columns are text
4927	09/21/2012 03:57 PM	Aaron Marcuse-Kubitza	csv2db: COPY FROM mode: Removed no longer needed explicit column list, now that the initial table has the exact width of the CSV (the row_num is added later)
4926	09/21/2012 03:55 PM	Aaron Marcuse-Kubitza	csv2db: Add any row_num column after creating the table, so it does not interfere with row widths when using COPY FROM without explicit column names
4925	09/21/2012 03:48 PM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
4924	09/21/2012 03:44 PM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
4923	09/21/2012 03:28 PM	Aaron Marcuse-Kubitza	inputs/VegBank/: Added taxonimportance/
4922	09/21/2012 03:20 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added and mapped aggregateOccurrenceID
4921	09/21/2012 03:12 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: taxonOccurrenceID: Re-sourced to VegBank taxonobservation and DwC occurrenceID, because this is where the VegBIEN table name came from
4920	09/21/2012 02:57 PM	Aaron Marcuse-Kubitza	tnrs_client: Support parsing multiple taxons at once, by specifying each as a command-line argument. Increased the max_pause to 10 min to support large batches. Limited the batch size to 5000 names, using the limit at <http://tnrs.iplantcollaborative.org/TNRSapp.html>. Note that when using xargs to pass many names, xargs will by default split its arguments into chunks of 5000. You can change this using the -n option.
4919	09/21/2012 02:29 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
4918	09/21/2012 01:20 PM	Aaron Marcuse-Kubitza	Added tnrs_client. Note that obtaining an actual CSV requires four (!) steps: submit, retrieve, prepare download, and download. The output of the retrieve step is unusable because the array has different lengths depending on the taxonomic ranks present in the provided taxon name. This initial version runs one name at a time, but could later be expanded to batch process because TNRS can run multiple names at once.
4917	09/21/2012 12:36 PM	Aaron Marcuse-Kubitza	streams.py: Line iteration: Added read_all()
4916	09/21/2012 08:24 AM	Aaron Marcuse-Kubitza	inputs/Madidi/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
4915	09/21/2012 08:18 AM	Aaron Marcuse-Kubitza	sql_io.py: null_strs: Added '-'
4914	09/21/2012 08:18 AM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Fixed bug where each column name needed to be converted to Unicode before being concatenated with other strings, to support non-ASCII characters
4913	09/21/2012 07:57 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
4912	09/21/2012 07:52 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Removed no longer needed old-style _units filter, now that unit conversion is handled by mappings/VegCore-VegBIEN.csv using _percent_to_fraction
4911	09/21/2012 07:48 AM	Aaron Marcuse-Kubitza	inputs/VegBank/observation_/map.csv: soilObs fields: Cited data dictionary source of units
4910	09/21/2012 07:15 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Soil component measurements: Added unitless terms that automap to all alternatives of units
4909	09/21/2012 07:08 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added term with _fraction units for every _percent term
4908	09/21/2012 07:03 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Soil component measurements: Added default units of percent (cmol_kg for cationExchangeCapacity). This involves translating the names everywhere and adding a _percent_to_fraction conversion in mappings/VegCore-VegBIEN.csv.
4907	09/20/2012 11:15 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Remapped verbatimLatitude/Longitude to locationcoords.verbatimlatitude/longitude because these fields now contain only non-decimal coordinates. This involves removing the _alt suffix on decimalLatitude/Longitude, which causes the VegBIEN.csvs to change.
4906	09/20/2012 11:11 PM	Aaron Marcuse-Kubitza	inputs///map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
4905	09/20/2012 11:06 PM	Aaron Marcuse-Kubitza	inputs///map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
4904	09/20/2012 10:54 PM	Aaron Marcuse-Kubitza	inputs/SpeciesLink/Specimen/map.csv: Documented that dwc_geospatial_VerbatimLatitude/Longitude contain a mix of DMS and other verbatim coordinates
4903	09/20/2012 10:47 PM	Aaron Marcuse-Kubitza	inputs/QMOR/Specimen/map.csv: Remapped verbatimLatitude/verbatimLongitude to latitude_DMS/longitude_DMS since these fields contain DMS values
4902	09/20/2012 10:43 PM	Aaron Marcuse-Kubitza	inputs/Madidi/Plot/map.csv: Remapped Latitude/Longitude (DMS) to new latitude_DMS/longitude_DMS
4901	09/20/2012 10:41 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped latitude_DMS, longitude_DMS
4900	09/20/2012 10:38 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added latitude_DMS, longitude_DMS
4899	09/20/2012 10:34 PM	Aaron Marcuse-Kubitza	inputs/REMIB/Specimen/map.csv: Remapped lat_deg/long_deg to decimalLatitude/Longitude because these values are (integer) degrees suitable for decimalLatitude/Longitude. Note that the other DMS fields are not yet translated to decimal degrees.
4898	09/20/2012 10:28 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
4897	09/20/2012 10:26 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".
4896	09/20/2012 10:23 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Don't automatically regenerate the aggregated unmapped_terms.csv, new_terms.csv because this almost doubles the remake time when a mappings/ prerequisite changes (41s -> 75s)
4895	09/20/2012 10:14 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".
4894	09/20/2012 10:09 PM	Aaron Marcuse-Kubitza	inputs/GBIF/Specimen/map.csv: Remapped VerbatimLatitude/Longitude to decimalLatitude/Longitude because DecimalLatitude/Longitude just contains VerbatimLatitude/Longitude cast to a low-resolution float, which created spurious repeating decimals
4893	09/20/2012 09:56 PM	Aaron Marcuse-Kubitza	mappings/Makefile: .VegCore-VegBIEN.csv.last_cleanup: Generate VegCore-VegBIEN.unsourced_terms.csv whenever VegCore-VegBIEN.csv changes, to track VegCore terms that are mapped to VegBIEN but not documented in VegCore.csv. Note that this file is not svn:ignored, so it will show up with a ? when the user runs `svn st` if there are any unsourced terms.
4892	09/20/2012 09:47 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Changed catch-all `.%.last_cleanup: %` target to a specific target for VegCore-VegBIEN.csv, because it's the only file that uses this target
4891	09/20/2012 09:45 PM	Aaron Marcuse-Kubitza	mappings/: Don't generate a for_review version of Veg+-VegCore.csv, because it is identical to the machine-readable Veg+-VegCore.csv (there are no output XPaths to simplify)
4890	09/20/2012 09:41 PM	Aaron Marcuse-Kubitza	mappings/: Don't generate a for_review version of VegX-VegCore.csv, because it is identical to the machine-readable VegX-VegCore.csv (there are no output XPaths to simplify)
4889	09/20/2012 09:37 PM	Aaron Marcuse-Kubitza	mappings/: Removed Veg+.unmapped_terms.csv because these terms are found in each datasource's new_terms.csv, which are updated regularly, while this file isn't, and which exist for every datasource, while this file only contained terms from a few datasources
4888	09/20/2012 09:29 PM	Aaron Marcuse-Kubitza	inputs/ARIZ/Specimen/map.csv: Remapped VerbatimLatitude, VerbatimLongitude to UNUSED
4887	09/20/2012 09:21 PM	Aaron Marcuse-Kubitza	Regenerated root unmapped_terms.csv, new_terms.csv
4886	09/20/2012 09:19 PM	Aaron Marcuse-Kubitza	lib/mappings.Makefile: unmapped_terms.csv, new_terms.csv: Only remake if newer than existing %/unmapped_terms.csv, %/new_terms.csv which haven't been autoremoved. This avoids always remaking every unmapped_terms.csv, new_terms.csv whenever `make missing_mappings` is run. Note that these files will automatically be remade whenever their corresponding map.csv changes, so it is not necessary to actually remake %/unmapped_terms.csv, %/new_terms.csv; they are prerequisites only so that their modification time may be checked to determine whether unmapped_terms.csv, new_terms.csv needs to be remade.
4885	09/20/2012 09:11 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Automatically regenerate aggregated unmapped_terms.csv, new_terms.csv when a subdir's corresponding file changes
4884	09/20/2012 09:10 PM	Aaron Marcuse-Kubitza	inputs/: Regenerated aggregated unmapped_terms.csv, new_terms.csv
4883	09/20/2012 08:58 PM	Aaron Marcuse-Kubitza	inputs/REMIB/: Moved nodes.make into Specimen.src/ so it's with the data it generates
4882	09/20/2012 08:55 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Regenerated */new_terms.csv
4881	09/20/2012 08:30 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.
4880	09/20/2012 08:28 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.
4879	09/20/2012 06:58 PM	Aaron Marcuse-Kubitza	inputs/TEAM/VL, VT: Split concatenated flat files apart into separate parts each time a header is duplicated, so that the header would be autoremoved by cat_csv. Changed modified BIEN2 flat file headers back to original headers (the duplicated headers) so the headers of all part files would match up. (This is required for cat_csv header autoremoval to work properly.) This results in changes to the input column names in */map.csv.
4878	09/20/2012 06:49 PM	Aaron Marcuse-Kubitza	sql_io.py: null_strs: Added 'nulo' (used by REMIB)
4877	09/20/2012 06:13 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: DBH: Removed diameterBreastHeight_m alternative because datasources that don't append units to DBH almost always have units of cm or in
4876	09/20/2012 06:11 PM	Aaron Marcuse-Kubitza	inputs/TEAM/*/map.csv: Remapped dbh from diameterBreastHeight_m to diameterBreastHeight_cm, using the units defined in Vegetation-Metadata-1.4.pdf
4875	09/20/2012 06:05 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
4874	09/19/2012 11:16 PM	Aaron Marcuse-Kubitza	inputs/TEAM/: Added TeamPlotMetaData
4873	09/19/2012 11:09 PM	Aaron Marcuse-Kubitza	inputs/TEAM/_src/: Added ci-team_extract/Vegetation-Metadata-1.4.pdf and symlink to it in the _src subdir
4872	09/19/2012 10:51 PM	Aaron Marcuse-Kubitza	inputs/: Added aggregated unmapped_terms.csv, new_terms.csv which were not already under version control
4871	09/19/2012 10:41 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS-CSV/Organism/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension
4870	09/19/2012 10:36 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/stems/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for plotObservations.intercept_cm, which measures the same dimension
4869	09/19/2012 10:33 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/plotObservations/map.csv: Remapped temp_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension
4868	09/19/2012 10:25 PM	Aaron Marcuse-Kubitza	inputs/Madidi/Organism/map.csv: Remapped Diameter from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the range and precision of values
4867	09/19/2012 10:23 PM	Aaron Marcuse-Kubitza	inputs/FIA/Organism/map.csv: DBH: Changed units comment to include that assumption was also based on location inside the U.S., because some data outside the U.S. also uses fractional DBHs, but these are not likely to be inch measurements
4866	09/19/2012 10:19 PM	Aaron Marcuse-Kubitza	inputs/FIA/Organism/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_in, assuming units based on the range and precision of values
4865	09/19/2012 10:16 PM	Aaron Marcuse-Kubitza	inputs/CTFS/StemObservation/map.csv: DBH: Changed units comment to include that assumption was also based on the precision of values, because fractional DBHs sometimes indicate units of inches
4864	09/19/2012 10:13 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added diameterBreastHeight_in
4863	09/19/2012 10:09 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: Added _in_to_m()
4862	09/19/2012 10:00 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Remapped DBH from no longer existing term diameterBreastHeight to diameterBreastHeight_cm, diameterBreastHeight_m (both terms will be listed in the map spreadsheet after automapping, and the user can then choose one)
4861	09/19/2012 09:57 PM	Aaron Marcuse-Kubitza	inputs/CTFS/StemObservation/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units are cm based on the range of values
4860	09/19/2012 09:56 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added diameterBreastHeight_cm
4859	09/19/2012 09:41 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Added stemID, which was only in mappings/VegCore-VegBIEN.csv
4858	09/19/2012 09:35 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: Inline $(unmappedTerms) because it's only used once
4857	09/19/2012 09:31 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.
4856	09/19/2012 09:19 PM	Aaron Marcuse-Kubitza	inputs/: Added unmapped_terms.csv, new_terms.csv which were not already under version control
4855	09/19/2012 08:43 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plot_/: Automapped with new parentPlotID term, which now has a join mapping in mappings/VegCore-VegBIEN.csv
4854	09/19/2012 08:41 PM	Aaron Marcuse-Kubitza	Regenerated unmapped_terms.csv, new_terms.csv
4853	09/19/2012 08:24 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Added parentPlotID
4852	09/19/2012 08:22 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Added parentLocationID, parentPlotName, which always map directly to the parent location, regardless of whether any subplot ID is present
4851	09/19/2012 08:16 PM	Aaron Marcuse-Kubitza	mappings/Veg+.unmapped_terms.csv: Removed vague term volumeCanopy, which has no definition in VegX
4850	09/19/2012 08:14 PM	Aaron Marcuse-Kubitza	mappings/Makefile: .VegCore.csv.last_cleanup: Fixed bug where needed to change sorting columns to match new column order
4849	09/19/2012 08:11 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Reordered columns to put Comments first, which matches mappings/Veg+-VegCore.csv
4848	09/19/2012 08:08 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Removed redundant stem_id->stemID mapping
4847	09/19/2012 08:07 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Standardized the capitalization of names, by camel-casing each name except for acronyms and "ID", which are made all uppercase
4846	09/19/2012 07:59 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Renamed diameterBreastHeight to diameterBreastHeight_m to assert units matching the VegBIEN field
4845	09/19/2012 07:44 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Removed duplicates
4844	09/19/2012 07:22 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: Use new mappings/VegCore.csv as the VegCore vocabulary to canonicalize on, in order to also canonicalize VegCore terms which are not yet mapped to VegBIEN. This results in several DwC terms getting their case standardized according to http://rs.tdwg.org/dwc/terms/. Continue to determine unmapped terms using mappings/VegCore-VegBIEN.csv, because a term should not be considered mapped until it has been mapped all the way through to VegBIEN.
4843	09/19/2012 07:12 PM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Removed trailing spaces from terms
4842	09/19/2012 07:05 PM	Aaron Marcuse-Kubitza	mappings/Veg+.unmapped_terms.csv: Removed duplicates of VegCore terms
4841	09/19/2012 07:02 PM	Aaron Marcuse-Kubitza	mappings/: Split Veg+.terms.csv into VegCore.csv and Veg+.unmapped_terms.csv
4840	09/19/2012 06:36 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Removed terms that are in mappings/Veg+-VegCore.csv
4839	09/19/2012 06:31 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Added sources where missing
4838	09/19/2012 06:20 PM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Added Source and Comments columns from mappings/Veg+.terms.csv. Reordered columns to put Comments first.
4837	09/19/2012 06:17 PM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Removed duplicate entries for stem_id/stemID, collector
4836	09/19/2012 05:56 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
4835	09/19/2012 05:24 PM	Aaron Marcuse-Kubitza	inputs/REMIB/Specimen/: Filter out invalid, frameshifted rows so they don't produce errors in the import or anomalies like thousands of taxondeterminations for one taxonoccurrence. This involves moving the CSVs to Specimen.src and using a create.sql to create the filtered table.
4834	09/19/2012 04:47 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Forward occurrenceID to taxonoccurrence.sourceaccessioncode when there is no other taxonoccurrence.sourceaccessioncode, to ensure that taxonoccurrence is uniquely identified so that there is one taxonoccurrence per organism
4833	09/19/2012 04:16 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.)
4832	09/19/2012 04:07 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.
4831	09/19/2012 03:36 PM	Aaron Marcuse-Kubitza	input.Makefile: Testing: %/test.by_col.xml: Do abort tester if by-column test fails. There are no longer small rowcount differences between row-based and column-based import on some datasources, so this is now possible.
4830	09/18/2012 11:13 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Added tag so that a stemobservation can be scoped by its tag when no other ID is specified

Project

General

Profile