Project

General

Profile

Statistics
| Revision:

# Date Author Comment
4924 09/21/2012 03:44 PM Aaron Marcuse-Kubitza

csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.

4923 09/21/2012 03:28 PM Aaron Marcuse-Kubitza

inputs/VegBank/: Added taxonimportance/

4922 09/21/2012 03:20 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added and mapped aggregateOccurrenceID

4921 09/21/2012 03:12 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: taxonOccurrenceID: Re-sourced to VegBank taxonobservation and DwC occurrenceID, because this is where the VegBIEN table name came from

4920 09/21/2012 02:57 PM Aaron Marcuse-Kubitza

tnrs_client: Support parsing multiple taxons at once, by specifying each as a command-line argument. Increased the max_pause to 10 min to support large batches. Limited the batch size to 5000 names, using the limit at <http://tnrs.iplantcollaborative.org/TNRSapp.html&gt;. Note that when using xargs to pass many names, xargs will by default split its arguments into chunks of 5000. You can change this using the -n option.

4919 09/21/2012 02:29 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

4918 09/21/2012 01:20 PM Aaron Marcuse-Kubitza

Added tnrs_client. Note that obtaining an actual CSV requires four (!) steps: submit, retrieve, prepare download, and download. The output of the retrieve step is unusable because the array has different lengths depending on the taxonomic ranks present in the provided taxon name. This initial version runs one name at a time, but could later be expanded to batch process because TNRS can run multiple names at once.

4917 09/21/2012 12:36 PM Aaron Marcuse-Kubitza

streams.py: Line iteration: Added read_all()

4916 09/21/2012 08:24 AM Aaron Marcuse-Kubitza

inputs/Madidi/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values

4915 09/21/2012 08:18 AM Aaron Marcuse-Kubitza

sql_io.py: null_strs: Added '-'

4914 09/21/2012 08:18 AM Aaron Marcuse-Kubitza

sql_io.py: cleanup_table(): Fixed bug where each column name needed to be converted to Unicode before being concatenated with other strings, to support non-ASCII characters

4913 09/21/2012 07:57 AM Aaron Marcuse-Kubitza

inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values

4912 09/21/2012 07:52 AM Aaron Marcuse-Kubitza

inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Removed no longer needed old-style _units filter, now that unit conversion is handled by mappings/VegCore-VegBIEN.csv using _percent_to_fraction

4911 09/21/2012 07:48 AM Aaron Marcuse-Kubitza

inputs/VegBank/observation_/map.csv: soilObs fields: Cited data dictionary source of units

4910 09/21/2012 07:15 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Soil component measurements: Added unitless terms that automap to all alternatives of units

4909 09/21/2012 07:08 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added term with *_fraction units for every *_percent term

4908 09/21/2012 07:03 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Soil component measurements: Added default units of percent (cmol_kg for cationExchangeCapacity). This involves translating the names everywhere and adding a _percent_to_fraction conversion in mappings/VegCore-VegBIEN.csv.

4907 09/20/2012 11:15 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Remapped verbatimLatitude/Longitude to locationcoords.verbatimlatitude/longitude because these fields now contain only non-decimal coordinates. This involves removing the _alt suffix on decimalLatitude/Longitude, which causes the VegBIEN.csvs to change.

4906 09/20/2012 11:11 PM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees

4905 09/20/2012 11:06 PM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees

4904 09/20/2012 10:54 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink/Specimen/map.csv: Documented that dwc_geospatial_VerbatimLatitude/Longitude contain a mix of DMS and other verbatim coordinates

4903 09/20/2012 10:47 PM Aaron Marcuse-Kubitza

inputs/QMOR/Specimen/map.csv: Remapped verbatimLatitude/verbatimLongitude to latitude_DMS/longitude_DMS since these fields contain DMS values

4902 09/20/2012 10:43 PM Aaron Marcuse-Kubitza

inputs/Madidi/Plot/map.csv: Remapped Latitude/Longitude (DMS) to new latitude_DMS/longitude_DMS

4901 09/20/2012 10:41 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped latitude_DMS, longitude_DMS

4900 09/20/2012 10:38 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added latitude_DMS, longitude_DMS

4899 09/20/2012 10:34 PM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/map.csv: Remapped lat_deg/long_deg to decimalLatitude/Longitude because these values are (integer) degrees suitable for decimalLatitude/Longitude. Note that the other DMS fields are not yet translated to decimal degrees.

4898 09/20/2012 10:28 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees

4897 09/20/2012 10:26 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".

4896 09/20/2012 10:23 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Don't automatically regenerate the aggregated unmapped_terms.csv, new_terms.csv because this almost doubles the remake time when a mappings/ prerequisite changes (41s -> 75s)

4895 09/20/2012 10:14 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".

4894 09/20/2012 10:09 PM Aaron Marcuse-Kubitza

inputs/GBIF/Specimen/map.csv: Remapped VerbatimLatitude/Longitude to decimalLatitude/Longitude because DecimalLatitude/Longitude just contains VerbatimLatitude/Longitude cast to a low-resolution float, which created spurious repeating decimals

4893 09/20/2012 09:56 PM Aaron Marcuse-Kubitza

mappings/Makefile: .VegCore-VegBIEN.csv.last_cleanup: Generate VegCore-VegBIEN.unsourced_terms.csv whenever VegCore-VegBIEN.csv changes, to track VegCore terms that are mapped to VegBIEN but not documented in VegCore.csv. Note that this file is not svn:ignored, so it will show up with a ? when the user runs `svn st` if there are any unsourced terms.

4892 09/20/2012 09:47 PM Aaron Marcuse-Kubitza

mappings/Makefile: Changed catch-all `.%.last_cleanup: %` target to a specific target for VegCore-VegBIEN.csv, because it's the only file that uses this target

4891 09/20/2012 09:45 PM Aaron Marcuse-Kubitza

mappings/: Don't generate a for_review version of Veg+-VegCore.csv, because it is identical to the machine-readable Veg+-VegCore.csv (there are no output XPaths to simplify)

4890 09/20/2012 09:41 PM Aaron Marcuse-Kubitza

mappings/: Don't generate a for_review version of VegX-VegCore.csv, because it is identical to the machine-readable VegX-VegCore.csv (there are no output XPaths to simplify)

4889 09/20/2012 09:37 PM Aaron Marcuse-Kubitza

mappings/: Removed Veg+.unmapped_terms.csv because these terms are found in each datasource's new_terms.csv, which are updated regularly, while this file isn't, and which exist for every datasource, while this file only contained terms from a few datasources

4888 09/20/2012 09:29 PM Aaron Marcuse-Kubitza

inputs/ARIZ/Specimen/map.csv: Remapped VerbatimLatitude, VerbatimLongitude to UNUSED

4887 09/20/2012 09:21 PM Aaron Marcuse-Kubitza

Regenerated root unmapped_terms.csv, new_terms.csv

4886 09/20/2012 09:19 PM Aaron Marcuse-Kubitza

lib/mappings.Makefile: unmapped_terms.csv, new_terms.csv: Only remake if newer than existing %/unmapped_terms.csv, %/new_terms.csv which haven't been autoremoved. This avoids always remaking every unmapped_terms.csv, new_terms.csv whenever `make missing_mappings` is run. Note that these files will automatically be remade whenever their corresponding map.csv changes, so it is not necessary to actually remake %/unmapped_terms.csv, %/new_terms.csv; they are prerequisites only so that their modification time may be checked to determine whether unmapped_terms.csv, new_terms.csv needs to be remade.

4885 09/20/2012 09:11 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Automatically regenerate aggregated unmapped_terms.csv, new_terms.csv when a subdir's corresponding file changes

4884 09/20/2012 09:10 PM Aaron Marcuse-Kubitza

inputs/: Regenerated aggregated unmapped_terms.csv, new_terms.csv

4883 09/20/2012 08:58 PM Aaron Marcuse-Kubitza

inputs/REMIB/: Moved nodes.make into Specimen.src/ so it's with the data it generates

4882 09/20/2012 08:55 PM Aaron Marcuse-Kubitza

inputs/TEAM/: Regenerated */new_terms.csv

4881 09/20/2012 08:30 PM Aaron Marcuse-Kubitza

inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.

4880 09/20/2012 08:28 PM Aaron Marcuse-Kubitza

inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.

4879 09/20/2012 06:58 PM Aaron Marcuse-Kubitza

inputs/TEAM/VL, VT: Split concatenated flat files apart into separate parts each time a header is duplicated, so that the header would be autoremoved by cat_csv. Changed modified BIEN2 flat file headers back to original headers (the duplicated headers) so the headers of all part files would match up. (This is required for cat_csv header autoremoval to work properly.) This results in changes to the input column names in */map.csv.

4878 09/20/2012 06:49 PM Aaron Marcuse-Kubitza

sql_io.py: null_strs: Added 'nulo' (used by REMIB)

4877 09/20/2012 06:13 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: DBH: Removed diameterBreastHeight_m alternative because datasources that don't append units to DBH almost always have units of cm or in

4876 09/20/2012 06:11 PM Aaron Marcuse-Kubitza

inputs/TEAM/*/map.csv: Remapped dbh from diameterBreastHeight_m to diameterBreastHeight_cm, using the units defined in Vegetation-Metadata-1.4.pdf

4875 09/20/2012 06:05 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

4874 09/19/2012 11:16 PM Aaron Marcuse-Kubitza

inputs/TEAM/: Added TeamPlotMetaData

4873 09/19/2012 11:09 PM Aaron Marcuse-Kubitza

inputs/TEAM/_src/: Added ci-team_extract/Vegetation-Metadata-1.4.pdf and symlink to it in the _src subdir

4872 09/19/2012 10:51 PM Aaron Marcuse-Kubitza

inputs/: Added aggregated unmapped_terms.csv, new_terms.csv which were not already under version control

4871 09/19/2012 10:41 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/Organism/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension

4870 09/19/2012 10:36 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/stems/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for plotObservations.intercept_cm, which measures the same dimension

4869 09/19/2012 10:33 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/plotObservations/map.csv: Remapped temp_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension

4868 09/19/2012 10:25 PM Aaron Marcuse-Kubitza

inputs/Madidi/Organism/map.csv: Remapped Diameter from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the range and precision of values

4867 09/19/2012 10:23 PM Aaron Marcuse-Kubitza

inputs/FIA/Organism/map.csv: DBH: Changed units comment to include that assumption was also based on location inside the U.S., because some data outside the U.S. also uses fractional DBHs, but these are not likely to be inch measurements

4866 09/19/2012 10:19 PM Aaron Marcuse-Kubitza

inputs/FIA/Organism/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_in, assuming units based on the range and precision of values

4865 09/19/2012 10:16 PM Aaron Marcuse-Kubitza

inputs/CTFS/StemObservation/map.csv: DBH: Changed units comment to include that assumption was also based on the precision of values, because fractional DBHs sometimes indicate units of inches

4864 09/19/2012 10:13 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added diameterBreastHeight_in

4863 09/19/2012 10:09 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added _in_to_m()

4862 09/19/2012 10:00 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Remapped DBH from no longer existing term diameterBreastHeight to diameterBreastHeight_cm, diameterBreastHeight_m (both terms will be listed in the map spreadsheet after automapping, and the user can then choose one)

4861 09/19/2012 09:57 PM Aaron Marcuse-Kubitza

inputs/CTFS/StemObservation/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units are cm based on the range of values

4860 09/19/2012 09:56 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added diameterBreastHeight_cm

4859 09/19/2012 09:41 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added stemID, which was only in mappings/VegCore-VegBIEN.csv

4858 09/19/2012 09:35 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: Inline $(unmappedTerms) because it's only used once

4857 09/19/2012 09:31 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.

4856 09/19/2012 09:19 PM Aaron Marcuse-Kubitza

inputs/: Added unmapped_terms.csv, new_terms.csv which were not already under version control

4855 09/19/2012 08:43 PM Aaron Marcuse-Kubitza

inputs/VegBank/plot_/: Automapped with new parentPlotID term, which now has a join mapping in mappings/VegCore-VegBIEN.csv

4854 09/19/2012 08:41 PM Aaron Marcuse-Kubitza

Regenerated unmapped_terms.csv, new_terms.csv

4853 09/19/2012 08:24 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Added parentPlotID

4852 09/19/2012 08:22 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Added parentLocationID, parentPlotName, which always map directly to the parent location, regardless of whether any subplot ID is present

4851 09/19/2012 08:16 PM Aaron Marcuse-Kubitza

mappings/Veg+.unmapped_terms.csv: Removed vague term volumeCanopy, which has no definition in VegX

4850 09/19/2012 08:14 PM Aaron Marcuse-Kubitza

mappings/Makefile: .VegCore.csv.last_cleanup: Fixed bug where needed to change sorting columns to match new column order

4849 09/19/2012 08:11 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Reordered columns to put Comments first, which matches mappings/Veg+-VegCore.csv

4848 09/19/2012 08:08 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed redundant stem_id->stemID mapping

4847 09/19/2012 08:07 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Standardized the capitalization of names, by camel-casing each name except for acronyms and "ID", which are made all uppercase

4846 09/19/2012 07:59 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Renamed diameterBreastHeight to diameterBreastHeight_m to assert units matching the VegBIEN field

4845 09/19/2012 07:44 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Removed duplicates

4844 09/19/2012 07:22 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Use new mappings/VegCore.csv as the VegCore vocabulary to canonicalize on, in order to also canonicalize VegCore terms which are not yet mapped to VegBIEN. This results in several DwC terms getting their case standardized according to http://rs.tdwg.org/dwc/terms/. Continue to determine unmapped terms using mappings/VegCore-VegBIEN.csv, because a term should not be considered mapped until it has been mapped all the way through to VegBIEN.

4843 09/19/2012 07:12 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Removed trailing spaces from terms

4842 09/19/2012 07:05 PM Aaron Marcuse-Kubitza

mappings/Veg+.unmapped_terms.csv: Removed duplicates of VegCore terms

4841 09/19/2012 07:02 PM Aaron Marcuse-Kubitza

mappings/: Split Veg+.terms.csv into VegCore.csv and Veg+.unmapped_terms.csv

4840 09/19/2012 06:36 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Removed terms that are in mappings/Veg+-VegCore.csv

4839 09/19/2012 06:31 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Added sources where missing

4838 09/19/2012 06:20 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Added Source and Comments columns from mappings/Veg+.terms.csv. Reordered columns to put Comments first.

4837 09/19/2012 06:17 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Removed duplicate entries for stem_id/stemID, collector

4836 09/19/2012 05:56 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

4835 09/19/2012 05:24 PM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/: Filter out invalid, frameshifted rows so they don't produce errors in the import or anomalies like thousands of taxondeterminations for one taxonoccurrence. This involves moving the CSVs to Specimen.src and using a create.sql to create the filtered table.

4834 09/19/2012 04:47 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Forward occurrenceID to taxonoccurrence.sourceaccessioncode when there is no other taxonoccurrence.sourceaccessioncode, to ensure that taxonoccurrence is uniquely identified so that there is one taxonoccurrence per organism

4833 09/19/2012 04:16 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.)

4832 09/19/2012 04:07 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.

4831 09/19/2012 03:36 PM Aaron Marcuse-Kubitza

input.Makefile: Testing: %/test.by_col.xml: Do abort tester if by-column test fails. There are no longer small rowcount differences between row-based and column-based import on some datasources, so this is now possible.

4830 09/18/2012 11:13 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Added tag so that a stemobservation can be scoped by its tag when no other ID is specified

4829 09/18/2012 11:11 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Fixed bug where filter condition underconstrained stemobservation when neither sourceaccessioncode nor authorstemcode was specified, by making sure that at least one *_unique index always applies

4828 09/18/2012 11:08 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Remapped tag to new stemobservation.tag

4827 09/18/2012 11:06 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: stemobservation: Added tag, tags

4826 09/18/2012 10:53 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: tag: Removed no longer applicable comment

4825 09/18/2012 10:49 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Removed no longer used previousTag and the complex mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. tag: Removed iscurrent=true because there is now only one tag field.