Regenerated vegbien.ERD exports
mappings: Added autogen Veg+-VegCore.to_self.csv, which is Veg+-VegCore.csv joined to itself, and use it as an intermediate map to join to VegCore-VegBIEN.csv. This provides support for two-level chains of mappings in Veg+-VegCore.csv.
mappings/Veg+-VegCore.csv: Changed output root to Veg+, to allow mappings/Veg+-VegCore.csv to be joined with itself idempotently, for supporting multi-level chains of mappings
mappings/Veg+-VegCore.csv: Add pass through /_alt mapping for all terms in this map that are merged with _alt, to allow datasource to define custom mappings that don't pass through the default mapping. This also allows mappings/Veg+-VegCore.csv to be joined with itself idempotently, to support multi-level chains of mappings.
mappings/Veg+-VegCore.csv: authorPlantCode: Added _alt suffix to create the correct priority
union: Exclude empty rows from the output, so that empty mappings from map_0 aren't included when map_1 contains a non-empty mapping for the same term. Note that this causes "No non-empty join mapping" warnings to turn into "No join mapping".
ci_map: Run join_union_sort in quiet mode so that it doesn't add lots of "No non-empty join mapping" warnings to the Comments column
mappings/Veg+-VegCore.csv: scientificNameAuthor: Added scientificNameAuthorship mapping with /_alt/1, to ensure that it has priority over scientificNameAuthor and to ensure that it has an _alt suffix when a datasource contains both scientificNameAuthor and scientificNameAuthorship (such as SpeciesLink)
inputs/SpeciesLink/src/specimens/map.csv: Added explicit _alt suffix when multiple terms map to the same place
inputs/ARIZ/src/specimens/map.csv: RelatedCatalogItem mappings: Added _alt suffixes
union: Multi-support: When an input appears in both maps, treat an empty mapping as if it didn't exist so that it doesn't overwrite a non-empty mapping in the other map
mappings/Makefile: Veg+.cs-VegBIEN.csv: Join Veg+-VegCore.csv to VegCore-VegBIEN.csv in quiet mode, to avoid adding "No non-empty join mapping" to the Comments column
join: quiet mode: Turn off all warnings, not just "No input mapping" warnings. This is useful when join-unioning a synonymy to a primary map, which may have "No non-empty join mapping" for some terms but this should not be stored in the resulting map's Comments column.
mappings/Makefile: Rewrapped lines
mappings/Veg+-VegCore.csv: Added verbatimGrowthForm mapping
mappings/Veg+.terms.csv: verbatimGrowthForm: Added comment that additional values come from SALVIAS. As other datasources' custom growth form values are added, they can be added to this comment.
mappings/Veg+.terms.csv: Added verbatimGrowthForm
schemas/vegbien.sql: locationdetermination: Added verbatimlatitude, verbatimlongitude, verbatimcoordinates
schemas/functions.sql: Made aggregating functions polymorphic
xml_func.py: Removed no longer used _collapse()
xml_func.py: Removed no longer needed _if(), which has been translated to a SQL function
schemas/functions.sql: Added _if()
sql.py: function_exists(): Support overloaded functions
sql.py: run_query(): Parse "more than one" errors as DuplicateExceptions
xml_func.py: XML function specification documentation: Updated parameters
xml_func.py: Removed no longer needed _eq(), which has been translated to a SQL function
schemas/functions.sql: Added _eq()
sql.py: run_query(): Parse "could not determine polymorphic type because input has type "unknown"" errors as MissingCastExceptions to type text. This adds support for polymorphic SQL functions whose parameters are anyelement, etc.
sql_io.py: put_table(): sql.MissingCastException: Support unknown (None) columns, by casting all columns
sql.py: MissingCastException: Support unknown (None) columns
xml_dom.py: replace_with_text(): Support bool `new` values
input.Makefile: Determine import order from sorted order of all non-hidden subdirs, instead of from fixed constant. This allows datasources to specify arbitrary tables, rather than being limited to 0.plots, 1.organisms, 2.stems, specimens.
lib/common.Makefile: Added $(wildcard/) (needed because builtin $(wildcard) doesn't do / suffix correctly)
input.Makefile: src/%/map.full.csv: Fixed bug where couldn't have $(srcMap) in prerequisites because this would for some reason cause src/%/map.full.csv to always be remade
input.Makefile: Src maps cleanup: Fixed bug where src.csv was using .map.csv.last_cleanup instead of .src.csv.last_cleanup as its .last_cleanup file
input.Makefile: Maps building: Moved src/%/map.full.csv after src/%/map.csv now that the filenames are fixed, so pattern matching order isn't an issue
input.Makefile: Maps building: $(makeFullCsv): Removed no longer needed test for whether the $(coreSelfMap) exists, because Veg+'s self map always exists
inputs/CTFS/src/1.organisms/: Added "_" prefix to prevent it from being treated as a data table subdir, before the DB export is mapped
inputs/CTFS/src/ERD.jpg: Made it a symlink to "STRI2011_DB v5.jpg" instead of a copy of it
Added inputs/CTFS/src/bci_01April2011.zip.url, which contains the original download URL for our copy of the CTFS database
inputs/CTFS/src/: Added "_" prefix to scripts_to_drop_extra_tables subdir to prevent it from being treated as a data table subdir
inputs/Makefile: Input data sync: Updated rsync filter for new subdirs layout
README.TXT: Datasource setup: Updated for new subdirs layout
input.Makefile: SVN: add: Updated svn:ignores for new subdirs layout
inputs/Makefile: Import logs: Fixed bug where excluded install logs needed to be renamed according to the new name format (from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-log-files-into-subfolders>)
inputs: Moved log files into subfolders, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-log-files-into-subfolders>
input.Makefile: Merged Installation and Staging tables sections into Staging tables installation, since no other installation is performed. Removed "import/" prefix from non-file import-related targets.
inputs: Moved test outputs into subfolders, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-test-outputs-into-subfolders>
input.Makefile: Import to VegBIEN: Removed extra test for $(inputFiles), because when there are no inputs, $(tables) will be empty and import will automatically do nothing. Removed no longer needed $(inputFiles).
inputs: Moved maps into subfolders, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-maps-into-subfolders>
inputs: Replaced Veg+ prefix with map on via maps, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Replace-Veg-prefix-with-map-on-via-maps>
strings.py: concat(): Apply length limits by shrinking max_len by new raw_extra_len() of the strings. This also fixes a bug where multi-byte characters in str0 were not properly taken into account, leading to overly long strings. Added doc comment.
strings.py: Added raw_extra_len()
sql_gen.py: NoUnderlyingTableException: Take a (required) parameter for the item that had no underlying table, and provide this wherever a NoUnderlyingTableException is created
strings.py: concat(): Perform substring operation on Unicode strings so that substring does not split Unicode characters. Still use to_raw_str() to calculate the str1 length because Unicode characters can be multi-byte, and length limits often apply to the byte length, not the character length.
exc.py: add_msg(): Fixed bug where needed to convert the Unicode string back into a raw string because Python's top-level exception handler doesn't support Unicode strings as exception messages
inputs/import.stats.xls: Updated with stats from latest import
inputs: Renamed stems table to 2.stems so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
inputs: Renamed organisms table to 1.organisms so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
inputs: Renamed plots table to 0.plots so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
input.Makefile: Mapping: If table subdir contains no input files, print warning instead of aborting. This situation occurs when renaming a version-controlled directory, whose previous version persists as an empty dir until committing.
input.Makefile: Mapping: Removed no longer used $(<in) and test for it in $(map)
input.Makefile: Mapping: $(map): Removed no longer used test for $(mapEnv)
sql.py: run_query(): Exception handling: Fixed bug where PostgreSQL 9.1 PL/Python errors have a different format than PostgreSQL 9.0 which needs to be supported separately. This format was already supported in sql_gen.plpythonu_error_handler, but also needed to be supported for exceptions that propagate back to the client.
inputs/SALVIAS-CSV/src/: Removed source files because they shouldn't be under version control. (They are synchronized via `make inputs/download`.)
inputs: Moved src files into VegCSV subfolders (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#CSV-representation), with table suffixes removed, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders>
util.py: dict_subset(): Fall back to using dict when OrderedDict is not available, in order to support making the maps on nimoy
mappings/: Removed now-inaccurate ".stems" suffix from VegX-VegCore.stems.csv, which actually applied to all tables
mappings/: Removed no longer used ".specimens" suffix from maps, which is now the same for all maps
mappings/: Removed no longer used plots, organisms, and stems maps, which were copies of the specimens map
input.Makefile: Core maps: Always use the specimens "table", since there are now no longer separate mappings for different tables, and the other tables' maps in mappings/ are merely copies of the specimens table's map
input.Makefile: Removed no longer used custom via maps code, so that map files no longer need a prefix (which is always the same) specifying that they map through Veg+. Veg+ thus serves as the single gateway to VegBIEN, which avoids ever again having to maintain two copies of the mappings, as was the case when DwC and VegX XPaths were separate gateways. This will assist in untying the complex mapping logic in input.Makefile from file naming conventions in mappings/, and simplify the task of grouping each map with the CSV it maps.
input.Makefile: Removed no longer used DB inputs section, because all of our inputs are either CSV or (rarely) XML. This removes a significant amount of dead code that will make it easier to refactor input.Makefile to use custom CSV import orders.
mappings/Veg+-VegCore.specimens.csv: Added mappings for miscellaneous terms
mappings/Veg+.terms.csv: Added miscellaneous terms
to_do/: svn:ignore OpenOffice lock files
inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) went back down to 9 hours after replacing the slower _merge with _alt.
Added new autogen mappings/VegCore.self.specimens.csv (not currently used)
Merged DwC (including DwC1) and VegCSV mappings into new Veg+ schema. This involves replacing occurrences of DwC and VegCSV with Veg+ (or sometimes VegCore) everywhere, as described in <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV-DwC_merging>.
README.TXT: Schema changes: Updated filenames of PDF ERD exports
xpath.py: parse(): _value(): Support '+' as a word character that doesn't need to be quoted
intersect: Fixed bug where test for ignore option needed to be removed, because ignore is not supported by this program
util.py: list_subset(): Fixed bug where using '+' to append the rest of the list didn't work if '+' was the first index, because max() cannot be called on an empty list
mappings/DwC2-VegBIEN.specimens.csv: Added VegCSV mappings, to enable use of one VegCSV-VegBIEN mapping for specimens and plots data
inputs/XAL/maps/DwC.specimens.csv: Remapped FieldNumber to recordNumber because this historical DwC term (http://rs.tdwg.org/dwc/terms/history/index.htm#fieldNumber-2009-04-24) has close to the same meaning as recordNumber, but not the same meaning as the current fieldNumber term
inputs/SpeciesLink/maps/DwC.specimens.csv: Remapped fieldNumber to recordNumber because term usage was inconsistent with DwC definition. Datasources often confuse this term, because it seems like the collection number, but is actually the author code for the event (VegBank's authorObsCode).
mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Added additional VegCSV mappings for mergability. taxonoccurrence.authortaxoncode: Added alternative mappings from VegCSV for mergability.
xml_func.py: simplify(): Apply pass-through optimizations for _if statements with no condition (which means false). This faciliates automated testing after an _if statement has been added, because the put template provided as part of the automated test will only change for those datasources that actually have a condition entry for the _if statement, which greatly reduces the number of tests that need to be accepted. (Note that the path before the _if will still be included as an empty path if there are no other mappings to that table, because the _if statement does not surround it.)
mappings/VegCSV-VegBIEN.specimens.csv: Added DwC mappings, to enable use of one VegCSV-VegBIEN mapping for specimens and plots data
schemas/vegbien.sql: Moved collectionnumber from specimenreplicate to plantobservation to replace authorplantcode, since these terms are used analogously in plots and specimens data. This code is really the DwC recordNumber (VegBIEN collectionnumber), which "serves as a link between field notes and an Occurrence record, such as a specimen [or plots data] collector's number" (http://rs.tdwg.org/dwc/terms/#recordNumber). Also, this prevents a specimenreplicate from incorrectly being created when plots data provides an authorplantcode.
mappings/DwC2-VegBIEN.specimens.csv: Mapped individualID for mergability with VegCSV
mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: Split occurrenceID into occurrenceID and individualID, where individualID refers to the plant in plots data and occurrenceID refers to the specimen in specimens data. This prevents plant sourceaccessioncodes from being mapped to the specimenreplicate, which was messing up stems mappings for the parent plantobservation. It also avoids mapping the specimenreplicate sourceaccessioncode to additional tables where it isn't needed. (Note that occurrenceID is needed for location to ensure that each specimen gets its own location to make locationdeterminations on. Everything else is directly or indirectly scoped by location when its own sourceaccessioncode isn't specified.)
mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Removed catalogNumber mapping because the catalogNumber applies only to the specimen, not to the occurrence, especially in plots data
mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Map everything except occurrenceID (which is globally unique) to new authortaxoncode, which only needs to be unique within the locationevent
schemas/vegbien.sql: taxonoccurrence: Renamed taxonoccurrence_locationevent_1_to_1 to taxonoccurrence_unique_within_locationevent and added new authortaxoncode to it
schemas/vegbien.sql: taxonoccurrence: Added authortaxoncode to store unique keys that are unique within the locationevent rather than within the datasource