Project

General

Profile

Statistics
| Revision:

# Date Author Comment
4147 08/21/2012 06:13 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added _if()

4146 08/21/2012 06:12 AM Aaron Marcuse-Kubitza

sql.py: function_exists(): Support overloaded functions

4145 08/21/2012 06:09 AM Aaron Marcuse-Kubitza

sql.py: run_query(): Parse "more than one" errors as DuplicateExceptions

4144 08/21/2012 05:42 AM Aaron Marcuse-Kubitza

xml_func.py: XML function specification documentation: Updated parameters

4143 08/21/2012 05:39 AM Aaron Marcuse-Kubitza

xml_func.py: Removed no longer needed _eq(), which has been translated to a SQL function

4142 08/21/2012 05:38 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added _eq()

4141 08/21/2012 05:37 AM Aaron Marcuse-Kubitza

sql.py: run_query(): Parse "could not determine polymorphic type because input has type "unknown"" errors as MissingCastExceptions to type text. This adds support for polymorphic SQL functions whose parameters are anyelement, etc.

4140 08/21/2012 05:35 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): sql.MissingCastException: Support unknown (None) columns, by casting all columns

4139 08/21/2012 05:30 AM Aaron Marcuse-Kubitza

sql.py: MissingCastException: Support unknown (None) columns

4138 08/21/2012 05:29 AM Aaron Marcuse-Kubitza

xml_dom.py: replace_with_text(): Support bool `new` values

4137 08/21/2012 04:22 AM Aaron Marcuse-Kubitza

input.Makefile: Determine import order from sorted order of all non-hidden subdirs, instead of from fixed constant. This allows datasources to specify arbitrary tables, rather than being limited to 0.plots, 1.organisms, 2.stems, specimens.

4136 08/21/2012 04:14 AM Aaron Marcuse-Kubitza

lib/common.Makefile: Added $(wildcard/) (needed because builtin $(wildcard) doesn't do / suffix correctly)

4135 08/21/2012 04:11 AM Aaron Marcuse-Kubitza

input.Makefile: src/%/map.full.csv: Fixed bug where couldn't have $(srcMap) in prerequisites because this would for some reason cause src/%/map.full.csv to always be remade

4134 08/21/2012 03:47 AM Aaron Marcuse-Kubitza

input.Makefile: Src maps cleanup: Fixed bug where src.csv was using .map.csv.last_cleanup instead of .src.csv.last_cleanup as its .last_cleanup file

4133 08/21/2012 03:30 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Moved src/%/map.full.csv after src/%/map.csv now that the filenames are fixed, so pattern matching order isn't an issue

4132 08/21/2012 03:27 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: $(makeFullCsv): Removed no longer needed test for whether the $(coreSelfMap) exists, because Veg+'s self map always exists

4131 08/21/2012 03:12 AM Aaron Marcuse-Kubitza

input.Makefile: Src maps cleanup: Fixed bug where src.csv was using .map.csv.last_cleanup instead of .src.csv.last_cleanup as its .last_cleanup file

4130 08/21/2012 02:34 AM Aaron Marcuse-Kubitza

inputs/CTFS/src/1.organisms/: Added "_" prefix to prevent it from being treated as a data table subdir, before the DB export is mapped

4129 08/21/2012 02:20 AM Aaron Marcuse-Kubitza

inputs/CTFS/src/ERD.jpg: Made it a symlink to "STRI2011_DB v5.jpg" instead of a copy of it

4128 08/21/2012 02:11 AM Aaron Marcuse-Kubitza

Added inputs/CTFS/src/bci_01April2011.zip.url, which contains the original download URL for our copy of the CTFS database

4127 08/21/2012 01:31 AM Aaron Marcuse-Kubitza

inputs/CTFS/src/: Added "_" prefix to scripts_to_drop_extra_tables subdir to prevent it from being treated as a data table subdir

4126 08/21/2012 01:10 AM Aaron Marcuse-Kubitza

inputs/Makefile: Input data sync: Updated rsync filter for new subdirs layout

4125 08/21/2012 12:55 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Updated for new subdirs layout

4124 08/21/2012 12:17 AM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Updated svn:ignores for new subdirs layout

4123 08/21/2012 12:08 AM Aaron Marcuse-Kubitza

inputs/Makefile: Import logs: Fixed bug where excluded install logs needed to be renamed according to the new name format (from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-log-files-into-subfolders&gt;)

4122 08/20/2012 11:59 PM Aaron Marcuse-Kubitza

inputs: Moved log files into subfolders, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-log-files-into-subfolders>

4121 08/20/2012 11:01 PM Aaron Marcuse-Kubitza

input.Makefile: Merged Installation and Staging tables sections into Staging tables installation, since no other installation is performed. Removed "import/" prefix from non-file import-related targets.

4120 08/20/2012 10:20 PM Aaron Marcuse-Kubitza

inputs: Moved test outputs into subfolders, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-test-outputs-into-subfolders>

4119 08/20/2012 09:58 PM Aaron Marcuse-Kubitza

input.Makefile: Import to VegBIEN: Removed extra test for $(inputFiles), because when there are no inputs, $(tables) will be empty and import will automatically do nothing. Removed no longer needed $(inputFiles).

4118 08/20/2012 08:46 PM Aaron Marcuse-Kubitza

inputs: Moved maps into subfolders, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-maps-into-subfolders>

4117 08/20/2012 07:16 PM Aaron Marcuse-Kubitza

inputs: Replaced Veg+ prefix with map on via maps, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Replace-Veg-prefix-with-map-on-via-maps>

4116 08/20/2012 06:39 PM Aaron Marcuse-Kubitza

strings.py: concat(): Apply length limits by shrinking max_len by new raw_extra_len() of the strings. This also fixes a bug where multi-byte characters in str0 were not properly taken into account, leading to overly long strings. Added doc comment.

4115 08/20/2012 06:29 PM Aaron Marcuse-Kubitza

strings.py: Added raw_extra_len()

4114 08/20/2012 06:17 PM Aaron Marcuse-Kubitza

sql_gen.py: NoUnderlyingTableException: Take a (required) parameter for the item that had no underlying table, and provide this wherever a NoUnderlyingTableException is created

4113 08/20/2012 06:16 PM Aaron Marcuse-Kubitza

strings.py: concat(): Perform substring operation on Unicode strings so that substring does not split Unicode characters. Still use to_raw_str() to calculate the str1 length because Unicode characters can be multi-byte, and length limits often apply to the byte length, not the character length.

4112 08/20/2012 06:13 PM Aaron Marcuse-Kubitza

exc.py: add_msg(): Fixed bug where needed to convert the Unicode string back into a raw string because Python's top-level exception handler doesn't support Unicode strings as exception messages

4111 08/20/2012 05:22 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4110 08/17/2012 07:53 PM Aaron Marcuse-Kubitza

inputs: Renamed stems table to 2.stems so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>

4109 08/17/2012 07:49 PM Aaron Marcuse-Kubitza

inputs: Renamed organisms table to 1.organisms so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>

4108 08/17/2012 07:30 PM Aaron Marcuse-Kubitza

inputs: Renamed plots table to 0.plots so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>

4107 08/17/2012 07:30 PM Aaron Marcuse-Kubitza

inputs: Renamed plots table to 0.plots so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>

4106 08/17/2012 07:00 PM Aaron Marcuse-Kubitza

input.Makefile: Mapping: If table subdir contains no input files, print warning instead of aborting. This situation occurs when renaming a version-controlled directory, whose previous version persists as an empty dir until committing.

4105 08/17/2012 06:41 PM Aaron Marcuse-Kubitza

input.Makefile: Mapping: Removed no longer used $(<in) and test for it in $(map)

4104 08/17/2012 06:37 PM Aaron Marcuse-Kubitza

input.Makefile: Mapping: $(map): Removed no longer used test for $(mapEnv)

4103 08/17/2012 05:50 PM Aaron Marcuse-Kubitza

sql.py: run_query(): Exception handling: Fixed bug where PostgreSQL 9.1 PL/Python errors have a different format than PostgreSQL 9.0 which needs to be supported separately. This format was already supported in sql_gen.plpythonu_error_handler, but also needed to be supported for exceptions that propagate back to the client.

4102 08/17/2012 05:34 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/src/: Removed source files because they shouldn't be under version control. (They are synchronized via `make inputs/download`.)

4101 08/17/2012 05:15 PM Aaron Marcuse-Kubitza

inputs: Moved src files into VegCSV subfolders (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#CSV-representation), with table suffixes removed, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders>

4100 08/17/2012 04:26 PM Aaron Marcuse-Kubitza

util.py: dict_subset(): Fall back to using dict when OrderedDict is not available, in order to support making the maps on nimoy

4099 08/17/2012 04:02 PM Aaron Marcuse-Kubitza

mappings/: Removed now-inaccurate ".stems" suffix from VegX-VegCore.stems.csv, which actually applied to all tables

4098 08/17/2012 03:59 PM Aaron Marcuse-Kubitza

mappings/: Removed no longer used ".specimens" suffix from maps, which is now the same for all maps

4097 08/17/2012 03:52 PM Aaron Marcuse-Kubitza

mappings/: Removed no longer used plots, organisms, and stems maps, which were copies of the specimens map

4096 08/17/2012 03:48 PM Aaron Marcuse-Kubitza

input.Makefile: Core maps: Always use the specimens "table", since there are now no longer separate mappings for different tables, and the other tables' maps in mappings/ are merely copies of the specimens table's map

4095 08/17/2012 03:30 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer used custom via maps code, so that map files no longer need a prefix (which is always the same) specifying that they map through Veg+. Veg+ thus serves as the single gateway to VegBIEN, which avoids ever again having to maintain two copies of the mappings, as was the case when DwC and VegX XPaths were separate gateways. This will assist in untying the complex mapping logic in input.Makefile from file naming conventions in mappings/, and simplify the task of grouping each map with the CSV it maps.

4094 08/17/2012 03:14 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer used DB inputs section, because all of our inputs are either CSV or (rarely) XML. This removes a significant amount of dead code that will make it easier to refactor input.Makefile to use custom CSV import orders.

4093 08/17/2012 02:51 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.specimens.csv: Added mappings for miscellaneous terms

4092 08/17/2012 02:45 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added miscellaneous terms

4091 08/17/2012 12:52 PM Aaron Marcuse-Kubitza

to_do/: svn:ignore OpenOffice lock files

4090 08/17/2012 12:50 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) went back down to 9 hours after replacing the slower _merge with _alt.

4089 08/16/2012 08:34 PM Aaron Marcuse-Kubitza

Added new autogen mappings/VegCore.self.specimens.csv (not currently used)

4088 08/16/2012 08:30 PM Aaron Marcuse-Kubitza

Merged DwC (including DwC1) and VegCSV mappings into new Veg+ schema. This involves replacing occurrences of DwC and VegCSV with Veg+ (or sometimes VegCore) everywhere, as described in <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV-DwC_merging&gt;.

4087 08/16/2012 08:18 PM Aaron Marcuse-Kubitza

README.TXT: Schema changes: Updated filenames of PDF ERD exports

4086 08/16/2012 08:15 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

4085 08/16/2012 08:12 PM Aaron Marcuse-Kubitza

xpath.py: parse(): _value(): Support '+' as a word character that doesn't need to be quoted

4084 08/16/2012 06:54 PM Aaron Marcuse-Kubitza

intersect: Fixed bug where test for ignore option needed to be removed, because ignore is not supported by this program

4083 08/16/2012 06:45 PM Aaron Marcuse-Kubitza

util.py: list_subset(): Fixed bug where using '+' to append the rest of the list didn't work if '+' was the first index, because max() cannot be called on an empty list

4082 08/16/2012 05:14 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Added VegCSV mappings, to enable use of one VegCSV-VegBIEN mapping for specimens and plots data

4081 08/16/2012 05:12 PM Aaron Marcuse-Kubitza

inputs/XAL/maps/DwC.specimens.csv: Remapped FieldNumber to recordNumber because this historical DwC term (http://rs.tdwg.org/dwc/terms/history/index.htm#fieldNumber-2009-04-24) has close to the same meaning as recordNumber, but not the same meaning as the current fieldNumber term

4080 08/16/2012 04:55 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink/maps/DwC.specimens.csv: Remapped fieldNumber to recordNumber because term usage was inconsistent with DwC definition. Datasources often confuse this term, because it seems like the collection number, but is actually the author code for the event (VegBank's authorObsCode).

4079 08/16/2012 04:28 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Added additional VegCSV mappings for mergability. taxonoccurrence.authortaxoncode: Added alternative mappings from VegCSV for mergability.

4078 08/16/2012 04:21 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Apply pass-through optimizations for _if statements with no condition (which means false). This faciliates automated testing after an _if statement has been added, because the put template provided as part of the automated test will only change for those datasources that actually have a condition entry for the _if statement, which greatly reduces the number of tests that need to be accepted. (Note that the path before the _if will still be included as an empty path if there are no other mappings to that table, because the _if statement does not surround it.)

4077 08/16/2012 02:26 PM Aaron Marcuse-Kubitza

mappings/VegCSV-VegBIEN.specimens.csv: Added DwC mappings, to enable use of one VegCSV-VegBIEN mapping for specimens and plots data

4076 08/16/2012 02:22 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Moved collectionnumber from specimenreplicate to plantobservation to replace authorplantcode, since these terms are used analogously in plots and specimens data. This code is really the DwC recordNumber (VegBIEN collectionnumber), which "serves as a link between field notes and an Occurrence record, such as a specimen [or plots data] collector's number" (http://rs.tdwg.org/dwc/terms/#recordNumber). Also, this prevents a specimenreplicate from incorrectly being created when plots data provides an authorplantcode.

4075 08/16/2012 01:55 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Mapped individualID for mergability with VegCSV

4074 08/16/2012 01:49 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: Split occurrenceID into occurrenceID and individualID, where individualID refers to the plant in plots data and occurrenceID refers to the specimen in specimens data. This prevents plant sourceaccessioncodes from being mapped to the specimenreplicate, which was messing up stems mappings for the parent plantobservation. It also avoids mapping the specimenreplicate sourceaccessioncode to additional tables where it isn't needed. (Note that occurrenceID is needed for location to ensure that each specimen gets its own location to make locationdeterminations on. Everything else is directly or indirectly scoped by location when its own sourceaccessioncode isn't specified.)

4073 08/16/2012 01:33 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Removed catalogNumber mapping because the catalogNumber applies only to the specimen, not to the occurrence, especially in plots data

4072 08/16/2012 01:14 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Map everything except occurrenceID (which is globally unique) to new authortaxoncode, which only needs to be unique within the locationevent

4071 08/16/2012 12:59 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonoccurrence: Renamed taxonoccurrence_locationevent_1_to_1 to taxonoccurrence_unique_within_locationevent and added new authortaxoncode to it

4070 08/16/2012 12:57 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonoccurrence: Added authortaxoncode to store unique keys that are unique within the locationevent rather than within the datasource

4069 08/16/2012 12:43 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/maps/VegCSV.organisms.csv: Added _alt to height_m, stem_height_m to choose between them when both are specified (rather than having bin/map choose their priority order based on their order in the map). Note that when both of the heights are specified, they are always either the same, or height_m is invalid (see <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Some-organisms-have-one-stem-but-different-heights-in-the-organisms-and-stems-tables&gt;).

4068 08/16/2012 12:39 PM Aaron Marcuse-Kubitza

bin/map: collision_suffix: Setting back to _alt to test if _merge caused the SpeciesLink slowdown. SpeciesLink contains a huge number of equivalent columns due to each DwC term being present with namespaces for all versions of the DwC schema, and these columns can be combined either using _alt or _merge. _merge is only useful if the values in different versions of the same DwC field are different, which is not likely the case.

4067 08/16/2012 12:29 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) doubled, to 16 hours, most likely due to replacing _alt with the slower _merge, which preserves more input data.

4066 08/15/2012 11:30 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: occurrenceID: Mapped to location.authorlocationcode instead of sourceaccessioncode so that it would not override any location- or event-related IDs in location.authorlocationcode merely by being mapped to the sourceaccessioncode field (which takes precedence over the authorlocationcode when specified)

4065 08/15/2012 10:43 AM Aaron Marcuse-Kubitza

mappings/VegCSV-VegBIEN.specimens.csv: occurrenceID: Mapped to specimenreplicate.sourceaccessioncode for mergability with DwC

4064 08/15/2012 09:14 AM Aaron Marcuse-Kubitza

mappings/VegCSV-VegBIEN.specimens.csv: Mapped voucherType to indirect voucher _if statements' conditions

4063 08/15/2012 09:02 AM Aaron Marcuse-Kubitza

mappings/VegCSV-VegBIEN.specimens.csv: locationID: location.sourceaccessioncode mapping: Added /_alt suffix for mergability with DwC

4062 08/15/2012 08:53 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: collectionID: Mapped to location.authorlocationcode as merge with collectionCode, the same way as it is for specimenreplicate.collectioncode_dwc

4061 08/15/2012 08:23 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: location: location_unique_within_datasource_by_authorlocationcode unique index: Added `parent_id IS NULL` condition so that an authorlocationcode is not unintentionally treated as globally unique when a parent location is available (which implies that the authorlocationcode is a subplot code)

4060 08/15/2012 08:20 AM Aaron Marcuse-Kubitza

mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Added location.authorlocationcode mapping for mergability with DwC

4059 08/15/2012 08:13 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: location.authorlocationcode mappings: Added /_alt/3 for mergability with VegCSV mappings to same field

4058 08/15/2012 08:05 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Wrapped all mappings in direct voucher _if for mergability with VegCSV

4057 08/15/2012 07:57 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Moved direct/indirect voucher _if inwards to wrap just the value of catalognumber_dwc, not the catalognumber_dwc field node, to match the corresponding VegCSV mapping

4056 08/15/2012 07:48 AM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Replaced _alt with _merge where applicable to avoid losing source data on import when multiple fields collide

4055 08/15/2012 07:46 AM Aaron Marcuse-Kubitza

mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up using `make mappings/`

4054 08/15/2012 07:18 AM Aaron Marcuse-Kubitza

schemas/functions.sql: join_strs_transform(): Use STRICT optimization to avoid needing to manually check if the state value or input value is NULL (http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html#AEN51596)

4053 08/15/2012 07:15 AM Aaron Marcuse-Kubitza

schemas/functions.sql: join_strs(), join_strs_transform(): Reversed order of params to enable strict optimization, which replaces the state value with the first parameter, which used to be the delimiter (http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html#AEN51596)

4052 08/15/2012 07:07 AM Aaron Marcuse-Kubitza

Renamed join_strs_transform_preserve_empty() to join_strs_transform() now that there are no other join_strs_transform_...() functions

4051 08/15/2012 07:06 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Removed no longer used join_strs_transform_fold_empty()

4050 08/15/2012 07:06 AM Aaron Marcuse-Kubitza

schemas/functions.sql: join_strs() aggregate: Use join_strs_transform_preserve_empty() as an optimization because all our data has already had '' replaced with NULL by sql_io.cleanup_table() in csv2db. This will help speed up _merges now that they are performed on a large scale in the slowest datasource, SpeciesLink.

4049 08/15/2012 07:02 AM Aaron Marcuse-Kubitza

bin/map: collision_suffix: Changed to use _merge instead of _alt to avoid losing source data on import when multiple fields collide

4048 08/15/2012 06:58 AM Aaron Marcuse-Kubitza

bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed