Project

General

Profile

Activity

From 07/25/2012 to 08/23/2012

08/23/2012

05:32 PM Revision 4205: mappings/VegCore-VegBIEN.csv: Primary taxondetermination: Removed [role=identifier] because the role of the entity making the determination is unknown. Added [!isoriginal] filter to those mappings to ensure that primary taxondetermination XPaths map to a different taxondetermination than the [isoriginal=true] determination when both are present.
Aaron Marcuse-Kubitza
05:24 PM Revision 4204: inputs/SALVIAS*/1.organisms/map.csv: Remapped cfaff to identificationQualifier, because it was previously mapped to the same taxondetermination as the Orig* terms but does not have a corresponding Orig prefix to indicate that it should apply to the original determination instead of the primary TNRS one
Aaron Marcuse-Kubitza
05:19 PM Revision 4203: mappings/Veg+.terms.csv: Removed no longer used computer.* taxonomic terms
Aaron Marcuse-Kubitza
05:19 PM Revision 4202: mappings/VegCore-VegBIEN.csv: Removed no longer used computer.* taxonomic terms
Aaron Marcuse-Kubitza
05:18 PM Revision 4201: inputs: Regenerated VegBIEN.csv for several datasources, which had apparently not gotten regenerated when make was run after the taxonRank mapping addition
Aaron Marcuse-Kubitza
05:00 PM Revision 4200: backups/: svn:ignore: Also ignore .*, which includes temp files generated by rsync
Aaron Marcuse-Kubitza
04:58 PM Revision 4199: xml_func.py: simplify(): Also consider _name() to be an aggregate function
Aaron Marcuse-Kubitza
04:57 PM Revision 4198: xml_func.py: simplify(): Also consider _name() to be an aggregate function
Aaron Marcuse-Kubitza
04:49 PM Revision 4197: inputs/SALVIAS*/1.organisms/map.csv: Removed computer.* prefix from primary (TNRS) taxondetermination, so it would map to the main taxondetermination in VegBIEN
Aaron Marcuse-Kubitza
04:46 PM Revision 4196: mappings/VegCore-VegBIEN.csv: Mapped taxonRank analogously to computer.taxonRank
Aaron Marcuse-Kubitza
04:34 PM Revision 4195: inputs/SALVIAS*/1.organisms/map.csv: Remapped OrigFamily/OrigGenus/OrigSpecies to new verbatim* taxonomic names. Also remapped cfaff to verbatimIdentificationQualifier, because it was previously mapped to the same taxondetermination as the Orig* terms, but this will later need to be remapped to identificationQualifier (not in this commit because that is a separate change). Note that the switch to the verbatim* taxonomic names removes a concatenated binomial that was part of the previous mappings, which put OrigGenus and OrigSpecies together into one scientificName.
Aaron Marcuse-Kubitza
03:34 PM Revision 4194: mappings/VegCore-VegBIEN.csv: Mapped verbatimScientificName to taxonoccurrence.authortaxoncode as an alternative to scientificName
Aaron Marcuse-Kubitza
03:12 PM Revision 4193: mappings/VegCore-VegBIEN.csv: Mapped verbatim* taxonomic terms
Aaron Marcuse-Kubitza
03:10 PM Revision 4192: mappings/Veg+.terms.csv: Added verbatimIdentificationQualifier
Aaron Marcuse-Kubitza
03:07 PM Revision 4191: mappings/Veg+.terms.csv: Added verbatimScientificName
Aaron Marcuse-Kubitza
03:06 PM Revision 4190: schemas/vegbien.sql: taxondetermination: taxondetermination_unique unique index: Added isoriginal so an "original" determination in the same row (as found in SALVIAS) will be seen as distinct from the scrubbed determination, even if they are to the same plant name
Aaron Marcuse-Kubitza
02:57 PM Revision 4189: mappings/VegCore-VegBIEN.csv: taxonomic terms: Removed ":[isoriginal=true]" because there may be multiple determinations for an organism (either in separate rows or, for SALVIAS, in separate columns), and not all will be the original determination
Aaron Marcuse-Kubitza
02:43 PM Revision 4188: schemas/vegbien.sql: taxondetermination.role: Default to 'unknown' so that the field is optional
Aaron Marcuse-Kubitza
02:41 PM Revision 4187: schemas/vegbien.sql: role enum: Added 'unknown' value
Aaron Marcuse-Kubitza
02:20 PM Revision 4186: mappings/Veg+.terms.csv: Added verbatim* taxonomic terms
Aaron Marcuse-Kubitza
02:12 PM Revision 4185: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza

08/22/2012

04:56 PM Revision 4184: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza
04:31 PM Revision 4183: inputs: Regenerated maps for changes to bin/union, which removes empty mappings. Added /_alt suffix where needed.
Aaron Marcuse-Kubitza
03:23 PM Revision 4182: inputs: Move src subdir into main dir, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-src-subdir-into-main-dir>
Aaron Marcuse-Kubitza
02:02 PM Revision 4181: input.Makefile: $(tables): Allow datasource to specify custom import order in src/import_order.txt
Aaron Marcuse-Kubitza
01:29 PM Revision 4180: mappings/Veg+.terms.csv: growthForm: Documented source of standard terms
Aaron Marcuse-Kubitza
10:21 AM Revision 4179: inputs/SALVIAS*/src/1.organisms/map.csv: Removed no longer applicable comments, which related to mappings that were in effect long ago
Aaron Marcuse-Kubitza
10:09 AM Revision 4178: inputs/SALVIAS/src/2.stems/map.csv: Added comments from corresponding SALVIAS-CSV organisms columns
Aaron Marcuse-Kubitza
09:54 AM Revision 4177: inputs/SALVIAS*/src/1.organisms/map.csv: Habit: Mapped to new Veg+ habit term
Aaron Marcuse-Kubitza
09:53 AM Revision 4176: inputs/SALVIAS*/src/1.organisms/map.csv: Habit: Don't filter out values not part of the provided terms list, because such values should be flagged as invalid in the error maps rather than silently discarded. This also ensures that any valid values which are not part of the provided terms list are kept.
Aaron Marcuse-Kubitza
09:45 AM Revision 4175: mappings/Veg+-VegCore.csv: habit: Map to new verbatimGrowthForm since this field is not necessarily standardized
Aaron Marcuse-Kubitza
09:42 AM Revision 4174: mappings/Makefile: Veg+.cs-VegBIEN.csv: Join new Veg+-VegCore.to_self.csv (self-join), instead of Veg+-VegCore.csv, to VegCore-VegBIEN.csv, to support two-level chains of mappings in Veg+-VegCore.csv
Aaron Marcuse-Kubitza
09:40 AM Revision 4173: mappings/Veg+-VegCore.csv: /_alt pass through mappings: Removed comment because the two-level mapping propagates it to all fields ending in /_alt, even though it doesn't apply to them, causing the main VegBIEN map and several datasources' maps to change unnecessarily. Also, the comment is not completely accurate because /_alt pass throughs are now used primarily to support idempotent self-joins of Veg+-VegCore.csv.
Aaron Marcuse-Kubitza
09:21 AM Revision 4172: union: Don't eliminate duplicate rows based on matches between map_0's *output* column and map_1's input column, because union is now being used for self-joins and it is legitimate for a term to appear as both an input and an output
Aaron Marcuse-Kubitza
09:10 AM Revision 4171: sql_io.py: put_table(): MissingCastException: Use strings.repr_no_u() instead of strings.urepr() in order to remove the u in u'...' for Unicode strings
Aaron Marcuse-Kubitza

08/21/2012

09:48 AM Revision 4170: README.TXT: After a new import: Updated commands for new subdirs layout
Aaron Marcuse-Kubitza
09:42 AM Revision 4169: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
09:34 AM Revision 4168: mappings: Added autogen Veg+-VegCore.to_self.csv, which is Veg+-VegCore.csv joined to itself, and use it as an intermediate map to join to VegCore-VegBIEN.csv. This provides support for two-level chains of mappings in Veg+-VegCore.csv.
Aaron Marcuse-Kubitza
09:31 AM Revision 4167: mappings/Veg+-VegCore.csv: Changed output root to Veg+, to allow mappings/Veg+-VegCore.csv to be joined with itself idempotently, for supporting multi-level chains of mappings
Aaron Marcuse-Kubitza
09:27 AM Revision 4166: mappings/Veg+-VegCore.csv: Add pass through /_alt mapping for all terms in this map that are merged with _alt, to allow datasource to define custom mappings that don't pass through the default mapping. This also allows mappings/Veg+-VegCore.csv to be joined with itself idempotently, to support multi-level chains of mappings.
Aaron Marcuse-Kubitza
09:19 AM Revision 4165: mappings/Veg+-VegCore.csv: authorPlantCode: Added _alt suffix to create the correct priority
Aaron Marcuse-Kubitza
09:13 AM Revision 4164: union: Exclude empty rows from the output, so that empty mappings from map_0 aren't included when map_1 contains a non-empty mapping for the same term. Note that this causes "No non-empty join mapping" warnings to turn into "No join mapping".
Aaron Marcuse-Kubitza
09:08 AM Revision 4163: ci_map: Run join_union_sort in quiet mode so that it doesn't add lots of "No non-empty join mapping" warnings to the Comments column
Aaron Marcuse-Kubitza
09:06 AM Revision 4162: mappings/Veg+-VegCore.csv: scientificNameAuthor: Added scientificNameAuthorship mapping with /_alt/1, to ensure that it has priority over scientificNameAuthor and to ensure that it has an _alt suffix when a datasource contains both scientificNameAuthor and scientificNameAuthorship (such as SpeciesLink)
Aaron Marcuse-Kubitza
09:00 AM Revision 4161: inputs/SpeciesLink/src/specimens/map.csv: Added explicit _alt suffix when multiple terms map to the same place
Aaron Marcuse-Kubitza
08:58 AM Revision 4160: mappings/Veg+-VegCore.csv: scientificNameAuthor: Added scientificNameAuthorship mapping with /_alt/1, to ensure that it has priority over scientificNameAuthor and to ensure that it has an _alt suffix when a datasource contains both scientificNameAuthor and scientificNameAuthorship (such as SpeciesLink)
Aaron Marcuse-Kubitza
08:31 AM Revision 4159: inputs/ARIZ/src/specimens/map.csv: RelatedCatalogItem mappings: Added _alt suffixes
Aaron Marcuse-Kubitza
08:09 AM Revision 4158: union: Multi-support: When an input appears in both maps, treat an empty mapping as if it didn't exist so that it doesn't overwrite a non-empty mapping in the other map
Aaron Marcuse-Kubitza
07:51 AM Revision 4157: mappings/Makefile: Veg+.cs-VegBIEN.csv: Join Veg+-VegCore.csv to VegCore-VegBIEN.csv in quiet mode, to avoid adding "No non-empty join mapping" to the Comments column
Aaron Marcuse-Kubitza
07:50 AM Revision 4156: join: quiet mode: Turn off all warnings, not just "No input mapping" warnings. This is useful when join-unioning a synonymy to a primary map, which may have "No non-empty join mapping" for some terms but this should not be stored in the resulting map's Comments column.
Aaron Marcuse-Kubitza
07:30 AM Revision 4155: mappings/Makefile: Rewrapped lines
Aaron Marcuse-Kubitza
07:28 AM Revision 4154: mappings/Veg+-VegCore.csv: Added verbatimGrowthForm mapping
Aaron Marcuse-Kubitza
07:09 AM Revision 4153: mappings/Veg+.terms.csv: verbatimGrowthForm: Added comment that additional values come from SALVIAS. As other datasources' custom growth form values are added, they can be added to this comment.
Aaron Marcuse-Kubitza
07:00 AM Revision 4152: mappings/Veg+.terms.csv: Added verbatimGrowthForm
Aaron Marcuse-Kubitza
06:44 AM Revision 4151: schemas/vegbien.sql: locationdetermination: Added verbatimlatitude, verbatimlongitude, verbatimcoordinates
Aaron Marcuse-Kubitza
06:22 AM Revision 4150: schemas/functions.sql: Made aggregating functions polymorphic
Aaron Marcuse-Kubitza
06:16 AM Revision 4149: xml_func.py: Removed no longer used _collapse()
Aaron Marcuse-Kubitza
06:13 AM Revision 4148: xml_func.py: Removed no longer needed _if(), which has been translated to a SQL function
Aaron Marcuse-Kubitza
06:13 AM Revision 4147: schemas/functions.sql: Added _if()
Aaron Marcuse-Kubitza
06:12 AM Revision 4146: sql.py: function_exists(): Support overloaded functions
Aaron Marcuse-Kubitza
06:09 AM Revision 4145: sql.py: run_query(): Parse "more than one" errors as DuplicateExceptions
Aaron Marcuse-Kubitza
05:42 AM Revision 4144: xml_func.py: XML function specification documentation: Updated parameters
Aaron Marcuse-Kubitza
05:39 AM Revision 4143: xml_func.py: Removed no longer needed _eq(), which has been translated to a SQL function
Aaron Marcuse-Kubitza
05:38 AM Revision 4142: schemas/functions.sql: Added _eq()
Aaron Marcuse-Kubitza
05:37 AM Revision 4141: sql.py: run_query(): Parse "could not determine polymorphic type because input has type "unknown"" errors as MissingCastExceptions to type text. This adds support for polymorphic SQL functions whose parameters are anyelement, etc.
Aaron Marcuse-Kubitza
05:35 AM Revision 4140: sql_io.py: put_table(): sql.MissingCastException: Support unknown (None) columns, by casting all columns
Aaron Marcuse-Kubitza
05:30 AM Revision 4139: sql.py: MissingCastException: Support unknown (None) columns
Aaron Marcuse-Kubitza
05:29 AM Revision 4138: xml_dom.py: replace_with_text(): Support bool `new` values
Aaron Marcuse-Kubitza
04:22 AM Revision 4137: input.Makefile: Determine import order from sorted order of all non-hidden subdirs, instead of from fixed constant. This allows datasources to specify arbitrary tables, rather than being limited to 0.plots, 1.organisms, 2.stems, specimens.
Aaron Marcuse-Kubitza
04:14 AM Revision 4136: lib/common.Makefile: Added $(wildcard/) (needed because builtin $(wildcard) doesn't do / suffix correctly)
Aaron Marcuse-Kubitza
04:11 AM Revision 4135: input.Makefile: src/%/map.full.csv: Fixed bug where couldn't have $(srcMap) in prerequisites because this would for some reason cause src/%/map.full.csv to always be remade
Aaron Marcuse-Kubitza
03:47 AM Revision 4134: input.Makefile: Src maps cleanup: Fixed bug where src.csv was using .map.csv.last_cleanup instead of .src.csv.last_cleanup as its .last_cleanup file
Aaron Marcuse-Kubitza
03:30 AM Revision 4133: input.Makefile: Maps building: Moved src/%/map.full.csv after src/%/map.csv now that the filenames are fixed, so pattern matching order isn't an issue
Aaron Marcuse-Kubitza
03:27 AM Revision 4132: input.Makefile: Maps building: $(makeFullCsv): Removed no longer needed test for whether the $(coreSelfMap) exists, because Veg+'s self map always exists
Aaron Marcuse-Kubitza
03:12 AM Revision 4131: input.Makefile: Src maps cleanup: Fixed bug where src.csv was using .map.csv.last_cleanup instead of .src.csv.last_cleanup as its .last_cleanup file
Aaron Marcuse-Kubitza
02:34 AM Revision 4130: inputs/CTFS/src/1.organisms/: Added "_" prefix to prevent it from being treated as a data table subdir, before the DB export is mapped
Aaron Marcuse-Kubitza
02:20 AM Revision 4129: inputs/CTFS/src/ERD.jpg: Made it a symlink to "STRI2011_DB v5.jpg" instead of a copy of it
Aaron Marcuse-Kubitza
02:11 AM Revision 4128: Added inputs/CTFS/src/bci_01April2011.zip.url, which contains the original download URL for our copy of the CTFS database
Aaron Marcuse-Kubitza
01:31 AM Revision 4127: inputs/CTFS/src/: Added "_" prefix to scripts_to_drop_extra_tables subdir to prevent it from being treated as a data table subdir
Aaron Marcuse-Kubitza
01:10 AM Revision 4126: inputs/Makefile: Input data sync: Updated rsync filter for new subdirs layout
Aaron Marcuse-Kubitza
12:55 AM Revision 4125: README.TXT: Datasource setup: Updated for new subdirs layout
Aaron Marcuse-Kubitza
12:17 AM Revision 4124: input.Makefile: SVN: add: Updated svn:ignores for new subdirs layout
Aaron Marcuse-Kubitza
12:08 AM Revision 4123: inputs/Makefile: Import logs: Fixed bug where excluded install logs needed to be renamed according to the new name format (from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-log-files-into-subfolders>)
Aaron Marcuse-Kubitza

08/20/2012

11:59 PM Revision 4122: inputs: Moved log files into subfolders, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-log-files-into-subfolders>
Aaron Marcuse-Kubitza
11:01 PM Revision 4121: input.Makefile: Merged Installation and Staging tables sections into Staging tables installation, since no other installation is performed. Removed "import/" prefix from non-file import-related targets.
Aaron Marcuse-Kubitza
10:20 PM Revision 4120: inputs: Moved test outputs into subfolders, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-test-outputs-into-subfolders>
Aaron Marcuse-Kubitza
09:58 PM Revision 4119: input.Makefile: Import to VegBIEN: Removed extra test for $(inputFiles), because when there are no inputs, $(tables) will be empty and import will automatically do nothing. Removed no longer needed $(inputFiles).
Aaron Marcuse-Kubitza
08:46 PM Revision 4118: inputs: Moved maps into subfolders, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-maps-into-subfolders>
Aaron Marcuse-Kubitza
07:16 PM Revision 4117: inputs: Replaced Veg+ prefix with map on via maps, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Replace-Veg-prefix-with-map-on-via-maps>
Aaron Marcuse-Kubitza
06:39 PM Revision 4116: strings.py: concat(): Apply length limits by shrinking max_len by new raw_extra_len() of the strings. This also fixes a bug where multi-byte characters in str0 were not properly taken into account, leading to overly long strings. Added doc comment.
Aaron Marcuse-Kubitza
06:29 PM Revision 4115: strings.py: Added raw_extra_len()
Aaron Marcuse-Kubitza
06:17 PM Revision 4114: sql_gen.py: NoUnderlyingTableException: Take a (required) parameter for the item that had no underlying table, and provide this wherever a NoUnderlyingTableException is created
Aaron Marcuse-Kubitza
06:16 PM Revision 4113: strings.py: concat(): Perform substring operation on Unicode strings so that substring does not split Unicode characters. Still use to_raw_str() to calculate the str1 length because Unicode characters can be multi-byte, and length limits often apply to the byte length, not the character length.
Aaron Marcuse-Kubitza
06:13 PM Revision 4112: exc.py: add_msg(): Fixed bug where needed to convert the Unicode string back into a raw string because Python's top-level exception handler doesn't support Unicode strings as exception messages
Aaron Marcuse-Kubitza
05:22 PM Revision 4111: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza

08/17/2012

07:53 PM Revision 4110: inputs: Renamed stems table to 2.stems so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
Aaron Marcuse-Kubitza
07:49 PM Revision 4109: inputs: Renamed organisms table to 1.organisms so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
Aaron Marcuse-Kubitza
07:30 PM Revision 4108: inputs: Renamed plots table to 0.plots so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
Aaron Marcuse-Kubitza
07:30 PM Revision 4107: inputs: Renamed plots table to 0.plots so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
Aaron Marcuse-Kubitza
07:00 PM Revision 4106: input.Makefile: Mapping: If table subdir contains no input files, print warning instead of aborting. This situation occurs when renaming a version-controlled directory, whose previous version persists as an empty dir until committing.
Aaron Marcuse-Kubitza
06:41 PM Revision 4105: input.Makefile: Mapping: Removed no longer used $(<in) and test for it in $(map)
Aaron Marcuse-Kubitza
06:37 PM Revision 4104: input.Makefile: Mapping: $(map): Removed no longer used test for $(mapEnv)
Aaron Marcuse-Kubitza
05:50 PM Revision 4103: sql.py: run_query(): Exception handling: Fixed bug where PostgreSQL 9.1 PL/Python errors have a different format than PostgreSQL 9.0 which needs to be supported separately. This format was already supported in sql_gen.plpythonu_error_handler, but also needed to be supported for exceptions that propagate back to the client.
Aaron Marcuse-Kubitza
05:34 PM Revision 4102: inputs/SALVIAS-CSV/src/: Removed source files because they shouldn't be under version control. (They are synchronized via `make inputs/download`.)
Aaron Marcuse-Kubitza
05:15 PM Revision 4101: inputs: Moved src files into VegCSV subfolders (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#CSV-representation), with table suffixes removed, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders>
Aaron Marcuse-Kubitza
04:26 PM Revision 4100: util.py: dict_subset(): Fall back to using dict when OrderedDict is not available, in order to support making the maps on nimoy
Aaron Marcuse-Kubitza
04:02 PM Revision 4099: mappings/: Removed now-inaccurate ".stems" suffix from VegX-VegCore.stems.csv, which actually applied to all tables
Aaron Marcuse-Kubitza
03:59 PM Revision 4098: mappings/: Removed no longer used ".specimens" suffix from maps, which is now the same for all maps
Aaron Marcuse-Kubitza
03:52 PM Revision 4097: mappings/: Removed no longer used plots, organisms, and stems maps, which were copies of the specimens map
Aaron Marcuse-Kubitza
03:48 PM Revision 4096: input.Makefile: Core maps: Always use the specimens "table", since there are now no longer separate mappings for different tables, and the other tables' maps in mappings/ are merely copies of the specimens table's map
Aaron Marcuse-Kubitza
03:30 PM Revision 4095: input.Makefile: Removed no longer used custom via maps code, so that map files no longer need a prefix (which is always the same) specifying that they map through Veg+. Veg+ thus serves as the single gateway to VegBIEN, which avoids ever again having to maintain two copies of the mappings, as was the case when DwC and VegX XPaths were separate gateways. This will assist in untying the complex mapping logic in input.Makefile from file naming conventions in mappings/, and simplify the task of grouping each map with the CSV it maps.
Aaron Marcuse-Kubitza
03:14 PM Revision 4094: input.Makefile: Removed no longer used DB inputs section, because all of our inputs are either CSV or (rarely) XML. This removes a significant amount of dead code that will make it easier to refactor input.Makefile to use custom CSV import orders.
Aaron Marcuse-Kubitza
02:51 PM Revision 4093: mappings/Veg+-VegCore.specimens.csv: Added mappings for miscellaneous terms
Aaron Marcuse-Kubitza
02:45 PM Revision 4092: mappings/Veg+.terms.csv: Added miscellaneous terms
Aaron Marcuse-Kubitza
12:52 PM Revision 4091: to_do/: svn:ignore OpenOffice lock files
Aaron Marcuse-Kubitza
12:50 PM Revision 4090: inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) went back down to 9 hours after replacing the slower _merge with _alt.
Aaron Marcuse-Kubitza

08/16/2012

08:34 PM Revision 4089: Added new autogen mappings/VegCore.self.specimens.csv (not currently used)
Aaron Marcuse-Kubitza
08:30 PM Revision 4088: Merged DwC (including DwC1) and VegCSV mappings into new Veg+ schema. This involves replacing occurrences of DwC and VegCSV with Veg+ (or sometimes VegCore) everywhere, as described in <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV-DwC_merging>.
Aaron Marcuse-Kubitza
08:18 PM Revision 4087: README.TXT: Schema changes: Updated filenames of PDF ERD exports
Aaron Marcuse-Kubitza
08:15 PM Revision 4086: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
08:12 PM Revision 4085: xpath.py: parse(): _value(): Support '+' as a word character that doesn't need to be quoted
Aaron Marcuse-Kubitza
06:54 PM Revision 4084: intersect: Fixed bug where test for ignore option needed to be removed, because ignore is not supported by this program
Aaron Marcuse-Kubitza
06:45 PM Revision 4083: util.py: list_subset(): Fixed bug where using '+' to append the rest of the list didn't work if '+' was the first index, because max() cannot be called on an empty list
Aaron Marcuse-Kubitza
05:14 PM Revision 4082: mappings/DwC2-VegBIEN.specimens.csv: Added VegCSV mappings, to enable use of one VegCSV-VegBIEN mapping for specimens and plots data
Aaron Marcuse-Kubitza
05:12 PM Revision 4081: inputs/XAL/maps/DwC.specimens.csv: Remapped FieldNumber to recordNumber because this historical DwC term (http://rs.tdwg.org/dwc/terms/history/index.htm#fieldNumber-2009-04-24) has close to the same meaning as recordNumber, but not the same meaning as the current fieldNumber term
Aaron Marcuse-Kubitza
04:55 PM Revision 4080: inputs/SpeciesLink/maps/DwC.specimens.csv: Remapped fieldNumber to recordNumber because term usage was inconsistent with DwC definition. Datasources often confuse this term, because it seems like the collection number, but is actually the author code for the *event* (VegBank's authorObsCode).
Aaron Marcuse-Kubitza
04:28 PM Revision 4079: mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Added additional VegCSV mappings for mergability. taxonoccurrence.authortaxoncode: Added alternative mappings from VegCSV for mergability.
Aaron Marcuse-Kubitza
04:21 PM Revision 4078: xml_func.py: simplify(): Apply pass-through optimizations for _if statements with no condition (which means false). This faciliates automated testing after an _if statement has been added, because the put template provided as part of the automated test will only change for those datasources that actually have a condition entry for the _if statement, which greatly reduces the number of tests that need to be accepted. (Note that the path before the _if will still be included as an empty path if there are no other mappings to that table, because the _if statement does not surround it.)
Aaron Marcuse-Kubitza
02:26 PM Revision 4077: mappings/VegCSV-VegBIEN.specimens.csv: Added DwC mappings, to enable use of one VegCSV-VegBIEN mapping for specimens and plots data
Aaron Marcuse-Kubitza
02:22 PM Revision 4076: schemas/vegbien.sql: Moved collectionnumber from specimenreplicate to plantobservation to replace authorplantcode, since these terms are used analogously in plots and specimens data. This code is really the DwC recordNumber (VegBIEN collectionnumber), which "serves as a link between field notes and an Occurrence record, such as a specimen [or plots data] collector's number" (http://rs.tdwg.org/dwc/terms/#recordNumber). Also, this prevents a specimenreplicate from incorrectly being created when plots data provides an authorplantcode.
Aaron Marcuse-Kubitza
01:55 PM Revision 4075: mappings/DwC2-VegBIEN.specimens.csv: Mapped individualID for mergability with VegCSV
Aaron Marcuse-Kubitza
01:49 PM Revision 4074: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: Split occurrenceID into occurrenceID and individualID, where individualID refers to the plant in plots data and occurrenceID refers to the specimen in specimens data. This prevents plant sourceaccessioncodes from being mapped to the specimenreplicate, which was messing up stems mappings for the parent plantobservation. It also avoids mapping the specimenreplicate sourceaccessioncode to additional tables where it isn't needed. (Note that occurrenceID is needed for location to ensure that each specimen gets its own location to make locationdeterminations on. Everything else is directly or indirectly scoped by location when its own sourceaccessioncode isn't specified.)
Aaron Marcuse-Kubitza
01:33 PM Revision 4073: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Removed catalogNumber mapping because the catalogNumber applies only to the specimen, not to the occurrence, especially in plots data
Aaron Marcuse-Kubitza
01:14 PM Revision 4072: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Map everything except occurrenceID (which is globally unique) to new authortaxoncode, which only needs to be unique within the locationevent
Aaron Marcuse-Kubitza
12:59 PM Revision 4071: schemas/vegbien.sql: taxonoccurrence: Renamed taxonoccurrence_locationevent_1_to_1 to taxonoccurrence_unique_within_locationevent and added new authortaxoncode to it
Aaron Marcuse-Kubitza
12:57 PM Revision 4070: schemas/vegbien.sql: taxonoccurrence: Added authortaxoncode to store unique keys that are unique within the locationevent rather than within the datasource
Aaron Marcuse-Kubitza
12:43 PM Revision 4069: inputs/SALVIAS-CSV/maps/VegCSV.organisms.csv: Added _alt to height_m, stem_height_m to choose between them when both are specified (rather than having bin/map choose their priority order based on their order in the map). Note that when both of the heights are specified, they are always either the same, or height_m is invalid (see <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Some-organisms-have-one-stem-but-different-heights-in-the-organisms-and-stems-tables>).
Aaron Marcuse-Kubitza
12:39 PM Revision 4068: bin/map: collision_suffix: Setting back to _alt to test if _merge caused the SpeciesLink slowdown. SpeciesLink contains a huge number of equivalent columns due to each DwC term being present with namespaces for all versions of the DwC schema, and these columns can be combined either using _alt or _merge. _merge is only useful if the values in different versions of the same DwC field are *different*, which is not likely the case.
Aaron Marcuse-Kubitza
12:29 PM Revision 4067: inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) doubled, to 16 hours, most likely due to replacing _alt with the slower _merge, which preserves more input data.
Aaron Marcuse-Kubitza

08/15/2012

11:30 AM Revision 4066: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: occurrenceID: Mapped to location.authorlocationcode instead of sourceaccessioncode so that it would not override any location- or event-related IDs in location.authorlocationcode merely by being mapped to the sourceaccessioncode field (which takes precedence over the authorlocationcode when specified)
Aaron Marcuse-Kubitza
10:43 AM Revision 4065: mappings/VegCSV-VegBIEN.specimens.csv: occurrenceID: Mapped to specimenreplicate.sourceaccessioncode for mergability with DwC
Aaron Marcuse-Kubitza
09:14 AM Revision 4064: mappings/VegCSV-VegBIEN.specimens.csv: Mapped voucherType to indirect voucher _if statements' conditions
Aaron Marcuse-Kubitza
09:02 AM Revision 4063: mappings/VegCSV-VegBIEN.specimens.csv: locationID: location.sourceaccessioncode mapping: Added /_alt suffix for mergability with DwC
Aaron Marcuse-Kubitza
08:53 AM Revision 4062: mappings/DwC2-VegBIEN.specimens.csv: collectionID: Mapped to location.authorlocationcode as merge with collectionCode, the same way as it is for specimenreplicate.collectioncode_dwc
Aaron Marcuse-Kubitza
08:23 AM Revision 4061: schemas/vegbien.sql: location: location_unique_within_datasource_by_authorlocationcode unique index: Added `parent_id IS NULL` condition so that an authorlocationcode is not unintentionally treated as globally unique when a parent location is available (which implies that the authorlocationcode is a subplot code)
Aaron Marcuse-Kubitza
08:20 AM Revision 4060: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Added location.authorlocationcode mapping for mergability with DwC
Aaron Marcuse-Kubitza
08:13 AM Revision 4059: mappings/DwC2-VegBIEN.specimens.csv: location.authorlocationcode mappings: Added /_alt/3 for mergability with VegCSV mappings to same field
Aaron Marcuse-Kubitza
08:05 AM Revision 4058: mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Wrapped all mappings in direct voucher _if for mergability with VegCSV
Aaron Marcuse-Kubitza
07:57 AM Revision 4057: mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Moved direct/indirect voucher _if inwards to wrap just the value of catalognumber_dwc, not the catalognumber_dwc field node, to match the corresponding VegCSV mapping
Aaron Marcuse-Kubitza
07:48 AM Revision 4056: mappings/DwC2-VegBIEN.specimens.csv: Replaced _alt with _merge where applicable to avoid losing source data on import when multiple fields collide
Aaron Marcuse-Kubitza
07:46 AM Revision 4055: mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up using `make mappings/`
Aaron Marcuse-Kubitza
07:18 AM Revision 4054: schemas/functions.sql: join_strs_transform(): Use STRICT optimization to avoid needing to manually check if the state value or input value is NULL (http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html#AEN51596)
Aaron Marcuse-Kubitza
07:15 AM Revision 4053: schemas/functions.sql: join_strs(), join_strs_transform(): Reversed order of params to enable strict optimization, which replaces the state value with the *first* parameter, which used to be the delimiter (http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html#AEN51596)
Aaron Marcuse-Kubitza
07:07 AM Revision 4052: Renamed join_strs_transform_preserve_empty() to join_strs_transform() now that there are no other join_strs_transform_...() functions
Aaron Marcuse-Kubitza
07:06 AM Revision 4051: schemas/functions.sql: Removed no longer used join_strs_transform_fold_empty()
Aaron Marcuse-Kubitza
07:06 AM Revision 4050: schemas/functions.sql: join_strs() aggregate: Use join_strs_transform_preserve_empty() as an optimization because all our data has already had '' replaced with NULL by sql_io.cleanup_table() in csv2db. This will help speed up _merges now that they are performed on a large scale in the slowest datasource, SpeciesLink.
Aaron Marcuse-Kubitza
07:02 AM Revision 4049: bin/map: collision_suffix: Changed to use _merge instead of _alt to avoid losing source data on import when multiple fields collide
Aaron Marcuse-Kubitza
06:58 AM Revision 4048: bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed
Aaron Marcuse-Kubitza
06:56 AM Revision 4047: bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed
Aaron Marcuse-Kubitza
06:52 AM Revision 4046: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence.sourceaccessioncode mappings: Added catalogNumber mapping, which takes precendence over recordNumber and is applicable to specimens data and direct vouchers. recordNumber should only be used as a last resort (before the taxon name) because this is collector-assigned and often not unique within anything.
Aaron Marcuse-Kubitza
06:34 AM Revision 4045: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Moved direct/indirect voucher _ifs inwards to wrap just the value of catalognumber_dwc, not the catalognumber_dwc field node, so that a future SQL function implementation of _if only needs to concern itself with returning one value or another, not with handling XML subtrees. The previous moving of the _ifs in r3942 was intended to effect this, but the _ifs weren't moved in far enough to wrap just the *value*.
Aaron Marcuse-Kubitza
06:21 AM Revision 4044: mappings/VegCSV-VegBIEN.specimens.csv: eventDate mappings: Removed collectiondate mapping because the eventDate refers only to the plot event. Added /_alt suffixes for mergability with DwC.
Aaron Marcuse-Kubitza
06:15 AM Revision 4043: mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Split eventDate into eventDate and dateCollected, where eventDate refers only to the date of the sampling event, but dateCollected also refers to the date the particular specimen was collected. (This distinction is important in merging with VegCSV, because in plots data, these two fields are distinct.) Remapped datasources with dateCollected-related fields to new dateCollected.
Aaron Marcuse-Kubitza
05:55 AM Revision 4042: bin/map: Run new xml_func.simplify() on the root before printing the put template, so that _alts and _merges with only one element for the current datasource will be printed in their simplified form (with the _alt/_merge removed). This faciliates automated testing after an _alt/_merge suffix has been added, because the put template provided as part of the automated test will only change for those datasources that actually have an entry for both mappings, which greatly reduces the number of tests that need to be accepted.
Aaron Marcuse-Kubitza
05:51 AM Revision 4041: xml_func.py: Added simplify()
Aaron Marcuse-Kubitza
05:45 AM Revision 4040: xpath.py: put_obj(): Use new get_values(), so that the returned nodes are not modified by XML tree transformations, such as those performed by xml_func.process()
Aaron Marcuse-Kubitza
05:43 AM Revision 4039: Added get_values()
Aaron Marcuse-Kubitza
05:41 AM Revision 4038: xml_dom.py: is_empty(): Treat whitespace-only text nodes (including text nodes containing empty strings) as empty. This will also support None equivalents in text nodes, because they are isspace_none_str, which is considered whitespace.
Aaron Marcuse-Kubitza
05:36 AM Revision 4037: xml_func.py: _map(): Don't remove None params, because are valid values and must be supported. This will become an issue once empty strings in text nodes are considered equivalent to None.
Aaron Marcuse-Kubitza
05:33 AM Revision 4036: xml_func.py: _units(): Don't remove None params, because are valid values and must be supported. This will become an issue once empty strings in text nodes are considered equivalent to None.
Aaron Marcuse-Kubitza
05:25 AM Revision 4035: xml_func.py: _name(): Fixed bug where needed to pass None values through and handle no name parts to properly support NULL propagation
Aaron Marcuse-Kubitza
05:08 AM Revision 4034: xml_dom.py: value(), set_value(): Use new strings.isspace_none_str as sentinel None equivalent, to support cloning text nodes containing a sentinel None
Aaron Marcuse-Kubitza
05:06 AM Revision 4033: xml_dom.py: value(), set_value(): Use new strings.isspace_none_str as sentinel None equivalent, to support cloning text nodes containing a sentinel None
Aaron Marcuse-Kubitza
05:04 AM Revision 4032: strings.py: Added isspace_none_str to support clone-safe sentinel str values that pass isspace()
Aaron Marcuse-Kubitza
04:51 AM Revision 4031: xml_dom.py: is_whitespace(): Also consider empty text nodes to be whitespace
Aaron Marcuse-Kubitza
04:47 AM Revision 4030: xml_dom.py: is_whitespace(): Support text nodes whose value() is None by using .nodeValue instead
Aaron Marcuse-Kubitza
04:44 AM Revision 4029: xml_dom.py: set_value(): Don't set the value of a text node to None by removing it, because this prevents the node from being reused. Instead use a sentinel string value to denote None, and map to and from it.
Aaron Marcuse-Kubitza
04:40 AM Revision 4028: strings.py: Added none_str and helper class NonInternedStr to support sentinel str values
Aaron Marcuse-Kubitza
04:19 AM Revision 4027: xml_dom.py: set_value(): Support setting the value of a text node to None, by removing it
Aaron Marcuse-Kubitza
03:44 AM Revision 4026: Removed trailing whitespace on non-empty lines
Aaron Marcuse-Kubitza
03:40 AM Revision 4025: sql_io.py: put_table(): DuplicateKeyException: is_literals: Fixed bug where sql.select() needed to select on just the join_cols, not the whole mapping
Aaron Marcuse-Kubitza
03:14 AM Revision 4024: xml_func.py: process(): Removed support for no longer used structural functions
Aaron Marcuse-Kubitza
03:13 AM Revision 4023: xml_func.py: Removed no longer used structural functions
Aaron Marcuse-Kubitza
03:05 AM Revision 4022: mappings/for_review/DwC2-VegBIEN.specimens.fields.csv: input root: Removed DwC XML path info since DwC is now a CSV schema
Aaron Marcuse-Kubitza
02:57 AM Revision 4021: mappings/DwC2-VegBIEN.specimens.csv: eventDate: Also map to obsstartdate/obsenddate, since the collectiondate is also the event date for specimens data, and for mergability with VegCSV
Aaron Marcuse-Kubitza
02:24 AM Revision 4020: mappings/VegCSV-VegBIEN.specimens.csv: eventDate: Added mappings to obsstartdate/obsenddate, since users of this field (currently SALVIAS census_date) intend it as the plot event's date. Keep the mapping to collectiondate because a non-range plot event date is also the collectiondate of all organisms in that plot event.
Aaron Marcuse-Kubitza
02:05 AM Revision 4019: schemas/py_functions.sql: parse_date_range(): Always return a value for end date, even if string is not a date range. This enables using _dateRangeEnd() as a filter function on anything intended as an end date.
Aaron Marcuse-Kubitza
01:53 AM Revision 4018: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: eventDate: collectiondate mapping: Removed _dateRangeStart filter because the eventDate (obsstartdate) is only valid as the date the *specimen was collected* if it is a single date, not a date range. (It is still valid as the obsstartdate/obsenddate if it's a range.)
Aaron Marcuse-Kubitza
01:49 AM Revision 4017: mappings/Veg+.terms.csv: Added dateCollected
Aaron Marcuse-Kubitza
12:45 AM Revision 4016: input via maps: Removed _date/date filter from date fields because the main mappings now have _date around all dates, so this filter is redundant
Aaron Marcuse-Kubitza
12:39 AM Revision 4015: inputs/SALVIAS-CSV/maps/VegCSV.organisms.csv: census_date: Don't map directly to the year, as this field is allowed to be a full date even though our data sample contains only years. Note that _date/date will automatically detect plain years and treat them as years, and so will casts to timestamp.
Aaron Marcuse-Kubitza
12:33 AM Revision 4014: inputs/SALVIAS*/maps/VegCSV.organisms.csv: census_date: Documented that this is for the subplot, not the organism, as all organisms in a subplot have the same value for it
Aaron Marcuse-Kubitza
12:09 AM Revision 4013: mappings/DwC2-VegBIEN.specimens.csv: verbatimLatitude/verbatimLongitude: Fixed mappings to use _alt/2 instead of _alt/1 to avoid collisions with decimalLatitude/decimalLongitude
Aaron Marcuse-Kubitza

08/14/2012

11:54 PM Revision 4012: schemas/functions.sql: _merge(): Changed sort_orders to match the $-variable name instead of the function parameter name, so each line of the VALUES clause would use the same number for both
Aaron Marcuse-Kubitza
11:52 PM Revision 4011: schemas/functions.sql: _merge(): Filter out NULL values as optimization so DISTINCT ON only has to consider non-NULL values
Aaron Marcuse-Kubitza
11:48 PM Revision 4010: schemas/functions.sql: join_strs(): Return NULL if all strings were NULL or ''. This fixes unexpected behavior in _merge() where all elements are NULL but the return value is non-NULL.
Aaron Marcuse-Kubitza
11:32 PM Revision 4009: schemas/functions.sql: Added join_strs_transform_preserve_empty() and use it in join_strs_transform_fold_empty()
Aaron Marcuse-Kubitza
11:25 PM Revision 4008: schemas/functions.sql: Renamed join_strs_() to join_strs_transform_fold_empty() for clarity and to indicate that it's for use by the join_strs() aggregate
Aaron Marcuse-Kubitza
11:11 PM Revision 4007: mappings/DwC2-VegBIEN.specimens.csv: recordNumber: Added VegCSV mappings for it
Aaron Marcuse-Kubitza
10:51 PM Revision 4006: mappings/DwC2-VegBIEN.specimens.csv: occurrenceID: Added VegCSV mappings for it
Aaron Marcuse-Kubitza
10:44 PM Revision 4005: mappings/DwC2-VegBIEN.specimens.csv: mappings to /location/sourceaccessioncode: Added _alt to prioritize them properly
Aaron Marcuse-Kubitza
10:39 PM Revision 4004: inputs/UNCC/maps/DwC.specimens.csv: herbarium: Fixed mapping to go to institutionCode instead of collectionCode
Aaron Marcuse-Kubitza
10:36 PM Revision 4003: mappings/DwC2-VegBIEN.specimens.csv: Remapped institutionCode/collectionCode/catalogNumber location mappings to location.authorlocationcode
Aaron Marcuse-Kubitza
09:50 PM Revision 4002: schemas/vegbien.ERD.mwb: Reset methodtaxonclass lines so that only one needs to be repositioned after syncing with the schema
Aaron Marcuse-Kubitza
09:31 PM Revision 4001: mappings/VegCSV-VegBIEN.specimens.csv: locationID: Removed mapping to locationevent.sourceaccessioncode, because locationID relates to the plot, not the plot event. (The locationevent is scoped by the location when the sourceaccessioncode and authoreventcode are not specified, so duplicate elimination will still occur correctly.)
Aaron Marcuse-Kubitza
09:27 PM Revision 4000: mappings/DwC2-VegBIEN.specimens.csv: Mapped locationID, for mergability with VegCSV
Aaron Marcuse-Kubitza
09:04 PM Revision 3999: mappings/VegCSV-VegBIEN.specimens.csv: plotName: Removed authoreventcode mapping because plotName relates to the plot, not the plot event. (The locationevent is scoped by the location when the authoreventcode is not specified, so duplicate elimination will still occur correctly.) Instead map only authoreventcode-related fields (currently CVS's authorObsCode) to authoreventcode, via DwC's (confusingly-named) fieldNumber ("An identifier given to the event in the field").
Aaron Marcuse-Kubitza
08:40 PM Revision 3998: schemas/vegbien.sql: locationevent: locationevent_unique_within_location: Added authoreventcode to index. It was already in the locationevent_unique_within_*parent*_by_authoreventcode index, but also needed to be in the no-parent (non-subplot) index. This fixes locationevent duplicate elimination when a locationevent sourceaccessioncode is not specified.
Aaron Marcuse-Kubitza
08:27 PM Revision 3997: schemas/vegbien.sql: location: location_unique_within_datasource unique index: Added COALESCE() and `WHERE sourceaccessioncode IS NOT NULL` now that sourceaccessioncode is nullable. Renamed location_unique_within_datasource and location_unique_authorlocationcode to location_unique_within_datasource_by_... to show that both are alternatives for globally unique keys. schemas/vegbien.ERD.mwb: Moved elements slightly to reduce the number of lines that need to be repositioned after syncing with the schema.
Aaron Marcuse-Kubitza
07:35 PM Revision 3996: mappings/DwC2-VegBIEN.specimens.csv: Mapped verbatimElevation and samplingProtocol, for mergability with VegCSV
Aaron Marcuse-Kubitza
07:12 PM Revision 3995: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza

08/13/2012

06:12 PM Revision 3994: mappings/VegCSV-VegBIEN.specimens.csv: location unique keys: Map to a new parent location for the location, instead of a parent locationevent for the locationevent. This much simpler mapping (which does not require _alt or _merge) is possible now that the necessary unique indexes have been set up.
Aaron Marcuse-Kubitza
05:52 PM Revision 3993: Regenerated vegbien.ERD exports, now including both pages in vegbien.ERD.core.pdf. Renamed schemas/vegbien.ERD.core.pdf to vegbien.ERD.pdf because it now includes the full schema.
Aaron Marcuse-Kubitza
05:48 PM Revision 3992: schemas/filter_ERD.csv: Removed extraneous lines to improve readability. schemas/vegbien.ERD.mwb: Reconfigured elements to put only the most important ones in the core subset (the top page).
Aaron Marcuse-Kubitza
03:59 PM Revision 3991: schemas/vegbien.sql: location: Made sourceaccessioncode optional if authorlocationcode is specified, since either of these fields can now serve as the unique key
Aaron Marcuse-Kubitza
03:39 PM Revision 3990: mappings/VegCSV-VegBIEN.specimens.csv: Map to new location.authorlocationcode
Aaron Marcuse-Kubitza
03:23 PM Revision 3989: schemas/vegbien.sql: location: Support uniquely specifying a location by its authorlocationcode
Aaron Marcuse-Kubitza
03:13 PM Revision 3988: schemas/vegbien.sql: location: Added authorlocationcode to unique indexes
Aaron Marcuse-Kubitza
02:58 PM Revision 3987: schemas/vegbien.sql: location: Added authorlocationcode
Aaron Marcuse-Kubitza
02:45 PM Revision 3986: schemas/vegbien.sql: location: Added location_unique_within_parent_by_coords unique index that uses COALESCE(), replacing location_unique_subplot_coords unique constraint
Aaron Marcuse-Kubitza
02:07 PM Revision 3985: mappings/VegCSV-VegBIEN.specimens.csv: maximumElevationInMeters: Fixed bug where _rangeEnd filter needed to be removed because this only works on a field which can be either a range or the start of a range, such as minimumElevationInMeters (on an end-of-range field, a single value will be removed completely). Added _alt for mergeability with DwC. minimumElevationInMeters: Added elevationrange-to mapping using _rangeEnd for mergeability with DwC.
Aaron Marcuse-Kubitza
01:53 PM Revision 3984: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: minimum/maximumElevationInMeters, minimum/maximumDepthInMeters: Remove any "ca." prefix from value. Doing this on all elevation/depth fields will make the DwC and VegCSV mappings mergeable.
Aaron Marcuse-Kubitza
01:04 PM Revision 3983: mappings/VegCSV-VegBIEN.specimens.csv: locality: Mapped using same XPath as DwC, to enable merging
Aaron Marcuse-Kubitza
01:01 PM Revision 3982: mappings/DwC2-VegBIEN.specimens.csv: Mapped individualCount. This will enable merging with VegCSV.
Aaron Marcuse-Kubitza
12:51 PM Revision 3981: mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up. This still needs to be run manually with `make mappings/` because the derived maps are symlinks rather than make targets, so make never touches the non-derived map and doesn't run its recipe in the automated tests
Aaron Marcuse-Kubitza
12:48 PM Revision 3980: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxondetermination mappings: Removed iscurrent=true because it is not the role of the mappings to specify which taxondetermination is the current one. Eventually, the order of the determinations will need to be specified using a sort # or similar, and the DB will select the current one for queries to use. Ensure all mappings have :[isoriginal=true] so that they match up between DwC and VegCSV.
Aaron Marcuse-Kubitza
12:35 PM Revision 3979: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxondetermination mappings: Ensure all mappings have :[iscurrent=true] or equivalent so that they sort together, and match up between DwC and VegCSV
Aaron Marcuse-Kubitza
12:19 PM Revision 3978: mappings/VegCSV-VegBIEN.specimens.csv: individualCount: Disambiguated alternate meaning as stem count by changing stem count fields to map to new stemCount term, which maps to plantobservation.stemcount
Aaron Marcuse-Kubitza
12:12 PM Revision 3977: mappings/Veg+.terms.csv: Added stemCount
Aaron Marcuse-Kubitza
12:10 PM Revision 3976: mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up
Aaron Marcuse-Kubitza
12:01 PM Revision 3975: mappings/DwC2-VegBIEN.specimens.csv: Mapped identificationQualifier. This will enable merging with VegCSV.
Aaron Marcuse-Kubitza
11:59 AM Revision 3974: mappings/VegCSV-VegBIEN.specimens.csv: identificationQualifier (taxon fit): Removed mapping to prefix of binomial field, since that field should just contain what the datasource said was the binomial. It's TNRS's job to concatenate the taxon fit, etc. with the binomial and other name parts for name resolution.
Aaron Marcuse-Kubitza
11:27 AM Revision 3973: mappings/DwC2-VegBIEN.specimens.csv: fieldNumber: Remapped to authoreventcode because this is (confusingly) the author code for the *event*, according to the DwC definition
Aaron Marcuse-Kubitza
11:22 AM Revision 3972: inputs/NY, ARIZ: FieldNumber: Remapped to recordNumber because term usage was inconsistent with DwC definition. Datasources sometimes confuse this term, because it seems like the collection number, but is actually the author code for the *event* (VegBank's authorObsCode).
Aaron Marcuse-Kubitza
11:20 AM Revision 3971: schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated
Aaron Marcuse-Kubitza
10:58 AM Revision 3970: mappings/DwC1-DwC2.specimens.csv: Removed Source column and source-related comments because this information is now maintained in mappings/Veg+.terms.csv
Aaron Marcuse-Kubitza
10:55 AM Revision 3969: mappings/DwC2-VegBIEN.specimens.csv: Removed Source column because this information is now maintained in mappings/Veg+.terms.csv
Aaron Marcuse-Kubitza
10:49 AM Revision 3968: mappings/VegCSV-VegBIEN.specimens.csv: Removed Source column and source-related comments because this information is now maintained in mappings/Veg+.terms.csv
Aaron Marcuse-Kubitza
10:44 AM Revision 3967: Added mappings/Veg+.terms.csv, which will serve the purpose of listing all available terms with their source. This will remove the need to store the sources in the mappings, where they are out of place and difficult to maintain during refactoring.
Aaron Marcuse-Kubitza
10:37 AM Revision 3966: Added mappings/Veg+.terms.csv, which will serve the purpose of listing all available terms with their source. This will remove the need to store the sources in the mappings, where they are out of place and difficult to maintain during refactoring.
Aaron Marcuse-Kubitza
10:19 AM Revision 3965: mappings/VegX-VegCSV.stems.csv: Removed Comments and Source columns because this information is now maintained in mappings/VegCSV-VegBIEN.specimens.csv. This will simplify later VegCSV refactoring, because the Comments and Source columns will not need to be changed along with the VegCSV column.
Aaron Marcuse-Kubitza
10:02 AM Revision 3964: mappings/VegCSV-VegBIEN.specimens.csv: Removed Comments and Source columns because this information is now maintained in mappings/VegCSV-VegBIEN.specimens.csv. This will simplify later VegCSV refactoring, because the Comments and Source columns will not need to be changed along with the VegCSV column.
Aaron Marcuse-Kubitza
10:00 AM Revision 3963: mappings/VegCSV-VegBIEN.specimens.csv: Changed plotID to locationID and parentPlotID to parentLocationID to use DwC-related terms
Aaron Marcuse-Kubitza
09:31 AM Revision 3962: mappings/DwC2-VegBIEN.specimens.csv: collectionID: Fixed mapping to point to collectioncode_dwc instead of collectionnumber, as this is an ID *of* the collection rather than *within* it
Aaron Marcuse-Kubitza
09:15 AM Revision 3961: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza

08/10/2012

10:29 PM Revision 3960: schemas: Renamed vegbien.ERD.pdf to vegbien.ERD.1_pg.pdf since it's not the primary PDF that should be used, due to its slow load time
Aaron Marcuse-Kubitza
10:26 PM Revision 3959: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
10:23 PM Revision 3958: schemas/vegbien.sql: specimenreplicate: specimenreplicate_plantobservation_1_to_1: Only apply when sourceaccessioncode and catalognumber_dwc are NULL, in order to support multiple specimenreplicates for one plant in plots data. specimenreplicate_unique_catalognumber: Added plantobservation_id, so that catalognumber_dwc (a sort of authorSpecimenCode for plots data) only needs to be unique within a plant. Eventually, we will want to migrate the mappings so that collectionnumber is used for this purpose instead.
Aaron Marcuse-Kubitza
10:16 PM Revision 3957: schemas/vegbien.sql: specimenreplicate: Made plantobservation_id optional again, since indirect vouchers do create specimenreplicates without a parent plantobservation. schemas/vegbien.ERD.mwb: Fixed lines.
Aaron Marcuse-Kubitza
10:02 PM Revision 3956: schemas/vegbien.sql: specimenreplicate: Made plantobservation_id required, since that is now the parent table fkey
Aaron Marcuse-Kubitza
10:00 PM Revision 3955: schemas/vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
09:51 PM Revision 3954: schemas/vegbien.ERD.mwb: Adjusted lines. Adjusted position of locationdetermination to put location directly next to locationevent. Expanded location to fill newly-available space.
Aaron Marcuse-Kubitza
09:37 PM Revision 3953: schemas/vegbien.sql: locationevent: Renamed authorlocationcode to authoreventcode to be consistent with the table name. Note that for our current datasources, the plot = the plot event, so the authoreventcode and authorlocationcode/authorPlotCode will be the same.
Aaron Marcuse-Kubitza
09:22 PM Revision 3952: mappings/VegCSV-VegBIEN.specimens.csv: Changed VegCSV term fieldNumber (from DwC) to recordNumber to be consistent with the TDWG meaning of fieldNumber, which defines it as the author code for the *event*, not the organism (what VegBIEN calls the authorlocationcode and VegBank calls the authorObsCode)
Aaron Marcuse-Kubitza
08:47 PM Revision 3951: mappings/VegCSV-VegBIEN.specimens.csv: Comments: Removed no longer applicable comments about XPath syntax added to affect sort order
Aaron Marcuse-Kubitza
08:35 PM Revision 3950: mappings/VegCSV-VegBIEN.specimens.csv: height: Removed mapping to plantobservation.overallheight, since the height is a stem field rather than a plant field. Note that a height in the *organisms* table will be mapped to the height in a single stemobservation for that plant, with NULL sourceaccessioncode and authorstemcode. Note also that this change is possible because no mapped datasource yet provides a valid overallheight with multiple stems or that differs from its single stem's height. (Although SALVIAS sometimes provides both a stem height and an organism height, that height is always either the same, or the organism height is invalid. See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Some-organisms-have-one-stem-but-different-heights-in-the-organisms-and-stems-tables>.)
Aaron Marcuse-Kubitza
06:56 PM Revision 3949: mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Removed obsolete mapping to growthform, since growthforms and cultivated/native information are no longer merged into one field in VegBIEN (which they were when this mapping was created)
Aaron Marcuse-Kubitza
06:18 PM Revision 3948: mappings/VegCSV-VegBIEN.specimens.csv: decimalLatitude/decimalLongitude: Added _nullIf suffix for mergability with VegCSV-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
06:10 PM Revision 3947: mappings/VegCSV-VegBIEN.specimens.csv: coordinateUncertaintyInMeters: Added _noCV suffix for mergability with VegCSV-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
06:00 PM Revision 3946: mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Added _if wrapper for mergability with VegCSV-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
05:52 PM Revision 3945: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber direct voucher _if statement: Changed @name to "if *indirect* voucher", so that it's logical consistent with the else branch following it. It was previously "if *direct* voucher" because the _if statement only contained a case for direct vouchers, and the else branch was being used in place of a _not() function.
Aaron Marcuse-Kubitza
05:38 PM Revision 3944: mappings/roots: plots roots: Default to using VegCSV instead of VegX for new plots datasources
Aaron Marcuse-Kubitza
05:35 PM Revision 3943: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber _if statements: Changed @names to more descriptive comments. This also prevents the @name from looking confusingly like the condition of the _if statement, which is actually supplied through the cond param and is usually located in a separate mapping.
Aaron Marcuse-Kubitza
05:20 PM Revision 3942: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Split _if apart into separate _ifs for the indirect and direct voucher cases. Moved direct voucher _if inwards so it is just wrapping catalognumber_dwc itself. This will enable this mapping to be used for specimens data, which is always considered a direct voucher and will always have this _if return true. Also moved indirect voucher _if inwards in the same way, so that a future SQL function implementation of _if only needs to concern itself with returning one value or another, not with handling entire XML subtrees. Note that if the indirect voucher _if returns false, NOT NULL and CHECK constraint violations will cause the intervening voucher and specimenreplicate elements to be deleted, thus having the same effect. Use new @name syntax for distinguishing _if statements.
Aaron Marcuse-Kubitza
05:02 PM Revision 3941: mappings: Removed no longer used for_review/VegBIEN-DwC2.specimens.csv
Aaron Marcuse-Kubitza
04:49 PM Revision 3940: xml_func.py: _if(): Changed documentation about name param for distinguishing separate _if statements to use @name attribute instead, so that the XML/SQL function mechanism doesn't have to deal with code that's solely for XPath merging
Aaron Marcuse-Kubitza
04:09 PM Revision 3939: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
04:08 PM Revision 3938: schemas/vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
03:57 PM Revision 3937: schemas/filter_ERD.csv: Removed no longer applicable specimenreplicate inheritance filters
Aaron Marcuse-Kubitza
03:50 PM Revision 3936: inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes additional date parsing on all date fields, which adds 1/2-1 hour to the import time. Eventually, we will want to translate _date() to PL/pgSQL and only use extra date processing if PostgreSQL's cast to timestamp doesn't work, which should greatly reduce this time.
Aaron Marcuse-Kubitza

08/09/2012

05:37 PM Revision 3935: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
05:35 PM Revision 3934: schemas/vegbien.sql: Removed inheritance link between specimenreplicate and taxonoccurrence, which is not needed now that specimenreplicate is mapped via plantobservation. mappings/DwC2-VegBIEN.specimens.csv: As part of this change, moved mappings to specimenreplicate fields inherited from taxonoccurrence to go directly to taxonoccurrence.
Aaron Marcuse-Kubitza
05:15 PM Revision 3933: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
05:14 PM Revision 3932: schemas/vegbien.ERD.mwb: Synced with schema
Aaron Marcuse-Kubitza
05:13 PM Revision 3931: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Default to mapping via plantobservation rather than via voucher when no voucherType is specified, in order to be consistent with the specimens data mapping for catalogNumber
Aaron Marcuse-Kubitza
03:31 PM Revision 3930: Regenerated mappings/for_review/VegX-VegCSV.stems.csv. Note that running `make mappings/` did not change mappings/VegX-VegCSV.stems.csv, because all changes were deletions of lines.
Aaron Marcuse-Kubitza
03:29 PM Revision 3929: mappings/VegX-VegCSV.stems.csv: Removed no longer used user-defined terms (simpleUserdefined). Note that CTFS does use user-defined terms, but these are all defined in its own map spreadsheet.
Aaron Marcuse-Kubitza
03:24 PM Revision 3928: mappings: Removed no longer needed VegX-VegBIEN mappings
Aaron Marcuse-Kubitza
03:23 PM Revision 3927: mappings/Makefile: Made VegCSV-VegBIEN.specimens.csv a non-derived map, since the VegX-VegCSV mapping is no longer used. This causes automatic creation of a for_review file.
Aaron Marcuse-Kubitza
03:21 PM Revision 3926: plots inputs: Removed maps/.VegX.*.csv.last_cleanup
Aaron Marcuse-Kubitza
03:13 PM Revision 3925: plots inputs: Remapped all VegX via maps to VegCSV. See steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegX-%3EVegCSV>.
Aaron Marcuse-Kubitza
02:45 PM Revision 3924: join: Added map_1_core_only option that uses only columns 0 and 1 of map_1. This is useful for one-time refactoring joins where the Source column, mappings comments, etc. shouldn't be part of the datasource's via map (although they will be part of the autogenerated VegBIEN map)
Aaron Marcuse-Kubitza
02:33 PM Revision 3923: join: Use opts.env_usage() for usage message
Aaron Marcuse-Kubitza
02:04 PM Revision 3922: mappings: Made VegCSV-VegBIEN.{plots,organisms,stems}.csv symlinks to VegCSV-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
01:46 PM Revision 3921: mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Commented out combining with DwC2-VegBIEN mappings, because merging DwC and VegX/VegCSV into one map is a lower priority than replacing all datasource VegX mappings with VegCSV (which does not require the merging but does require XPaths that don't collide, which is not yet the case)
Aaron Marcuse-Kubitza
01:40 PM Revision 3920: lib/xml_func.py: _if(): Made then param optional, so that user can just map to the else branch as a shortcut for logically inverting the condition. (Note that a _not() XML function does not exist yet, so this is also a workaround.)
Aaron Marcuse-Kubitza
01:29 PM Revision 3919: VegBIEN mappings: Wrapped dates in _date() and _dateRangeStart()/_dateRangeEnd(), to assist in importing date and date range values that PostgreSQL cannot parse. This will increase the import time, but hopefully also decrease the # of invalid values in the errors tables. (These functions can later be optimized to reduce the impact on import time.)
Aaron Marcuse-Kubitza
01:25 PM Revision 3918: sql_io.py: put_table(): is_literals: is_function: Fixed bug where function call needed to be recreated in each iteration of the main loop, because the arguments to the function, which are based on mapping, may change as the result of error handling replacing invalid values with NULL
Aaron Marcuse-Kubitza
01:13 PM Revision 3917: sql_io.py: put_table(): is_literals: Fixed bug where sql.select() that calls the function needed to be run recoverably, to auto-rollback errors. Made sql.select() cacheable because SQL functions are immutable, so it should be idempotent.
Aaron Marcuse-Kubitza
01:03 PM Revision 3916: mappings/DwC2-VegBIEN.specimens.csv: Remapped taxonRemarks to taxondetermination.notes because http://rs.tdwg.org/dwc/terms/#taxonRemarks indicates that these notes are "about the taxon", not the specimen/plant in general
Aaron Marcuse-Kubitza
12:56 PM Revision 3915: mappings/DwC2-VegBIEN.specimens.csv: Remapped eventDate to new aggregateoccurrence.collectiondate, which is a more accurate place than locationevent.obsstartdate/obsenddate because the date refers to a specific specimen. This also makes eventDate compatible with plots data.
Aaron Marcuse-Kubitza
12:44 PM Revision 3914: mappings/DwC2-VegBIEN.specimens.csv: Moved sex user-defined mapping to plantobservation because it's a property of the plant rather than the specimen, and so that it can also apply to plots data
Aaron Marcuse-Kubitza
12:31 PM Revision 3913: mappings: Remapped specimenreplicate.description to new aggregateoccurrence.notes because the notes don't necessarily refer specifically to the specimen, especially for plots data
Aaron Marcuse-Kubitza
12:31 PM Revision 3912: mappings: Remapped specimenreplicate.description to new aggregateoccurrence.notes because the notes don't necessarily refer specifically to the specimen, especially for plots data
Aaron Marcuse-Kubitza
12:21 PM Revision 3911: schemas/vegbien.sql: aggregateoccurrence: Added notes, to serve the purpose that specimenreplicate.description previously did. specimenreplicate.description is not appropriate for plots data, and often not appropriate even for specimens data, which uses fieldNotes as a general notes field rather than a description of the specimen.
Aaron Marcuse-Kubitza
12:07 PM Revision 3910: schemas/vegbien.sql: aggregateoccurrence: Reordered linecover so it's near cover instead of at the end
Aaron Marcuse-Kubitza
12:02 PM Revision 3909: schemas/vegbien.sql: Moved collectiondate from specimenreplicate to aggregateoccurrence because it's actually the SALVIAS census_date, which is the date the plant was sampled, rather than the DwC eventDate, which is the date the specimen was collected
Aaron Marcuse-Kubitza
11:56 AM Revision 3908: mappings/DwC2-VegBIEN.specimens.csv: Mapped specimenreplicate via plantobservation for consistency with plots data. (This change is required for VegCSV table merging to work properly.) This is also a more accurate way of representing the data, because a specimen in fact comes from a plant, and it's natural to place the plant-related data (measurements, etc.) in the plantobservation table.
Aaron Marcuse-Kubitza
11:42 AM Revision 3907: mappings/DwC2-VegBIEN.specimens.csv: Mapped specimenreplicate via plantobservation for consistency with plots data. (This change is required for VegCSV table merging to work properly.) This is also a more accurate way of representing the data, because a specimen in fact comes from a plant, and it's natural to place the plant-related data (measurements, etc.) in the plantobservation table.
Aaron Marcuse-Kubitza
10:41 AM Revision 3906: mappings/VegX-VegCSV.stems.csv: Remapped stem notes to new stemNotes term, and mapped new organism notes VegX XPath to now-available DwC fieldNotes
Aaron Marcuse-Kubitza
10:30 AM Revision 3905: inputs/SALVIAS/maps/VegX.organisms.csv: Map organism notes to different place than stem notes, because these are separate fields
Aaron Marcuse-Kubitza
10:09 AM Revision 3904: mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Temporarily sort by input column rather than output column, to assist in finding terms that map to different places in the DwC- and VegX-VegBIEN mappings
Aaron Marcuse-Kubitza
10:02 AM Revision 3903: mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Use new all option to union, in order to manually review inputs which appear in both maps but map to different places
Aaron Marcuse-Kubitza
10:01 AM Revision 3902: union: Added full flag to turn off merging mappings that are in both maps, in order to review inputs which appear in both maps but map to different places
Aaron Marcuse-Kubitza
09:57 AM Revision 3901: mappings/Makefile: Merged .VegX-VegCSV.stems.csv.last_cleanup into .%.last_cleanup, since VegX-VegCSV.stems.csv now uses the same cleanup operations as the other non-derived maps. Note that this automatically creates a file in for_review for VegX-VegCSV.stems.csv, which is currently identical to it.
Aaron Marcuse-Kubitza
09:52 AM Revision 3900: mappings/Makefile: .%.last_cleanup: Removed simplify_xpath because non-derived maps will now have VegX XPaths in their Source column URLs, which should not be modified
Aaron Marcuse-Kubitza
09:50 AM Revision 3899: mappings/Makefile: VegX-VegCSV.stems.csv: Removed autogeneration command because once file has been generated, regeneration is no longer needed
Aaron Marcuse-Kubitza
09:42 AM Revision 3898: mappings/Makefile: Fixed bug where VegX-VegCSV.stems.csv needed to be removed from $(vegcsvMaps) so it wouldn't be deleted on `make clean`
Aaron Marcuse-Kubitza
08:53 AM Revision 3897: mappings/VegX-VegCSV.stems.csv: Source: Put URLs in the order their terms appear in the VegCSV term name
Aaron Marcuse-Kubitza
08:38 AM Revision 3896: mappings/VegX-VegCSV.stems.csv: Comments: Changed "Table name" to "Table" to be concise
Aaron Marcuse-Kubitza
08:37 AM Revision 3895: mappings/VegX-VegCSV.stems.csv: Mapped VegX community fields
Aaron Marcuse-Kubitza
08:28 AM Revision 3894: mappings/VegX-VegCSV.stems.csv: Mapped VegX cover-related fields
Aaron Marcuse-Kubitza
08:26 AM Revision 3893: mappings/VegX-VegCSV.stems.csv: Changed authorPlantCode to the associated DwC term fieldNumber
Aaron Marcuse-Kubitza
08:04 AM Revision 3892: mappings/VegX-VegCSV.stems.csv: Changed locationNarrative to the associated DwC term locality
Aaron Marcuse-Kubitza
08:00 AM Revision 3891: mappings/VegX-VegCSV.stems.csv: Changed collectedDate to the associated DwC term eventDate
Aaron Marcuse-Kubitza
07:54 AM Revision 3890: mappings/VegX-VegCSV.stems.csv: Added plot prefix to eventStartDate/eventEndDate to distinguish it from the DwC eventDate, which is the date the *specimen* was collected
Aaron Marcuse-Kubitza
07:40 AM Revision 3889: mappings/VegX-VegCSV.stems.csv: Order within table: Updated order #s for salvias_plots terms that got changed to SALVIAS data dictionary terms
Aaron Marcuse-Kubitza
07:33 AM Revision 3888: mappings/VegX-VegCSV.stems.csv: Changed collector name parts to the associated DwC term recordedBy
Aaron Marcuse-Kubitza
07:11 AM Revision 3887: mappings/VegX-VegCSV.stems.csv: Mapped SALVIAS voucher type
Aaron Marcuse-Kubitza

08/08/2012

11:09 PM Revision 3886: mappings/VegX-VegCSV.stems.csv: Mapped collector name parts
Aaron Marcuse-Kubitza
11:00 PM Revision 3885: mappings/VegX-VegCSV.stems.csv: Table names ("." prefixes) merged into name where possible, for consistency. computer taxonomic elements have not been merged because the field part should exactly match the corresponding DwC term.
Aaron Marcuse-Kubitza
10:53 PM Revision 3884: mappings/VegX-VegCSV.stems.csv: Order within table: If Source has multiple URLs, ensure each source has its own order
Aaron Marcuse-Kubitza
10:44 PM Revision 3883: mappings/VegX-VegCSV.stems.csv: Order within table: Separate orders of multiple elements with "," instead of ";", for consistency with the Source column
Aaron Marcuse-Kubitza
10:42 PM Revision 3882: mappings/VegX-VegCSV.stems.csv: Changed authorPlotCode terms to a variation of VegX's plotName, for standardization with VegX
Aaron Marcuse-Kubitza
10:37 PM Revision 3881: mappings/VegX-VegCSV.stems.csv: Changed uniqueIDs with table names to the table name + "ID", for standardization
Aaron Marcuse-Kubitza
10:26 PM Revision 3880: mappings/VegX-VegCSV.stems.csv: Changed terms with table names to DwC terms where possible
Aaron Marcuse-Kubitza
10:19 PM Revision 3879: mappings/VegX-VegCSV.stems.csv: Removed comments about alternate names, as these will be included in a separate "VegCSV-alt" mapping to "VegCSV-core" terms
Aaron Marcuse-Kubitza
10:17 PM Revision 3878: mappings/VegX-VegCSV.stems.csv: Clarified comments about the inclusion of the table name
Aaron Marcuse-Kubitza
10:12 PM Revision 3877: mappings/VegX-VegCSV.stems.csv: Mapped plotObservation user-defined terms
Aaron Marcuse-Kubitza
09:59 PM Revision 3876: mappings/VegX-VegCSV.stems.csv: Mapped VegX plotObservation fields
Aaron Marcuse-Kubitza
09:40 PM Revision 3875: mappings/VegX-VegCSV.stems.csv: Corrected sources of DwC terms to point to the actual DwC term, where needed. eventDate parts: Added source for VegBank field used as named suffix.
Aaron Marcuse-Kubitza
09:35 PM Revision 3874: mappings/VegX-VegCSV.stems.csv: Corrected sources of VegX names to point to the actual VegX field name, where needed
Aaron Marcuse-Kubitza
09:28 PM Revision 3873: mappings/VegX-VegCSV.stems.csv: Mapped SALVIAS stem tags
Aaron Marcuse-Kubitza
09:22 PM Revision 3872: mappings/VegX-VegCSV.stems.csv: Corrected parent plot-only mappings by prefixing "parentPlot."
Aaron Marcuse-Kubitza
09:18 PM Revision 3871: mappings/VegX-VegCSV.stems.csv: Mapped VegX //plot/plotName
Aaron Marcuse-Kubitza
09:14 PM Revision 3870: mappings/VegX-VegCSV.stems.csv: Mapped VegX //plot/plotUniqueIdentifier
Aaron Marcuse-Kubitza
09:00 PM Revision 3869: mappings/VegX-VegCSV.stems.csv: Source SALVIAS terms from the SALVIAS data dictionary when possible, to provide an automatic link to the description of the term. Having these direct links will also assist in creating a data dictionary for VegCSV and eventually VegBIEN (using mappings/VegCSV-VegBIEN.specimens.csv). Note that many SALVIAS terms exist only in the live database, as they are not part of the export format documented in the data dictionary.
Aaron Marcuse-Kubitza
08:31 PM Revision 3868: mappings/VegX-VegCSV.stems.csv: Source VegBank terms directly from the appropriate VegBank data dictionary page, to provide an automatic link to the description of the term. Having these direct links will also assist in creating a data dictionary for VegCSV and eventually VegBIEN (using mappings/VegCSV-VegBIEN.specimens.csv).
Aaron Marcuse-Kubitza
08:18 PM Revision 3867: mappings/VegX-VegCSV.stems.csv: Mapped VegX relativePlotPosition terms
Aaron Marcuse-Kubitza
08:02 PM Revision 3866: maps with Order column: Renamed Order column to Order within table for clarity
Aaron Marcuse-Kubitza
08:00 PM Revision 3865: maps with Order column: Renamed Order column to Order within table for clarity
Aaron Marcuse-Kubitza
07:57 PM Revision 3864: maps with Source column: Added original column name to source URLs, so that source name is completely specified. For official DwC terms, this also allows linking directly to the term. Fixed nimoy phpMyAdmin links so that going to the link in a browser would take you straight there after login.
Aaron Marcuse-Kubitza
06:53 PM Revision 3863: mappings/VegX-VegCSV.stems.csv: Corrected SALVIAS stem diameter terms to place original name (before expansion for clarity) in the Comments column instead of appending it to the source URL, because the source URL should point just to the table the term is in. The actual term is identified directly by its order # and indirectly by the name of the VegCSV term, which should be similar (if not, the original term should be listed in the comments).
Aaron Marcuse-Kubitza
06:46 PM Revision 3862: mappings/VegX-VegCSV.stems.csv: Mapped SALVIAS stem diameter terms
Aaron Marcuse-Kubitza
06:35 PM Revision 3861: mappings/VegX-VegCSV.stems.csv: Mapped VegX project terms
Aaron Marcuse-Kubitza
06:29 PM Revision 3860: mappings/VegX-VegCSV.stems.csv: VegX plot terms: Added order
Aaron Marcuse-Kubitza
06:25 PM Revision 3859: mappings/VegX-VegCSV.stems.csv: Mapped non-user-defined height XPath
Aaron Marcuse-Kubitza
06:23 PM Revision 3858: mappings/VegX-VegCSV.stems.csv: Changed source of height to VegX, because there is a VegX height field
Aaron Marcuse-Kubitza
06:20 PM Revision 3857: mappings/VegX-VegCSV.stems.csv: Mapped VegX plot terms except unique keys
Aaron Marcuse-Kubitza
06:11 PM Revision 3856: mappings/VegX-VegCSV.stems.csv: Mapped remaining sourceAccessionCode user-defined terms to <VegX-table>.uniqueID
Aaron Marcuse-Kubitza
06:06 PM Revision 3855: mappings/VegX-VegCSV.stems.csv: Corrected sources of VegX names to point to the appropriate element in veg.xsd, rather than the appropriate type, because the names we used actually came from veg.xsd's top-level elements rather than from the type names
Aaron Marcuse-Kubitza
05:57 PM Revision 3854: mappings/VegX-VegCSV.stems.csv: Changed plantObservation.sourceAccessionCode to individualOrganismObservation.uniqueID, to be consistent with VegX names. (*source*AccessionCode only applies to an aggregate DB that preserves info from its inputs. accessionCode made less sense, because this field is for the datasource's primary key, which it may or may not consider an accession code.)
Aaron Marcuse-Kubitza
05:39 PM Revision 3853: mappings/VegX-VegCSV.stems.csv: Mapped aggregateOrganismObservation terms
Aaron Marcuse-Kubitza
05:36 PM Revision 3852: mappings/VegX-VegCSV.stems.csv: Changed base back to baseSaturation to distinguish this pH-related concept from other meanings of base, and to match VegBank
Aaron Marcuse-Kubitza
05:26 PM Revision 3851: mappings/DwC2-VegBIEN.specimens.csv: Removed no longer applicable comments, which were from the very first NY/SALVIAS->VegX/VegBank mapping and had been preserved by the map spreadsheet transformation scripts. Note that many comments have been left, because they either provide explanatory information or because we never reached a decision on the questions posed (such as many of Brad's "OMIT" comments).
Aaron Marcuse-Kubitza
05:18 PM Revision 3850: mappings/VegX-VegCSV.stems.csv: Removed no longer applicable comments, which were from the very first NY/SALVIAS->VegX/VegBank mapping and had been preserved by the map spreadsheet transformation scripts
Aaron Marcuse-Kubitza
05:15 PM Revision 3849: mappings/VegX-VegCSV.stems.csv: Mapped individualOrganismObservation user-defined terms
Aaron Marcuse-Kubitza
04:09 PM Revision 3848: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
04:02 PM Revision 3847: schemas/vegbien.ERD.mwb: Added link to VegBIEN schema wiki page
Aaron Marcuse-Kubitza
03:46 PM Revision 3846: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza
03:40 PM Revision 3845: README.TXT: After a new import: Added steps to check inputs' error counts and only continue with deleting previous imports, etc. if there were little to no errors. Added step to record the import times.
Aaron Marcuse-Kubitza

08/07/2012

09:45 AM Revision 3844: mappings/VegX-VegCSV.stems.csv: Mapped VegBank and SALVIAS abioticObservation terms
Aaron Marcuse-Kubitza
09:08 AM Revision 3843: mappings/VegX-VegCSV.stems.csv: Resolved ambiguous terms that appeared twice on the output side
Aaron Marcuse-Kubitza
08:52 AM Revision 3842: mappings/VegX-VegCSV.stems.csv: Mapped VegX abioticObservation terms
Aaron Marcuse-Kubitza
08:36 AM Revision 3841: mappings/VegX-VegCSV.stems.csv: Mapped standard DwC terms
Aaron Marcuse-Kubitza
08:13 AM Revision 3840: mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Sources: Replaced DwC with http://rs.tdwg.org/dwc/terms/, because DwC terms can come from many places but the DwC source referred specifically to this web page
Aaron Marcuse-Kubitza
08:06 AM Revision 3839: mappings/DwC1-DwC2.specimens.csv: Corrected mapping for previousCatalogNumber
Aaron Marcuse-Kubitza
08:00 AM Revision 3838: mappings/DwC1-DwC2.specimens.csv: Added source of datasources' custom terms
Aaron Marcuse-Kubitza
07:51 AM Revision 3837: mappings/DwC1-DwC2.specimens.csv: Added source of DwC 1.2 (http://digir.net/schema/conceptual/darwin/2003/1.0/darwin2.xsd), aka DwC Classic, terms
Aaron Marcuse-Kubitza
07:43 AM Revision 3836: mappings/DwC1-DwC2.specimens.csv: Added source of custom NY staging table terms in nimoy.bien2_staging.nybg_raw
Aaron Marcuse-Kubitza
07:27 AM Revision 3835: mappings/DwC1-DwC2.specimens.csv: Added source of DwC 1.21 (http://digir.net/schema/conceptual/darwin/manis/1.21/darwin2.xsd) terms
Aaron Marcuse-Kubitza
07:02 AM Revision 3834: mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Sources: Replaced DwC with http://rs.tdwg.org/dwc/terms/, because DwC terms can come from many places but the DwC source referred specifically to this web page
Aaron Marcuse-Kubitza
06:51 AM Revision 3833: mappings/DwC1-DwC2.specimens.csv: Added source of remappings of DwC terms with /_alt added
Aaron Marcuse-Kubitza
06:46 AM Revision 3832: mappings/DwC1-DwC2.specimens.csv: Added source of DwC terms with namespace removed
Aaron Marcuse-Kubitza
06:32 AM Revision 3831: mappings/VegX-VegCSV.stems.csv: Added "computer." before taxonomic terms whose VegX mapping used the "computer" role. (This is useful for datasources that supply separate determinations in the same row, such as SALVIAS.)
Aaron Marcuse-Kubitza
06:23 AM Revision 3830: mappings/DwC2-VegBIEN.specimens.csv: Added Source column containing "DwC" for every field with a an entry in the Order column, so that the source of the term can be tracked once we start combining DwC and VegCSV
Aaron Marcuse-Kubitza
06:07 AM Revision 3829: inputs/SALVIAS*/maps/VegX.organisms.csv: Fixed missing join mappings for stemobservation-related fields
Aaron Marcuse-Kubitza
05:56 AM Revision 3828: mappings/DwC2-VegBIEN.specimens.csv: Repopulated Order values for the few rows that had lost it in the process of copying and pasting mappings
Aaron Marcuse-Kubitza
05:49 AM Revision 3827: mappings/DwC2-VegBIEN.specimens.csv: Added Source column containing "DwC" for every field with a an entry in the Order column, so that the source of the term can be tracked once we start combining DwC and VegCSV
Aaron Marcuse-Kubitza
05:38 AM Revision 3826: mappings/Makefile: VegX-VegCSV.stems.csv: Clean up when edited using sort_map
Aaron Marcuse-Kubitza
05:27 AM Revision 3825: Added mappings/VegCSV-VegBIEN.specimens.csv, which is generated from VegX-VegCSV.stems.csv
Aaron Marcuse-Kubitza
05:19 AM Revision 3824: mappings/for_review: svn:ignore OpenOffice.org lock files
Aaron Marcuse-Kubitza
05:14 AM Revision 3823: Added mappings/VegX-VegCSV.stems.csv. The initial version is autogenerated by joining the simplified VegBIEN XPaths of related maps.
Aaron Marcuse-Kubitza
05:05 AM Revision 3822: join: Support discarding multiple outputs if they should be considered ambiguous
Aaron Marcuse-Kubitza
04:40 AM Revision 3821: input.Makefile: Maps validation: $(missingMappingsCmd): Support non-DwC mappings by matching entire line containing mapping, not just word characters. Remove any XML function so that merging of non-empty join mappings still works properly.
Aaron Marcuse-Kubitza
03:35 AM Revision 3820: mappings/Makefile: Use new invert
Aaron Marcuse-Kubitza
03:35 AM Revision 3819: Added invert
Aaron Marcuse-Kubitza
03:31 AM Revision 3818: mappings/Makefile: for_review/VegBIEN-DwC2.specimens.csv: Include all comments column(s), not just the first
Aaron Marcuse-Kubitza
03:27 AM Revision 3817: cols: Removed special handling of '+' because list_subset() now handles this col_num value itself, by appending the rest of the columns. Support intermixing int and '+' columns, by using new format.str2int_passthru().
Aaron Marcuse-Kubitza
03:23 AM Revision 3816: util.py: list_subset(): Made an index of '+' append the rest of the list
Aaron Marcuse-Kubitza
03:21 AM Revision 3815: format.py: Added str2int_passthru()
Aaron Marcuse-Kubitza
03:16 AM Revision 3814: cols: Changed value for all columns to '+' so that it wouldn't need to be shell-escaped as '*' was
Aaron Marcuse-Kubitza
01:42 AM Revision 3813: review: Remove keys except last. This should increase the number of matches between human-readable VegBIEN XPaths of VegX and DwC2.
Aaron Marcuse-Kubitza
01:39 AM Revision 3812: mappings/DwC2-VegBIEN.specimens.csv: Use :[] instead of [] for all XML functions, so that the XML function args will get removed by review
Aaron Marcuse-Kubitza
01:18 AM Revision 3811: review: Remove XML functions. This should increase the number of matches between human-readable VegBIEN XPaths of VegX and DwC2.
Aaron Marcuse-Kubitza
12:34 AM Revision 3810: mappings/Makefile: human-readable maps in for_review: Simplify just the output column so that the input column can be programmatically linked back to the original input names/XPaths
Aaron Marcuse-Kubitza
12:26 AM Revision 3809: mappings/Makefile: Removed no longer used $(chRoot), $(cpReview)
Aaron Marcuse-Kubitza
12:23 AM Revision 3808: Removed the human-readable mappings mappings/for_review/VegX-VegBIEN.plots.csv, VegX-VegBIEN.organisms.csv because these are now duplicates of VegX-VegBIEN.stems.csv
Aaron Marcuse-Kubitza
12:20 AM Revision 3807: review: Support limiting the XPath simplifying to custom columns, rather than always the first two
Aaron Marcuse-Kubitza
12:12 AM Revision 3806: review: Usage message: Fixed typo
Aaron Marcuse-Kubitza
12:10 AM Revision 3805: Added mappings/for_review/VegBIEN-DwC2.specimens.csv, generated by inverting for_review/DwC2-VegBIEN.specimens.csv. This will be used to help translate VegX->VegCSV.
Aaron Marcuse-Kubitza

08/06/2012

11:44 PM Revision 3804: mappings: Made VegX-VegBIEN.organisms.csv, VegX-VegBIEN.plots.csv symlinks to VegX-VegBIEN.stems.csv instead of building them in the Makefile by copying VegX-VegBIEN.stems.csv, since these files are now always the same
Aaron Marcuse-Kubitza
09:29 PM Revision 3803: mappings/VegX-VegBIEN.stems.csv: _if that maps to specimenreplicate via plantobservation or voucher: Refactored to map right-hand side of _eq in the left-hand side mapping, rather than in all then/else mappings. Distinguish this _if statement from others using new name param.
Aaron Marcuse-Kubitza
09:16 PM Revision 3802: xml_func.py: _if(): Documented that can add `name` param to distinguish separate _if statements
Aaron Marcuse-Kubitza
09:08 PM Revision 3801: xml_func.py: _if(): Made cond optional. When it's not specified or None, it is treated as False. This supports cases where all elements of the condition are required but not mapped to.
Aaron Marcuse-Kubitza
08:50 PM Revision 3800: mappings/VegX-VegBIEN.stems.csv: _if that maps to specimenreplicate via plantobservation or voucher: Refactored to map voucherType directly into _if/cond/_eq/left rather than mapping it to a temporary _ignore location and retrieving it with _ref
Aaron Marcuse-Kubitza
08:47 PM Revision 3799: xml_func.py: Removed no longer used _simplifyPath(), which is now a built-in function of db_xml.put()
Aaron Marcuse-Kubitza
08:36 PM Revision 3798: xml_func.py: _eq(): Documented that '' (empty node) is returned if a value was not mapped to, not if a value was None, since None arguments are no longer removed by process() (now XML functions do this manually with conv_items())
Aaron Marcuse-Kubitza
08:19 PM Revision 3797: xml_func.py: _ref(): Only display "XPath reference target missing" warning if target node does not exist, not if it exists but is empty
Aaron Marcuse-Kubitza
08:17 PM Revision 3796: xpath.py: get(): reference expansion: Use get_1() and check for None result instead of using get(), which returns multiple nodes when we just want the first
Aaron Marcuse-Kubitza
07:39 PM Revision 3795: mappings/VegX-VegBIEN.stems.csv: Reversed XPaths so that they start with location instead of plantobservation
Aaron Marcuse-Kubitza
07:30 PM Revision 3794: lib/common.Makefile: Added $(cp)
Aaron Marcuse-Kubitza
05:58 PM Revision 3793: mappings/Makefile: Include lib/common.Makefile
Aaron Marcuse-Kubitza
05:57 PM Revision 3792: lib/common.Makefile: Added $(CP)
Aaron Marcuse-Kubitza
05:36 PM Revision 3791: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza

08/03/2012

09:59 PM Revision 3790: mappings/VegX-VegBIEN.stems.csv: Reversed input XPaths so that they start with plot instead of individualOrganismObservation as stem
Aaron Marcuse-Kubitza
09:57 PM Revision 3789: inputs/CTFS: Disabled maps because CTFS is not yet compatible with reversed XPaths, but the effort required to make it compatible is not worth including in the current commit. We lose only 2 test rows of test VegX data by doing this, since the full CTFS VegX files were never able to be imported.
Aaron Marcuse-Kubitza
08:31 PM Revision 3788: ch_root, ch_root_via: Documented that these are usually *not* idempotent operations
Aaron Marcuse-Kubitza
07:42 PM Revision 3787: mappings/VegX-VegBIEN.stems.csv: input (VegX) root: Removed tcs namespace URL to simplify the XPath reversing process. It isn't needed now that we don't generate intermediate XML documents in the automated tests (because intermediate formats are no longer required to be XML schemas).
Aaron Marcuse-Kubitza
07:16 PM Revision 3786: mappings/DwC2-VegBIEN.specimens.csv: Reversed XPaths so that they start with location instead of specimenreplicate
Aaron Marcuse-Kubitza
07:00 PM Revision 3785: README.TXT: WinMerge setup: Documented how to get to Compare Options page
Aaron Marcuse-Kubitza
06:59 PM Revision 3784: README.TXT: WinMerge setup: Added step to set Whitespace to Ignore change
Aaron Marcuse-Kubitza
06:55 PM Revision 3783: README.TXT: Moved WinMerge setup to separate section. Changed Moved block detection link to the Configuration page.
Aaron Marcuse-Kubitza
06:32 PM Revision 3782: mappings/VegX-VegBIEN.stems.csv: Expanded {} expressions using expand_braces, so that each distinct output for the same input is on its own line, improving readability. This will also help enable search-and-replace reversing of XPaths for the re-rooting to location.
Aaron Marcuse-Kubitza
06:17 PM Revision 3781: mappings/VegX-VegBIEN.stems.csv: VegX XPaths: Expanded {} expressions using expand_braces, so that later use of expand_braces on the file would not affect the VegX output mappings of the inputs' via maps (VegX.organisms.csv, etc.)
Aaron Marcuse-Kubitza
05:54 PM Revision 3780: mappings/DwC2-VegBIEN.specimens.csv: Expanded {} expressions using expand_braces, so that each distinct output for the same input is on its own line, improving readability. This will also help enable search-and-replace reversing of XPaths for the re-rooting to location.
Aaron Marcuse-Kubitza
05:52 PM Revision 3779: README.TXT: Accepting test cases: Documented that when refactoring mappings, it's helpful to use WinMerge to detect moved lines
Aaron Marcuse-Kubitza
05:14 PM Revision 3778: expand_braces: Fixed bug where needed to get next line from stdin in raw mode, so that \ won't be parsed as escape chars
Aaron Marcuse-Kubitza
04:59 PM Revision 3777: join: Fixed bug where when an input to mapped to multiple outputs, the joined row for each output needed to be output separately using writer.writerow()
Aaron Marcuse-Kubitza
03:52 PM Revision 3776: sort_map: Remove duplicates resulting from multiple outputs for the same input. mappings/Makefile: $(mkSelfMap): Removed uniq now that sort_map does this.
Aaron Marcuse-Kubitza
03:24 PM Revision 3775: mappings/Makefile: $(mkSelfMap): Run uniq on the output to remove duplicates resulting from multiple outputs for the same input
Aaron Marcuse-Kubitza
03:10 PM Revision 3774: expand_braces: Also expand XPaths containing [], with up to one level of nesting (which is the most we currently use), because many {} XPaths do in fact contain []. Debug-print intermediate values when env var expand_braces_debug is true. Added usage message.
Aaron Marcuse-Kubitza

08/02/2012

11:13 PM Revision 3773: expand_braces: Fixed bug where ./{ and brackets with commas inside {} are unparseable, and should not be expanded
Aaron Marcuse-Kubitza
11:05 PM Revision 3772: expand_braces: Fixed bug where `head -1` seemed to read more lines than just the first, causing EOF to be returned after the first line, by using `read` instead. Support data containing \r (such as Excel-dialect CSVs) by removing it. Fixed bug where ./{...} was not being properly escaped.
Aaron Marcuse-Kubitza
10:08 PM Revision 3771: Added expand_braces
Aaron Marcuse-Kubitza
09:12 PM Revision 3770: mappings: location: Removed centerlatitude/centerlongitude mappings because the lat/long should be in only one place: the locationdetermination. It is up to the database querier to decide which locationdetermination(s) to use as the coordinates for a plot/specimen.
Aaron Marcuse-Kubitza
08:54 PM Revision 3769: bin/map: input is CSV: Removed unused map_ var
Aaron Marcuse-Kubitza
08:50 PM Revision 3768: bin/map: Documented that it's multi-safe (supports an input appearing multiple times)
Aaron Marcuse-Kubitza
08:39 PM Revision 3767: subtract: Documented that it's multi-safe (supports an input appearing multiple times)
Aaron Marcuse-Kubitza
08:32 PM Revision 3766: join: Made it multi-safe (supports an input appearing multiple times)
Aaron Marcuse-Kubitza
08:30 PM Revision 3765: lib/common.Makefile: Added empty clean target to make sure `make clean` always works
Aaron Marcuse-Kubitza
08:03 PM Revision 3764: root Makefile, input.Makefile: Maps validation: Treat missing join mappings differently from missing non-empty join mappings, because they indicate mapping to an invalid location, which is a bug. Factored maps validation code out into new lib/mappings.Makefile.
Aaron Marcuse-Kubitza
08:00 PM Revision 3763: lib/common.Makefile: Added vars for chars not allowed in make targets. Added functions/vars to replace "_" with " ".
Aaron Marcuse-Kubitza
07:38 PM Revision 3762: root Makefile: Include lib/common.Makefile
Aaron Marcuse-Kubitza
07:37 PM Revision 3761: input.Makefile: Include lib/common.Makefile
Aaron Marcuse-Kubitza
06:48 PM Revision 3760: intersect: Documented that it's multi-safe (supports an input appearing multiple times)
Aaron Marcuse-Kubitza
06:42 PM Revision 3759: union: Documented that it's multi-safe (supports an input appearing multiple times)
Aaron Marcuse-Kubitza
06:00 PM Revision 3758: mappings/DwC2-VegBIEN.specimens.csv: Moved shared /specimenreplicate root to mappings in preparation for reversing the XPaths so that parent table paths (such as location) don't contain a prefix for child tables (specimenreplicate, locationevent, etc.). This reversing will avoid the need to "ch_root" the child table map to obtain maps for parent tables with the prefixes removed, allowing all hierarchical levels to use the same map spreadsheet.
Aaron Marcuse-Kubitza
05:53 PM Revision 3757: ch_root: Support column headers without a root, for non-hierarchical formats such as DwC
Aaron Marcuse-Kubitza
05:45 PM Revision 3756: lib/common.Makefile: rsync: Time the rsync operation
Aaron Marcuse-Kubitza
05:29 PM Revision 3755: in_place: Wrap EXIT handler in shell function so that "-escaping can easily be used on the temp file path
Aaron Marcuse-Kubitza
05:26 PM Revision 3754: in_place: Documented that doesn't update file on error
Aaron Marcuse-Kubitza
05:23 PM Revision 3753: DwC mappings: Removed ':/list/' root (full version: '::[@xmlns:dcterms=http://purl.org/dc/terms/]/list/') from map spreadsheets to simplify the boilerplate in each file. Since intermediate DwC XML files no longer need to be produced for automated tests, these roots are not needed.
Aaron Marcuse-Kubitza
04:46 PM Revision 3752: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza
04:40 PM Revision 3751: inputs/import.stats.xls: Moved independent-import data to separate tab so that it wouldn't get moved to the side whenever a new column of simultaneous-import data is inserted. It is also no longer updated, because all column-based imports are now done simultaneously.
Aaron Marcuse-Kubitza
04:32 PM Revision 3750: Use strings.ustr() or strings.urepr() everywhere that columns are stringified, in order to support column names with non-ASCII characters (such as in the Madidi data)
Aaron Marcuse-Kubitza
04:16 PM Revision 3749: strings.py: concat(): Convert args to raw (non-Unicode) strings first, so that multi-byte Unicode sequences are considered by # of bytes instead of # of chars. This is necessary because PostgreSQL truncates identifiers by # of bytes instead of # of chars, so that identifiers will actually be less than 63 chars long when some chars were multi-byte.
Aaron Marcuse-Kubitza
04:11 PM Revision 3748: strings.py: ustr(): Call __str__() method manually like urepr() to avoid Unicode errors when the returning string is non-ASCII
Aaron Marcuse-Kubitza
03:54 PM Revision 3747: strings.py: Added urepr() and use it in repr_no_u(), to better support repr() return values with non-ASCII characters. Avoiding repr() also provides a more complete stack trace in the case of such errors.
Aaron Marcuse-Kubitza

08/01/2012

11:37 AM Revision 3746: schemas/vegbien.sql: plantobservation: plantobservation_aggregateoccurrence_count_1() trigger: Don't raise an error if existing count was >1, because there are in fact datasets (notably SALVIAS) where input records for individual stems may themselves contain aggregate data (such as plant and stem counts). For this data, we have an anomalous condition where an aggregateoccurrence has count >1 but contains one plantobservation, due to the plant/stem count being included in the first stem's record. (See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Data-interpretation-issues> for more info on this problem.) Note that our desired 1:1 relationship between aggregateoccurrence and plantobservation is still guaranteed by a constraint, but the anomalous data may still cause irregularities later on in the analysis.
Aaron Marcuse-Kubitza
10:55 AM Revision 3745: sql_io.py: put_table(): Ignoring all rows on unrecoverable errors: Also support the case where has_joins == True, by setting it to False so that the no-joins case is effectively used
Aaron Marcuse-Kubitza
10:32 AM Revision 3744: inputs/import.stats.xls: Moved Simultaneously above Independently because that is how we are now running the imports
Aaron Marcuse-Kubitza
10:21 AM Revision 3743: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
09:50 AM Revision 3742: schemas/vegbien.sql: *_1_to_1 and *_unique_within_* unique indexes with a `WHERE sourceaccessioncode IS NULL` filter: Added IS NULL filters for other unique keys, so that these fallback indexes would only be used if there was no (or no other) way to uniquely identify their tables. For *_1_to_1 unique indexes, this is the case for specimens data.
Aaron Marcuse-Kubitza
09:48 AM Revision 3741: schemas/vegbien.sql: *_1_to_1 and *_unique_within_* unique indexes with a `WHERE sourceaccessioncode IS NULL` filter: Added IS NULL filters for other unique keys, so that these fallback indexes would only be used if there was no (or no other) way to uniquely identify their tables. For *_1_to_1 unique indexes, this is the case for specimens data.
Aaron Marcuse-Kubitza
09:41 AM Revision 3740: schemas/vegbien.sql: stemobservation: Replaced stemobservation_unique_code unique constraint with stemobservation_unique_within_plantobservation unique index that uses COALESCE() and WHERE ... IS NOT NULL appropriately, to work with sql_gen's use of COALESCE() indexes and (for the renaming) to better reflect what it does
Aaron Marcuse-Kubitza
09:36 AM Revision 3739: schemas/vegbien.ERD.mwb: Synced with schema
Aaron Marcuse-Kubitza
09:30 AM Revision 3738: schemas/vegbien.sql: *_1_to_1 and *_unique_within_* unique indexes intended to operate only when sourceaccessioncode is NULL: Changed to use `sourceaccessioncode IS NULL` WHERE condition instead of COALESCE(sourceaccessioncode, ...) element, since the sourceaccessioncode is not actually needed for the uniquification (it is already globally unique within the datasource if it's not NULL; this just covers the case where it is NULL)
Aaron Marcuse-Kubitza
09:23 AM Revision 3737: schemas/vegbien.sql: *_unique_within_* unique indexes used for 1:1 relationships: Renamed to *_*_1_to_1 to better reflect what they do
Aaron Marcuse-Kubitza
09:21 AM Revision 3736: schemas/vegbien.sql: *_unique_within_* unique indexes used for 1:1 relationships: Renamed to *_*_1_to_1 to better reflect what they do
Aaron Marcuse-Kubitza
08:58 AM Revision 3735: schemas/vegbien.sql: plantobservation: Corrected plantobservation_aggregateoccurrence_id_1_to_1's name to plantobservation_aggregateoccurrence_1_to_1 because it's 1:1 with aggregateoccurrence, not aggregateoccurrence_id. Made it a unique index for consistency with our general method of expressing unique constraints on potentially nullable columns.
Aaron Marcuse-Kubitza
08:54 AM Revision 3734: schemas/vegbien.sql: specimenreplicate: Renamed specimenreplicate_unique_plantobservation to specimenreplicate_plantobservation_1_to_1 to better reflect what it does
Aaron Marcuse-Kubitza
08:50 AM Revision 3733: schemas/vegbien.sql: locationevent unique indexes: Renamed to *_unique_within_* to better reflect what they do
Aaron Marcuse-Kubitza
08:34 AM Revision 3732: schemas/vegbien.sql: location: Removed redundant location_unique_sourceaccessioncode unique constraint, which has been replaced by location_unique_within_datasource
Aaron Marcuse-Kubitza
08:31 AM Revision 3731: schemas/vegbien.sql: Reset foreign key constraint names to autogenerated defaults for consistency
Aaron Marcuse-Kubitza
08:27 AM Revision 3730: schemas/vegbien.sql: Renamed *_unique_datasource unique indexes to *_unique_within_datasource to better reflect what they do
Aaron Marcuse-Kubitza
08:25 AM Revision 3729: schemas/vegbien.sql: locationevent: Renamed locationevent_unique_accessioncode to locationevent_unique_within_location to better reflect what it does
Aaron Marcuse-Kubitza
08:22 AM Revision 3728: schemas/vegbien.sql: specimenreplicate: Renamed specimenreplicate_unique_accessioncode to specimenreplicate_unique_within_datasource to better reflect what it does
Aaron Marcuse-Kubitza
08:11 AM Revision 3727: schemas/vegbien.sql: stemobservation: Renamed stemobservation_unique_accessioncode to stemobservation_unique_within_plantobservation and also apply it to NULL sourceaccessioncodes, so that a plantobservation can have a single stemobservation for its single stem's traits without needing a separate sourceaccessioncode for it
Aaron Marcuse-Kubitza
08:02 AM Revision 3726: schemas/vegbien.sql: aggregateoccurrence: Removed redundant aggregateoccurrence_unique_accessioncode unique constraint, which has been replaced by aggregateoccurrence_unique_within_taxonoccurrence
Aaron Marcuse-Kubitza
07:43 AM Revision 3725: schemas/vegbien.sql: plantnamescope: Added CHECK constraint to ensure that at least one key column is specified (an empty plantnamescope doesn't make sense; use NULL instead)
Aaron Marcuse-Kubitza
07:32 AM Revision 3724: schemas/vegbien.ERD.mwb: Synced with schema
Aaron Marcuse-Kubitza
07:23 AM Revision 3723: ch_root: Don't require both the input and output mappings to contain their respective new roots, since sometimes only one or the other root is being subset. This will occur, for example, in mappings that are flat on the input but normalized on the output, such as VegCSV.
Aaron Marcuse-Kubitza
07:06 AM Revision 3722: VegBIEN: Reversed aggregateoccurrence<->plantobservation relationship to point from plantobservation->aggregateoccurrence, so plantobservation could be scoped by aggregateoccurrence in the same way as all other core tables are scoped by their parent tables. This reversed direction was an anomaly due to the need to have a trigger auto-set aggregateoccurrence.count to 1 when there was an associated plantobservation. This was most easily accomplished on the aggregateoccurrence table itself, but required the reversed relationship. The trigger has now been reimplemented on plantobservation, which externally updates aggregateoccurrence.count.
Aaron Marcuse-Kubitza
06:53 AM Revision 3721: input.Makefile: Testing: diffing test outputs: Ignore changes in whitespace, due to e.g. different indent levels. This facilitates accepting tests when an element has been nested inside another element (or unnested), by showing only the opening and closing tags of the new outer element.
Aaron Marcuse-Kubitza
06:42 AM Revision 3720: dicts.py: DictProxy: Fixed bug where default value for inner param needed to be created in the constructor, or else every default instance would use and modify the same dictionary
Aaron Marcuse-Kubitza
06:26 AM Revision 3719: db_xml.py: put(): wrap_e(): Call augment_error() to add the current node to the error message
Aaron Marcuse-Kubitza
06:14 AM Revision 3718: db_xml.py: put(): Raise an error if there are multiple fields with the same name, instead of silently overwriting the first with the second. This generally indicates the need to use `:[@merge=1]` on the fields in question.
Aaron Marcuse-Kubitza
06:11 AM Revision 3717: dicts.py: Added OnceOnlyDict and helper exception KeyExistsError
Aaron Marcuse-Kubitza
06:10 AM Revision 3716: dicts.py: DictProxy: Added default value for inner param to facilitate creating empty wrapped dicts
Aaron Marcuse-Kubitza
05:48 AM Revision 3715: bin/map: out_is_db: row-based mode: Debug-log the processed XML tree produced by xml_func.process()
Aaron Marcuse-Kubitza
05:16 AM Revision 3714: sql_io.py: put_table(): Fixed bug where Missing mapping for NOT NULL column errors should actually be warnings because sometimes the mappings include extra tables which aren't used by the dataset
Aaron Marcuse-Kubitza
05:12 AM Revision 3713: sql_io.py: put_table(): Fixed bug where Missing mapping for NOT NULL column errors should actually be warnings because sometimes the mappings include extra tables which aren't used by the dataset
Aaron Marcuse-Kubitza
03:18 AM Revision 3712: schemas/vegbien.sql: aggregateoccurrence: Added UNIQUE INDEX that makes an aggregateoccurrence unique within a taxonoccurrence. When the sourceaccessioncode isn't specified (as for individual organisms data, where this goes in plantobservation and taxonoccurrence), this ensures a 1:1 relationship between aggregateoccurrence and taxonoccurrence.
Aaron Marcuse-Kubitza
03:08 AM Revision 3711: schemas/vegbien.sql: taxonoccurrence: Added UNIQUE INDEX that makes a taxonoccurrence unique within a locationevent. When the sourceaccessioncode isn't specified (as for specimens data), this ensures a 1:1 relationship between taxonoccurrence and locationevent.
Aaron Marcuse-Kubitza
03:05 AM Revision 3710: mappings/VegX-VegBIEN.stems.csv: binomial (full) plantname: Also mapped to an alternative for taxonoccurrence.sourceaccessioncode, for aggregate plots data that distinguishes taxonoccurrences only by plantname (such as CVS)
Aaron Marcuse-Kubitza
02:23 AM Revision 3709: exc.py: e_msg(): Fixed bug where exceptions with nothing in e.args (such as StopIteration) caused a failed assertion. Fixed bug where exceptions with multiple values in e.args (such as certain IOErrors) caused a failed assertion.
Aaron Marcuse-Kubitza
01:27 AM Revision 3708: sql.py: flatten(): Documented that shouldn't cache query because the temp table will usually be truncated after use
Aaron Marcuse-Kubitza
01:05 AM Revision 3707: sql_gen.py: merge_not_null(): For clarity, use to_text() to represent NULL as the string 'NULL' instead of as the null sentinel for the column's type
Aaron Marcuse-Kubitza
01:02 AM Revision 3706: sql_gen.py: Added to_text() and helper value null_as_str
Aaron Marcuse-Kubitza
12:52 AM Revision 3705: mappings/VegX-VegBIEN.stems.csv: plantobservation: sourceaccessioncode, authorplantcode: Removed no longer needed mapping to specimenreplicate.sourceaccessioncode, since specimenreplicate for plots data is now identified by its plantobservation fkey, without needing its own sourceaccessioncode
Aaron Marcuse-Kubitza

07/31/2012

10:41 PM Revision 3704: sql_io.py: put_table(): ignore_cond(): Fixed bug where if is_literals, need to return NULL, instead of trying to filter invalid rows out of a nonexistant input table
Aaron Marcuse-Kubitza
09:57 PM Revision 3703: mappings/VegX-VegBIEN.stems.csv: Replaced "/}" (with unnecessary "/") with "}"
Aaron Marcuse-Kubitza
09:51 PM Revision 3702: mappings/VegX-VegBIEN.stems.csv: Replaced doubled "/"s with single "/"
Aaron Marcuse-Kubitza
09:05 PM Revision 3701: backups/Makefile: Added synchronization of backups with vegbiendev. Added downloading backups to After a new import steps.
Aaron Marcuse-Kubitza
09:04 PM Revision 3700: lib/common.Makefile: rsync: $(remote): Fixed bug where the inputs/ dir was hardcoded, when the remote dir name needed to be determined dynamically based on the Makefile dir
Aaron Marcuse-Kubitza
08:54 PM Revision 3699: backups/Makefile: Refactored to include lib/common.Makefile
Aaron Marcuse-Kubitza
08:46 PM Revision 3698: inputs/Makefile: Added download-logs to download import logs onto local machine and added it to the "After a new import" steps
Aaron Marcuse-Kubitza
08:36 PM Revision 3697: Moved generally useful targets and vars from inputs/Makefile to lib/common.Makefile and lib/forwarding.Makefile
Aaron Marcuse-Kubitza
08:04 PM Revision 3696: bin/map: Don't create unneeded /_ignore/inLabel element containing the datasource name because sql_io.put_table() now autopopulates the datasource_id
Aaron Marcuse-Kubitza
07:57 PM Revision 3695: schemas/functions.sql, py_functions.sql: Removed no longer needed relational functions, since sql_io.put_table() supports regular SQL functions
Aaron Marcuse-Kubitza

07/30/2012

08:31 PM Revision 3694: inputs/Madidi/maps/VegX.plots.csv: Mapped all mappable columns
Aaron Marcuse-Kubitza
08:28 PM Revision 3693: mappings/VegX-VegBIEN.stems.csv: elevation, elevationrange: Added _rangeStart/_rangeEnd filter
Aaron Marcuse-Kubitza
08:19 PM Revision 3692: sql_io.py: Wrapping mapping in a sql_gen.ColDict: Documented that sql_gen.ColDict sanitizes both keys and values passed into it
Aaron Marcuse-Kubitza
08:18 PM Revision 3691: sql_gen.py: ColDict: Documented that anything that isn't a column is wrapped in a NamedCol
Aaron Marcuse-Kubitza
08:04 PM Revision 3690: README.TXT: Datasource setup: Accepting the test cases: Added instructions for what to do if you get errors
Aaron Marcuse-Kubitza
06:09 PM Revision 3689: bin/map: Fixed bug where needed to use sql.function_exists() to determine if something is a relational (now SQL) function, including in row-based mode, since that now uses sql_io.put_table(), which requires this. The bug fix relies on the new xml_func.process() feature that preserves unknown relational functions in case they are built-in functions rather than SQL functions.
Aaron Marcuse-Kubitza
06:04 PM Revision 3688: xml_func.py: process(): In row-based mode, when trying to evaluate function using DB, preserve unknown funcs because these might be built-in functions of db_xml.put(). The sql.DoesNotExistException should be raised again when db_xml.put() is run and it verifies whether the function is built-in or not (e.g. _simplifyPath is now built-in, for column-based support). See db_xml.put_special_funcs for built-in functions.
Aaron Marcuse-Kubitza
05:59 PM Revision 3687: db_xml.py: put(): Fixed bug where strings starting with "$" were interpreted as input columns in row-based mode (this should only apply to column-based mode). Explicitly store whether in row-based mode in is_literals var (similar to is_literals in sql_io.put_table()).
Aaron Marcuse-Kubitza
05:54 PM Revision 3686: sql_io.py: put_table(): unrecoverable errors: Returning default value: is_literals: Remove column rename from default value so it doesn't get treated as a column by db_xml.put() (which is handled differently from a literal value)
Aaron Marcuse-Kubitza
03:53 PM Revision 3685: db_xml.py: put(): put_(): Removed no longer needed in_row_ct_ref param, which is only used by put_table(). Rewrapped function body.
Aaron Marcuse-Kubitza
03:46 PM Revision 3684: sql_io.py: put_table(): ignore(): literals: Only replace invalid literal with NULL or remove row if that column actually contains the invalid value in question. This handles the case where all columns are being ignore()d because the specific column couldn't be identified, and this was not the invalid column.
Aaron Marcuse-Kubitza
03:02 PM Revision 3683: mappings/VegX-VegBIEN.stems.csv: plot: Mapped note
Aaron Marcuse-Kubitza
02:32 PM Revision 3682: mappings/VegX-VegBIEN.stems.csv: plot: Added landform mapping
Aaron Marcuse-Kubitza
02:24 PM Revision 3681: schemas/vegbank.ERD.pdf: Auto-repaired with Adobe Reader so that the repair message doesn't pop up whenever it's opened
Aaron Marcuse-Kubitza
02:22 PM Revision 3680: schemas: Added vegbank.ERD.pdf so the VegBank ERD is easily accessible when mapping
Aaron Marcuse-Kubitza
01:51 PM Revision 3679: mappings/VegX-VegBIEN.stems.csv: project: Mapped sourceaccessioncode. This entailed adding a distinguishing suffix to the projectname input mapping.
Aaron Marcuse-Kubitza
01:31 PM Revision 3678: mappings/DwC2-VegBIEN.specimens.csv, VegX-VegBIEN.stems.csv: Removed all manual mappings to datasource_id now that datasource_id is auto-populated, both on the VegBIEN output side and the DwC/VegX input side. This should greatly simplify many of the mappings!
Aaron Marcuse-Kubitza
12:11 PM Revision 3677: db_xml.py: put(): Don't suppress exceptions thrown by sql_io.put_table() by passing them to on_error(), because some exceptions indicate unrecoverable database connection problems such as a broken connection, which should abort the import
Aaron Marcuse-Kubitza
11:52 AM Revision 3676: db_xml.py: put(): Support datasets with no rows, where root.firstChild == None. Documented that to use an entire XML document, you need to pass root.firstChild rather than root.
Aaron Marcuse-Kubitza
11:31 AM Revision 3675: inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes CVS.
Aaron Marcuse-Kubitza
11:23 AM Revision 3674: README.TXT: Documented that the PostgreSQL server should be restarted after installing system updates that may affect it, to avoid spurious errors that crash the import but go away upon reimport
Aaron Marcuse-Kubitza

07/27/2012

11:12 PM Revision 3673: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
11:10 PM Revision 3672: schemas/vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
11:08 PM Revision 3671: schemas/vegbien.ERD.mwb: Synced with schema
Aaron Marcuse-Kubitza
10:51 PM Revision 3670: bin/map: Call sys.stdout.flush() after every call to sys.stdout.write() to avoid interleaved stdout/stderr output due to stdout buffering
Aaron Marcuse-Kubitza
10:48 PM Revision 3669: bin/map: Call sys.stdout.flush() after every call to sys.stdout.write() to avoid interleaved stdout/stderr output due to stdout buffering
Aaron Marcuse-Kubitza
10:13 PM Revision 3668: schemas/vegbien.sql: *_unique_datasource UNIQUE INDEXes: Removed COALESCE() from datasource_id and datasource_id IS NOT NULL filter, because datasource_id is now always NOT NULL
Aaron Marcuse-Kubitza
10:07 PM Revision 3667: schemas/filter_ERD.csv: Removed AUTO_INCREMENT because that is not added to any other tables
Aaron Marcuse-Kubitza
10:05 PM Revision 3666: Regenerated schemas/vegbien.my.sql
Aaron Marcuse-Kubitza
10:04 PM Revision 3665: schemas/vegbien.sql: specimenreplicate: Inherit datasource_id from taxonoccurrence instead of defining it independently
Aaron Marcuse-Kubitza
09:56 PM Revision 3664: xml_func.py: Removed no longer needed local XML functions that have been translated to SQL functions
Aaron Marcuse-Kubitza
09:52 PM Revision 3663: input.Makefile: Testing: Removed VegBIEN.%.xml test because the import.%.xml test output includes the template tree that it's inserting, so there is no need to generate the XML tree in a separate test. This will also remove the need to maintain local XML functions that have already been translated to DB functions for the sole purpose of this automated test.
Aaron Marcuse-Kubitza
09:40 PM Revision 3662: schemas/vegbien.sql: Made datasource_id required on every table that has it, to trigger the automatic population of it by sql_io.put_table()'s col_defaults
Aaron Marcuse-Kubitza
09:38 PM Revision 3661: Moved importing of col_defaults from db_xml.put_table() to bin/map, so that it also happens in row-based mode. Note that this causes a DB entry for the datasource to always be created, even if the datasource has no mappings or no rows.
Aaron Marcuse-Kubitza
09:13 PM Revision 3660: Use new exc.reraise() where exc.raise_() was used, so that the stack trace is preserved when the exception is rethrown
Aaron Marcuse-Kubitza
09:11 PM Revision 3659: exc.py: reraise(): Take optional exception argument so it can be invoked in the same way as raise_(). Interestingly, this missing parameter does not produce the usual "...() takes no arguments (1 given)" error when the function is called inside an except block.
Aaron Marcuse-Kubitza
09:04 PM Revision 3658: exc.py: Added reraise()
Aaron Marcuse-Kubitza
09:02 PM Revision 3657: db_xml.py: put(): Inserting node: Wrap sql_io.put_table() call in catch-all exception handler that calls on_error_() (wrapper for error handler provided by caller) and returns None. This both adds additional debugging info to the exception (in on_error_()) and allows recovery from arbitrary exceptions that happen in sql_io.put_table(), so that an exception does not abort the import.
Aaron Marcuse-Kubitza
08:50 PM Revision 3656: exc.py: get_e_tracebacks_str(): Use the current system traceback if the exception doesn't contain its own traceback(s)
Aaron Marcuse-Kubitza
08:35 PM Revision 3655: schemas/vegbien.sql: specimenreplicate: Added locationevent fkey, since fkeys are not inherited from parent tables
Aaron Marcuse-Kubitza
08:30 PM Revision 3654: schemas/vegbien.sql: Added datasource_id fkey constraints to all tables that needed it
Aaron Marcuse-Kubitza
08:21 PM Revision 3653: bin/map: out_is_db: Use col_defaults in row-based mode as well
Aaron Marcuse-Kubitza
08:02 PM Revision 3652: db_xml.py: Renamed put_table_special_funcs to put_special_funcs because it is now used by put() as well
Aaron Marcuse-Kubitza
08:00 PM Revision 3651: db_xml.py: Moved put() before the functions that use it
Aaron Marcuse-Kubitza
07:58 PM Revision 3650: db_xml.py: Renamed _put_table_part() to put(), replacing the existing put() whose functionality it now performs
Aaron Marcuse-Kubitza
07:52 PM Revision 3649: db_xml.py: _put_table_part(): Reordered params to match put(), so that it can eventually be substituted for it
Aaron Marcuse-Kubitza
07:44 PM Revision 3648: db_xml.py: _put_table_part(): Allow being invoked directly by adding defaults for parameters
Aaron Marcuse-Kubitza
07:41 PM Revision 3647: db_xml.py: put(): Use _put_table_part(). This will ensure that all the put-related functionality is in one place, rather than duplicated.
Aaron Marcuse-Kubitza
07:30 PM Revision 3646: db_xml.py: _put_table_part(): Append the node to errors handled with on_error()
Aaron Marcuse-Kubitza
07:29 PM Revision 3645: sql_io.py: Added own SyntaxError class to replace built-in SyntaxError because it stringifies to only the first line
Aaron Marcuse-Kubitza
06:46 PM Revision 3644: input.Makefile: Testing: Removed $(via).%.xml tests because they require the via format (DwC/VegX) to be XML, but we want to flatten VegX into a DwC-like set of CSV column names
Aaron Marcuse-Kubitza
06:45 PM Revision 3643: Removed inputs/NY/test/VegX.specimens.xml.ref because NY is not mapped via VegX
Aaron Marcuse-Kubitza
06:31 PM Revision 3642: input.Makefile: Testing: Renamed import.*.out tests to end in .xml because they now contain XML import trees for validation, and this extension turns on XML syntax highlighting in a text editor
Aaron Marcuse-Kubitza
06:03 PM Revision 3641: bin/map: out_is_db: Output the put template to stdout so it will be validated in the automated testing
Aaron Marcuse-Kubitza
05:41 PM Revision 3640: xml_func.py: process(): If local XML function can't be found, just replace with last param instead of returning an error. This allows DB-only functions to be ignored in XML output mode.
Aaron Marcuse-Kubitza
05:32 PM Revision 3639: sql_gen.py: ColDict.__setitem__(): Fixed bug where None value should not be replaced with column default value if column has no underlying table
Aaron Marcuse-Kubitza
05:27 PM Revision 3638: sql.py: DbConn.col_info(): If column does not exist, raise sql_gen.NoUnderlyingTableException
Aaron Marcuse-Kubitza
04:58 PM Revision 3637: sql_io.py: put_table(): In log messages, use `.to_str(db)` instead of repr() where possible to use the SQL syntax of the DB driver
Aaron Marcuse-Kubitza
04:51 PM Revision 3636: sql_io.py: put_table(): ignore(): Replacing invalid value with NULL in nullable column: Corrected log message to "Replacing invalid value ... with NULL in column ..." because the rows with that value are not ignored in that case
Aaron Marcuse-Kubitza
04:47 PM Revision 3635: sql.py: run_query(): InvalidValueException: Parse any exception ending in "out of range", not just "field value out of range", in order to support errors that the timezone is out of range
Aaron Marcuse-Kubitza
04:35 PM Revision 3634: schemas/py_functions.sql: _dateRange*(): Made functions STRICT because they return NULL on NULL input
Aaron Marcuse-Kubitza

07/26/2012

09:53 PM Revision 3633: sql_io.py: put(): Use a simple case of put_table(), which now supports everything put() needs. This will enable all row-based and column-based processing to be maintained in the same function, put_table(), and avoids the need to reimplement any column-based functionality (like SQL functions) in put().
Aaron Marcuse-Kubitza
09:51 PM Revision 3632: xml_dom.py: NodeTextEntryIter: Allow empty values through as None, and instead filter them out in TextEntryOnlyIter using new helper function non_empty(). This allows XML functions to decide for themselves whether empty values should be filtered out, because process() will now no longer automatically remove them. This will enable process() to work with SQL functions, which *must not* have empty values filtered out because this will remove required, but nullable, arguments.
Aaron Marcuse-Kubitza
09:45 PM Revision 3631: xml_func.py: Use conv_items() in every XML function that needs empty (NULL) entries removed, so that they are not dependent on what process() does to the items
Aaron Marcuse-Kubitza
09:43 PM Revision 3630: sql_io.py: put_table(): ignore(): Support invalid literals in addition to invalid column values. This also allows put_table() to fully support being called by put().
Aaron Marcuse-Kubitza
08:55 PM Revision 3629: xml_func.py: process(): In row-based mode, if function is not explicitly a relational function but does not exist as a local XML function, treat it as a relational function. This will help in merging sql_io.put() and put_table(), since put() did not support SQL functions but put_table() does, and this ensures that a SQL function is always used if the local XML function has been removed in favor of it.
Aaron Marcuse-Kubitza
08:37 PM Revision 3628: sql_io.py: put_table(): Removed into param to set a custom into table name because put_table() now has all the info it needs to generate this name automatically, and callers are no longer providing it
Aaron Marcuse-Kubitza
07:56 PM Revision 3627: bin/map: by_col: db_xml.put_table() call: Use new col_defaults param to automatically set datasource_id to the in_label (datasource name)
Aaron Marcuse-Kubitza
07:46 PM Revision 3626: xpath.py: path2xml(): Skip to tree created inside root, since that is how callers want to use the returned node
Aaron Marcuse-Kubitza
07:45 PM Revision 3625: db_xml.py: put_table(): Import col_defaults to translate nodes to pkeys
Aaron Marcuse-Kubitza
07:44 PM Revision 3624: db_xml.py: _put_table_part(): Support no in_table, for iterations with only literal values
Aaron Marcuse-Kubitza
07:27 PM Revision 3623: sql_io.py: put_table(): is_literals: When ignoring all rows, return default value instead of always None
Aaron Marcuse-Kubitza
06:35 PM Revision 3622: db_xml.py: put_table(): Removed parent_ids_loc and next params since these are only used in the recursion
Aaron Marcuse-Kubitza
06:17 PM Revision 3621: db_xml.py: put_table(): Split into an outer function that sets up the database environment and subsets in_table, and a (recursive) inner function that imports the data
Aaron Marcuse-Kubitza
05:55 PM Revision 3620: db_xml.py: put_table(): Subsetting and partitioning in_table: Documented that it's OK to do this even if table already the right size because it takes <1 sec
Aaron Marcuse-Kubitza
05:43 PM Revision 3619: sql_io.py: put_table(): Use is_function where caller-provided is_func was used, since is_function determines whether something is a function based on whether it actually exists as a SQL function instead of just whether its name starts with "_". Removed now-unneeded is_func param.
Aaron Marcuse-Kubitza
05:36 PM Revision 3618: sql_io.py: put_table(): Added col_defaults param and use it if there's a missing mapping for a NOT NULL column. This requires callers passing arguments by position to add an empty value for this parameter.
Aaron Marcuse-Kubitza
04:48 PM Revision 3617: bin/map: by_col: Only clear errors table if doing full re-import starting from row 0, not if restarting import at a later row
Aaron Marcuse-Kubitza
04:47 PM Revision 3616: input.Makefile: Import to VegBIEN: Fixed bug where `&>>` was used to append stdout and stderr to the log file, but is not supported on Mac OS X. Replaced with `&>` (overwrite instead of append) because log file is unique by date/time the import runs, so there won't be an existing log file that would be overwritten.
Aaron Marcuse-Kubitza
04:34 PM Revision 3615: schemas/vegbien.sql: Added datasource_id to all tables with a sourceaccessioncode (and corresponding *_unique_datasource constraint on these columns) so they can be directly looked up using just the input table's own fkey to parent. This will enable loading hierarchical (plots) data without "breadcrumbs", a huge benefit! Also added sourceaccessioncode wherever there was a datasource_id, to standardize on these names as being the columns that link directly to the input table rows.
Aaron Marcuse-Kubitza
01:15 PM Revision 3614: README.TXT: Datasource setup: Installing the staging tables: View the logs: Fixed bug in tail syntax to also work on Linux
Aaron Marcuse-Kubitza

07/25/2012

11:04 PM Revision 3613: Added inputs/Madidi/ with empty mappings
Aaron Marcuse-Kubitza
11:01 PM Revision 3612: README.TXT: Datasource setup: Populating the src/ subdir with input data: Added step to make sure each header in multiple part files for a table is EXACTLY the same
Aaron Marcuse-Kubitza
10:56 PM Revision 3611: README.TXT: Datasource setup: Installing the staging tables: Added steps to deal with colliding column names in the flat file headers. Added command to view the logs.
Aaron Marcuse-Kubitza
10:53 PM Revision 3610: csv2db: log(): sys.stderr.write(): Run strings.to_raw_str() on message to handle Unicode chars
Aaron Marcuse-Kubitza
10:52 PM Revision 3609: csv2db: Run strings.to_unicode() on column names to handle Unicode chars
Aaron Marcuse-Kubitza
10:36 PM Revision 3608: csv2db: esc_name(): Use db.esc_name()
Aaron Marcuse-Kubitza
09:25 PM Revision 3607: Added inputs/BIEN2.datasources.xlsx (formerly bien_data_sources.xlsx in nimoy:/home/bien/raw_data/)
Aaron Marcuse-Kubitza
09:06 PM Revision 3606: exc.py: e_msg(): Added assertions to check that e.args is compatible with this function
Aaron Marcuse-Kubitza
08:59 PM Revision 3605: exc.py: Use new e_str() where its definition was used
Aaron Marcuse-Kubitza
08:54 PM Revision 3604: exc.py: Use new Unicode-safe e_msg() instead of strings.ustr() on exceptions
Aaron Marcuse-Kubitza
08:47 PM Revision 3603: exc.py: e_msg(): Run strings.ustr() on the returned string so it will be appendable to other Unicode strings
Aaron Marcuse-Kubitza
08:43 PM Revision 3602: exc.py: Added e_msg(), e_str() (from SQL py_functions._date())
Aaron Marcuse-Kubitza
02:06 PM Revision 3601: db_xml.py: put_table(): Adding fkey to parent: Fixed bug where should only add parent_ids_loc table to list of tables not to truncate if it's a column, because it is sometimes just a pkey value when that iteration contained only literals
Aaron Marcuse-Kubitza
01:56 PM Revision 3600: inputs/import.stats.xls: Updated with stats from latest import
Aaron Marcuse-Kubitza
01:42 PM Revision 3599: inputs/import.stats.xls: Corrected date of last import
Aaron Marcuse-Kubitza
 

Also available in: Atom