Activity
From 08/14/2012 to 09/12/2012
09/12/2012
- 05:36 PM Revision 4671: input.Makefile: $(viaMaps): Removed extra addition of */map.csv, which is already included because all $(tables) have or will get a map.csv
- 05:34 PM Revision 4670: mappings/: Removed no longer used derived file Veg+.vocab.csv
- 05:33 PM Revision 4669: input.Makefile: Removed no longer used $(vocab)
- 05:32 PM Revision 4668: input.Makefile: Maps validation: %/new_terms.csv: Filter out $(coreMap) and $(dict) successively instead of $(vocab), to avoid requiring intermediate mapping files not edited by the user
- 05:28 PM Revision 4667: input.Makefile: Maps validation: $(newTerms): Don't hardcode the caller's first filter_out_ci by prerequisite position; instead allow them to specify the command (including the var name) themselves
- 05:24 PM Revision 4666: input.Makefile: Maps validation: $(newTerms): For simplicity, subset the columns before running filter_out_ci
- 05:20 PM Revision 4665: mappings/: Removed no longer used Veg+-VegBIEN.csv and derived autogen Veg+.self.csv
- 05:16 PM Revision 4664: input.Makefile: Maps building: %/unmapped_terms.csv: Use $(coreMap) instead of $(vocab) because the terms should already be translated to VegCore terms, rather than still being Veg+
- 05:13 PM Revision 4663: input.Makefile: Maps validation: $(newTerms): Fixed bug where header needed to be removed *before* running filter_out_ci because filter_out_ci only removes the header if it matches the vocabulary's header. Removing the header afterward can cause the first row to be removed instead if the header was already removed.
- 05:11 PM Revision 4662: cols: Support CSVs without a header, such as intermediates that become unmapped_terms.csv, new_terms.csv
- 04:37 PM Revision 4661: inputs/: Regenerated unmapped_terms.csv, new_terms.csv
- 04:25 PM Revision 4660: input.Makefile: %/.map.csv.last_cleanup: Removed no longer used prerequisite $(vocab)
- 04:24 PM Revision 4659: input.Makefile: %/.map.csv.last_cleanup: Canonicalize separately on $(coreMap) and $(dict), instead of requiring them to be combined in $(vocab)
- 04:20 PM Revision 4658: input.Makefile: Use mappings/VegCore-VegBIEN.csv instead of mappings/Veg+-VegBIEN.csv as the core map, because the automapper now takes care of Veg+ -> VegCore translation
- 04:14 PM Revision 4657: inputs/*/*/map.csv: Moved filter suffixes to separate filter column to enable automapping to work on those mappings' terms, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Move-filter-suffixes-to-separate-filter-column>. Note that the only changes to VegBIEN.csvs are the (now automapped) names of terms in "No join mapping" comments.
- 03:37 PM Revision 4656: inputs/*/*/map.csv: Added Filter column to contain any suffix added after the term, so that the automapping mechanism does not have to deal with the filter expressions
- 03:35 PM Revision 4655: Added cat_cols
- 03:34 PM Revision 4654: Added ins_col
- 03:13 PM Revision 4653: input.Makefile: Maps building: %/.map.csv.last_cleanup: Reference fixed prerequisites by name instead of by position in the prerequisites list
- 02:28 PM Revision 4652: Removed no longer used intersect
- 02:18 PM Revision 4651: inputs/*/*/map.csv: Removed no longer needed [Veg+] suffix in root, because the input column is no longer used by old-style map utilities such as union that needed this
- 02:07 PM Revision 4650: translate: Translate the column header instead of passing it through, in order to properly support CSVs without a header and to support renaming the header when the column's contents change to a different schema or vocabulary
- 02:04 PM Revision 4649: canon: Canonicalize the column header instead of passing it through, in order to properly support CSVs without a header
- 01:57 PM Revision 4648: filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.
- 01:48 PM Revision 4647: autoremove: `svn rm`: Fixed bug where needed to add --force in case the file had already been modified before being autoremoved
- 01:32 PM Revision 4646: input.Makefile: Maps building: Removed no longer used $(createOnlyMaps)
- 01:30 PM Revision 4645: input.Makefile: Maps building: Removed no longer used %/src.csv, because it is no longer needed to generate map.full.csv from map.csv
- 01:21 PM Revision 4644: input.Makefile: Maps building: %/map.csv: If it doesn't exist, generate directly using $(mkSrcMap) instead of by copying %/src.csv, in order to eventually avoid the need to create a separate src.csv at all. Note that this avoids the need to run make twice when the table is first created to properly bootstrap all maps.
- 01:09 PM Revision 4643: autoremove: Try `svn rm` first in case the file is in svn
- 01:02 PM Revision 4642: input.Makefile: Maps building: Removed no longer used %/map.full.csv
- 12:59 PM Revision 4641: input.Makefile: Maps building: %/VegBIEN.csv: Use %/map.csv directly because %/map.full.csv is now a copy of it
- 12:56 PM Revision 4640: input.Makefile: Maps building: %/map.full.csv: Generate by copying map.csv, because the content of these files now differs only in the sort order of the names
- 12:53 PM Revision 4639: inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.
- 12:43 PM Revision 4638: inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.
- 12:31 PM Revision 4637: join: passthru mode: Fixed bug where empty join mappings needed to have the output field of the right-hand row manually set to the output field of the left-hand row for maps.merge_mappings() to work properly
- 12:14 PM Revision 4636: inputs/*/*/map.csv: Added back automapped mappings to map.csv, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>
- 12:07 PM Revision 4635: inputs/VegBank/taxonobservation_/map.csv: Updated with new renamings of colliding join columns
- 12:00 PM Revision 4634: join: When a join mapping exists but is empty, still include any additional columns from that mapping in the combined row
- 11:48 AM Revision 4633: inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Use input term as the initial Veg+ term, so the src.csv can be used with the Add back automapped mappings process at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>
- 11:31 AM Revision 4632: inputs/XAL/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix
- 11:22 AM Revision 4631: inputs/SpeciesLink/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix
- 11:02 AM Revision 4630: inputs/SpeciesLink/Specimen/map.csv: Removed no longer needed duplicate entries for each first letter case, which cause duplicate output mappings now that join is case- and punctuation-insensitive. Note that the `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.
- 10:27 AM Revision 4629: inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Added Comments column for consistency with autogenerated src.csv format
- 10:14 AM Revision 4628: join: Added new passthru mode which passes through terms with no input mapping or no join mapping
- 09:25 AM Revision 4627: inputs/: Added [Veg+] to via map roots to indicate that the datasource and Veg+ vocabularies are combinable. This is possible now that automapped entries are no longer subtracted when this is in the map root, so there is no concern of losing comments on subtracted automapped rows. Note that this change turns on old-style automapping for these datasources, causing SALVIAS plotMetadata to acquire additional mappings.
- 08:59 AM Revision 4626: canon, translate, filter_out_ci: Support vocabularies/dictionaries with additional columns in addition to the functional column(s) used by the program. These columns can contain comments, etc. This was not originally supported because Python 2's iterable unpacking only supports "an iterable with the same number of items as there are targets in the target list" (http://docs.python.org/reference/simple_stmts.html#assignment-statements). We now use numeric array indexes instead to get around this limitation, and for consistency with other map-manipulation scripts.
- 08:21 AM Revision 4625: Removed no longer used subtract (use filter_out_ci instead)
- 08:19 AM Revision 4624: input.Makefile: Maps building: %/.map.csv.last_cleanup: Removed no longer needed subtraction of automapped entries, because information about unmapped and new terms is now available in unmapped_terms.csv and new_terms.csv
- 08:13 AM Revision 4623: README.TXT: Data import: `make backups/download`: Removed '&' because running the command in the background prevents rsync from providing a continuously updating progress indication (because a backgrounded process's stdout is not a TTY)
- 08:04 AM Revision 4622: mappings/VegCore-VegBIEN.csv: Removed no longer needed /_simplifyPath:[next=parent_id]/path expressions in specific paths because parent_id forwarding is now set globally for all paths in the map root
- 07:56 AM Revision 4621: mappings/VegCore-VegBIEN.csv: Added /_simplifyPath:[next=parent_id]/path to root so the returned subplot location will be its parent location if there is no subplot name or ID (indicating that that particular plot did not have subplots). Note that this also causes the parent_id forwarding effect to occur for all other tables containing parent_id, which will help prevent similar issues with subplot events, etc. This will hopefully fix the SALVIAS.plotObservations bug where some organisms did not have a subplot #, causing the subplot location to become NULL and causing the corresponding locationevent rows not to match the locationevent_unique_within_location index filter condition (which requires a parent_id), which caused multiple output table pkeys to be returned for those rows, violating the locationevent_pkeys temp table's primary key.
- 07:25 AM Revision 4620: mappings/VegCore-VegBIEN.csv: namedplace elements: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg
- 07:02 AM Revision 4619: inputs/import.stats.xls: Updated with stats from latest import
09/11/2012
- 11:04 AM Revision 4618: input.Makefile: Maps validation: $(newTerms): Fixed bug where tail with positive offset needs -n flag
- 11:01 AM Revision 4617: Regenerated/modified inputs/*/*/src.csv to use the self-mapping format used by the new automapping mechanism
- 10:50 AM Revision 4616: src_map: Map source columns to themselves so that src.csv can be used directly with the new automapping mechanism
- 10:48 AM Revision 4615: input.Makefile: Maps validation: %/new_terms.csv: Remove terms which are also in %/unmapped_terms.csv, because terms are not considered new (i.e. potential Veg+ terms) until they have been mapped to an existing Veg+ term. Being unmapped has a higher priority than being new, because it affects the current datasource itself rather than the easier mapping of future datasources.
- 10:22 AM Revision 4614: lib/mappings.Makefile: missing_mappings: Display unmapped_terms.csv, new_terms.csv after generating them, to preserve the behavior of the original missing_mappings
- 10:17 AM Revision 4613: root Makefile: Maps validation: Removed no longer used $(missingMappingsCmd)
- 10:17 AM Revision 4612: input.Makefile: Maps validation: Removed no longer used $(missingMappingsCmd)
- 10:16 AM Revision 4611: lib/mappings.Makefile: Removed no longer needed missing_%_mappings targets, since unmapped_terms.csv and new_terms.csv now serve the same purpose in a more efficient way
- 10:14 AM Revision 4610: lib/mappings.Makefile: `ifndef` for $(termsSubdirs): Fixed bug where needed to be termsSubdirs instead of missingMappingsCmd
- 10:02 AM Revision 4609: lib/mappings.Makefile: Require $(termsSubdirs)
- 10:00 AM Revision 4608: Generated global unmapped_terms.csv, new_terms.csv
- 10:00 AM Revision 4607: root Makefile: Maps validation: Added $(termsSubdirs) to enable generation of global unmapped_terms.csv, new_terms.csv
- 09:59 AM Revision 4606: inputs/: Generated combined unmapped_terms.csv, new_terms.csv for all inputs
- 09:58 AM Revision 4605: lib/mappings.Makefile: $(catTerms): Fixed bug where only existing $+ files (using $(+w)) could be included in the list (both to check and to use), because otherwise cat would raise an error or try to read stdin
- 09:56 AM Revision 4604: Existing maps discovery: Fixed bug where new unmapped_terms.csv, new_terms.csv needed to be included in $(anyMap)
- 09:52 AM Revision 4603: lib/common.Makefile: Added $(+w)
- 09:22 AM Revision 4602: lib/common.Makefile: Added $(no/) to remove trailing /
- 09:18 AM Revision 4601: Extracted %/unmapped_terms.csv, %/new_terms.csv as separate targets in the Maps validation section so they can be invoked even when %/.map.csv.last_cleanup is not a top-level target (in $(MAKECMDGOALS)). Continue to invoke them in %/.map.csv.last_cleanup by using $(selfMake).
- 08:56 AM Revision 4600: input.Makefile: Maps validation: Set $(termsSubdirs) to enable unmapped_terms.csv, new_terms.csv generation
- 08:56 AM Revision 4599: lib/mappings.Makefile: Added unmapped_terms.csv, new_terms.csv which are generated by combining the correspondingly-named files in $(termsSubdirs)
- 08:42 AM Revision 4598: input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Autoremove empty terms lists to avoid clutter
- 08:40 AM Revision 4597: Added autoremove
- 08:22 AM Revision 4596: input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Remove the CSV header from the terms lists so that multiple terms lists can easily be appended together
- 08:16 AM Revision 4595: input.Makefile: Maps building: %/.map.csv.last_cleanup: unmapped_terms.csv, new_terms.csv: Factored out commands into $(newTerms)
- 08:09 AM Revision 4594: input.Makefile: Maps building: %/.map.csv.last_cleanup: Generate reports on new and unmapped terms in map.csv
- 08:07 AM Revision 4593: Added filter_out_ci
- 07:26 AM Revision 4592: input.Makefile: Maps building: %/.map.csv.last_cleanup: Translate map.csv using $(mappings)/$(via)-VegCore.csv
- 07:25 AM Revision 4591: Added translate
- 07:08 AM Revision 4590: mappings/Veg+-VegCore.csv: Removed no longer used Comments column. Use mappings/Veg+.terms.csv to cite term definitions instead.
- 07:06 AM Revision 4589: mappings/Veg+-VegCore.csv: previousCatalogNumber: Removed no longer needed "According to" comment, because this is now documented in the mappings/Veg+.terms.csv entry. Note that the citation for any mapping is the overlap of the terms' definitions, and thus only the definitions need to be cited, not the mapping itself. (The definitions are provided in the links in mappings/Veg+.terms.csv.)
- 07:01 AM Revision 4588: mappings/Veg+.terms.csv: previousCatalogNumber: Added Source link to DwC history entry, which documents the definition of this term
- 06:43 AM Revision 4587: input.Makefile: Maps building: %/.map.csv.last_cleanup: Canonicalize map.csv using $(mappings)/$(via).vocab.csv
- 06:40 AM Revision 4586: Added canon
- 06:29 AM Revision 4585: mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.
- 05:49 AM Revision 4584: Added mappings/Veg+.vocab.csv
- 04:41 AM Revision 4583: inputs/GBIF/Specimen/map.csv: Remapped *Original fields to new verbatim* taxonomic terms
- 04:31 AM Revision 4582: mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.
- 04:23 AM Revision 4581: mappings/Veg+.terms.csv: Added min/max SlopeAspect/SlopeGradient
- 04:13 AM Revision 4580: inputs/VegBank/plot_/map.csv: Omit reallatitude/reallongitude because private data should not be placed in a public database
- 04:10 AM Revision 4579: inputs/CVS/Organism/map.csv: Omit realLatitude/realLongitude because private data should not be placed in a public database. Keeping VegBIEN free of restricted-access data allows anyone to run arbitrary queries on the database, without needing an entire security mechanism/front end just to manage users' read-only access to the data (as VegBank has). Note that the private coordinates are still accessible in the staging tables, so they will need to be locked down in order to make VegBIEN secure to public access.
- 03:16 AM Revision 4578: mappings/Veg+-VegCore.csv: Remapped QuadratID to subplotID because the standard definition of an ID term is an ID that's unique within the datasource, and it's just CTFS's usage that makes it unique only within the plot
- 03:13 AM Revision 4577: inputs/CTFS/StemObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID
- 03:09 AM Revision 4576: inputs/CTFS/SubplotObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID
- 03:06 AM Revision 4575: inputs/CTFS/Subplot/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID. Omit QuadratName because QuadratID is used for the same purpose.
- 02:57 AM Revision 4574: mappings/Veg+-VegCore.csv: Removed recordNumber/_alt and recordNumber redirection mappings so that Veg+-VegCore.csv contains only renamings, not business logic. Note that removing the global ordering of these fields does not affect the datasources which contain multiple recordNumber synonyms because they either have a custom ordering or one field is duplicated or unused.
- 02:49 AM Revision 4573: inputs/NY/Specimen/map.csv: Omit CollectorNumber because it is not used, so it does not need to be mapped
- 02:45 AM Revision 4572: inputs/ARIZ/Specimen/map.csv: Omit FieldNumber because it is identical to CollectorNumber, so it does not need to be mapped
- 02:19 AM Revision 4571: inputs/SpeciesLink/Specimen/map.csv: Added manual CollectorNumber mapping which places it after recordNumber/fieldNumber, so that mappings/Veg+-VegCore.csv doesn't need to maintain a global ordering between these fields and just needs to indicate their equivalency
- 02:09 AM Revision 4570: mappings/: Removed no longer needed Veg+-VegCore.to_self.csv, because multiple levels of mappings are no longer needed to get to the VegCore term
- 02:07 AM Revision 4569: mappings/Veg+-VegCore.csv: DescriptionOfSite: Mapped directly to locality rather than to locationNarrative to avoid needing multiple levels of mappings to get to the VegCore term
- 01:56 AM Revision 4568: mappings/Veg+-VegCore.csv: Removed scientificNameAuthorship/_alt and scientificNameAuthorship redirection mappings, which were only used by SpeciesLink but it now has the necessary _alts in its own map.csv
- 01:48 AM Revision 4567: mappings/Veg+-VegCore.csv: Removed dateCollected/_alt and dateCollected redirection mappings, which were only needed when multiple dateCollected fields were being combined in Veg+-VegCore.csv
- 01:45 AM Revision 4566: mappings/: Moved year/month/dayCollected mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayCollected values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateCollected field is now higher than the year/month/dayCollected fields when both are specified, because the dateCollected field presumably contains verbatim text while the year/month/dayCollected fields contain parsed date parts.
- 01:32 AM Revision 4565: inputs/SALVIAS-CSV/Organism/map.csv: Remapped census_date to eventDate, since it is not the start of a range
- 01:31 AM Revision 4564: inputs/Madidi/Plot/map.csv: Remapped First evaluation to eventDate, since it is not necessarily the start of a range
- 01:23 AM Revision 4563: mappings/VegCore-VegBIEN.csv: startDate, endDate mappings: Removed _dateRangeStart/_dateRangeEnd filters because these are assumed to already be start and end dates of a range. (eventDate should be used for concatenated date ranges.)
- 01:09 AM Revision 4562: mappings/VegCore-VegBIEN.csv: Don't map dateCollected to locationevent.obsstartdate/obsenddate because this is the date the *specimen* was collected, not the date (range) of the entire collection *event*. This distinction may not be meaningful for specimens data, but VegBIEN should reflect what the data provider designated. This also reduces the number of dateCollected-related mappings needed for any dateCollected-related field, such as year/month/dayCollected.
- 12:55 AM Revision 4561: mappings/Veg+-VegCore.csv: Removed dateIdentified/_alt and dateIdentified redirection mappings, which were only needed when multiple dateIdentified fields were being combined in Veg+-VegCore.csv
- 12:50 AM Revision 4560: mappings/: Moved year/month/dayIdentified mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayIdentified values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateIdentified field is now higher than the year/month/dayIdentified fields when both are specified, because the dateIdentified field presumably contains verbatim text while the year/month/dayIdentified fields contain parsed date parts.
- 12:34 AM Revision 4559: mappings/: Moved verbatimGrowthForm filter mapping from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic
- 12:28 AM Revision 4558: inputs/UNCC/Specimen/map.csv, inputs/NCU-NCSC/Specimen/map.csv: Remapped cultivated fields directly via new cultivated term, rather than via establishmentMeans
- 12:06 AM Revision 4557: sql_io.py: mk_errors_table(): Don't cache the sql.table_exists() query, because the table will be created and its existence must be rechecked
- 12:02 AM Revision 4556: sql.py: table_exists(): Allow caller to set whether query will be cached. This is useful if the table will later be created and its existence should be checked again.
- 12:00 AM Revision 4555: sql.py: tables(): Allow caller to set whether query will be cached
09/10/2012
- 11:51 PM Revision 4554: mappings/VegCore-VegBIEN.csv: Mapped cultivated
- 11:47 PM Revision 4553: inputs/TEAM/: Added _src/README.TXT with Brad's comments on which files to use
- 11:01 PM Revision 4552: mappings/Veg+.terms.csv: Added cultivated
- 10:35 PM Revision 4551: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Removed manual VACUUM run because this is done as part of $(exportHeader), which calls $(cleanup)
- 10:34 PM Revision 4550: input.Makefile: Staging tables installation: $(cleanup): Append output to log
- 10:21 PM Revision 4549: schemas/py_functions.sql: Added pass-through _date(timestamp) for datasource date columns that are already timestamps
- 10:12 PM Revision 4548: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Fixed bug where embedded \ in ADD COLUMN statement was not removed by the shell, because single quotes do not remove embedded \s
- 09:55 PM Revision 4547: inputs/VegBank/vegbank.~.clean_up.sql: Also rename taxonobservation.reference_id to taxonobservation_reference_id
- 09:51 PM Revision 4546: input.Makefile: Staging tables installation: $(logInstall*Add): Fixed bug where needed to only add -a flag for tee when tee was actually being used (in verbose mode), not when &> is used instead
- 09:49 PM Revision 4545: inputs/VegBank/taxonobservation_/header.csv: Updated for new renames in vegbank.~.clean_up.sql
- 09:34 PM Revision 4544: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also log the output of commands run after create.sql
- 09:30 PM Revision 4543: input.Makefile: Staging tables installation: Factored $(call logInstall,$*/) out into $(logInstall*)
- 09:25 PM Revision 4542: schemas/py_functions.sql: Added pass-through _dateRangeStart(timestamp), _dateRangeEnd(timestamp) for datasource date columns that are already timestamps
- 09:23 PM Revision 4541: inputs/VegBank/plantconcept_/header.csv: Updated for new renames in vegbank.~.clean_up.sql
- 09:11 PM Revision 4540: inputs/VegBank/plantconcept_/create.sql: Use new plantconcept_plantnames()
- 09:09 PM Revision 4539: inputs/VegBank/vegbank.~.utils.sql: plantconcept_plantnames(): Use SQL SELECT query and WITH clause (http://www.postgresql.org/docs/8.4/static/queries-with.html) instead of temp table, because PostgreSQL does not support using temp tables inside functions that are called repeatedly (http://archives.postgresql.org/pgsql-general/2006-02/msg00516.php; it results in an "out of shared memory" error)
- 08:30 PM Revision 4538: inputs/VegBank/vegbank.~.utils.sql: Removed hardcoded schema name, which is set dynamically by input.Makefile using `SET search_path`
- 08:26 PM Revision 4537: inputs/VegBank/vegbank.~.utils.sql: Added plantconcept_plantnames()
- 07:28 PM Revision 4536: inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Made function STABLE instead of IMMUTABLE because it accesses DB tables
- 07:21 PM Revision 4535: inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where the original plantconcept table's columns needed to be renamed, rather than the derived table plantconcept_'s. Note that this script runs before any derived tables are created, so this would be the wrong place for these statements if the derived table's columns did need to be renamed.
- 07:05 PM Revision 4534: input.Makefile: Staging tables installation: $(dbExports): Sort each group of .sql files in lexical order, since $(wildcard) apparently does not sort them that way automatically on vegbiendev
- 06:55 PM Task #490 (New): change import.stats.xls to use field rather than row count
- * This will be more accurate, because different data sources have different #s of columns, and this affects the load ...
- 06:53 PM Revision 4533: inputs/import.stats.xls: Updated with stats from latest import. Corrected input row count of CTFS.TaxonOccurrence, which had been set to the inserted row count (which is right above it in the log file).
- 06:35 PM Revision 4532: schemas/vegbien.sql: taxonrank: Added comment documenting source of values
09/07/2012
- 04:57 PM Revision 4531: inputs/VegBank/taxonobservation_/map.csv: Mapped observation_id to eventID
- 04:49 PM Revision 4530: inputs/TEAM/: Added VL
- 04:43 PM Revision 4529: inputs/VegBank/: Added taxonobservation_/
- 04:43 PM Revision 4528: inputs/VegBank/: Added plantconcept_/
- 04:22 PM Revision 4527: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Ignore errors if create.sql already added a primary key
- 04:12 PM Revision 4526: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Provide the table name as a var (:table) to the query
- 03:56 PM Revision 4525: inputs/VegBank/vegbank.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined
- 03:55 PM Revision 4524: to_do/timeline.doc: Updated to reflect additional time that validations will take, and analytical DB's dependency on it
- 02:54 PM Revision 4523: Added validation/
- 12:56 PM Revision 4522: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Time the install
- 12:54 PM Revision 4521: inputs/VegBank/: Added plantconcept_/
- 12:35 PM Revision 4520: inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Renamed ancestor_id output param to plantconcept_id for clarity and so it can be directly USING-joined with plantconcept on plantconcept_id
- 12:24 PM Revision 4519: inputs/VegBank/: Added vegbank.~.utils.sql (which runs after vegbank.sql), for use by tables' create.sql scripts
- 10:57 AM Revision 4518: inputs/import.stats.xls: Updated with stats from latest import
- 10:43 AM Revision 4517: inputs/VegBank/: Added observation_/
- 10:31 AM Revision 4516: inputs/VegBank/: Added vegbank.~.clean_up.sql (which runs after vegbank.sql), to prevent "cannot alter type of a column used by a view or rule" errors
- 10:14 AM Revision 4515: inputs/VegBank/: Added plot_/
- 10:13 AM Revision 4514: inputs/VegBank/: Added plot_/
- 10:13 AM Revision 4513: inputs/VegBank/: Added logs
- 10:12 AM Revision 4512: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Log the output to the install log, just like for other %/install targets
- 10:06 AM Revision 4511: vegbien_dest: schemas: Added public explicitly, even though it's already in the default search_path, in order to shadow any datasource's tables of the same name as a VegBIEN table (such as in VegBank). (VegBIEN tables are referenced without a schema, while datasource tables are referenced with a schema, so collisions are not a problem after this fix.)
- 09:55 AM Revision 4510: input.Makefile: Staging tables installation: sql/install: Fixed bug where needed space before \ at end of line, because one is not automatically added in a recipe command (although it's added elsewhere)
- 09:51 AM Revision 4509: sql.py: run_query(): DuplicateException: Also match "of relation" part of error message, so that parsed column name does not contain "of relation"
- 09:24 AM Revision 4508: subtract: Made it case- and punctuation-insensitive
- 09:18 AM Revision 4507: mappings/: Removed no longer needed Veg+.cs-VegBIEN.csv, which is now the same as Veg+-VegBIEN.csv which was derived from it
- 09:16 AM Revision 4506: join: Documented that it's case- and punctuation-insensitive.
- 09:16 AM Revision 4505: bin/map: map_table(): Refactored to map simplified to original column names first and then determine column index for each original name, in order to avoid trying to recover the original name from a simplified name where multiple original names might collide onto the same simplified name. Documented that it's case- and punctuation-insensitive.
- 09:11 AM Revision 4504: intersect, union: Made case- and punctuation-insensitive. mappings/Veg+-VegBIEN.csv: Removed no longer needed duplicate entries for each first letter case, which must now be removed for case- and punctuation-insensitive intersect/union to work. Note that the SpeciesLink `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.
- 08:42 AM Revision 4503: bin/map: map_table(): Resolve all mappings and prefixes after applying maps.simplify()
- 08:37 AM Revision 4502: inputs/SpeciesLink/Specimen/map.csv: _alt all scientificNameAuthorship synonyms together in one _alt
- 08:27 AM Revision 4501: schemas/functions.sql: _alt(): Added extra numbered parameters. Eventually these will need to be converted to variadic args, but this will require special support from column-based import.
- 07:26 AM Revision 4500: join: Use new maps.simplify()
- 07:26 AM Revision 4499: maps.py: Added simplify()
- 07:23 AM Revision 4498: join: Match terms with non-alphanumeric chars removed
- 07:15 AM Revision 4497: join: Match terms case-insensitively
09/06/2012
- 11:17 PM Revision 4496: Added inputs/TEAM/
- 10:55 PM Revision 4495: sql_io.py: put_table(): Creating the into table: into_out_pkey: If is_function, just use "result" as the output column name, without prefixing the function name. This shortens the table names of function calls on function calls, which need a fixed column name to detect which columns are function results and use just the table names for those columns.
- 10:32 PM Revision 4494: input.Makefile: Documentation: $(steps): Fixed bug where import make target needed to be changed to new single-table import target
- 09:38 PM Revision 4493: schemas/vegbien.sql: analytical_db_view: Changed LEFT JOINs to JOINs where tables contain information that's required for the analytical DB. This should also enable the PostgreSQL query planner to make additional join optimizations, in the hopes of avoiding disk-space-intensive hash joins.
- 08:42 PM Revision 4492: Replaced repr() with strings.urepr() (or equivalent) everywhere needed, to avoid future UnicodeEncodeErrors
- 08:30 PM Revision 4491: Replaced str() with strings.ustr() (or equivalent) everywhere needed, to avoid future UnicodeEncodeErrors
- 08:03 PM Revision 4490: sql.py: map_expr(): Replacing without quotes: Don't match unquoted name where it's preceded or followed by '.', because this could be a '.' embedded in a punctuation-containing column name, such as those frequently used by column-based import. Note that because database-internal names currently do not contain punctuation, this situation only occurs when a database-internal expression (such as a check constraint condition) is replaced in two steps, and the first step introduces punctuation-containing column names into the expression.
- 07:19 PM Revision 4489: schemas/vegbien.sql: project: Don't require projectname to be specified when sourceaccessioncode is provided
- 07:14 PM Revision 4488: sql_gen.py: ensure_not_null(): If type_ is set, cast the column to it if needed
- 06:56 PM Revision 4487: README.TXT: Data import: Added testing steps to perform on local machine before running the import
- 06:49 PM Revision 4486: README.TXT: Documentation: Redmine-formatted list of steps for column-based import: Updated make command for new table subdir name
- 06:27 PM Revision 4485: sql.py: run_query(): Parse "types cannot be matched" error as MissingCastException to type text
- 06:10 PM Revision 4484: sql_io.py: put_table(): Creating the into table: Fixed bug where in_pkey and out_pkey names would collide if the output and input pkeys have the same name (as is the case for SALVIAS.projects). This entails changing out_pkey to new into_out_pkey wherever the into table's out_pkey is created or referenced.
- 05:06 PM Revision 4483: sql_io.py: put_table(): Combining output and input pkeys in inserted order: Changed sql_gen.Table to sql_gen.Col when creating the column references (they have a similar effect, so using the wrong type did not cause any tests to fail)
- 04:49 PM Revision 4482: README.TXT: Added steps before the import to `svn up` and update the schemas
- 04:47 PM Revision 4481: README.TXT: Merged Backups > After a new import and Data import sections into one Data import section that contains the steps to perform and back up an import. Note that many `svn diff` lines result from a change in indentation.
- 04:35 PM Revision 4480: sql_io.py: put_table(): Combining output and input pkeys in inserted order: Fixed bug where column references would be ambiguous if the output and input pkeys have the same name (as is the case for SALVIAS.projects)
- 04:21 PM Revision 4479: schemas/functions.sql: Added _nullIf() overload where the type param has type text, to handle cases where row-based import auto-casts all args to text in response to a 'could not determine polymorphic type because input has type "unknown"' error
- 04:18 PM Revision 4478: schemas/vegbien.sql: party: Removed party_datasource unique index because it was causing problems with column-based import (due to multiple unique indexes covering the same columns in different ways), and because it prevented creation of more than one party per organization
- 03:54 PM Revision 4477: xml_func.py: _if(): Documented that it must be run to remove conditions that functions._if() can't handle
- 03:42 PM Revision 4476: README.TXT: Datasource setup: Testing: Added step to test column-based import (by_col=1), because it is stricter about types than row-based import and sometimes fails when row-based import succeeds
09/05/2012
- 09:18 AM Revision 4475: schemas/functions.sql: _nullIf(): Polymorphically support other datatypes besides text
- 09:09 AM Revision 4474: bin/map: Clearing errors table: Fixed bug where needed to check if sql_io.errors_table() returned None (indicating that the errors table didn't exist) before calling sql.drop_table()
- 09:04 AM Revision 4473: bin/map: Clearing errors table: Fixed bug where needed to use sql.drop_table() instead of sql.truncate() now that errors tables are not created until column-based import runs
- 08:54 AM Revision 4472: input.Makefile: Maps validation: $(missingMappingsCmd): Fixed bug where need to use system's sort, not bin/sort, now that bin/ is added to the PATH by this makefile
- 08:34 AM Revision 4471: inputs/SALVIAS/verify/plots.ref: Regenerated on PostgreSQL staging tables. The orders have changed slightly because this is derived from a PostgreSQL translation of the queries, with corresponding changes in collations and NULL sort orders. The counts have also changed slightly, possibly due to the changes Brad made to the salvias_plots database on nimoy after the initial version was downloaded. (The current counts are correct according to the current salvias_plots database.)
- 08:31 AM Revision 4470: inputs/SALVIAS/verify/plots.ref.sql: # locations: Fixed bug where a NULL value in LatDec or LongDec would propagate to the concatenated value, reducing its uniqueness
- 08:26 AM Task #484 (Resolved): support installing staging tables directly from a MySQL export
- 08:14 AM Revision 4469: inputs/SALVIAS/verify/plots.ref.sql: Retrofitted to work with PostgreSQL staging tables
- 07:51 AM Revision 4468: schemas/vegbien.sql: project: Added project_unique_name_date unique index for projects that don't have a sourceaccessioncode
- 07:46 AM Revision 4467: inputs/SALVIAS/plotMetadata/map.csv: Remapped project_id to project.sourceaccessioncode
- 07:37 AM Revision 4466: inputs/SALVIAS/: Added projects/
- 07:32 AM Revision 4465: input.Makefile: Sources: $(catSrcs): Fixed bug where needed to use cat_csv even if subdir was not actually a CSV table, because this also cats the header.csv file created for a subdir that references an already-installed staging table
- 07:26 AM Revision 4464: input.Makefile: Existing maps discovery: Fixed bug where top-level logs dir needed to be excluded from list of subdirs that are treated as tables
- 07:00 AM Revision 4463: my2pg: Prepend 'SET standard_conforming_strings = off;' because this defaults to on starting with PostgreSQL 9.1
- 06:41 AM Revision 4462: schemas/vegbien.sql: locationevent: Made location_id optional when sourceaccessioncode is provided, since a sourceaccessioncode is globally unique and does not require a location to scope it
- 06:36 AM Revision 4461: input.Makefile: Staging tables installation: Store install logs for full-DB exports in new logs subdir of main dir. This also fixes a bug where the install log itself was considered a DB export, because its extension was .log.sql.
- 06:33 AM Revision 4460: Added inputs/SALVIAS/logs/
- 06:33 AM Revision 4459: input.Makefile: SVN: add: Also add logs subdir of main dir, to store install logs for full-DB exports
- 06:23 AM Revision 4458: mappings/VegCore-VegBIEN.csv: if subplot: Also forward locationID and plotName to the location of the parent locationevent (in addition to the parent location of the location), in order to "complete the diamond" connecting subplot locationevent -> (parent plot locationevent, subplot location) -> parent plot location
- 06:09 AM Revision 4457: sql_io.py: cleanup_table(): NullValueException: Log the caught exception so it's clear that the update is being retried
- 06:05 AM Revision 4456: input.Makefile: Staging tables installation: %/install: Fixed bug where $(if $(isRef)) needed to be checked before $(if $(nonXml)) because a subdir referencing an already-installed staging table must be treated specially by ignoring its autogenerated header.csv file, and not trying to install that file as if it were itself CSV data
- 05:49 AM Revision 4455: my2pg, my2pg.data: Fixed bug where replacement for '0000-00-00' date needed to be wrapped in single quotes
- 05:45 AM Revision 4454: input.Makefile: sql/install: Log the installation of a full-DB export to a log file in the main dir
- 05:38 AM Revision 4453: input.Makefile: Staging tables installation: %/install: Factored out stderr logging into $(logInstall)
- 05:35 AM Revision 4452: input.Makefile: Support empty subdirs referencing an already-installed staging table everywhere, by replacing $(isCsv) with new $(nonXml) where needed
- 05:22 AM Revision 4451: inputs/SALVIAS/: Switched to using the DB export's staging tables instead of the exported CSVs
- 05:08 AM Revision 4450: input.Makefile: Staging tables installation: Treat empty subdirs as referencing an already-installed staging table, and run cleanup and header export operations on them
- 04:48 AM Revision 4449: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Factored out cleanup and header export operations for reuse in other types of table subdirs
- 04:23 AM Revision 4448: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Removed deprecated (but benign) errors_table_only option to csv2db. Run csv2db without a command in order to clean up the created staging table.
- 03:57 AM Revision 4447: sql_io.py: cleanup_table(): Removed no longer used cols param
- 03:56 AM Revision 4446: csv2db: When no command is specified, just clean up the specified table
- 03:55 AM Revision 4445: sql_io.py: cleanup_table(): Always clean up all columns in the table
- 03:43 AM Revision 4444: sql_io.py: cleanup_table(): Handle NullValueExceptions (due to setting values to NULL in a NOT NULL column) by dropping the NOT NULL constraint
- 03:32 AM Revision 4443: sql.py: Added drop_not_null()
- 03:29 AM Revision 4442: sql_gen.py: is_text_col(): Also consider character varying to be a text type
- 03:07 AM Revision 4441: csv2db: Removed no longer used errors_table_only option
- 03:00 AM Revision 4440: README.TXT: Schema changes: Removed step to reinstall errors tables, because they are now created automatically by column-based import
- 02:59 AM Revision 4439: csv2db: Removed no longer needed creation of errors table, because it is now created automatically by column-based import
- 02:58 AM Revision 4438: input.Makefile: Staging tables installation: $(dbExports): Fixed bug where it would be non-empty even when the input contains no DB exports, because += adds extra whitespace. This caused sql/install to be incorrectly included as part of $(allInstalls).
- 02:49 AM Revision 4437: db_xml.py: put_table(): Create errors table if it doesn't exist
- 02:48 AM Revision 4436: sql_io.py: Added mk_errors_table()
- 02:05 AM Revision 4435: inputs/Makefile: Input data: $(rsyncSrcs): Also exclude logs subdirs located at more than one level below the root, which occurs for example when a table subdir is moved into _archive/
- 01:56 AM Revision 4434: input.Makefile: Staging tables installation: sql/install: Fixed bug where _always was part of $+, causing cat to try to cat this nonexistent file
- 01:51 AM Revision 4433: Added inputs/SALVIAS/salvias_plots.schema.sql
- 01:50 AM Revision 4432: Added inputs/SALVIAS/_MySQL/
- 01:47 AM Revision 4431: input.Makefile: Staging tables installation: MySQL exports: Run all non-data-only exports through my2pg, not just schema-only exports. This supports transforming a combined schema+data export.
- 01:42 AM Revision 4430: my2pg: Also perform data-only replacements, since default values can contain data-specific replacements. This also allows my2pg to transform a combined schema+data export.
- 01:39 AM Revision 4429: input.Makefile: Staging tables installation: Also translate MySQL data to PostgreSQL
- 01:38 AM Revision 4428: Added my2pg.data
- 01:28 AM Revision 4427: input.Makefile: Staging tables installation: Place MySQL exports in separate _MySQL/ subdir so they don't clutter up the main dir, which will contain PostgreSQL translations
- 01:03 AM Revision 4426: Added my2pg
- 01:02 AM Revision 4425: input.Makefile: Staging tables installation: DB exports: Concatenate all exports together, with schemas first, so that any config options which were applied only in the schema export will remain active when the data is imported. Changed `%.pg.sql: %.my.sql` to `%.schema.sql: %.schema.my.sql` so there doesn't need to be a .pg suffix for PostgreSQL schemas and only the schema gets translated.
- 12:15 AM Revision 4424: input.Makefile: Staging tables installation: $(dbExports): Don't consider MySQL DB exports as part of the DB exports that get installed, because they are not directly installable
- 12:13 AM Revision 4423: input.Makefile: Staging tables installation: Added `%.pg.sql: %.my.sql` to translate MySQL DB schemas to PostgreSQL
09/04/2012
- 09:20 PM Revision 4422: inputs/SALVIAS/_src/: Added salvias_plots.sql.url to provide a link to where salvias_plots.sql was exported from (it was not a raw file given to us by the data provider)
- 08:57 PM Revision 4421: Added cc_tty
- 08:57 PM Revision 4420: inputs/input.Makefile: `%: %.make`: Don't automatically redirect stderr to a log file, because some .make scripts need to display password prompts, etc. on the TTY and output them to stderr instead of /dev/tty
- 08:49 PM Revision 4419: inputs/REMIB/nodes.make: Fixed bin dir path for new subdir layout
- 08:48 PM Revision 4418: inputs/SpeciesLink/tapir.make: Write log messages to a log file ($0.log) instead of to stderr, because the verbose log messages should not fill up stderr. To view the progress, you should instead tail the created log file.
- 08:41 PM Revision 4417: inputs/REMIB/nodes.make: Updated path to node exports to use new subdir layout (in Specimen subdir, and without .specimens suffix)
- 08:38 PM Revision 4416: inputs/REMIB/nodes.make: Fixed lib dir path in sys.path.append() for new subdir layout
- 08:37 PM Revision 4415: inputs/REMIB/nodes.make: Write log messages to a log file ($0.log) instead of to sys.stderr, because the verbose log messages should not fill up stderr. To view the progress, you should instead tail the created log file.
- 08:23 PM Revision 4414: input.Makefile: Add the bin folder to the PATH so .make scripts can easily use programs in it
- 08:06 PM Revision 4413: input.Makefile: Staging tables installation: Support installing a DB export directly into the staging schema, without needing to first export it as CSVs
- 07:52 PM Revision 4412: inputs/SALVIAS/: Added _src/ subdir to store original DB export (before re-export in a PostgreSQL-compatible form)
- 07:31 PM Revision 4411: input.Makefile: `%: %.make`: Only remake if doesn't exist. This prevents unintentional remaking when the make script is newly checked out from svn (which sets the mod time to now) but the output is synced externally.
- 07:23 PM Revision 4410: input.Makefile: `%: %.make`: Removed no longer applicable comment, which applied when there were two separate `%: %.make`-related rules
- 06:55 PM Revision 4409: input.Makefile: Use $(inDatasrc) wherever its value was used
- 06:54 PM Revision 4408: input.Makefile: Added $(inDatasrc)
- 06:40 PM Revision 4407: sql_io.py: cleanup_table(): Only clean up text columns, to support staging tables with other column types
- 06:40 PM Revision 4406: sql_gen.py: Added is_text_col()
- 06:29 PM Revision 4405: sql_io.py: cleanup_table(): Add table to each column so its type can later be determined from the DB
- 06:13 PM Revision 4404: inputs/NY/verify/specimens.ref: Regenerated from specimens.ref.sql. The counts have changed slightly because this is derived directly from the NY CSV file, rather than from the nybg_raw BIEN2 staging table.
- 06:11 PM Revision 4403: inputs/NY/verify/specimens.ref.sql: Retrofitted to use PostgreSQL instead of MySQL syntax, since this now runs on the PostgreSQL staging tables
- 06:09 PM Revision 4402: input.Makefile: Verification of import: Added `%.ref: %.ref.sql` rule to make datasource's summary statistics from its staging tables. (This was previously run on a MySQL installation of the datasource, and thus limited to MySQL inputs, but we are now able to use the staging tables for this.)
- 06:04 PM Revision 4401: input.Makefile: Verification of import: $(verify): Factored psql command with output format settings into separate $(psqlExport) var
- 05:57 PM Revision 4400: schemas/vegbien.sql: analytical_db_view: Switched join order of location and party (datasource) tables, to facilitate using a nested loop join to fill in the datasource names
- 05:55 PM Revision 4399: schemas/vegbien.sql: party: Added party_datasource index on just the organizationname to facilitate querying just the datasources
- 04:25 PM Revision 4398: schemas/vegbien.sql: make_analytical_db(): Removed explicit schema reference so that the function can be redirected to use the current (rotated) schema using the search_path
08/31/2012
- 08:32 PM Revision 4397: schemas/Makefile: Removed no longer needed analytical_db, which has been replaced by bin/make_analytical_db
- 08:31 PM Revision 4396: README.TXT: After a new import: Use bin/make_analytical_db instead of `make schemas/analytical_db`, and run it asynchronously because it takes a long time
- 08:29 PM Revision 4395: Added make_analytical_db
- 08:22 PM Revision 4394: schemas/Makefile: Analytical DB: analytical_db: Time the creation of the analytical DB
- 08:18 PM Revision 4393: README.TXT: After a new import: Added command to make the analytical DB
- 08:15 PM Revision 4392: schemas/Makefile: Added analytical_db target
- 08:09 PM Revision 4391: schemas/vegbien.sql: Added make_analytical_db() and helper view analytical_db_view. Note that adding a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing the svn diff to change completely even though the DB structure has only been added to.
- 08:05 PM Revision 4390: schemas/vegbien.sql: Removed OIDs from tables because we don't use them (tables have primary keys instead)
- 06:55 PM Task #486 (New): add unit-conversion mechanism
- * This is primarily needed for DBH, plot area, and elevation/depth
* Make quantities with units be a tuple type cont... - 02:34 PM Task #485 (New): track data provider's citation requirements in VegBIEN
- * Some providers require them to be cited on any analysis that's conducted with their data:
** "Forest Plots Databas... - 02:23 PM Revision 4389: inputs/import.stats.xls: Updated with stats from latest import. This now includes CTFS.TaxonOccurrence (presence-only observations), FIA (11 million rows!), and Madidi.Organism. The addition of FIA almost doubles the # of rows to 26 million and increases the import time from 9.5 to 11.5 hours.
- 02:08 PM Task #483 (Rejected): rename staging table columns according to map.csv
- Reinstalling staging tables whenever a mapping changes is not a good idea.
Renaming will instead continue to occur d...
08/30/2012
- 04:54 PM Revision 4388: sql_io.py: null_strs: Added 'UNKNOWN'
- 04:02 PM Revision 4387: Added inputs/FIA/
- 12:45 PM Revision 4386: inputs/: Renamed subfolders to VegCSV names, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-to-VegCSV-names>
- 12:37 PM Revision 4385: inputs/Madidi/1.organisms/map.csv: Mapped columns
- 11:46 AM Revision 4384: inputs/Madidi/0.plots/map.csv: Remapped DMS Latitude/Longitude to verbatimLatitude/verbatimLongitude, since this is not the decimalLatitude/decimalLongitude
- 11:40 AM Revision 4383: input.Makefile: Testing: %-ok: Rename the test output to the accepted test output instead of copying it, because outputs of successful (including newly accepted) tests should be removed to reduce clutter (as $(runTest) does)
- 11:35 AM Revision 4382: mappings/Veg+-VegCore.csv: Remapped CTFS QuadratID to subplot rather than subplotID, because it's only unique within the parent plot, not globally unique, in CTFS
- 11:23 AM Revision 4381: inputs/import.stats.xls: Updated with stats from latest import. This now includes the core CTFS tables.
- 11:10 AM Revision 4380: Added inputs/VegBank/ with DB export
- 11:04 AM Revision 4379: input.Makefile: General targets: `%: %.make`: Don't always remake the target whenever it's visited, as other targets may depend on this file and it should not be remade whenever they are visited
- 11:00 AM Revision 4378: input.Makefile: General targets: `%: %.make`: Changed log file suffix to .log, because this log does not necessarily contain SQL statements
- 10:57 AM Revision 4377: input.Makefile: General targets: `%: %.make`: Time the creating command
- 10:55 AM Revision 4376: input.Makefile: General targets: Removed duplicate `%: %.make` rule
- 10:43 AM Revision 4375: inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused
- 10:42 AM Revision 4374: inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused
- 10:32 AM Revision 4373: mappings/Veg+-VegCore.csv: Mapped speciesInvID
- 10:27 AM Revision 4372: mappings/Veg+.terms.csv: Added speciesInvID
- 10:25 AM Revision 4371: mappings/VegCore-VegBIEN.csv: Mapped taxonOccurrenceID
- 10:22 AM Revision 4370: mappings/Veg+.terms.csv: Added taxonOccurrenceID
- 10:14 AM Revision 4369: inputs/CTFS/: Added TaxonOccurrence/ and its joined tables
- 10:13 AM Revision 4368: inputs/CTFS/: Added TaxonOccurrence/ and its joined tables
- 10:06 AM Revision 4367: inputs/CTFS/_archive/Organism.VegX/README.TXT: Added calculation of StemObservation rows distribution for each plot, which indicates that the bci plot actually contains 90% of the StemObservation rows. This brings the size inflation of VegX down to ~6x.
- 09:42 AM Revision 4366: inputs/CTFS/_archive/Organism.VegX/: Added README.TXT describing that this VegX export includes only *one* of 157 CTFS plots. This is important, because it indicates that VegX creates a ~1000x (!) increase in storage size (613.6 MB for bci.sql with 157 plots vs. 3.78 GB for VegX_CTFS_row_*.xml with 1 plot, assuming roughly equal #s of stems per plot).
- 09:08 AM Revision 4365: inputs/CTFS/StemObservation/map.csv: Remapped StemID to authorStemCode since it's only unique within the parent organism (Tree), not a globally unique ID as is required for stemID
- 09:05 AM Revision 4364: mappings/VegCore-VegBIEN.csv: Mapped authorStemCode
- 08:58 AM Revision 4363: mappings/Veg+.terms.csv: Added authorStemCode
- 08:58 AM Revision 4362: mappings/VegCore-VegBIEN.csv: Mapped stemID
- 08:52 AM Revision 4361: inputs/SALVIAS/2.stems/map.csv: Mapped stem_id
- 08:49 AM Task #484 (Resolved): support installing staging tables directly from a MySQL export
- * requires (re-)exporting MySQL DB with "@--compatible=postgresql@":http://dev.mysql.com/doc/refman/5.6/en/mysqldump....
- 08:46 AM Revision 4360: README.TXT: Datasource setup: Added steps to install any MySQL export
- 08:13 AM Revision 4359: mappings/VegCore-VegBIEN.csv: Mapped stemID
- 08:10 AM Revision 4358: mappings/Veg+-VegCore.csv: Mapped stem_id
- 08:05 AM Revision 4357: repl: Support treating all patterns as plain text (non-regexp)
- 07:52 AM Revision 4356: mappings/Veg+.terms.csv: Added stem_id
- 07:51 AM Revision 4355: mappings/Veg+.terms.csv: Added stemID
- 07:44 AM Revision 4354: mappings/Veg+-VegCore.csv: Mapped speciesName, subSpeciesName
- 07:43 AM Revision 4353: mappings/Veg+.terms.csv: Added CTFS taxonomic name columns
- 07:28 AM Revision 4352: mappings/Veg+.terms.csv: Removed comments not applicable to the term itself
- 07:25 AM Revision 4351: Inputs with multiple tables: Added explicit import_order.txt files, so that sort orders can later be removed from the subdir names
08/29/2012
- 11:17 PM Revision 4350: inputs/CTFS/: Added StemObservation/ and tables it is joined from
- 11:09 PM Revision 4349: mappings/Veg+-VegCore.csv: Mapped stemTag
- 11:08 PM Revision 4348: mappings/Veg+.terms.csv: Added stemTag
- 11:04 PM Revision 4347: mappings/Veg+-VegCore.csv: Mapped DBH
- 11:02 PM Revision 4346: mappings/Veg+.terms.csv: Added DBH
- 10:58 PM Revision 4345: input.Makefile: Maps building: Added comment that you cannot make a subdir separately from the entire datasource dir
- 10:17 PM Revision 4344: inputs/CTFS/Plot/create.sql: Added newline at end of file
- 10:04 PM Revision 4343: inputs/CTFS/: Renamed Site.src to Plot.src to use a VegCSV name for the table
- 10:01 PM Revision 4342: README.TXT: Datasource setup: Adding input data for each table: `make inputs/<datasrc>/<table>/add`: Added note explaining why you need to use this command instead of just creating an empty directory of the desired name
- 08:44 PM Revision 4341: inputs/CTFS/: Added SubplotObservation/
- 08:38 PM Revision 4340: mappings/VegCore-VegBIEN.csv: Redirect eventID, fieldNumber (authoreventcode) to parent locationevent when subplot columns exist
- 08:23 PM Revision 4339: inputs/CTFS/import_order.txt: Added PlotObservation
- 08:23 PM Revision 4338: inputs/CTFS/PlotObservation/: Remade (hadn't been automatically remade because it wasn't part of import_order.txt)
- 08:13 PM Revision 4337: mappings/VegCore-VegBIEN.csv: Also redirect locationID/plotName to parent location if subplotID column was provided
- 08:08 PM Revision 4336: mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Use _first to remove specimens-related alternatives for this field from consideration when plots-related alternatives exist. This avoids unintentionally using specimens-related columns for this field in plots data.
- 08:06 PM Revision 4335: xml_func.py: Added _first() simplifying function
- 08:05 PM Revision 4334: xml_func.py: Added helper functions variadic_args() and map_names()
- 07:38 PM Revision 4333: mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Placed inside "if subplot" _if statement along with sourceaccessioncode to reduce the number of separate _if statements needing a condition mapping
- 07:32 PM Revision 4332: xml_dom.py: NodeEntryIter: Support entries with multiple children
- 07:20 PM Revision 4331: xml_dom.py: replace(): Support a list of new nodes to replace the old node with
- 07:01 PM Revision 4330: xml_dom.py: Moved only_child() near related method has_one_child()
- 07:00 PM Revision 4329: xml_dom.py: only_child(): Raise exception instead of failing assertion. Include invalid node in exception message for easier debugging.
- 06:57 PM Revision 4328: xml_dom.py: Added only_child() and use it where its definition was used
- 06:33 PM Revision 4327: mappings/VegCore-VegBIEN.csv: Changed _merge to _join wherever the duplicate-eliminating functionality of _merge is not needed and a simple concatenation of non-NULL values is sufficient
- 06:24 PM Revision 4326: xml_func.py: Added _join() simplifying function
- 06:22 PM Revision 4325: schemas/functions.sql: Added _join()
- 06:18 PM Revision 4324: mappings/VegCore-VegBIEN.csv: Moved "if subplot" _if statement around /location/parent_id and /location/sourceaccessioncode themselves, so that only one _if cond mapping for subplot is needed. Note that this is only possible because this _if statement uses _exists, allowing it to be fully evaluated by the XML template simplifying mechanism, which supports subtrees as arguments to _if.
- 06:06 PM Revision 4323: mappings/VegCore-VegBIEN.csv: Removed no longer used parentLocationID, parentPlotName (locationID and plotName now automatically map to the correct location). mappings/Veg+-VegCore.csv: Removed no longer used parentPlotID.
- 05:57 PM Revision 4322: xml_func.py: passthru(): Use xml_dom.prune() so that after empty children are removed, the node itself is also removed if it's empty. This enables further pruning of any node that contains the pruned node.
- 05:55 PM Revision 4321: xml_dom.py: Added prune()
- 05:52 PM Revision 4320: xml_func.py: Removed no longer used prune() (use xml_dom.prune_children() instead)
- 05:51 PM Revision 4319: xml_func.py: Use new xml_dom.prune_children()
- 05:51 PM Revision 4318: xml_dom.py: Added prune_empty() and prune_children()
- 05:29 PM Revision 4317: inputs/CTFS/: Moved VegX export subdir to _archive and renamed it to remove ".disabled" suffix and have a VegCSV-like name
- 05:24 PM Revision 4316: inputs/CTFS/: Renamed README.TXT to DFtemp.analysis_query.txt because it relates only to a particular query from Shash, and moved it to the _archive/ subdir
- 05:21 PM Revision 4315: inputs/CTFS/: Moved source files into new _src/ subdir to avoid cluttering up the main dir
- 05:16 PM Revision 4314: Added inputs/CTFS/_src/
- 05:02 PM Revision 4313: inputs/CTFS/: Added non-data files that weren't under version control
- 04:59 PM Revision 4312: inputs/CTFS/: Moved _scripts_to_drop_extra_tables to _archive because they are for a different version of the CTFS database than the extract we received (bci.sql)
- 04:57 PM Revision 4311: inputs/CTFS/: Moved DBv5.txt to _archive because it's for a different version of the CTFS database than the extract we received (bci.sql)
- 04:49 PM Revision 4310: inputs/CTFS/: Moved CTFS_conversion_bci.php to _archive since it's just for the DFtemp (aggregated) mapping
- 04:48 PM Revision 4309: Added inputs/CTFS/_archive
- 04:39 PM Revision 4308: inputs/import.stats.xls: Updated with stats from latest import
- 04:16 PM Task #483 (Resolved): rename staging table columns according to map.csv
- * This will allow us to have just one VegCore-VegBIEN mapping, with each staging table already using VegCore column n...
08/28/2012
- 07:56 PM Revision 4307: Added inputs/CTFS/PlotObservation/
- 07:54 PM Revision 4306: mappings/VegCore-VegBIEN.csv: fieldNumber (authoreventcode): Don't copy to location.authorlocationcode if an actual locationID was specified
- 07:51 PM Revision 4305: xml_func.py: simplify(): Removed no longer needed pass-through optimizations for XML functions, which are now handled by each function's own simplifying function
- 07:50 PM Revision 4304: xml_func.py: Added _name simplifying function
- 07:48 PM Revision 4303: xml_func.py: Added _alt, _merge simplifying functions
- 07:45 PM Revision 4302: xml_func.py: passthru(): First prune the node
- 07:43 PM Revision 4301: xml_func.py: simplify(): Use new passthru()
- 07:43 PM Revision 4300: xml_func.py: Added passthru()
- 07:36 PM Revision 4299: xml_func.py: simplify(): Use new prune()
- 07:36 PM Revision 4298: xml_func.py: Added prune()
- 07:26 PM Revision 4297: mappings/VegCore-VegBIEN.csv: Mapped eventID
- 07:24 PM Revision 4296: mappings/Veg+-VegCore.csv: Mapped CTFS Census terms
- 07:20 PM Revision 4295: mappings/Veg+.terms.csv: Added CTFS Census terms
- 07:17 PM Revision 4294: mappings/VegCore-VegBIEN.csv: Changed plotEventStartDate, plotEventEndDate to startDate, endDate because a date range always applies to the event
- 07:13 PM Revision 4293: mappings/Veg+.terms.csv: Added startDate, endDate
- 06:59 PM Revision 4292: README.TXT: Testing: Mapping process: Added command to include column-based import tests
- 06:57 PM Task #482 (New): translate README.TXT to wiki page
- * This will provide easy-to-read formatting to what is currently a plain text file
- 06:49 PM Revision 4291: README.TXT: Datasource setup: Update vegbiendev: Added step to run the tests, to make sure the staging tables were installed properly
- 06:45 PM Revision 4290: inputs/CTFS/Plot/: Added create.sql
- 06:44 PM Revision 4289: inputs/CTFS/: Added import_order.txt
- 06:40 PM Revision 4288: Added inputs/CTFS/Subplot/
- 06:36 PM Revision 4287: mappings/Veg+-VegCore.csv: Mapped CTFS QuadratID
- 06:31 PM Task #464 (Resolved): reverse XPaths so that they start with location instead of plantobservation or specimenreplicate
- > CTFS's two test rows needed to be disabled because they require a transformation of the new mappings
CTFS is bei... - 06:30 PM Task #471 (Resolved): add make actions so new dependent maps are rebuilt automatically when their source map changes
- This appears to be fixed now that the primary map refactoring is done and the symlinks are gone
- 06:29 PM Task #458 (Resolved): map all VegX sources to stems table
- 06:28 PM Task #452 (Resolved): add column-based import to automated testing
- @make test by_col=1@
- 06:26 PM Revision 4286: mappings/VegCore-VegBIEN.csv: Mapped subplotID
- 06:24 PM Revision 4285: mappings/Veg+.terms.csv: Added subplotID
- 06:22 PM Revision 4284: mappings/Veg+-VegCore.csv: Mapped CTFS Quadrat columns
- 06:18 PM Revision 4283: mappings/VegCore-VegBIEN.csv: Mapped subplotX, subplotY
- 06:14 PM Revision 4282: mappings/VegCore-VegBIEN.csv: Removed empty mappings for unmapped DwC terms because these terms are now listed and maintained in mappings/Veg+.terms.csv
- 06:12 PM Revision 4281: mappings/Veg+.terms.csv: Added Brad's descriptive comments for several VegCore terms
- 06:07 PM Revision 4280: mappings/Veg+.terms.csv: Added subplotX, subplotY
- 06:03 PM Revision 4279: mappings/VegCore-VegBIEN.csv: Made organismX, organismY the official VegCore terms and map relativePlotX, relativePlotY to them in mappings/Veg+-VegCore.csv
- 06:00 PM Revision 4278: mappings/Veg+.terms.csv: Added organismX, organismY as clearer alternatives to relativePlotX, relativePlotY
- 05:48 PM Revision 4277: mappings/Veg+.terms.csv: Added CTFS Quadrat columns
- 05:38 PM Revision 4276: Added inputs/CTFS/
- 05:36 PM Revision 4275: input.Makefile: Testing: Only run column-based tests if column-based mode enabled, because these tests are much slower than the row-based tests for small numbers of rows. Note that this involves explicitly turning off column-based mode in the row-based test, to prevent propagation of the by_col env var which both enables these extra tests and sets bin/map to run in column-based mode.
- 05:28 PM Revision 4274: input.Makefile: Testing: Added by-column test, which is compared to the row-based test's accepted output
- 05:20 PM Revision 4273: input.Makefile: Testing: Merged $(runTest) and $(test2Db) because all tests go to the database
- 05:19 PM Revision 4272: input.Makefile: Testing: Moved `$(foreach use_staged,1,...)` from $(test2Db) to $(runTest) because all tests now use the staging tables
- 05:15 PM Revision 4271: input.Makefile: Testing: Merged $(test2Db) and $(testStaged2Db) because all tests now use the staging tables
- 05:14 PM Revision 4270: input.Makefile: Testing: $(runTest): Always use $(map2db) because there are no tests that use other programs (and haven't been in awhile)
- 05:09 PM Revision 4269: input.Makefile: Testing: Run the core test from the staging table, because derived tables only have a staging table and the flat-file test would produce inconsistent results
- 05:00 PM Revision 4268: mappings/Makefile: Fixed bug where rules needed to generate Veg+.self.csv ($(viaSelfMap)) were still using a pattern match that required a table (`.%.`, `.*.`), even though we are no longer using separate maps for separate tables
- 04:44 PM Revision 4267: mappings/Veg+-VegCore.csv: Mapped CTFS Country and Site columns
- 04:25 PM Revision 4266: mappings/Veg+.terms.csv: Added CTFS Country and Site columns
- 04:14 PM Revision 4265: README.TXT: Datasource setup: Adding input data: svn adding the generated map spreadsheets and related files: Added header.csv to the list of files added (for derived tables)
- 04:07 PM Revision 4264: README.TXT: Datasource setup: Adding input data: Documented how to create tables that will be joined together with another table, and how to create tables that are joins of other tables
- 04:01 PM Revision 4263: input.Makefile: Staging tables installation: %/install: Also create header.csv so that there is a CSV header that the map spreadsheets can be autogenerated from
- 02:22 PM Revision 4262: input.Makefile: Staging tables installation: %/install: Add row_num column to derived staging tables so they will have a pkey
- 02:21 PM Revision 4261: sql.py: pkey(): Use pkey_col constant if this column exists, to allow using a row_num column as the pkey even when it is placed at the end of the table (due to being added after the table was created)
- 01:59 PM Revision 4260: input.Makefile: Staging tables installation: %/install: Support alternative generation of a staging table by joining together other staging tables in a create.sql file
- 01:57 PM Revision 4259: input.Makefile: Staging tables installation: %/install: Don't create a row_num column when the table is a joined table because it collides during joins
- 01:49 PM Revision 4258: csv2db: Made input_cmd optional when errors_table_only is on, because the CSV header is not needed to create the errors table
- 01:47 PM Revision 4257: csv2db: Added has_row_num param to disable creating a row_num column
- 12:44 PM Revision 4256: input.Makefile: Existing maps discovery: $(allTables): When prepending unsorted (joined) tables, save them in $(joinedTables) for later use in determining which tables should have a row_num column
- 12:27 PM Revision 4255: README.TXT: Fixed indent
- 12:04 PM Revision 4254: input.Makefile: Staging tables installation: Install *all* tables, not just those present in import_order.txt. This will later allow staging tables to be derived by joining together other staging tables, which themselves are not imported but still need to be installed.
- 11:53 AM Revision 4253: input.Makefile: Existing maps discovery: $(tables): Prepend unsorted tables (those that are not present in import_order.txt)
- 11:04 AM Revision 4252: input.Makefile: Renamed "...-%" targets to "%/..." so they are more logically associated with a specific subdir
- 10:54 AM Revision 4251: mappings/Veg+.terms.csv: Added Madidi terms that don't exist in other datasources
- 10:47 AM Revision 4250: inputs/Madidi/0.plots/map.csv: Added [Veg+] to root to enable auto-mapping
- 10:35 AM Revision 4249: inputs/import.stats.xls: Updated with stats from latest import
08/27/2012
- 10:47 PM Revision 4248: inputs/SALVIAS*/1.organisms/map.csv: Map directly to locationID, plotName instead of parentLocationID, parentPlotName because these terms now map correctly to the parent location when a subplot column exists
- 10:43 PM Revision 4247: mappings/VegCore-VegBIEN.csv: plotName -> /location/authorlocationcode mapping: When subplot is provided, remove this mapping using _if ... _exists instead of _alt so that a NULL subplot value will not cause the parent plot's name to be used for the subplot name
- 10:34 PM Revision 4246: input.Makefile: Testing: $(runTest): Remove outputs of successful tests to reduce clutter
- 10:32 PM Revision 4245: input.Makefile: Testing: %/test.staging.xml: Don't create test.staging.xml at all for non-flat-file inputs, because it is not needed (diff does not run in this case)
- 10:23 PM Revision 4244: mappings/VegCore-VegBIEN.csv: Fixed bug where "if subplot" conditions would evaluate to true only if the subplot was NOT NULL, when they should actually evaluate to true if the datasource specified any subplot column, nullable or not
- 10:14 PM Revision 4243: xml_func.py: simplify(): Removed no longer needed hardcoded _if simplifying code now that there is an _if() simplifying function
- 10:10 PM Revision 4242: db_xml.py: input_col_prefix: Use value of xml_func.var_name_prefix, which is now the place where this value is configured
- 10:09 PM Revision 4241: db_xml.py: Moved input_col_prefix above the put() function that uses it
- 10:09 PM Revision 4240: xml_func.py: Added _if() simplifying function
- 10:07 PM Revision 4239: xml_func.py: Added is_var_name() and is_var()
- 10:06 PM Revision 4238: xml_dom.py: Added NodeEntryIter
- 09:33 PM Revision 4237: xml_func.py: Added _exists()
- 09:30 PM Revision 4236: xml_func.py: simplify(): Added support for custom simplifying functions, which are not hard-coded in simplify()
- 09:19 PM Revision 4235: xml_dom.py: replace_with_text(): Use new bool2str() so that False causes the node to be removed instead of replaced with the empty string
- 09:18 PM Revision 4234: xml_dom.py: Added bool2str()
- 08:56 PM Revision 4233: inputs/SALVIAS*/1.organisms/map.csv: Mapped subplot, Line to new subplot VegCore term
- 08:54 PM Revision 4232: mappings/VegCore-VegBIEN.csv: Mapped subplot, which involved replacing an _if with _alt to both remove plotName as the authorlocationcode and use subplot instead when subplot is specified
- 08:47 PM Revision 4231: mappings/VegCore-VegBIEN.csv: locationID, plotName: Redirect to /location/parent_id/location/* if subplot field is specified
- 08:42 PM Revision 4230: xml_func.py: simplify(): Also remove _if statements with only a condition. This is a required transformation, because such _if statements can't be handled by functions._if() due to there being no argument to provide the anyelement type.
- 08:06 PM Revision 4229: xml_func.py: simplify(): Added pruning optimization that removes empty children. Empty children are created when some mappings don't apply to the current datasource.
- 07:58 PM Revision 4228: xml_func.py: simplify(): Only generate children list if node is a function
- 07:33 PM Revision 4227: xml_func.py: simplify(): Refactored to support processing nodes that are not functions. Changed var names for clarity.
- 06:55 PM Revision 4226: mappings/VegCore-VegBIEN.csv: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg
- 06:51 PM Revision 4225: db_xml.py: put(): _simplifyPath() built-in function: Removed `require` param, which is not used by this _simplifyPath() implementation because the database constraints handle this
- 05:56 PM Revision 4224: mappings/Veg+.terms.csv: Added subplot
- 05:51 PM Task #480 (Resolved): automate adding a new table to an existing datasource
- @make inputs/<datasrc>/<table>/add@
- 05:30 PM Revision 4223: input.Makefile: SVN: add: Also add empty import_order.txt
- 05:30 PM Revision 4222: lib/common.Makefile: SVN: Added $(addFile)
- 05:26 PM Revision 4221: input.Makefile: SVN: add: Don't automatically add a Specimen subdir, because some plots datasources don't have that table
- 05:23 PM Revision 4220: README.TXT: Datasource setup: Adding input data: Added step to add <table> to inputs/<datasrc>/import_order.txt
- 04:48 PM Revision 4219: README.TXT: Datasource setup: Changed "<name>" to "<datasrc>" to distinguish it more clearly from "<table>", which is also a name
- 04:45 PM Revision 4218: README.TXT: Datasource setup: Adding input data: Changed steps to use new %/add command to add table's subdir
- 04:36 PM Revision 4217: input.Makefile: SVN: Added %/add to add a new table subdir. add: Changed default subdir name to Specimen to match suggested table names at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>. Use new %/add to add it.
- 04:18 PM Revision 4216: inputs/import.stats.xls: Updated with stats from latest import
08/24/2012
- 07:56 PM Revision 4215: README.TXT: Datasource setup: Replaced fixed table names with link to VegCSV suggested table names
- 07:51 PM Task #480 (Resolved): automate adding a new table to an existing datasource
- 07:43 PM Revision 4214: input.Makefile: $(srcsOnly): Include only files ending in one of the data extensions: csv tsv txt xml. This allows the data provider to include other documentation files, such as SQL export queries, in the table subdirs.
- 07:24 PM Revision 4213: bin/map: Documented that it is duplicate-column safe (supports multiple columns of the same name)
- 07:10 PM Revision 4212: README.TXT: Datasource setup: Obtaining CSVs: Documented that when exporting relational databases to CSVs, you MUST ensure that embedded quotes are escaped by doubling them, *not* by preceding them with a "\" as is the default in phpMyAdmin
- 07:00 PM Revision 4211: csvs.py: delims: Added ";", which is phpMyAdmin's default CSV delimiter
- 06:50 PM Revision 4210: sql_io.py: null_strs: Added 'NULL', which is used by phpMyAdmin as the default "Replace NULL with" value for CSV exports
- 06:48 PM Revision 4209: sql_io.py: cleanup_table(): Refactored to use for loop with array constant, so that additional NULL-equivalent strings can easily be added
- 06:30 PM Revision 4208: mappings/roots/: Merged roots for different tables into one mappings/root.sh for Veg+, which handles all tables' mappings to VegBIEN
- 04:31 PM Revision 4207: sql_io.py: put_table(): When ignoring all rows for an iteration, return literal NULL value instead of column of NULLs as an optimization for callers using that iteration's pkeys
- 12:20 PM Revision 4206: inputs/import.stats.xls: Updated with stats from latest import
08/23/2012
- 05:32 PM Revision 4205: mappings/VegCore-VegBIEN.csv: Primary taxondetermination: Removed [role=identifier] because the role of the entity making the determination is unknown. Added [!isoriginal] filter to those mappings to ensure that primary taxondetermination XPaths map to a different taxondetermination than the [isoriginal=true] determination when both are present.
- 05:24 PM Revision 4204: inputs/SALVIAS*/1.organisms/map.csv: Remapped cfaff to identificationQualifier, because it was previously mapped to the same taxondetermination as the Orig* terms but does not have a corresponding Orig prefix to indicate that it should apply to the original determination instead of the primary TNRS one
- 05:19 PM Revision 4203: mappings/Veg+.terms.csv: Removed no longer used computer.* taxonomic terms
- 05:19 PM Revision 4202: mappings/VegCore-VegBIEN.csv: Removed no longer used computer.* taxonomic terms
- 05:18 PM Revision 4201: inputs: Regenerated VegBIEN.csv for several datasources, which had apparently not gotten regenerated when make was run after the taxonRank mapping addition
- 05:00 PM Revision 4200: backups/: svn:ignore: Also ignore .*, which includes temp files generated by rsync
- 04:58 PM Revision 4199: xml_func.py: simplify(): Also consider _name() to be an aggregate function
- 04:57 PM Revision 4198: xml_func.py: simplify(): Also consider _name() to be an aggregate function
- 04:49 PM Revision 4197: inputs/SALVIAS*/1.organisms/map.csv: Removed computer.* prefix from primary (TNRS) taxondetermination, so it would map to the main taxondetermination in VegBIEN
- 04:46 PM Revision 4196: mappings/VegCore-VegBIEN.csv: Mapped taxonRank analogously to computer.taxonRank
- 04:34 PM Revision 4195: inputs/SALVIAS*/1.organisms/map.csv: Remapped OrigFamily/OrigGenus/OrigSpecies to new verbatim* taxonomic names. Also remapped cfaff to verbatimIdentificationQualifier, because it was previously mapped to the same taxondetermination as the Orig* terms, but this will later need to be remapped to identificationQualifier (not in this commit because that is a separate change). Note that the switch to the verbatim* taxonomic names removes a concatenated binomial that was part of the previous mappings, which put OrigGenus and OrigSpecies together into one scientificName.
- 03:34 PM Revision 4194: mappings/VegCore-VegBIEN.csv: Mapped verbatimScientificName to taxonoccurrence.authortaxoncode as an alternative to scientificName
- 03:12 PM Revision 4193: mappings/VegCore-VegBIEN.csv: Mapped verbatim* taxonomic terms
- 03:10 PM Revision 4192: mappings/Veg+.terms.csv: Added verbatimIdentificationQualifier
- 03:07 PM Revision 4191: mappings/Veg+.terms.csv: Added verbatimScientificName
- 03:06 PM Revision 4190: schemas/vegbien.sql: taxondetermination: taxondetermination_unique unique index: Added isoriginal so an "original" determination in the same row (as found in SALVIAS) will be seen as distinct from the scrubbed determination, even if they are to the same plant name
- 02:57 PM Revision 4189: mappings/VegCore-VegBIEN.csv: taxonomic terms: Removed ":[isoriginal=true]" because there may be multiple determinations for an organism (either in separate rows or, for SALVIAS, in separate columns), and not all will be the original determination
- 02:43 PM Revision 4188: schemas/vegbien.sql: taxondetermination.role: Default to 'unknown' so that the field is optional
- 02:41 PM Revision 4187: schemas/vegbien.sql: role enum: Added 'unknown' value
- 02:20 PM Revision 4186: mappings/Veg+.terms.csv: Added verbatim* taxonomic terms
- 02:12 PM Revision 4185: inputs/import.stats.xls: Updated with stats from latest import
- 01:44 PM Task #477 (Rejected): allow putting specimens data directly in the top level of the datasource directory
- * this avoids needing to create a single @specimens@ subdir for the DwC CSV
* need to work around make's autoremoval... - 01:39 PM Task #476 (New): develop map spreadsheet -> header override file translation utility
- * this will avoid the need to create a @map.full.csv@ and @VegBIEN.csv@ file for each table, because @mappings/Veg+-V...
08/22/2012
- 04:56 PM Revision 4184: inputs/import.stats.xls: Updated with stats from latest import
- 04:31 PM Revision 4183: inputs: Regenerated maps for changes to bin/union, which removes empty mappings. Added /_alt suffix where needed.
- 03:23 PM Revision 4182: inputs: Move src subdir into main dir, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-src-subdir-into-main-dir>
- 02:02 PM Revision 4181: input.Makefile: $(tables): Allow datasource to specify custom import order in src/import_order.txt
- 01:29 PM Revision 4180: mappings/Veg+.terms.csv: growthForm: Documented source of standard terms
- 10:21 AM Revision 4179: inputs/SALVIAS*/src/1.organisms/map.csv: Removed no longer applicable comments, which related to mappings that were in effect long ago
- 10:09 AM Revision 4178: inputs/SALVIAS/src/2.stems/map.csv: Added comments from corresponding SALVIAS-CSV organisms columns
- 09:54 AM Revision 4177: inputs/SALVIAS*/src/1.organisms/map.csv: Habit: Mapped to new Veg+ habit term
- 09:53 AM Revision 4176: inputs/SALVIAS*/src/1.organisms/map.csv: Habit: Don't filter out values not part of the provided terms list, because such values should be flagged as invalid in the error maps rather than silently discarded. This also ensures that any valid values which are not part of the provided terms list are kept.
- 09:45 AM Revision 4175: mappings/Veg+-VegCore.csv: habit: Map to new verbatimGrowthForm since this field is not necessarily standardized
- 09:42 AM Revision 4174: mappings/Makefile: Veg+.cs-VegBIEN.csv: Join new Veg+-VegCore.to_self.csv (self-join), instead of Veg+-VegCore.csv, to VegCore-VegBIEN.csv, to support two-level chains of mappings in Veg+-VegCore.csv
- 09:40 AM Revision 4173: mappings/Veg+-VegCore.csv: /_alt pass through mappings: Removed comment because the two-level mapping propagates it to all fields ending in /_alt, even though it doesn't apply to them, causing the main VegBIEN map and several datasources' maps to change unnecessarily. Also, the comment is not completely accurate because /_alt pass throughs are now used primarily to support idempotent self-joins of Veg+-VegCore.csv.
- 09:21 AM Revision 4172: union: Don't eliminate duplicate rows based on matches between map_0's *output* column and map_1's input column, because union is now being used for self-joins and it is legitimate for a term to appear as both an input and an output
- 09:10 AM Revision 4171: sql_io.py: put_table(): MissingCastException: Use strings.repr_no_u() instead of strings.urepr() in order to remove the u in u'...' for Unicode strings
08/21/2012
- 09:48 AM Revision 4170: README.TXT: After a new import: Updated commands for new subdirs layout
- 09:42 AM Revision 4169: Regenerated vegbien.ERD exports
- 09:34 AM Revision 4168: mappings: Added autogen Veg+-VegCore.to_self.csv, which is Veg+-VegCore.csv joined to itself, and use it as an intermediate map to join to VegCore-VegBIEN.csv. This provides support for two-level chains of mappings in Veg+-VegCore.csv.
- 09:31 AM Revision 4167: mappings/Veg+-VegCore.csv: Changed output root to Veg+, to allow mappings/Veg+-VegCore.csv to be joined with itself idempotently, for supporting multi-level chains of mappings
- 09:27 AM Revision 4166: mappings/Veg+-VegCore.csv: Add pass through /_alt mapping for all terms in this map that are merged with _alt, to allow datasource to define custom mappings that don't pass through the default mapping. This also allows mappings/Veg+-VegCore.csv to be joined with itself idempotently, to support multi-level chains of mappings.
- 09:19 AM Revision 4165: mappings/Veg+-VegCore.csv: authorPlantCode: Added _alt suffix to create the correct priority
- 09:13 AM Revision 4164: union: Exclude empty rows from the output, so that empty mappings from map_0 aren't included when map_1 contains a non-empty mapping for the same term. Note that this causes "No non-empty join mapping" warnings to turn into "No join mapping".
- 09:08 AM Revision 4163: ci_map: Run join_union_sort in quiet mode so that it doesn't add lots of "No non-empty join mapping" warnings to the Comments column
- 09:06 AM Revision 4162: mappings/Veg+-VegCore.csv: scientificNameAuthor: Added scientificNameAuthorship mapping with /_alt/1, to ensure that it has priority over scientificNameAuthor and to ensure that it has an _alt suffix when a datasource contains both scientificNameAuthor and scientificNameAuthorship (such as SpeciesLink)
- 09:00 AM Revision 4161: inputs/SpeciesLink/src/specimens/map.csv: Added explicit _alt suffix when multiple terms map to the same place
- 08:58 AM Revision 4160: mappings/Veg+-VegCore.csv: scientificNameAuthor: Added scientificNameAuthorship mapping with /_alt/1, to ensure that it has priority over scientificNameAuthor and to ensure that it has an _alt suffix when a datasource contains both scientificNameAuthor and scientificNameAuthorship (such as SpeciesLink)
- 08:31 AM Revision 4159: inputs/ARIZ/src/specimens/map.csv: RelatedCatalogItem mappings: Added _alt suffixes
- 08:09 AM Revision 4158: union: Multi-support: When an input appears in both maps, treat an empty mapping as if it didn't exist so that it doesn't overwrite a non-empty mapping in the other map
- 07:51 AM Revision 4157: mappings/Makefile: Veg+.cs-VegBIEN.csv: Join Veg+-VegCore.csv to VegCore-VegBIEN.csv in quiet mode, to avoid adding "No non-empty join mapping" to the Comments column
- 07:50 AM Revision 4156: join: quiet mode: Turn off all warnings, not just "No input mapping" warnings. This is useful when join-unioning a synonymy to a primary map, which may have "No non-empty join mapping" for some terms but this should not be stored in the resulting map's Comments column.
- 07:30 AM Revision 4155: mappings/Makefile: Rewrapped lines
- 07:28 AM Revision 4154: mappings/Veg+-VegCore.csv: Added verbatimGrowthForm mapping
- 07:09 AM Revision 4153: mappings/Veg+.terms.csv: verbatimGrowthForm: Added comment that additional values come from SALVIAS. As other datasources' custom growth form values are added, they can be added to this comment.
- 07:00 AM Revision 4152: mappings/Veg+.terms.csv: Added verbatimGrowthForm
- 06:44 AM Revision 4151: schemas/vegbien.sql: locationdetermination: Added verbatimlatitude, verbatimlongitude, verbatimcoordinates
- 06:22 AM Revision 4150: schemas/functions.sql: Made aggregating functions polymorphic
- 06:16 AM Revision 4149: xml_func.py: Removed no longer used _collapse()
- 06:13 AM Revision 4148: xml_func.py: Removed no longer needed _if(), which has been translated to a SQL function
- 06:13 AM Revision 4147: schemas/functions.sql: Added _if()
- 06:12 AM Revision 4146: sql.py: function_exists(): Support overloaded functions
- 06:09 AM Revision 4145: sql.py: run_query(): Parse "more than one" errors as DuplicateExceptions
- 05:42 AM Revision 4144: xml_func.py: XML function specification documentation: Updated parameters
- 05:39 AM Revision 4143: xml_func.py: Removed no longer needed _eq(), which has been translated to a SQL function
- 05:38 AM Revision 4142: schemas/functions.sql: Added _eq()
- 05:37 AM Revision 4141: sql.py: run_query(): Parse "could not determine polymorphic type because input has type "unknown"" errors as MissingCastExceptions to type text. This adds support for polymorphic SQL functions whose parameters are anyelement, etc.
- 05:35 AM Revision 4140: sql_io.py: put_table(): sql.MissingCastException: Support unknown (None) columns, by casting all columns
- 05:30 AM Revision 4139: sql.py: MissingCastException: Support unknown (None) columns
- 05:29 AM Revision 4138: xml_dom.py: replace_with_text(): Support bool `new` values
- 04:22 AM Revision 4137: input.Makefile: Determine import order from sorted order of all non-hidden subdirs, instead of from fixed constant. This allows datasources to specify arbitrary tables, rather than being limited to 0.plots, 1.organisms, 2.stems, specimens.
- 04:14 AM Revision 4136: lib/common.Makefile: Added $(wildcard/) (needed because builtin $(wildcard) doesn't do / suffix correctly)
- 04:11 AM Revision 4135: input.Makefile: src/%/map.full.csv: Fixed bug where couldn't have $(srcMap) in prerequisites because this would for some reason cause src/%/map.full.csv to always be remade
- 03:47 AM Revision 4134: input.Makefile: Src maps cleanup: Fixed bug where src.csv was using .map.csv.last_cleanup instead of .src.csv.last_cleanup as its .last_cleanup file
- 03:30 AM Revision 4133: input.Makefile: Maps building: Moved src/%/map.full.csv after src/%/map.csv now that the filenames are fixed, so pattern matching order isn't an issue
- 03:27 AM Revision 4132: input.Makefile: Maps building: $(makeFullCsv): Removed no longer needed test for whether the $(coreSelfMap) exists, because Veg+'s self map always exists
- 03:12 AM Revision 4131: input.Makefile: Src maps cleanup: Fixed bug where src.csv was using .map.csv.last_cleanup instead of .src.csv.last_cleanup as its .last_cleanup file
- 02:34 AM Revision 4130: inputs/CTFS/src/1.organisms/: Added "_" prefix to prevent it from being treated as a data table subdir, before the DB export is mapped
- 02:20 AM Revision 4129: inputs/CTFS/src/ERD.jpg: Made it a symlink to "STRI2011_DB v5.jpg" instead of a copy of it
- 02:11 AM Revision 4128: Added inputs/CTFS/src/bci_01April2011.zip.url, which contains the original download URL for our copy of the CTFS database
- 01:31 AM Revision 4127: inputs/CTFS/src/: Added "_" prefix to scripts_to_drop_extra_tables subdir to prevent it from being treated as a data table subdir
- 01:10 AM Revision 4126: inputs/Makefile: Input data sync: Updated rsync filter for new subdirs layout
- 12:55 AM Revision 4125: README.TXT: Datasource setup: Updated for new subdirs layout
- 12:17 AM Revision 4124: input.Makefile: SVN: add: Updated svn:ignores for new subdirs layout
- 12:08 AM Revision 4123: inputs/Makefile: Import logs: Fixed bug where excluded install logs needed to be renamed according to the new name format (from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-log-files-into-subfolders>)
08/20/2012
- 11:59 PM Revision 4122: inputs: Moved log files into subfolders, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-log-files-into-subfolders>
- 11:01 PM Revision 4121: input.Makefile: Merged Installation and Staging tables sections into Staging tables installation, since no other installation is performed. Removed "import/" prefix from non-file import-related targets.
- 10:20 PM Revision 4120: inputs: Moved test outputs into subfolders, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-test-outputs-into-subfolders>
- 09:58 PM Revision 4119: input.Makefile: Import to VegBIEN: Removed extra test for $(inputFiles), because when there are no inputs, $(tables) will be empty and import will automatically do nothing. Removed no longer needed $(inputFiles).
- 08:46 PM Revision 4118: inputs: Moved maps into subfolders, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-maps-into-subfolders>
- 07:16 PM Revision 4117: inputs: Replaced Veg+ prefix with map on via maps, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Replace-Veg-prefix-with-map-on-via-maps>
- 06:39 PM Revision 4116: strings.py: concat(): Apply length limits by shrinking max_len by new raw_extra_len() of the strings. This also fixes a bug where multi-byte characters in str0 were not properly taken into account, leading to overly long strings. Added doc comment.
- 06:29 PM Revision 4115: strings.py: Added raw_extra_len()
- 06:17 PM Revision 4114: sql_gen.py: NoUnderlyingTableException: Take a (required) parameter for the item that had no underlying table, and provide this wherever a NoUnderlyingTableException is created
- 06:16 PM Revision 4113: strings.py: concat(): Perform substring operation on Unicode strings so that substring does not split Unicode characters. Still use to_raw_str() to calculate the str1 length because Unicode characters can be multi-byte, and length limits often apply to the byte length, not the character length.
- 06:13 PM Revision 4112: exc.py: add_msg(): Fixed bug where needed to convert the Unicode string back into a raw string because Python's top-level exception handler doesn't support Unicode strings as exception messages
- 05:22 PM Revision 4111: inputs/import.stats.xls: Updated with stats from latest import
08/17/2012
- 07:53 PM Revision 4110: inputs: Renamed stems table to 2.stems so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
- 07:49 PM Revision 4109: inputs: Renamed organisms table to 1.organisms so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
- 07:30 PM Revision 4108: inputs: Renamed plots table to 0.plots so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
- 07:30 PM Revision 4107: inputs: Renamed plots table to 0.plots so import order would be inherent in the dir name, using steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-with-import-order>
- 07:20 PM Task #474 (Rejected): use svn to figure out when a map file has changed and needs to be cleaned up
- * Currently, a separate @.last_cleanup@ file is used as a timestamp.
This is problematic, because whenever a map fil... - 07:00 PM Revision 4106: input.Makefile: Mapping: If table subdir contains no input files, print warning instead of aborting. This situation occurs when renaming a version-controlled directory, whose previous version persists as an empty dir until committing.
- 06:41 PM Revision 4105: input.Makefile: Mapping: Removed no longer used $(<in) and test for it in $(map)
- 06:37 PM Revision 4104: input.Makefile: Mapping: $(map): Removed no longer used test for $(mapEnv)
- 05:50 PM Revision 4103: sql.py: run_query(): Exception handling: Fixed bug where PostgreSQL 9.1 PL/Python errors have a different format than PostgreSQL 9.0 which needs to be supported separately. This format was already supported in sql_gen.plpythonu_error_handler, but also needed to be supported for exceptions that propagate back to the client.
- 05:34 PM Revision 4102: inputs/SALVIAS-CSV/src/: Removed source files because they shouldn't be under version control. (They are synchronized via `make inputs/download`.)
- 05:15 PM Revision 4101: inputs: Moved src files into VegCSV subfolders (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#CSV-representation), with table suffixes removed, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders>
- 04:26 PM Revision 4100: util.py: dict_subset(): Fall back to using dict when OrderedDict is not available, in order to support making the maps on nimoy
- 04:02 PM Revision 4099: mappings/: Removed now-inaccurate ".stems" suffix from VegX-VegCore.stems.csv, which actually applied to all tables
- 03:59 PM Revision 4098: mappings/: Removed no longer used ".specimens" suffix from maps, which is now the same for all maps
- 03:52 PM Revision 4097: mappings/: Removed no longer used plots, organisms, and stems maps, which were copies of the specimens map
- 03:48 PM Revision 4096: input.Makefile: Core maps: Always use the specimens "table", since there are now no longer separate mappings for different tables, and the other tables' maps in mappings/ are merely copies of the specimens table's map
- 03:30 PM Revision 4095: input.Makefile: Removed no longer used custom via maps code, so that map files no longer need a prefix (which is always the same) specifying that they map through Veg+. Veg+ thus serves as the single gateway to VegBIEN, which avoids ever again having to maintain two copies of the mappings, as was the case when DwC and VegX XPaths were separate gateways. This will assist in untying the complex mapping logic in input.Makefile from file naming conventions in mappings/, and simplify the task of grouping each map with the CSV it maps.
- 03:14 PM Revision 4094: input.Makefile: Removed no longer used DB inputs section, because all of our inputs are either CSV or (rarely) XML. This removes a significant amount of dead code that will make it easier to refactor input.Makefile to use custom CSV import orders.
- 02:51 PM Revision 4093: mappings/Veg+-VegCore.specimens.csv: Added mappings for miscellaneous terms
- 02:45 PM Revision 4092: mappings/Veg+.terms.csv: Added miscellaneous terms
- 12:52 PM Revision 4091: to_do/: svn:ignore OpenOffice lock files
- 12:50 PM Revision 4090: inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) went back down to 9 hours after replacing the slower _merge with _alt.
08/16/2012
- 08:34 PM Revision 4089: Added new autogen mappings/VegCore.self.specimens.csv (not currently used)
- 08:30 PM Revision 4088: Merged DwC (including DwC1) and VegCSV mappings into new Veg+ schema. This involves replacing occurrences of DwC and VegCSV with Veg+ (or sometimes VegCore) everywhere, as described in <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV-DwC_merging>.
- 08:18 PM Revision 4087: README.TXT: Schema changes: Updated filenames of PDF ERD exports
- 08:15 PM Revision 4086: Regenerated vegbien.ERD exports
- 08:12 PM Revision 4085: xpath.py: parse(): _value(): Support '+' as a word character that doesn't need to be quoted
- 06:54 PM Revision 4084: intersect: Fixed bug where test for ignore option needed to be removed, because ignore is not supported by this program
- 06:45 PM Revision 4083: util.py: list_subset(): Fixed bug where using '+' to append the rest of the list didn't work if '+' was the first index, because max() cannot be called on an empty list
- 05:14 PM Revision 4082: mappings/DwC2-VegBIEN.specimens.csv: Added VegCSV mappings, to enable use of one VegCSV-VegBIEN mapping for specimens and plots data
- 05:12 PM Revision 4081: inputs/XAL/maps/DwC.specimens.csv: Remapped FieldNumber to recordNumber because this historical DwC term (http://rs.tdwg.org/dwc/terms/history/index.htm#fieldNumber-2009-04-24) has close to the same meaning as recordNumber, but not the same meaning as the current fieldNumber term
- 04:55 PM Revision 4080: inputs/SpeciesLink/maps/DwC.specimens.csv: Remapped fieldNumber to recordNumber because term usage was inconsistent with DwC definition. Datasources often confuse this term, because it seems like the collection number, but is actually the author code for the *event* (VegBank's authorObsCode).
- 04:28 PM Revision 4079: mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Added additional VegCSV mappings for mergability. taxonoccurrence.authortaxoncode: Added alternative mappings from VegCSV for mergability.
- 04:21 PM Revision 4078: xml_func.py: simplify(): Apply pass-through optimizations for _if statements with no condition (which means false). This faciliates automated testing after an _if statement has been added, because the put template provided as part of the automated test will only change for those datasources that actually have a condition entry for the _if statement, which greatly reduces the number of tests that need to be accepted. (Note that the path before the _if will still be included as an empty path if there are no other mappings to that table, because the _if statement does not surround it.)
- 02:26 PM Revision 4077: mappings/VegCSV-VegBIEN.specimens.csv: Added DwC mappings, to enable use of one VegCSV-VegBIEN mapping for specimens and plots data
- 02:22 PM Revision 4076: schemas/vegbien.sql: Moved collectionnumber from specimenreplicate to plantobservation to replace authorplantcode, since these terms are used analogously in plots and specimens data. This code is really the DwC recordNumber (VegBIEN collectionnumber), which "serves as a link between field notes and an Occurrence record, such as a specimen [or plots data] collector's number" (http://rs.tdwg.org/dwc/terms/#recordNumber). Also, this prevents a specimenreplicate from incorrectly being created when plots data provides an authorplantcode.
- 01:55 PM Revision 4075: mappings/DwC2-VegBIEN.specimens.csv: Mapped individualID for mergability with VegCSV
- 01:49 PM Revision 4074: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: Split occurrenceID into occurrenceID and individualID, where individualID refers to the plant in plots data and occurrenceID refers to the specimen in specimens data. This prevents plant sourceaccessioncodes from being mapped to the specimenreplicate, which was messing up stems mappings for the parent plantobservation. It also avoids mapping the specimenreplicate sourceaccessioncode to additional tables where it isn't needed. (Note that occurrenceID is needed for location to ensure that each specimen gets its own location to make locationdeterminations on. Everything else is directly or indirectly scoped by location when its own sourceaccessioncode isn't specified.)
- 01:33 PM Revision 4073: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Removed catalogNumber mapping because the catalogNumber applies only to the specimen, not to the occurrence, especially in plots data
- 01:14 PM Revision 4072: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence: Map everything except occurrenceID (which is globally unique) to new authortaxoncode, which only needs to be unique within the locationevent
- 12:59 PM Revision 4071: schemas/vegbien.sql: taxonoccurrence: Renamed taxonoccurrence_locationevent_1_to_1 to taxonoccurrence_unique_within_locationevent and added new authortaxoncode to it
- 12:57 PM Revision 4070: schemas/vegbien.sql: taxonoccurrence: Added authortaxoncode to store unique keys that are unique within the locationevent rather than within the datasource
- 12:43 PM Revision 4069: inputs/SALVIAS-CSV/maps/VegCSV.organisms.csv: Added _alt to height_m, stem_height_m to choose between them when both are specified (rather than having bin/map choose their priority order based on their order in the map). Note that when both of the heights are specified, they are always either the same, or height_m is invalid (see <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Some-organisms-have-one-stem-but-different-heights-in-the-organisms-and-stems-tables>).
- 12:39 PM Revision 4068: bin/map: collision_suffix: Setting back to _alt to test if _merge caused the SpeciesLink slowdown. SpeciesLink contains a huge number of equivalent columns due to each DwC term being present with namespaces for all versions of the DwC schema, and these columns can be combined either using _alt or _merge. _merge is only useful if the values in different versions of the same DwC field are *different*, which is not likely the case.
- 12:29 PM Revision 4067: inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) doubled, to 16 hours, most likely due to replacing _alt with the slower _merge, which preserves more input data.
08/15/2012
- 11:30 AM Revision 4066: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: occurrenceID: Mapped to location.authorlocationcode instead of sourceaccessioncode so that it would not override any location- or event-related IDs in location.authorlocationcode merely by being mapped to the sourceaccessioncode field (which takes precedence over the authorlocationcode when specified)
- 10:43 AM Revision 4065: mappings/VegCSV-VegBIEN.specimens.csv: occurrenceID: Mapped to specimenreplicate.sourceaccessioncode for mergability with DwC
- 09:14 AM Revision 4064: mappings/VegCSV-VegBIEN.specimens.csv: Mapped voucherType to indirect voucher _if statements' conditions
- 09:02 AM Revision 4063: mappings/VegCSV-VegBIEN.specimens.csv: locationID: location.sourceaccessioncode mapping: Added /_alt suffix for mergability with DwC
- 08:53 AM Revision 4062: mappings/DwC2-VegBIEN.specimens.csv: collectionID: Mapped to location.authorlocationcode as merge with collectionCode, the same way as it is for specimenreplicate.collectioncode_dwc
- 08:23 AM Revision 4061: schemas/vegbien.sql: location: location_unique_within_datasource_by_authorlocationcode unique index: Added `parent_id IS NULL` condition so that an authorlocationcode is not unintentionally treated as globally unique when a parent location is available (which implies that the authorlocationcode is a subplot code)
- 08:20 AM Revision 4060: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Added location.authorlocationcode mapping for mergability with DwC
- 08:13 AM Revision 4059: mappings/DwC2-VegBIEN.specimens.csv: location.authorlocationcode mappings: Added /_alt/3 for mergability with VegCSV mappings to same field
- 08:05 AM Revision 4058: mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Wrapped all mappings in direct voucher _if for mergability with VegCSV
- 07:57 AM Revision 4057: mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Moved direct/indirect voucher _if inwards to wrap just the value of catalognumber_dwc, not the catalognumber_dwc field node, to match the corresponding VegCSV mapping
- 07:48 AM Task #473 (Resolved): use _merge instead of _alt to avoid losing source data on import
- 07:48 AM Revision 4056: mappings/DwC2-VegBIEN.specimens.csv: Replaced _alt with _merge where applicable to avoid losing source data on import when multiple fields collide
- 07:46 AM Revision 4055: mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up using `make mappings/`
- 07:18 AM Revision 4054: schemas/functions.sql: join_strs_transform(): Use STRICT optimization to avoid needing to manually check if the state value or input value is NULL (http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html#AEN51596)
- 07:15 AM Revision 4053: schemas/functions.sql: join_strs(), join_strs_transform(): Reversed order of params to enable strict optimization, which replaces the state value with the *first* parameter, which used to be the delimiter (http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html#AEN51596)
- 07:07 AM Revision 4052: Renamed join_strs_transform_preserve_empty() to join_strs_transform() now that there are no other join_strs_transform_...() functions
- 07:06 AM Revision 4051: schemas/functions.sql: Removed no longer used join_strs_transform_fold_empty()
- 07:06 AM Revision 4050: schemas/functions.sql: join_strs() aggregate: Use join_strs_transform_preserve_empty() as an optimization because all our data has already had '' replaced with NULL by sql_io.cleanup_table() in csv2db. This will help speed up _merges now that they are performed on a large scale in the slowest datasource, SpeciesLink.
- 07:02 AM Revision 4049: bin/map: collision_suffix: Changed to use _merge instead of _alt to avoid losing source data on import when multiple fields collide
- 06:58 AM Revision 4048: bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed
- 06:56 AM Revision 4047: bin/map: Preventing collisions if multiple inputs mapping to same output: Made collision suffix configurable so it can easily be changed
- 06:52 AM Revision 4046: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxonoccurrence.sourceaccessioncode mappings: Added catalogNumber mapping, which takes precendence over recordNumber and is applicable to specimens data and direct vouchers. recordNumber should only be used as a last resort (before the taxon name) because this is collector-assigned and often not unique within anything.
- 06:34 AM Revision 4045: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Moved direct/indirect voucher _ifs inwards to wrap just the value of catalognumber_dwc, not the catalognumber_dwc field node, so that a future SQL function implementation of _if only needs to concern itself with returning one value or another, not with handling XML subtrees. The previous moving of the _ifs in r3942 was intended to effect this, but the _ifs weren't moved in far enough to wrap just the *value*.
- 06:21 AM Revision 4044: mappings/VegCSV-VegBIEN.specimens.csv: eventDate mappings: Removed collectiondate mapping because the eventDate refers only to the plot event. Added /_alt suffixes for mergability with DwC.
- 06:15 AM Revision 4043: mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Split eventDate into eventDate and dateCollected, where eventDate refers only to the date of the sampling event, but dateCollected also refers to the date the particular specimen was collected. (This distinction is important in merging with VegCSV, because in plots data, these two fields are distinct.) Remapped datasources with dateCollected-related fields to new dateCollected.
- 05:55 AM Revision 4042: bin/map: Run new xml_func.simplify() on the root before printing the put template, so that _alts and _merges with only one element for the current datasource will be printed in their simplified form (with the _alt/_merge removed). This faciliates automated testing after an _alt/_merge suffix has been added, because the put template provided as part of the automated test will only change for those datasources that actually have an entry for both mappings, which greatly reduces the number of tests that need to be accepted.
- 05:51 AM Revision 4041: xml_func.py: Added simplify()
- 05:45 AM Revision 4040: xpath.py: put_obj(): Use new get_values(), so that the returned nodes are not modified by XML tree transformations, such as those performed by xml_func.process()
- 05:43 AM Revision 4039: Added get_values()
- 05:41 AM Revision 4038: xml_dom.py: is_empty(): Treat whitespace-only text nodes (including text nodes containing empty strings) as empty. This will also support None equivalents in text nodes, because they are isspace_none_str, which is considered whitespace.
- 05:36 AM Revision 4037: xml_func.py: _map(): Don't remove None params, because are valid values and must be supported. This will become an issue once empty strings in text nodes are considered equivalent to None.
- 05:33 AM Revision 4036: xml_func.py: _units(): Don't remove None params, because are valid values and must be supported. This will become an issue once empty strings in text nodes are considered equivalent to None.
- 05:25 AM Revision 4035: xml_func.py: _name(): Fixed bug where needed to pass None values through and handle no name parts to properly support NULL propagation
- 05:08 AM Revision 4034: xml_dom.py: value(), set_value(): Use new strings.isspace_none_str as sentinel None equivalent, to support cloning text nodes containing a sentinel None
- 05:06 AM Revision 4033: xml_dom.py: value(), set_value(): Use new strings.isspace_none_str as sentinel None equivalent, to support cloning text nodes containing a sentinel None
- 05:04 AM Revision 4032: strings.py: Added isspace_none_str to support clone-safe sentinel str values that pass isspace()
- 04:51 AM Revision 4031: xml_dom.py: is_whitespace(): Also consider empty text nodes to be whitespace
- 04:47 AM Revision 4030: xml_dom.py: is_whitespace(): Support text nodes whose value() is None by using .nodeValue instead
- 04:44 AM Revision 4029: xml_dom.py: set_value(): Don't set the value of a text node to None by removing it, because this prevents the node from being reused. Instead use a sentinel string value to denote None, and map to and from it.
- 04:40 AM Revision 4028: strings.py: Added none_str and helper class NonInternedStr to support sentinel str values
- 04:19 AM Revision 4027: xml_dom.py: set_value(): Support setting the value of a text node to None, by removing it
- 03:44 AM Revision 4026: Removed trailing whitespace on non-empty lines
- 03:40 AM Revision 4025: sql_io.py: put_table(): DuplicateKeyException: is_literals: Fixed bug where sql.select() needed to select on just the join_cols, not the whole mapping
- 03:14 AM Revision 4024: xml_func.py: process(): Removed support for no longer used structural functions
- 03:13 AM Revision 4023: xml_func.py: Removed no longer used structural functions
- 03:05 AM Revision 4022: mappings/for_review/DwC2-VegBIEN.specimens.fields.csv: input root: Removed DwC XML path info since DwC is now a CSV schema
- 02:57 AM Revision 4021: mappings/DwC2-VegBIEN.specimens.csv: eventDate: Also map to obsstartdate/obsenddate, since the collectiondate is also the event date for specimens data, and for mergability with VegCSV
- 02:24 AM Revision 4020: mappings/VegCSV-VegBIEN.specimens.csv: eventDate: Added mappings to obsstartdate/obsenddate, since users of this field (currently SALVIAS census_date) intend it as the plot event's date. Keep the mapping to collectiondate because a non-range plot event date is also the collectiondate of all organisms in that plot event.
- 02:05 AM Revision 4019: schemas/py_functions.sql: parse_date_range(): Always return a value for end date, even if string is not a date range. This enables using _dateRangeEnd() as a filter function on anything intended as an end date.
- 01:53 AM Revision 4018: mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: eventDate: collectiondate mapping: Removed _dateRangeStart filter because the eventDate (obsstartdate) is only valid as the date the *specimen was collected* if it is a single date, not a date range. (It is still valid as the obsstartdate/obsenddate if it's a range.)
- 01:49 AM Revision 4017: mappings/Veg+.terms.csv: Added dateCollected
- 12:45 AM Revision 4016: input via maps: Removed _date/date filter from date fields because the main mappings now have _date around all dates, so this filter is redundant
- 12:39 AM Revision 4015: inputs/SALVIAS-CSV/maps/VegCSV.organisms.csv: census_date: Don't map directly to the year, as this field is allowed to be a full date even though our data sample contains only years. Note that _date/date will automatically detect plain years and treat them as years, and so will casts to timestamp.
- 12:33 AM Revision 4014: inputs/SALVIAS*/maps/VegCSV.organisms.csv: census_date: Documented that this is for the subplot, not the organism, as all organisms in a subplot have the same value for it
- 12:09 AM Revision 4013: mappings/DwC2-VegBIEN.specimens.csv: verbatimLatitude/verbatimLongitude: Fixed mappings to use _alt/2 instead of _alt/1 to avoid collisions with decimalLatitude/decimalLongitude
08/14/2012
- 11:54 PM Revision 4012: schemas/functions.sql: _merge(): Changed sort_orders to match the $-variable name instead of the function parameter name, so each line of the VALUES clause would use the same number for both
- 11:52 PM Revision 4011: schemas/functions.sql: _merge(): Filter out NULL values as optimization so DISTINCT ON only has to consider non-NULL values
- 11:48 PM Revision 4010: schemas/functions.sql: join_strs(): Return NULL if all strings were NULL or ''. This fixes unexpected behavior in _merge() where all elements are NULL but the return value is non-NULL.
- 11:32 PM Revision 4009: schemas/functions.sql: Added join_strs_transform_preserve_empty() and use it in join_strs_transform_fold_empty()
- 11:25 PM Revision 4008: schemas/functions.sql: Renamed join_strs_() to join_strs_transform_fold_empty() for clarity and to indicate that it's for use by the join_strs() aggregate
- 11:11 PM Revision 4007: mappings/DwC2-VegBIEN.specimens.csv: recordNumber: Added VegCSV mappings for it
- 10:57 PM Task #473 (Resolved): use _merge instead of _alt to avoid losing source data on import
- * _alt only preserves one of several alternative fields, while _merge concatenates them
* Important: _alt is still n... - 10:51 PM Revision 4006: mappings/DwC2-VegBIEN.specimens.csv: occurrenceID: Added VegCSV mappings for it
- 10:44 PM Revision 4005: mappings/DwC2-VegBIEN.specimens.csv: mappings to /location/sourceaccessioncode: Added _alt to prioritize them properly
- 10:39 PM Revision 4004: inputs/UNCC/maps/DwC.specimens.csv: herbarium: Fixed mapping to go to institutionCode instead of collectionCode
- 10:36 PM Revision 4003: mappings/DwC2-VegBIEN.specimens.csv: Remapped institutionCode/collectionCode/catalogNumber location mappings to location.authorlocationcode
- 09:50 PM Revision 4002: schemas/vegbien.ERD.mwb: Reset methodtaxonclass lines so that only one needs to be repositioned after syncing with the schema
- 09:31 PM Revision 4001: mappings/VegCSV-VegBIEN.specimens.csv: locationID: Removed mapping to locationevent.sourceaccessioncode, because locationID relates to the plot, not the plot event. (The locationevent is scoped by the location when the sourceaccessioncode and authoreventcode are not specified, so duplicate elimination will still occur correctly.)
- 09:27 PM Revision 4000: mappings/DwC2-VegBIEN.specimens.csv: Mapped locationID, for mergability with VegCSV
- 09:04 PM Revision 3999: mappings/VegCSV-VegBIEN.specimens.csv: plotName: Removed authoreventcode mapping because plotName relates to the plot, not the plot event. (The locationevent is scoped by the location when the authoreventcode is not specified, so duplicate elimination will still occur correctly.) Instead map only authoreventcode-related fields (currently CVS's authorObsCode) to authoreventcode, via DwC's (confusingly-named) fieldNumber ("An identifier given to the event in the field").
- 08:40 PM Revision 3998: schemas/vegbien.sql: locationevent: locationevent_unique_within_location: Added authoreventcode to index. It was already in the locationevent_unique_within_*parent*_by_authoreventcode index, but also needed to be in the no-parent (non-subplot) index. This fixes locationevent duplicate elimination when a locationevent sourceaccessioncode is not specified.
- 08:27 PM Revision 3997: schemas/vegbien.sql: location: location_unique_within_datasource unique index: Added COALESCE() and `WHERE sourceaccessioncode IS NOT NULL` now that sourceaccessioncode is nullable. Renamed location_unique_within_datasource and location_unique_authorlocationcode to location_unique_within_datasource_by_... to show that both are alternatives for globally unique keys. schemas/vegbien.ERD.mwb: Moved elements slightly to reduce the number of lines that need to be repositioned after syncing with the schema.
- 07:35 PM Revision 3996: mappings/DwC2-VegBIEN.specimens.csv: Mapped verbatimElevation and samplingProtocol, for mergability with VegCSV
- 07:12 PM Revision 3995: inputs/import.stats.xls: Updated with stats from latest import
Also available in: Atom