Project

General

Profile

Statistics
| Revision:

# Date Author Comment
4632 09/12/2012 11:31 AM Aaron Marcuse-Kubitza

inputs/XAL/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix

4631 09/12/2012 11:22 AM Aaron Marcuse-Kubitza

inputs/SpeciesLink/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix

4630 09/12/2012 11:02 AM Aaron Marcuse-Kubitza

inputs/SpeciesLink/Specimen/map.csv: Removed no longer needed duplicate entries for each first letter case, which cause duplicate output mappings now that join is case- and punctuation-insensitive. Note that the `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.

4629 09/12/2012 10:27 AM Aaron Marcuse-Kubitza

inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Added Comments column for consistency with autogenerated src.csv format

4628 09/12/2012 10:14 AM Aaron Marcuse-Kubitza

join: Added new passthru mode which passes through terms with no input mapping or no join mapping

4627 09/12/2012 09:25 AM Aaron Marcuse-Kubitza

inputs/: Added [Veg+] to via map roots to indicate that the datasource and Veg+ vocabularies are combinable. This is possible now that automapped entries are no longer subtracted when this is in the map root, so there is no concern of losing comments on subtracted automapped rows. Note that this change turns on old-style automapping for these datasources, causing SALVIAS plotMetadata to acquire additional mappings.

4626 09/12/2012 08:59 AM Aaron Marcuse-Kubitza

canon, translate, filter_out_ci: Support vocabularies/dictionaries with additional columns in addition to the functional column(s) used by the program. These columns can contain comments, etc. This was not originally supported because Python 2's iterable unpacking only supports "an iterable with the same number of items as there are targets in the target list" (http://docs.python.org/reference/simple_stmts.html#assignment-statements). We now use numeric array indexes instead to get around this limitation, and for consistency with other map-manipulation scripts.

4625 09/12/2012 08:21 AM Aaron Marcuse-Kubitza

Removed no longer used subtract (use filter_out_ci instead)

4624 09/12/2012 08:19 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: Removed no longer needed subtraction of automapped entries, because information about unmapped and new terms is now available in unmapped_terms.csv and new_terms.csv

4623 09/12/2012 08:13 AM Aaron Marcuse-Kubitza

README.TXT: Data import: `make backups/download`: Removed '&' because running the command in the background prevents rsync from providing a continuously updating progress indication (because a backgrounded process's stdout is not a TTY)

4622 09/12/2012 08:04 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Removed no longer needed /_simplifyPath:[next=parent_id]/path expressions in specific paths because parent_id forwarding is now set globally for all paths in the map root

4621 09/12/2012 07:56 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Added /_simplifyPath:[next=parent_id]/path to root so the returned subplot location will be its parent location if there is no subplot name or ID (indicating that that particular plot did not have subplots). Note that this also causes the parent_id forwarding effect to occur for all other tables containing parent_id, which will help prevent similar issues with subplot events, etc. This will hopefully fix the SALVIAS.plotObservations bug where some organisms did not have a subplot #, causing the subplot location to become NULL and causing the corresponding locationevent rows not to match the locationevent_unique_within_location index filter condition (which requires a parent_id), which caused multiple output table pkeys to be returned for those rows, violating the locationevent_pkeys temp table's primary key.

4620 09/12/2012 07:25 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: namedplace elements: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg

4619 09/12/2012 07:02 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4618 09/11/2012 11:04 AM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: $(newTerms): Fixed bug where tail with positive offset needs -n flag

4617 09/11/2012 11:01 AM Aaron Marcuse-Kubitza

Regenerated/modified inputs/*/*/src.csv to use the self-mapping format used by the new automapping mechanism

4616 09/11/2012 10:50 AM Aaron Marcuse-Kubitza

src_map: Map source columns to themselves so that src.csv can be used directly with the new automapping mechanism

4615 09/11/2012 10:48 AM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/new_terms.csv: Remove terms which are also in %/unmapped_terms.csv, because terms are not considered new (i.e. potential Veg+ terms) until they have been mapped to an existing Veg+ term. Being unmapped has a higher priority than being new, because it affects the current datasource itself rather than the easier mapping of future datasources.

4614 09/11/2012 10:22 AM Aaron Marcuse-Kubitza

lib/mappings.Makefile: missing_mappings: Display unmapped_terms.csv, new_terms.csv after generating them, to preserve the behavior of the original missing_mappings

4613 09/11/2012 10:17 AM Aaron Marcuse-Kubitza

root Makefile: Maps validation: Removed no longer used $(missingMappingsCmd)

4612 09/11/2012 10:17 AM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: Removed no longer used $(missingMappingsCmd)

4611 09/11/2012 10:16 AM Aaron Marcuse-Kubitza

lib/mappings.Makefile: Removed no longer needed missing_%_mappings targets, since unmapped_terms.csv and new_terms.csv now serve the same purpose in a more efficient way

4610 09/11/2012 10:14 AM Aaron Marcuse-Kubitza

lib/mappings.Makefile: `ifndef` for $(termsSubdirs): Fixed bug where needed to be termsSubdirs instead of missingMappingsCmd

4609 09/11/2012 10:02 AM Aaron Marcuse-Kubitza

lib/mappings.Makefile: Require $(termsSubdirs)

4608 09/11/2012 10:00 AM Aaron Marcuse-Kubitza

Generated global unmapped_terms.csv, new_terms.csv

4607 09/11/2012 10:00 AM Aaron Marcuse-Kubitza

root Makefile: Maps validation: Added $(termsSubdirs) to enable generation of global unmapped_terms.csv, new_terms.csv

4606 09/11/2012 09:59 AM Aaron Marcuse-Kubitza

inputs/: Generated combined unmapped_terms.csv, new_terms.csv for all inputs

4605 09/11/2012 09:58 AM Aaron Marcuse-Kubitza

lib/mappings.Makefile: $(catTerms): Fixed bug where only existing $+ files (using $(+w)) could be included in the list (both to check and to use), because otherwise cat would raise an error or try to read stdin

4604 09/11/2012 09:56 AM Aaron Marcuse-Kubitza

Existing maps discovery: Fixed bug where new unmapped_terms.csv, new_terms.csv needed to be included in $(anyMap)

4603 09/11/2012 09:52 AM Aaron Marcuse-Kubitza

lib/common.Makefile: Added $(+w)

4602 09/11/2012 09:22 AM Aaron Marcuse-Kubitza

lib/common.Makefile: Added $(no/) to remove trailing /

4601 09/11/2012 09:18 AM Aaron Marcuse-Kubitza

Extracted %/unmapped_terms.csv, %/new_terms.csv as separate targets in the Maps validation section so they can be invoked even when %/.map.csv.last_cleanup is not a top-level target (in $(MAKECMDGOALS)). Continue to invoke them in %/.map.csv.last_cleanup by using $(selfMake).

4600 09/11/2012 08:56 AM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: Set $(termsSubdirs) to enable unmapped_terms.csv, new_terms.csv generation

4599 09/11/2012 08:56 AM Aaron Marcuse-Kubitza

lib/mappings.Makefile: Added unmapped_terms.csv, new_terms.csv which are generated by combining the correspondingly-named files in $(termsSubdirs)

4598 09/11/2012 08:42 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Autoremove empty terms lists to avoid clutter

4597 09/11/2012 08:40 AM Aaron Marcuse-Kubitza

Added autoremove

4596 09/11/2012 08:22 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Remove the CSV header from the terms lists so that multiple terms lists can easily be appended together

4595 09/11/2012 08:16 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: unmapped_terms.csv, new_terms.csv: Factored out commands into $(newTerms)

4594 09/11/2012 08:09 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: Generate reports on new and unmapped terms in map.csv

4593 09/11/2012 08:07 AM Aaron Marcuse-Kubitza

Added filter_out_ci

4592 09/11/2012 07:26 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: Translate map.csv using $(mappings)/$(via)-VegCore.csv

4591 09/11/2012 07:25 AM Aaron Marcuse-Kubitza

Added translate

4590 09/11/2012 07:08 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed no longer used Comments column. Use mappings/Veg+.terms.csv to cite term definitions instead.

4589 09/11/2012 07:06 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: previousCatalogNumber: Removed no longer needed "According to" comment, because this is now documented in the mappings/Veg+.terms.csv entry. Note that the citation for any mapping is the overlap of the terms' definitions, and thus only the definitions need to be cited, not the mapping itself. (The definitions are provided in the links in mappings/Veg+.terms.csv.)

4588 09/11/2012 07:01 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: previousCatalogNumber: Added Source link to DwC history entry, which documents the definition of this term

4587 09/11/2012 06:43 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: %/.map.csv.last_cleanup: Canonicalize map.csv using $(mappings)/$(via).vocab.csv

4586 09/11/2012 06:40 AM Aaron Marcuse-Kubitza

Added canon

4585 09/11/2012 06:29 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.

4584 09/11/2012 05:49 AM Aaron Marcuse-Kubitza

Added mappings/Veg+.vocab.csv

4583 09/11/2012 04:41 AM Aaron Marcuse-Kubitza

inputs/GBIF/Specimen/map.csv: Remapped Original fields to new verbatim taxonomic terms

4582 09/11/2012 04:31 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.

4581 09/11/2012 04:23 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added min/max SlopeAspect/SlopeGradient

4580 09/11/2012 04:13 AM Aaron Marcuse-Kubitza

inputs/VegBank/plot_/map.csv: Omit reallatitude/reallongitude because private data should not be placed in a public database

4579 09/11/2012 04:10 AM Aaron Marcuse-Kubitza

inputs/CVS/Organism/map.csv: Omit realLatitude/realLongitude because private data should not be placed in a public database. Keeping VegBIEN free of restricted-access data allows anyone to run arbitrary queries on the database, without needing an entire security mechanism/front end just to manage users' read-only access to the data (as VegBank has). Note that the private coordinates are still accessible in the staging tables, so they will need to be locked down in order to make VegBIEN secure to public access.

4578 09/11/2012 03:16 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Remapped QuadratID to subplotID because the standard definition of an ID term is an ID that's unique within the datasource, and it's just CTFS's usage that makes it unique only within the plot

4577 09/11/2012 03:13 AM Aaron Marcuse-Kubitza

inputs/CTFS/StemObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID

4576 09/11/2012 03:09 AM Aaron Marcuse-Kubitza

inputs/CTFS/SubplotObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID

4575 09/11/2012 03:06 AM Aaron Marcuse-Kubitza

inputs/CTFS/Subplot/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID. Omit QuadratName because QuadratID is used for the same purpose.

4574 09/11/2012 02:57 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed recordNumber/_alt and recordNumber redirection mappings so that Veg+-VegCore.csv contains only renamings, not business logic. Note that removing the global ordering of these fields does not affect the datasources which contain multiple recordNumber synonyms because they either have a custom ordering or one field is duplicated or unused.

4573 09/11/2012 02:49 AM Aaron Marcuse-Kubitza

inputs/NY/Specimen/map.csv: Omit CollectorNumber because it is not used, so it does not need to be mapped

4572 09/11/2012 02:45 AM Aaron Marcuse-Kubitza

inputs/ARIZ/Specimen/map.csv: Omit FieldNumber because it is identical to CollectorNumber, so it does not need to be mapped

4571 09/11/2012 02:19 AM Aaron Marcuse-Kubitza

inputs/SpeciesLink/Specimen/map.csv: Added manual CollectorNumber mapping which places it after recordNumber/fieldNumber, so that mappings/Veg+-VegCore.csv doesn't need to maintain a global ordering between these fields and just needs to indicate their equivalency

4570 09/11/2012 02:09 AM Aaron Marcuse-Kubitza

mappings/: Removed no longer needed Veg+-VegCore.to_self.csv, because multiple levels of mappings are no longer needed to get to the VegCore term

4569 09/11/2012 02:07 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: DescriptionOfSite: Mapped directly to locality rather than to locationNarrative to avoid needing multiple levels of mappings to get to the VegCore term

4568 09/11/2012 01:56 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed scientificNameAuthorship/_alt and scientificNameAuthorship redirection mappings, which were only used by SpeciesLink but it now has the necessary _alts in its own map.csv

4567 09/11/2012 01:48 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed dateCollected/_alt and dateCollected redirection mappings, which were only needed when multiple dateCollected fields were being combined in Veg+-VegCore.csv

4566 09/11/2012 01:45 AM Aaron Marcuse-Kubitza

mappings/: Moved year/month/dayCollected mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayCollected values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateCollected field is now higher than the year/month/dayCollected fields when both are specified, because the dateCollected field presumably contains verbatim text while the year/month/dayCollected fields contain parsed date parts.

4565 09/11/2012 01:32 AM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/Organism/map.csv: Remapped census_date to eventDate, since it is not the start of a range

4564 09/11/2012 01:31 AM Aaron Marcuse-Kubitza

inputs/Madidi/Plot/map.csv: Remapped First evaluation to eventDate, since it is not necessarily the start of a range

4563 09/11/2012 01:23 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: startDate, endDate mappings: Removed _dateRangeStart/_dateRangeEnd filters because these are assumed to already be start and end dates of a range. (eventDate should be used for concatenated date ranges.)

4562 09/11/2012 01:09 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Don't map dateCollected to locationevent.obsstartdate/obsenddate because this is the date the specimen was collected, not the date (range) of the entire collection event. This distinction may not be meaningful for specimens data, but VegBIEN should reflect what the data provider designated. This also reduces the number of dateCollected-related mappings needed for any dateCollected-related field, such as year/month/dayCollected.

4561 09/11/2012 12:55 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Removed dateIdentified/_alt and dateIdentified redirection mappings, which were only needed when multiple dateIdentified fields were being combined in Veg+-VegCore.csv

4560 09/11/2012 12:50 AM Aaron Marcuse-Kubitza

mappings/: Moved year/month/dayIdentified mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayIdentified values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateIdentified field is now higher than the year/month/dayIdentified fields when both are specified, because the dateIdentified field presumably contains verbatim text while the year/month/dayIdentified fields contain parsed date parts.

4559 09/11/2012 12:34 AM Aaron Marcuse-Kubitza

mappings/: Moved verbatimGrowthForm filter mapping from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic

4558 09/11/2012 12:28 AM Aaron Marcuse-Kubitza

inputs/UNCC/Specimen/map.csv, inputs/NCU-NCSC/Specimen/map.csv: Remapped cultivated fields directly via new cultivated term, rather than via establishmentMeans

4557 09/11/2012 12:06 AM Aaron Marcuse-Kubitza

sql_io.py: mk_errors_table(): Don't cache the sql.table_exists() query, because the table will be created and its existence must be rechecked

4556 09/11/2012 12:02 AM Aaron Marcuse-Kubitza

sql.py: table_exists(): Allow caller to set whether query will be cached. This is useful if the table will later be created and its existence should be checked again.

4555 09/11/2012 12:00 AM Aaron Marcuse-Kubitza

sql.py: tables(): Allow caller to set whether query will be cached

4554 09/10/2012 11:51 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped cultivated

4553 09/10/2012 11:47 PM Aaron Marcuse-Kubitza

inputs/TEAM/: Added _src/README.TXT with Brad's comments on which files to use

4552 09/10/2012 11:01 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added cultivated

4551 09/10/2012 10:35 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Removed manual VACUUM run because this is done as part of $(exportHeader), which calls $(cleanup)

4550 09/10/2012 10:34 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(cleanup): Append output to log

4549 09/10/2012 10:21 PM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added pass-through _date(timestamp) for datasource date columns that are already timestamps

4548 09/10/2012 10:12 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Fixed bug where embedded \ in ADD COLUMN statement was not removed by the shell, because single quotes do not remove embedded \s

4547 09/10/2012 09:55 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: Also rename taxonobservation.reference_id to taxonobservation_reference_id

4546 09/10/2012 09:51 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(logInstall*Add): Fixed bug where needed to only add -a flag for tee when tee was actually being used (in verbose mode), not when &> is used instead

4545 09/10/2012 09:49 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/header.csv: Updated for new renames in vegbank.~.clean_up.sql

4544 09/10/2012 09:34 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also log the output of commands run after create.sql

4543 09/10/2012 09:30 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Factored $(call logInstall,$*/) out into $(logInstall*)

4542 09/10/2012 09:25 PM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added pass-through _dateRangeStart(timestamp), _dateRangeEnd(timestamp) for datasource date columns that are already timestamps

4541 09/10/2012 09:23 PM Aaron Marcuse-Kubitza

inputs/VegBank/plantconcept_/header.csv: Updated for new renames in vegbank.~.clean_up.sql

4540 09/10/2012 09:11 PM Aaron Marcuse-Kubitza

inputs/VegBank/plantconcept_/create.sql: Use new plantconcept_plantnames()

4539 09/10/2012 09:09 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: plantconcept_plantnames(): Use SQL SELECT query and WITH clause (http://www.postgresql.org/docs/8.4/static/queries-with.html) instead of temp table, because PostgreSQL does not support using temp tables inside functions that are called repeatedly (http://archives.postgresql.org/pgsql-general/2006-02/msg00516.php; it results in an "out of shared memory" error)

4538 09/10/2012 08:30 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: Removed hardcoded schema name, which is set dynamically by input.Makefile using `SET search_path`

4537 09/10/2012 08:26 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: Added plantconcept_plantnames()

4536 09/10/2012 07:28 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Made function STABLE instead of IMMUTABLE because it accesses DB tables

4535 09/10/2012 07:21 PM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where the original plantconcept table's columns needed to be renamed, rather than the derived table plantconcept_'s. Note that this script runs before any derived tables are created, so this would be the wrong place for these statements if the derived table's columns did need to be renamed.

4534 09/10/2012 07:05 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(dbExports): Sort each group of .sql files in lexical order, since $(wildcard) apparently does not sort them that way automatically on vegbiendev

4533 09/10/2012 06:53 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. Corrected input row count of CTFS.TaxonOccurrence, which had been set to the inserted row count (which is right above it in the log file).