Activity
From 08/29/2012 to 09/27/2012
09/27/2012
- 11:28 AM Revision 5081: import_all: Start the tnrs daemon using `make inputs/.TNRS/tnrs/tnrs-remake &`
- 11:25 AM Revision 5080: Added inputs/.TNRS/tnrs/tnrs.make to run tnrs_db on VegBIEN
- 11:25 AM Revision 5079: Added tnrs_db to scrub the taxonpaths in VegBIEN using TNRS
- 11:19 AM Revision 5078: Regenerated vegbien.ERD exports
- 11:17 AM Revision 5077: schemas/vegbien.sql: taxonpath: Made it datasource-general and uniquely identified only by its taxonomicnamewithauthor so that the taxonpaths imported by the TNRS datasource will be matched and used directly when the other datasources are imported
- 11:10 AM Revision 5076: schemas/vegbien.sql: taxonpath: taxonpath_unique_within_datasource_by_name unique index: Just do duplicate elimination on the taxonomicnamewithauthor, since that is now a required field and is generated by concatenating all the other fields. Note that the inserted row counts change slightly because the concatenation makes some names equal that are split among the fields differently, such as when the genus is included in the species field.
- 10:51 AM Revision 5075: db_xml.py: put(): Added _alt optimization that just returns the first arg if it's non-NULL
- 10:49 AM Revision 5074: sql_gen.py: Added is_nullable()
- 10:49 AM Revision 5073: schemas/vegbien.sql: taxonpath.taxonomicnamewithauthor: Made it NOT NULL, so that all taxonpaths would have a concatenated name to feed to TNRS
- 10:37 AM Revision 5072: mappings/VegCore-VegBIEN.csv: taxonomic terms: Changed _first to _alt because some datasources have NULL values in scientificNameWithAuthorship or scientificName, so it can't just be used in place of the joined-together taxonomic ranks
- 10:19 AM Revision 5071: db_xml.py: put(): Parse input columns and process values in separate loops, so that structural XML function optimization code can be inserted between them
- 10:12 AM Revision 5070: sql_io.py: put_table(): Removed comment that can support in_tables of any fixed-size iterable type, because the iterable must be ordered so that the first table can be treated specially
- 10:09 AM Revision 5069: sql_io.py: put_table(): Support in_tables of any fixed-size iterable type
- 09:13 AM Revision 5068: mappings/Veg+-VegCore.csv: cationExchangeCapacity->cationExchangeCapacity_cmol_kg mapping: Removed ? prefix because a mapping to only one set of units is unambiguous (if additional units for cationExchangeCapacity are found, this will become an ambiguous mapping). Note that canon automatically removes punctuation from VegCore terms, so this mapping would previously have had the ? prefix autoremoved anyway (both in inputs/*/*/map.csv and recently also in Veg+-VegCore.csv).
- 09:06 AM Revision 5067: mappings/Makefile: .Veg+-VegCore.csv.last_cleanup: Translate VegCore terms using itself so that any mapping to another Veg+ term automatically becomes a mapping to a VegCore term. .VegX-VegCore.csv.last_cleanup: Translate VegCore terms using Veg+-VegCore.csv to keep the terms up to date.
- 09:04 AM Revision 5066: mappings/VegX-VegCore.csv: Translated VegCore terms using Veg+-VegCore.csv
- 09:00 AM Revision 5065: mappings/Makefile: .VegCore.csv.last_cleanup, .VegCore-VegBIEN.csv.last_cleanup: Apply Veg+-VegCore.csv so that terms can easily be renamed just by adding a mapping in Veg+-VegCore.csv, which will auto-translate all places that use the term. .VegCore-VegBIEN.csv.last_cleanup: Canonicalize to VegCore.csv so case changes in VegCore terms will automatically propagate to VegCore-VegBIEN.csv.
- 08:46 AM Revision 5064: mappings/VegCore-VegBIEN.csv: Mapped verbatimScientificNameWithAuthorship, so that it links a verbatim taxonpath to the scrubbed taxonpath created from the primary taxonomic terms
- 08:36 AM Revision 5063: mappings/VegCore.csv: Renamed unscrubbedScientificNameWithAuthorship to the more standard verbatimScientificNameWithAuthorship, which is available now that the original taxondetermination terms use the original* prefix
- 08:31 AM Revision 5062: mappings/VegCore.csv: Renamed verbatim* taxonomic terms to original* because in most datasources, they are in fact for the *original* taxon determination of the organism (which can be a completely different name than the primary determination), rather than merely unscrubbed versions of the primary taxonomic name elements. Note that SALVIAS's orig_* terms do appear to be merely unscrubbed versions, but it's not a problem to add an additional taxon determination for them.
- 08:14 AM Revision 5061: sql.py: pkey(): Get the table's actual primary key column, rather than just using the first column in the table. Continue to return the first column in the table if the table has no primary key.
- 07:31 AM Revision 5060: inputs/.TNRS/tnrs/postprocess.sql: Use :table var instead of hardcoding the table name
- 07:30 AM Revision 5059: inputs/.TNRS/tnrs/postprocess.sql: Also add a primary key on Name_submitted, to prevent duplicate entries
- 07:27 AM Revision 5058: inputs/.TNRS/tnrs/: Added postprocess.sql which makes Name_submitted NOT NULL
- 07:25 AM Revision 5057: sql.py: insert(): ignore mode: Also ignore NullValueException
- 07:24 AM Revision 5056: input.Makefile: Staging tables installation: %/install: Support custom postprocess.sql which specifies commands to run after the table is imported
- 07:10 AM Revision 5055: import_all: Added import of .TNRS datasource, which happens synchronously before other datasources are imported
- 07:08 AM Revision 5054: Moved tnrs table from public (schemas/vegbien.sql) to its own TNRS schema, which is created by a new .TNRS datasource. Note that .TNRS is included in the automated testing, but not yet in the import.
- 06:57 AM Revision 5053: mappings/VegCore-VegBIEN.csv: Restored subplotID -> if subplot cond mapping, which had been overwritten
- 06:46 AM Revision 5052: inputs/ACAD/Specimen/map.csv: Remapped scientificName to scientificNameWithAuthorship
- 06:06 AM Revision 5051: sql_io.py: append_csv(): Using INSERT: Use ignore mode to support inserting rows into a table with a unique constraint
- 06:05 AM Revision 5050: sql.py: insert(): Added ignore optimization that just suppresses any DuplicateKeyException on the client side, to avoid needing to create a wrapper function just to insert-ignore one row
- 05:23 AM Revision 5049: mappings/VegCore-VegBIEN.csv: Synchronized verbatim* and non-verbatim taxonomic terms' mappings
- 05:08 AM Revision 5048: mappings/VegCore.csv: Added special term unscrubbedScientificNameWithAuthorship
- 05:05 AM Revision 5047: mappings/VegCore.csv: Added verbatimSubspecies, verbatimVariety, verbatimForma, verbatimCultivar (already mapped in VegCore-VegBIEN.csv)
- 05:04 AM Revision 5046: mappings/Makefile: .VegCore.csv.last_cleanup: Also remake VegCore-VegBIEN.unsourced_terms.csv here, not just in .VegCore-VegBIEN.csv.last_cleanup, so that the unsourced_terms.csv will be remade if the user adds the missing sources to VegCore.csv
- 05:03 AM Revision 5045: mappings/Makefile: VegCore-VegBIEN.unsourced_terms.csv: Factored remake code into its own make target
- 04:51 AM Revision 5044: mappings/VegCore-VegBIEN.csv: verbatim* taxonomic terms: Added taxonomicnamewithauthor mappings analogous to those for the non-verbatim taxonomic terms
- 04:29 AM Revision 5043: mappings/VegCore.csv: Added verbatimScientificNameWithAuthorship
- 03:50 AM Revision 5042: Added inputs/.public/, which stores mappings that manipulate VegBIEN itself
- 03:49 AM Revision 5041: forwarding.Makefile: Differentiate between subdirs which can be sent a command and subdirs which will receive a command broadcast to "all" subdirs
- 03:39 AM Revision 5040: README.TXT: Data import: Starting column-based import: Use import_all, which now supports passing custom vars like by_col=1
- 03:37 AM Revision 5039: import_all: Pass any args, such as vars, through to with_all
- 03:35 AM Revision 5038: with_all: Support additional command-line args for the make target, such as vars
- 03:11 AM Revision 5037: sql_io.py: append_csv(): Check that the CSV's header matches the table's columns
- 03:08 AM Revision 5036: schemas/vegbien.sql: Added tnrs table to hold contents of TNRS response
- 02:20 AM Revision 5035: input.Makefile: Existing maps discovery: $(anyMap): Inlined patterns used because they are only used here
- 01:27 AM Revision 5034: schemas/vegbien.sql: taxonpath_canon_taxonpath_id_self_ref(), placepath_canon_placepath_id_self_ref(): Fixed bug where the pkey could only be prepopulated if it was not already set, in order to support UPDATE as well as INSERT statements
- 01:15 AM Revision 5033: schemas/vegbien.sql: taxonpath.canon_taxonpath_id, placepath.canon_placepath_id: Fixed comment describing that the special value 0 creates an automatic self-reference
- 01:09 AM Revision 5032: schemas/vegbien.sql: taxonpath.canon_taxonpath_id, placepath.canon_placepath_id: Added trigger to automatically create a self-reference (indicating a scrubbed name) when set to the special value 0
- 12:33 AM Revision 5031: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)
- 12:28 AM Revision 5030: bin/map, db_xml.put_table() (row-based and column-based import): Don't sort the input table by its pkey, in order to support input tables with no pkey. Note that reading the input table in table order and having this match the input flat file's order is only possible with sql_io.import_csv()'s truncation of the table on a failed import, which ensures that the rows will be stored in inserted order.
- 12:19 AM Revision 5029: input.Makefile: Staging tables installation: Removed no longer used $(isJoinedTable). Note that it is no longer necessary for joined tables to be suffixed with ".src" to prevent the creation of a row_num column, which collided during joins.
- 12:17 AM Revision 5028: csv2db: Removed no longer used has_row_num param
- 12:14 AM Revision 5027: sql_io.py: import_csv(): Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)
09/26/2012
- 11:49 PM Revision 5026: bin/map, db_xml.put_table() (row-based and column-based import): Don't sort the input table by its pkey, in order to support input tables with no pkey. Note that reading the input table in table order and having this match the input flat file's order is only possible with sql_io.import_csv()'s truncation of the table on a failed import, which ensures that the rows will be stored in inserted order.
- 11:34 PM Revision 5025: sql_io.py: import_csv(): Only do the import in a savepoint if using COPY FROM, to allow autocommits after each insert and thus make rows visible immediately after they are inserted
- 10:53 PM Revision 5024: db_xml.py: put_table(): Subsetting in_table: Add a row number column if in_table does not already have a pkey
- 10:48 PM Revision 5023: db_xml.py: put_table(): Subsetting in_table: Copy all of in_table's structure, rather than just the column types, by using sql.copy_table_struct() and sql.insert_select(). This preserves pkeys and NOT NULL constraints, which are useful for column-based import.
- 10:47 PM Revision 5022: db_xml.py: put_table(): Subsetting in_table: Create in_table as a completely new sql_gen.Table instead of copying full_in_table and relying on sql.run_query_into() to set is_temp and remove the schema
- 10:40 PM Revision 5021: sql.py: add_row_num(): Use if_not_exists in order to abort if the column already exists rather than adding a version #
- 10:36 PM Revision 5020: sql.py: add_col(): Added if_not_exists param to abort if the column already exists rather than adding a version #
- 10:14 PM Revision 5019: db_xml.py: put_table(): Removed no longer accurate comment that full_in_table will be shadowed (hidden) by the created temp table. (The temp table is now named differently, so the shadowing does not occur.)
- 10:02 PM Revision 5018: db_xml.py: put_table(): Replaced no longer accurate Recurse comment with Import data. Rewrapped lines.
- 09:12 PM Revision 5017: sql_io.py: import_csv(): Factored insertion code out into new append_csv()
- 08:47 PM Revision 5016: README.TXT: Data import: `make test by_col=1`: Replaced errors explanation with pointer to updated explanation in the Testing section
- 08:31 PM Revision 5015: xml_func.py: Removed no longer used _name(). Use _join_words() instead.
- 08:30 PM Revision 5014: mappings/VegCore-VegBIEN.csv: Use new, more general _join_words() instead of _name()
- 08:22 PM Revision 5013: mappings/Veg+-VegCore.csv: Prefix ambiguous terms' VegCore replacement with "?" so it's visually flagged in map.csv, in the same way that unmatched terms are flagged with a "*" prefix
- 08:19 PM Revision 5012: mappings/VegCore-VegBIEN.csv: Taxonomic terms: Also join terms together in taxonomicnamewithauthor if scientificNameWithAuthorship is not provided, for use by TNRS
- 08:15 PM Revision 5011: xml_func.py: Simplifying functions: Merging: Added _join_words()
- 07:57 PM Revision 5010: inputs/ARIZ/Specimen/map.csv: Remapped ScientificNameAuthor to scientificNameWithAuthorship because it contains the binomial in addition to the authority
- 07:39 PM Revision 5009: schemas/functions.sql: Added _join_words()
- 07:33 PM Revision 5008: input.Makefile: Paths: $(datasrc): Remove any "." prefix from the subdir name. The "." prefix allows a subdir to be hidden from the normal import process.
- 06:56 PM Revision 5007: db_xml.py: put_table(): Allow caller to specify custom partition_size
- 06:45 PM Revision 5006: tnrs.py: tnrs_request(): Return the CSV stream directly instead of reading it into a string
- 06:42 PM Revision 5005: tnrs.py: tnrs_request(): Moved CSV-download-specific functionality from do_request() to the Download section
- 06:34 PM Revision 5004: inputs/import.stats.xls: Updated import times
09/25/2012
- 11:13 PM Revision 5003: tnrs.py: tnrs_request(): Return the response instead of printing it to stdout
- 10:59 PM Revision 5002: schemas/py_functions.sql: _namePart(): Fixed bug where it was returning the empty string instead of NULL
- 10:46 PM Revision 5001: sql_io.py: import_csv(): Documented that sql.truncate() MUST be run so that the rows will be stored in inserted order, and the row_num added after import will match up with the CSV's row order
- 10:35 PM Revision 5000: sql.py: add_row_num(): Add distinguishing comment to ADD COLUMN statement so that it will be cached. The distinguishing comment is required because sometimes column names are truncated, leading to unwanted collisions with previously-cached ADD COLUMN statements. It provides a way of distinguishing the full column name behind a particular ADD COLUMN statement.
- 10:24 PM Revision 4999: sql_io.py: import_csv(): Free memory used by deleted rows from any failed import. Documented that sql.create_table() is not rolled back if the import fails, but instead is cached, and will not be re-run if the import is retried.
- 09:37 PM Revision 4998: sql_io.py: import_csv(): Fixed bug where the added row number column needed to be named row_num instead of _row_num to be autodetected as the pkey column (sql.pkey_col) by sql.pkey() and to avoid name collisions with the row number column added in column-based import
- 09:34 PM Revision 4997: sql.py: add_row_num(): Support custom row number column name
- 09:12 PM Revision 4996: csv2db: Use new sql_io.import_csv()
- 09:10 PM Revision 4995: sql_io.py: Added import_csv()
- 09:05 PM Revision 4994: csv2db: Don't truncate the table before loading rows because it has just been created, and is therefore empty. This statement may be left over from a time when the table was created only once, and its creation was not rolled back if the import fails.
- 08:44 PM Revision 4993: sql_io.py: cleanup_table(): Print 'Cleaning up table' log message
- 08:41 PM Revision 4992: sql_io.py: cleanup_table(): Also vacuum and reanalyze table
- 07:43 PM Revision 4991: tnrs_client: Use new tnrs.tnrs_request()
- 07:43 PM Revision 4990: Added tnrs.py
- 07:34 PM Revision 4989: tnrs_client: Factored TNRS request code into separate function tnrs_request()
- 07:23 PM Revision 4988: inputs/VegBank/taxonimportance/map.csv: Documented that taxonimportance is not 1:1 with taxonobservation
- 07:22 PM Revision 4987: mappings/VegCore-VegBIEN.csv: Removed unnecessary /_first/# suffix for multiple terms in the same _exists expression, because _exists() only checks whether its node is non-empty, and it does not matter how many child nodes it contains
- 06:57 PM Revision 4986: schemas/vegbien.sql: taxonoccurrence: taxonoccurrence_unique_within_locationevent unique index: Fixed bug where locationevent_id needed to be enclosed in COALESCE(..., 2147483647) so that the unique constraint also applies to rows with NULL locationevent_ids (there is no other unique constraint handling these rows)
- 06:52 PM Revision 4985: README.TXT: Documented that if the row-based and column-based imports produce different inserted row counts, this usually means that a table is underconstrained (the unique indexes don't cover all possible rows). The inserted row count difference occurs because column-based import collapses empty table rows into one insert, while row-based import performs an insert of the empty row for each input row. Without a unique index to combine multiple row-based inserts, extra rows will be added.
- 06:48 PM Revision 4984: sql_io.py: put_table(): Warn if inserting empty table rows
- 06:13 PM Revision 4983: schemas/py_functions.sql: _namePart(): Fixed bug where it was returning the empty string instead of NULL
- 05:57 PM Revision 4982: schemas/functions.sql, py_functions.sql: Added schema comment that functions must always return NULL in place of the empty string, to ensure that empty strings do not find their way into VegBIEN. Note that row-based import automatically removes empty strings because the intermediate values are stored in XML and our XML DOM traversing code auto-replaces the empty string with NULL. Column-based import, on the other hand, does not, because the intermediate data is stored in database temp tables instead of a DOM tree.
- 05:31 PM Revision 4981: root map: Fixed custom public schema override to work with schemas lists that include public, by replacing public with the new public schema instead of just appending it
- 04:53 PM Revision 4980: inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.
- 04:52 PM Revision 4979: inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.
- 04:36 PM Revision 4978: ins_col: Added column fill value param
- 04:16 PM Revision 4977: inputs/VegBank/stemcount/map.csv: Fixed bug where taxonimportance_id needed to point to aggregateOccurrenceID instead of taxonOccurrenceID
- 04:15 PM Revision 4976: mappings/VegCore-VegBIEN.csv: Don't forward individualID to taxonoccurrence.sourceaccessioncode when aggregateOccurrenceID is present
- 03:52 PM Revision 4975: inputs/import.stats.xls: Updated import times
09/24/2012
- 06:45 PM Revision 4974: Regenerated vegbien.ERD exports
- 06:33 PM Revision 4973: schemas/vegbien.sql: placepath.otherranks comment: Added analogous text from taxonpath.otherranks
- 06:31 PM Revision 4972: schemas/vegbien.sql: taxonpath.author comment: Added equivalent Darwin Core term
- 06:27 PM Revision 4971: schemas/vegbien.sql: taxon columns: Added descriptive comments for data dictionary
- 06:15 PM Revision 4970: schemas/vegbien.sql: placepath: Added canon_placepath_id, analogous to taxonpath.canon_taxonpath_id
- 06:09 PM Revision 4969: schemas/vegbien.sql: place, placepath descriptive comments: Added analogous text from taxon/taxonpath
- 06:05 PM Revision 4968: schemas/vegbien.sql: taxonpath: descriptive comment: Changed "applicable taxon" to "identified taxon"
- 05:58 PM Revision 4967: schemas/vegbien.sql: taxon: descriptive comment: Reworded to emphasize that this stores only one rank (e.g. family) of the full taxonomic name, in contrast to taxonpath, which stores all of them
- 05:54 PM Revision 4966: schemas/vegbien.sql: taxonpath: descriptive comment: Clarified that this is the full path to a taxon, including all components of the taxonomic name
- 05:48 PM Revision 4965: schemas/vegbien.sql: Replaced "scientific name" with "taxonomic name" for schema-wide consistency and for consistency with the taxon/taxonomic name vocabulary
- 05:38 PM Revision 4964: schemas/vegbien.sql: taxonpath named ranks: Added descriptive comments for data dictionary
- 05:34 PM Revision 4963: schemas/vegbien.sql: taxonpath columns other than named ranks: Added descriptive comments for data dictionary
- 05:14 PM Revision 4962: schemas/vegbien.sql: taxonscope: descriptive comment: Reworded to make the first sentence a noun, for consistency with other descriptive table comments
- 05:13 PM Revision 4961: schemas/vegbien.sql: taxon: descriptive comment: Added note that the taxonname stores only one rank (e.g. family) of the full identifying name
- 05:07 PM Revision 4960: schemas/vegbien.sql: taxonpath: descriptive comment: Reworded to make the first sentence a noun, for consistency with other descriptive table comments. The convention is for the first "sentence" to be a noun which describes the entity that the table models.
- 05:00 PM Revision 4959: schemas/vegbien.sql: comments: Removed units from comments on fields which already have a units suffix, to avoid having to keep the units in sync between the suffix and the comment. Note that the units were abbreviated equally in the suffixes and comments, so this did not result in a loss of information other than the ^ for a quantity squared (but it's obvious enough that m2 is m^2).
- 04:54 PM Revision 4958: schemas/vegbien.sql: taxonscope: descriptive comment: Added period for consistency with other descriptive table comments
- 04:50 PM Revision 4957: schemas/vegbien.sql: taxon: Added descriptive comment for data dictionary
- 04:48 PM Revision 4956: schemas/vegbien.sql: VegBank-equivalent tables comments: Prepended "Equivalent to" before VegBank, so the equivalent tables statement can fit grammatically after a description of the table instead of having to be the first phrase in the descriptive table comment
- 04:41 PM Revision 4955: schemas/vegbien.sql: taxon: VegBank-equivalent tables comment: Added plantName and applicable columns from plantStatus, which are also part of the taxon table
- 04:37 PM Revision 4954: schemas/vegbien.sql: placepath: Added otherranks field, analogous to taxonpath.otherranks
- 04:26 PM Revision 4953: schemas/vegbien.sql: taxonpath: Added descriptive comment for data dictionary
- 03:36 PM Revision 4952: inputs/import.stats.xls: Updated import times
- 02:58 PM Revision 4951: inputs/UNCC/Specimen/map.csv: accession: Documented that it's globally unique, although occasionally duplicated
- 02:54 PM Revision 4950: inputs/REMIB/Specimen/map.csv: Remapped accession_number to catalogNumber because it is not globally unique, only (usually) unique within the institution providing the data ("acronym"). Note that there are nevertheless 11,869 rows where an accession_number appears multiple times within the same institution.
- 02:45 PM Revision 4949: mappings/VegCore-VegBIEN.csv: Only use institutionCode+collectionCode+catalogNumber as the authorlocationcode (location-scoping ID) if there is actually a catalogNumber. Otherwise, the mapping process would attempt to create one location for each collection in the datasource, when there should be one location for each specimen.
- 02:36 PM Revision 4948: schemas/py_functions.sql: _namePart(): Slice the first name from the beginning of the string to one word before the end, instead of one after the beginning, in order to avoid overlap with the last name, which starts one before the end, when there is only one word. Note that only one word means the name is assumed to be a last name. This assumption may not always be true, but when a datasource provides the name concatenated, an assumption must be made when not all name components are present.
- 02:30 PM Revision 4947: schemas/vegbien.sql: party: Added check constraint to require at least an organizationname or surname. Previously, NULL entries for the collector or identifier incorrectly caused the creation of an empty party entry, hence the lower inserted row counts now that this is no longer created.
- 02:17 PM Revision 4946: inputs/REMIB/Specimen/map.csv: Remapped acronym to institutionCode because this is an aggregator, and the field lists the datasource each record was aggregated from. Note that the inserted row count changes because of different duplicate elimination strategies in specimenreplicate and party (which institutionCode is placed in).
- 02:11 PM Revision 4945: inputs/REMIB/Specimen/create.sql: Also filter out rows where acronym (collectionCode) is NULL because this is a required field for valid records
- 01:28 PM Revision 4944: schemas/vegbien.sql: taxonpath: Renamed scientificnameauthor to author so the column name doesn't have "scientificname" in it, which made the term look confusingly like scientificname itself. Added descriptive comment that this is the author of the scientific name.
- 01:19 PM Revision 4943: schemas/vegbien.sql: taxonpath: Renamed canon_id to canon_taxonpath_id to clarify that this is a recursive fkey. The convention is that a recursive fkey includes the table name plus a descriptive prefix.
- 01:14 PM Revision 4942: schemas/filter_ERD.csv: Don't filter out fkeys from taxonpath to itself
- 01:04 PM Task #501 (Resolved): find out which datasources won't allow their data to be publicly accessible
- * needed before we can make VegBIEN public
These datasources are:
* "REMIB":http://www.conabio.gob.mx/remib/cgi... - 01:02 PM Task #500 (New): when lower rank has name concatenated together, use lowest rank as the scientific name
- 12:57 PM Task #499 (Resolved): map example terms into the taxonomic schema
- 12:57 PM Task #498 (Resolved): add definitions to columns in "green tables"
- 12:57 PM Task #497 (Resolved): create examples of taxonomic names to test the limits of the new taxonomic schema
- * need types of morphospecies indicators
- 11:32 AM Revision 4941: schemas/vegbien.sql: taxonpath: Added canon_id for the canonical (scrubbed) taxonpath determined by TNRS
- 11:24 AM Revision 4940: schemas/vegbien.sql: taxonpath: taxonpath_unique_within_datasource_by_name unique index: Added otherranks, so that ranks without a named column will be used in uniquely identifying the taxonpath
- 11:22 AM Revision 4939: sql.py: DbConn.col_info(): Parse array types as sql_gen.ArrayType
- 11:22 AM Revision 4938: sql_gen.py: EnsureNotNull: Support ArrayType types
- 11:21 AM Revision 4937: strings.py: remove_prefix(), remove_suffix(): Added require param to raise aan exception if the string does not have the given prefix/suffix
- 11:06 AM Revision 4936: sql.py: DbConn.col_info(): Moved parsing of user-defined datatypes to Python code, so that parsing for other composite types which also requires both data_type and udt_name can easily be added
- 11:03 AM Revision 4935: sql_gen.py: Added ArrayType
- 10:29 AM Revision 4934: schemas/vegbien.sql: Scope taxonpath instead of taxon with taxonscope, because a morphospecies name is specific to a datasource entity, so it should go in the datasource-specific taxonpath table instead of the datasource-general taxon table
- 10:14 AM Revision 4933: schemas/vegbien.sql: taxonpath: Added otherranks array column to store ranked names without a named column. Documented that ranks with no named column should be stored in this new field instead of in a chain of taxons pointed to by taxon_id. This ensures that only the tree of life uses the taxon table.
- 09:47 AM Revision 4932: schemas/vegbien.sql: Removed no longer used table stemtag, which has been replaced by stemobservation.tag, stemobservation.tags
09/21/2012
- 04:28 PM Revision 4931: inputs/ARIZ/Specimen/test.xml.ref: Updated after reinstalling staging table with new sql_io.null_strs
- 04:22 PM Revision 4930: inputs/VegBank/: Added stemlocation/
- 04:17 PM Revision 4929: inputs/VegBank/: Added stemcount/
- 04:10 PM Revision 4928: sql_io.py: cleanup_table(): Fixed bug where couldn't run any update statement when no columns are text
- 03:57 PM Revision 4927: csv2db: COPY FROM mode: Removed no longer needed explicit column list, now that the initial table has the exact width of the CSV (the row_num is added later)
- 03:55 PM Revision 4926: csv2db: Add any row_num column after creating the table, so it does not interfere with row widths when using COPY FROM without explicit column names
- 03:48 PM Revision 4925: csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
- 03:44 PM Revision 4924: csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
- 03:28 PM Revision 4923: inputs/VegBank/: Added taxonimportance/
- 03:20 PM Revision 4922: mappings/VegCore.csv: Added and mapped aggregateOccurrenceID
- 03:12 PM Revision 4921: mappings/VegCore.csv: taxonOccurrenceID: Re-sourced to VegBank taxonobservation and DwC occurrenceID, because this is where the VegBIEN table name came from
- 02:57 PM Revision 4920: tnrs_client: Support parsing multiple taxons at once, by specifying each as a command-line argument. Increased the max_pause to 10 min to support large batches. Limited the batch size to 5000 names, using the limit at <http://tnrs.iplantcollaborative.org/TNRSapp.html>. Note that when using xargs to pass many names, xargs will by default split its arguments into chunks of 5000. You can change this using the -n option.
- 02:29 PM Revision 4919: inputs/import.stats.xls: Updated import times
- 01:20 PM Revision 4918: Added tnrs_client. Note that obtaining an actual CSV requires four (!) steps: submit, retrieve, prepare download, and download. The output of the retrieve step is unusable because the array has different lengths depending on the taxonomic ranks present in the provided taxon name. This initial version runs one name at a time, but could later be expanded to batch process because TNRS can run multiple names at once.
- 12:36 PM Revision 4917: streams.py: Line iteration: Added read_all()
- 08:24 AM Revision 4916: inputs/Madidi/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
- 08:18 AM Revision 4915: sql_io.py: null_strs: Added '-'
- 08:18 AM Revision 4914: sql_io.py: cleanup_table(): Fixed bug where each column name needed to be converted to Unicode before being concatenated with other strings, to support non-ASCII characters
- 07:57 AM Revision 4913: inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
- 07:52 AM Revision 4912: inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Removed no longer needed old-style _units filter, now that unit conversion is handled by mappings/VegCore-VegBIEN.csv using _percent_to_fraction
- 07:48 AM Revision 4911: inputs/VegBank/observation_/map.csv: soilObs fields: Cited data dictionary source of units
- 07:15 AM Revision 4910: mappings/Veg+-VegCore.csv: Soil component measurements: Added unitless terms that automap to all alternatives of units
- 07:08 AM Revision 4909: mappings/VegCore.csv: Added term with *_fraction units for every *_percent term
- 07:03 AM Revision 4908: mappings/VegCore.csv: Soil component measurements: Added default units of percent (cmol_kg for cationExchangeCapacity). This involves translating the names everywhere and adding a _percent_to_fraction conversion in mappings/VegCore-VegBIEN.csv.
09/20/2012
- 11:15 PM Revision 4907: mappings/VegCore-VegBIEN.csv: Remapped verbatimLatitude/Longitude to locationcoords.verbatimlatitude/longitude because these fields now contain only non-decimal coordinates. This involves removing the _alt suffix on decimalLatitude/Longitude, which causes the VegBIEN.csvs to change.
- 11:11 PM Revision 4906: inputs/*/*/map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
- 11:06 PM Revision 4905: inputs/*/*/map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
- 10:54 PM Revision 4904: inputs/SpeciesLink/Specimen/map.csv: Documented that dwc_geospatial_VerbatimLatitude/Longitude contain a mix of DMS and other verbatim coordinates
- 10:47 PM Revision 4903: inputs/QMOR/Specimen/map.csv: Remapped verbatimLatitude/verbatimLongitude to latitude_DMS/longitude_DMS since these fields contain DMS values
- 10:43 PM Revision 4902: inputs/Madidi/Plot/map.csv: Remapped Latitude/Longitude (DMS) to new latitude_DMS/longitude_DMS
- 10:41 PM Revision 4901: mappings/VegCore-VegBIEN.csv: Mapped latitude_DMS, longitude_DMS
- 10:38 PM Revision 4900: mappings/VegCore.csv: Added latitude_DMS, longitude_DMS
- 10:34 PM Revision 4899: inputs/REMIB/Specimen/map.csv: Remapped lat_deg/long_deg to decimalLatitude/Longitude because these values are (integer) degrees suitable for decimalLatitude/Longitude. Note that the other DMS fields are not yet translated to decimal degrees.
- 10:28 PM Revision 4898: mappings/Veg+-VegCore.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
- 10:26 PM Revision 4897: mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".
- 10:23 PM Revision 4896: input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Don't automatically regenerate the aggregated unmapped_terms.csv, new_terms.csv because this almost doubles the remake time when a mappings/ prerequisite changes (41s -> 75s)
- 10:14 PM Revision 4895: mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".
- 10:09 PM Revision 4894: inputs/GBIF/Specimen/map.csv: Remapped VerbatimLatitude/Longitude to decimalLatitude/Longitude because DecimalLatitude/Longitude just contains VerbatimLatitude/Longitude cast to a low-resolution float, which created spurious repeating decimals
- 09:56 PM Revision 4893: mappings/Makefile: .VegCore-VegBIEN.csv.last_cleanup: Generate VegCore-VegBIEN.unsourced_terms.csv whenever VegCore-VegBIEN.csv changes, to track VegCore terms that are mapped to VegBIEN but not documented in VegCore.csv. Note that this file is *not* svn:ignored, so it will show up with a ? when the user runs `svn st` if there are any unsourced terms.
- 09:47 PM Revision 4892: mappings/Makefile: Changed catch-all `.%.last_cleanup: %` target to a specific target for VegCore-VegBIEN.csv, because it's the only file that uses this target
- 09:45 PM Revision 4891: mappings/: Don't generate a for_review version of Veg+-VegCore.csv, because it is identical to the machine-readable Veg+-VegCore.csv (there are no output XPaths to simplify)
- 09:41 PM Revision 4890: mappings/: Don't generate a for_review version of VegX-VegCore.csv, because it is identical to the machine-readable VegX-VegCore.csv (there are no output XPaths to simplify)
- 09:37 PM Revision 4889: mappings/: Removed Veg+.unmapped_terms.csv because these terms are found in each datasource's new_terms.csv, which are updated regularly, while this file isn't, and which exist for every datasource, while this file only contained terms from a few datasources
- 09:29 PM Revision 4888: inputs/ARIZ/Specimen/map.csv: Remapped VerbatimLatitude, VerbatimLongitude to UNUSED
- 09:21 PM Revision 4887: Regenerated root unmapped_terms.csv, new_terms.csv
- 09:19 PM Revision 4886: lib/mappings.Makefile: unmapped_terms.csv, new_terms.csv: Only remake if newer than existing %/unmapped_terms.csv, %/new_terms.csv which haven't been autoremoved. This avoids always remaking every unmapped_terms.csv, new_terms.csv whenever `make missing_mappings` is run. Note that these files will automatically be remade whenever their corresponding map.csv changes, so it is not necessary to actually remake %/unmapped_terms.csv, %/new_terms.csv; they are prerequisites only so that their modification time may be checked to determine whether unmapped_terms.csv, new_terms.csv needs to be remade.
- 09:11 PM Revision 4885: input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Automatically regenerate aggregated unmapped_terms.csv, new_terms.csv when a subdir's corresponding file changes
- 09:10 PM Revision 4884: inputs/: Regenerated aggregated unmapped_terms.csv, new_terms.csv
- 08:58 PM Revision 4883: inputs/REMIB/: Moved nodes.make into Specimen.src/ so it's with the data it generates
- 08:55 PM Revision 4882: inputs/TEAM/: Regenerated */new_terms.csv
- 08:30 PM Revision 4881: inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.
- 08:28 PM Revision 4880: inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.
- 06:58 PM Revision 4879: inputs/TEAM/VL, VT: Split concatenated flat files apart into separate parts each time a header is duplicated, so that the header would be autoremoved by cat_csv. Changed modified BIEN2 flat file headers back to original headers (the duplicated headers) so the headers of all part files would match up. (This is required for cat_csv header autoremoval to work properly.) This results in changes to the input column names in */map.csv.
- 06:49 PM Revision 4878: sql_io.py: null_strs: Added 'nulo' (used by REMIB)
- 06:13 PM Revision 4877: mappings/Veg+-VegCore.csv: DBH: Removed diameterBreastHeight_m alternative because datasources that don't append units to DBH almost always have units of cm or in
- 06:11 PM Revision 4876: inputs/TEAM/*/map.csv: Remapped dbh from diameterBreastHeight_m to diameterBreastHeight_cm, using the units defined in Vegetation-Metadata-1.4.pdf
- 06:05 PM Revision 4875: inputs/import.stats.xls: Updated import times
09/19/2012
- 11:16 PM Revision 4874: inputs/TEAM/: Added TeamPlotMetaData
- 11:09 PM Revision 4873: inputs/TEAM/_src/: Added ci-team_extract/Vegetation-Metadata-1.4.pdf and symlink to it in the _src subdir
- 10:51 PM Revision 4872: inputs/: Added aggregated unmapped_terms.csv, new_terms.csv which were not already under version control
- 10:41 PM Revision 4871: inputs/SALVIAS-CSV/Organism/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension
- 10:36 PM Revision 4870: inputs/SALVIAS/stems/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for plotObservations.intercept_cm, which measures the same dimension
- 10:33 PM Revision 4869: inputs/SALVIAS/plotObservations/map.csv: Remapped temp_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension
- 10:25 PM Revision 4868: inputs/Madidi/Organism/map.csv: Remapped Diameter from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the range and precision of values
- 10:23 PM Revision 4867: inputs/FIA/Organism/map.csv: DBH: Changed units comment to include that assumption was also based on location inside the U.S., because some data outside the U.S. also uses fractional DBHs, but these are not likely to be inch measurements
- 10:19 PM Revision 4866: inputs/FIA/Organism/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_in, assuming units based on the range and precision of values
- 10:16 PM Revision 4865: inputs/CTFS/StemObservation/map.csv: DBH: Changed units comment to include that assumption was also based on the precision of values, because fractional DBHs sometimes indicate units of inches
- 10:13 PM Revision 4864: mappings/VegCore.csv: Added diameterBreastHeight_in
- 10:09 PM Revision 4863: schemas/functions.sql: Added _in_to_m()
- 10:00 PM Revision 4862: mappings/Veg+-VegCore.csv: Remapped DBH from no longer existing term diameterBreastHeight to diameterBreastHeight_cm, diameterBreastHeight_m (both terms will be listed in the map spreadsheet after automapping, and the user can then choose one)
- 09:57 PM Revision 4861: inputs/CTFS/StemObservation/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units are cm based on the range of values
- 09:56 PM Revision 4860: mappings/VegCore.csv: Added diameterBreastHeight_cm
- 09:41 PM Revision 4859: mappings/VegCore.csv: Added stemID, which was only in mappings/VegCore-VegBIEN.csv
- 09:35 PM Revision 4858: input.Makefile: Maps validation: Inline $(unmappedTerms) because it's only used once
- 09:31 PM Revision 4857: input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.
- 09:19 PM Revision 4856: inputs/: Added unmapped_terms.csv, new_terms.csv which were not already under version control
- 08:43 PM Revision 4855: inputs/VegBank/plot_/: Automapped with new parentPlotID term, which now has a join mapping in mappings/VegCore-VegBIEN.csv
- 08:41 PM Revision 4854: Regenerated unmapped_terms.csv, new_terms.csv
- 08:24 PM Revision 4853: mappings/Veg+-VegCore.csv: Added parentPlotID
- 08:22 PM Revision 4852: mappings/VegCore-VegBIEN.csv: Added parentLocationID, parentPlotName, which always map directly to the parent location, regardless of whether any subplot ID is present
- 08:16 PM Revision 4851: mappings/Veg+.unmapped_terms.csv: Removed vague term volumeCanopy, which has no definition in VegX
- 08:14 PM Revision 4850: mappings/Makefile: .VegCore.csv.last_cleanup: Fixed bug where needed to change sorting columns to match new column order
- 08:11 PM Revision 4849: mappings/VegCore.csv: Reordered columns to put Comments first, which matches mappings/Veg+-VegCore.csv
- 08:08 PM Revision 4848: mappings/Veg+-VegCore.csv: Removed redundant stem_id->stemID mapping
- 08:07 PM Revision 4847: mappings/Veg+-VegCore.csv: Standardized the capitalization of names, by camel-casing each name except for acronyms and "ID", which are made all uppercase
- 07:59 PM Revision 4846: mappings/VegCore.csv: Renamed diameterBreastHeight to diameterBreastHeight_m to assert units matching the VegBIEN field
- 07:44 PM Revision 4845: mappings/VegCore.csv: Removed duplicates
- 07:22 PM Revision 4844: input.Makefile: Maps building: Use new mappings/VegCore.csv as the VegCore vocabulary to canonicalize on, in order to also canonicalize VegCore terms which are not yet mapped to VegBIEN. This results in several DwC terms getting their case standardized according to http://rs.tdwg.org/dwc/terms/. Continue to determine unmapped terms using mappings/VegCore-VegBIEN.csv, because a term should not be considered mapped until it has been mapped all the way through to VegBIEN.
- 07:12 PM Revision 4843: mappings/VegCore.csv: Removed trailing spaces from terms
- 07:05 PM Revision 4842: mappings/Veg+.unmapped_terms.csv: Removed duplicates of VegCore terms
- 07:02 PM Revision 4841: mappings/: Split Veg+.terms.csv into VegCore.csv and Veg+.unmapped_terms.csv
- 06:36 PM Revision 4840: mappings/Veg+.terms.csv: Removed terms that are in mappings/Veg+-VegCore.csv
- 06:31 PM Revision 4839: mappings/Veg+-VegCore.csv: Added sources where missing
- 06:20 PM Revision 4838: mappings/Veg+-VegCore.csv: Added Source and Comments columns from mappings/Veg+.terms.csv. Reordered columns to put Comments first.
- 06:17 PM Revision 4837: mappings/Veg+.terms.csv: Removed duplicate entries for stem_id/stemID, collector
- 05:56 PM Revision 4836: inputs/import.stats.xls: Updated import times
- 05:24 PM Revision 4835: inputs/REMIB/Specimen/: Filter out invalid, frameshifted rows so they don't produce errors in the import or anomalies like thousands of taxondeterminations for one taxonoccurrence. This involves moving the CSVs to Specimen.src and using a create.sql to create the filtered table.
- 04:47 PM Revision 4834: mappings/VegCore-VegBIEN.csv: Forward occurrenceID to taxonoccurrence.sourceaccessioncode when there is no other taxonoccurrence.sourceaccessioncode, to ensure that taxonoccurrence is uniquely identified so that there is one taxonoccurrence per organism
- 04:16 PM Revision 4833: mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.)
- 04:07 PM Revision 4832: mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.
- 03:36 PM Revision 4831: input.Makefile: Testing: %/test.by_col.xml: Do abort tester if by-column test fails. There are no longer small rowcount differences between row-based and column-based import on some datasources, so this is now possible.
09/18/2012
- 11:13 PM Revision 4830: schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Added tag so that a stemobservation can be scoped by its tag when no other ID is specified
- 11:11 PM Revision 4829: schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Fixed bug where filter condition underconstrained stemobservation when neither sourceaccessioncode nor authorstemcode was specified, by making sure that at least one *_unique index always applies
- 11:08 PM Revision 4828: mappings/VegCore-VegBIEN.csv: Remapped tag to new stemobservation.tag
- 11:06 PM Revision 4827: schemas/vegbien.sql: stemobservation: Added tag, tags
- 10:53 PM Revision 4826: mappings/VegCore-VegBIEN.csv: tag: Removed no longer applicable comment
- 10:49 PM Revision 4825: mappings/VegCore-VegBIEN.csv: Removed no longer used previousTag and the complex mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. tag: Removed iscurrent=true because there is now only one tag field.
- 10:41 PM Revision 4824: inputs/SALVIAS/*/map.csv: Remapped all versions of stem and tree tags to tag, with the second tag superceding the first, to avoid the complex VegCore-VegBIEN mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. inputs/SALVIAS-CSV/Organism/map.csv: stem and tree tags: Made the stem tag supercede the tree tag instead of vice versa, to have as specific of a tag as possible.
- 10:30 PM Revision 4823: inputs/SALVIAS/stems/map.csv: Copied Brad's comments on plotObservations.tag1, tag2 to stem_tag1, stem_tag2
- 10:18 PM Revision 4822: mappings/VegCore-VegBIEN.csv: Removed _rangeStart and _rangeEnd filters from fields which should contain decimal values. These filters should be added on a per-datasource basis instead.
- 10:12 PM Revision 4821: inputs/ARIZ/Specimen/map.csv: Documented that MinimumElevationInMeters, MinimumElevationInMeters contain some verbatim values, including ranges and units
- 10:09 PM Revision 4820: mappings/VegCore-VegBIEN.csv: Removed /_units:[default=m,to=m,to=]/value filter from fields. It should be added on a per-datasource basis instead.
- 10:05 PM Revision 4819: mappings/VegCore-VegBIEN.csv: Removed /_replace:["\bca\.?"=]/value filter from fields. It should be added on a per-datasource basis instead.
- 09:36 PM Revision 4818: mappings/VegCore-VegBIEN.csv: verbatimElevation->elevation_m mapping: Translate units automatically (currently only works in row-based mode). Don't remove any "ca." prefix because this is a datasource-specific filter that does not apply to current datasources with verbatimElevation. Also map verbatimElevation to location.verbatimelevation.
- 09:21 PM Revision 4817: inputs/NCU-NCSC/Specimen/map.csv: Elevation: Removed comment that it includes units, because this is now part of the definition of verbatimElevation
- 09:20 PM Revision 4816: mappings/Veg+.terms.csv: Documented that verbatimElevation must include units
- 09:14 PM Revision 4815: inputs/ARIZ/Specimen/map.csv: Remapped VerbatimElevation to UNUSED
- 09:11 PM Revision 4814: inputs/*/*/map.csv: Remapped all unused terms to special value UNUSED. Remapped all private terms to special value PRIVATE. Remapped all deliberately unmapped terms to special value OMIT.
- 08:53 PM Revision 4813: mappings/Veg+-VegCore.csv: Remapped realLatitude, realLongitude to new special value PRIVATE, which is more specific than OMIT
- 08:51 PM Revision 4812: mappings/Veg+.terms.csv: Added special value PRIVATE
- 08:44 PM Revision 4811: mappings/Veg+.terms.csv: Added special values OMIT, UNUSED
- 08:20 PM Revision 4810: inputs/VegBank/plot_/map.csv: Remapped elevation from verbatimElevation to elevationInMeters, since the values are all decimals. The units come from the data dictionary.
- 08:14 PM Revision 4809: inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Remapped elev_m from verbatimElevation to elevationInMeters, since the values are all decimals. Note that the units of SALVIAS Elev were provided by a comment from Brad (and can also be assumed to be the same as SALVIAS-CSV elev_m).
- 08:02 PM Revision 4808: inputs/NCU-NCSC/Specimen/map.csv: Documented that Elevation includes units
- 07:50 PM Revision 4807: inputs/Madidi/Plot/map.csv: Remapped Minimum altitude from minimumElevationInMeters to verbatimElevation_m, since it is a range, not a minimum. Note that the units are assumed based on the range of values present and the region the data is from (Madidi National Park).
- 07:46 PM Revision 4806: mappings/VegCore-VegBIEN.csv: Also mapped verbatimElevation_m to verbatimelevation
- 07:44 PM Revision 4805: mappings/VegCore-VegBIEN.csv: Also mapped verbatimElevation_m to elevationrange_m
- 07:38 PM Revision 4804: mappings/VegCore-VegBIEN.csv: Mapped verbatimElevation_m
- 07:31 PM Revision 4803: mappings/Veg+.terms.csv: Added verbatimElevation_m
- 07:28 PM Revision 4802: mappings/Veg+-VegCore.csv: Mapped realLatitude, realLongitude to OMIT because private data should not be placed in a public database
- 07:26 PM Revision 4801: mappings/Veg+.terms.csv: Added realLatitude, realLongitude
- 07:23 PM Revision 4800: inputs/VegBank/plot_/map.csv: Documented that elevationrange is unused
- 07:13 PM Revision 4799: inputs/Madidi/Plot/map.csv: Fixed comments on Direction and Orientación/exposicion so each comment refers to the other field that is equivalent
- 07:10 PM Revision 4798: inputs/Madidi/Plot/map.csv: Remapped Altitude from verbatimElevation to elevationInMeters, since the values are all decimals. Note that the units are assumed based on the range of values present and the region the data is from (Madidi National Park).
- 06:50 PM Revision 4797: inputs/CTFS/Plot/map.csv: Remapped Elevation from verbatimElevation to elevationInMeters, since it is a float in the original bci.sql database. Note that the units are assumed based on the range of values present and the country the data is from (Panama).
- 06:33 PM Revision 4796: mappings/VegCore-VegBIEN.csv: Mapped elevationInMeters
- 06:30 PM Revision 4795: mappings/Veg+.terms.csv: Added elevationInMeters
- 05:43 PM Revision 4794: schemas/vegbien.sql: location: Added verbatimelevation
- 05:21 PM Revision 4793: README.TXT: Data import: Added note that `make schemas/reinstall` must be done *after* running make_analytical_db on a previous import
- 05:18 PM Task #495 (Resolved): add separate datasource table rather than using party for this
- 05:16 PM Revision 4792: schemas/vegbien.sql: Added indexes for additional analytical_db_view joins, as described at <https://projects.nceas.ucsb.edu/nceas/issues/494>
- 05:14 PM Task #494 (Resolved): add indexes for the analytical_db_view joins
- Index added on specimenreplicate
- 05:01 PM Task #494: add indexes for the analytical_db_view joins
- Indexes added on locationevent, taxonoccurrence, aggregateoccurrence
- 04:38 PM Task #494 (Resolved): add indexes for the analytical_db_view joins
- * *_unique indexes are often used in joins, but some (such as locationevent_unique_within_location) have filter condi...
- 04:51 PM Revision 4791: schemas/vegbien.sql: Added indexes for the analytical_db_view joins, as described at <https://projects.nceas.ucsb.edu/nceas/issues/494>
- 04:28 PM Revision 4790: README.TXT: Data import: Added note that `make schemas/rotate` must be done *after* running make_analytical_db
- 04:17 PM Revision 4789: schemas/functions.sql: Renamed _pct_to_frac() to _percent_to_fraction() and _frac_to_pct() to _fraction_to_percent(), for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere
- 04:06 PM Revision 4788: review: Don't remove XML functions that are unit conversions
- 04:00 PM Revision 4787: schemas/vegbien.sql: Changed _frac units suffix to _fraction for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere
- 03:58 PM Revision 4786: schemas/vegbien.sql: Changed _frac units suffix to _fraction for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere
- 03:47 PM Revision 4785: inputs/*/*/map.csv: Remapped intercept_cm to new intercept_cm so that units match
- 03:45 PM Revision 4784: mappings/VegCore-VegBIEN.csv: Mapped intercept_cm
- 03:41 PM Revision 4783: schemas/functions.sql: Added _cm_to_m()
- 03:39 PM Revision 4782: mappings/Veg+.terms.csv: Added intercept_cm
- 03:35 PM Revision 4781: mappings/VegCore-VegBIEN.csv: Changed volumeCanopy to the more accurate intercept_m. volumeCanopy was the closest equivalent VegX term, but did not really fit line-intercept information, nor did it include units.
- 03:28 PM Revision 4780: mappings/Veg+.terms.csv: Added intercept_m
- 02:46 PM Revision 4779: schemas/vegbien.sql: taxonscope: Added comment that it stores the scope of a morphospecies name
- 02:32 PM Revision 4778: inputs/import.stats.xls: Updated import times
- 02:31 PM Revision 4777: README.TXT: Data import: Commit: Shortened import message to fit on one line in the README, to avoid issues when copying and pasting
09/17/2012
- 05:02 PM Revision 4776: schemas/functions.sql: Added _ha_to_m2(text), _pct_to_frac(text)
- 04:55 PM Revision 4775: schemas/vegbien.sql: analytical_db_view: Use _m2_to_ha() on location.area_m2 to get plotAreaHa
- 04:50 PM Revision 4774: schemas/vegbien.sql: analytical_db_view: Use _m2_to_ha() on location.area_m2 to get plotAreaHa
- 04:49 PM Revision 4773: schemas/functions.sql: Added _m2_to_ha()
- 04:46 PM Revision 4772: mappings/VegCore-VegBIEN.csv, Veg+.terms.csv: Removed imprecise and no longer used plotArea and area. Use plotArea_<units> instead.
- 04:44 PM Revision 4771: inputs/*/*/map.csv: Remapped applicable plotArea fields to plotArea_m2
- 04:41 PM Revision 4770: mappings/VegCore-VegBIEN.csv: Mapped plotArea_m2
- 04:40 PM Revision 4769: mappings/Veg+.terms.csv: Added plotArea_m2
- 04:39 PM Revision 4768: mappings/VegCore-VegBIEN.csv: Renamed plotAreaHa to plotArea_ha for consistency with VegBIEN units suffixing convention, which includes an "_"
- 04:35 PM Revision 4767: inputs/*/*/map.csv: Remapped applicable plotArea fields to plotAreaHa
- 04:19 PM Revision 4766: mappings/Veg+-VegCore.csv: Removed inaccurate SizeOfSite->plotArea mapping, which does not match units
- 04:16 PM Revision 4765: mappings/VegCore-VegBIEN.csv: Mapped plotAreaHa
- 04:16 PM Revision 4764: schemas/functions.sql: Added _ha_to_m2()
- 04:11 PM Revision 4763: mappings/Veg+.terms.csv: Added plotAreaHa
- 04:08 PM Revision 4762: mappings/Veg+.terms.csv: Standardize area using VegX /plots/plot/area instead of Madidi Inventory+description.Area
- 04:01 PM Revision 4761: schemas/vegbien.sql: analytical_db_view: Use _frac_to_pct() on aggregateoccurrence.cover_frac to get pctCover
- 03:43 PM Revision 4760: schemas/functions.sql: Added _pct_to_frac()
- 03:37 PM Revision 4759: mappings/VegCore-VegBIEN.csv: coverPercent: Convert to fraction using _pct_to_frac()
- 03:37 PM Revision 4758: xml_dom.py: replace_with_text(): Support ints and floats
- 03:36 PM Revision 4757: xml_dom.py: replace_with_text(): Support ints and floats
- 03:31 PM Revision 4756: xml_func.py: simplify(): Run xml_dom.prune_empty() on function nodes that don't have an explicit simplifying function. This allows single-arg functions with no arg to be pruned rather than called with no args (causing errors if the single param does not have a default value).
- 02:31 PM Revision 4755: Regenerated vegbien.ERD exports
- 02:29 PM Revision 4754: schemas/vegbien.sql: Added units suffix to additional VegBIEN fields that have units
- 02:01 PM Revision 4753: schemas/vegbien.sql: Added units suffix to all core VegBIEN fields that have units. It is the responsibility of the mappings to ensure that all units are properly translated.
- 12:18 PM Revision 4752: root Makefile: PostgreSQL: postgres-Linux: Added postgresql-postgis apt-get
- 11:58 AM Revision 4751: backups/Makefile: Backups: Full DB: Specify the date suffix of the backup when it's created rather than adding it afterwards. This allows the user to specify a suffix that matches the corresponding public-schema backup.
- 11:41 AM Revision 4750: inputs/*/*/map.csv: Mapped variants of subspecies directly to new subspecies term
- 11:31 AM Revision 4749: mappings/VegCore-VegBIEN.csv: subspecies, infraspecificEpithet: Added _alts for datasources that specify both
- 11:27 AM Revision 4748: input.Makefile: Mapping: $(map2db): Inline $(map) because this is the only place it's used
- 11:26 AM Revision 4747: input.Makefile: Mapping: $(map): Don't require flat files because they don't need to be used directly anymore (staging tables are used instead)
- 11:24 AM Revision 4746: input.Makefile: Mapping: $(map2db): Always use staging tables, because the flat files don't need to be used directly anymore
- 11:02 AM Revision 4745: mappings/Veg+-VegCore.csv: Remapped subspecies, subSpeciesName to new subspecies term
- 10:52 AM Revision 4744: mappings/VegCore-VegBIEN.csv: Mapped subspecies, variety, forma, cultivar
- 10:47 AM Revision 4743: mappings/Veg+.terms.csv: Added subspecies, variety, forma, cultivar
- 10:33 AM Revision 4742: Regenerated vegbien.ERD exports
- 10:30 AM Revision 4741: schemas/vegbien.sql: taxon.authority_id: Added descriptive comment that this is the authority which defines the taxon name (as opposed to the author of the taxon name)
- 10:29 AM Revision 4740: schemas/vegbien.sql: taxon: Added author_id for the author of the taxon name. This is distinct from authority_id, which is the authority used to determine which taxon name to apply.
- 10:14 AM Revision 4739: schemas/vegbien.sql: analytical_db_view: Use new denormalized placepath table instead of place, which significantly reduces the number of joins
- 10:11 AM Revision 4738: schemas/vegbien.sql: location: Removed stateprovince, country because these are now in placepath (as well as in place.rank)
- 10:06 AM Task #383: convert VegBank data dictionary to database comments
- Bob wants a VegBIEN data dictionary
- 10:01 AM Revision 4737: schemas/vegbien.sql: analytical_db_view: LEFT JOIN locationcoords and locationplace so that locations will be included even if they don't have one of these two determinations
- 10:00 AM Revision 4736: schemas/vegbien.sql: analytical_db_view: Fixed bug where method was being joined instead of left-joined, causing only rows with a method to be included
- 09:44 AM Revision 4735: Regenerated vegbien.ERD exports
- 09:41 AM Revision 4734: schemas/vegbien.sql: locationplace: Added identifier_id, so that different identifiers (e.g. the data provider and GNRS) can provide separate locationplaces even if the standardized name happens to be the same as the original name
- 09:31 AM Revision 4733: mappings/VegBank-VegBIEN.csv: Added place->locationplace renaming
- 09:30 AM Revision 4732: mappings/VegBIEN-VegBank.csv: Reversed the order of the columns so it's a more natural forward renaming, and renamed the file to VegBank-VegBIEN.csv to reflect the new column order
- 09:27 AM Revision 4731: mappings/VegBIEN-VegBank.csv: Fixed order of plantconcept->taxon renaming because the VegBIEN column is on the right
- 09:26 AM Revision 4730: schemas/vegbien.sql: Renamed namedplace to place for simplicity and consistency with placepath and locationplace
- 09:09 AM Revision 4729: schemas/vegbien.sql: taxon: Made authority an fkey to reference instead of a text field
- 09:03 AM Revision 4728: schemas/vegbien.sql: Moved steps to include a taxon name at a rank with no explicit column from taxon's comment to taxonpath's comment, because that is the table the steps apply to
- 09:00 AM Revision 4727: schemas/vegbien.sql: Added placepath (analogous to taxonpath), and point locationplace to it instead of directly to namedplace
- 08:11 AM Revision 4726: schemas/vegbien.sql: Split locationdetermination into locationcoords and locationplace, so that coordinate determinations can be made separately from place determinations
- 07:22 AM Revision 4725: schemas/vegbien.sql: location: Removed authore, authorn because this information is now in locationdetermination as verbatimlongitude, verbatimlatitude
- 07:20 AM Revision 4724: schemas/vegbien.sql: location: Removed centerlatitude/longitude, publiclatitude/longitude because this information is now in locationdetermination
- 07:09 AM Task #327 (Resolved): look into Clio
- *[[Column-based import]]* does effectively what "*Clio*":http://www.almaden.ibm.com/cs/projects/criollo/ does
- 07:07 AM Task #427 (Resolved): Load all plots data
- All *[[Databanks#BIEN 2 datasources|BIEN2 plots data*]] has been loaded, including the core fields of VegBank
- 07:05 AM Task #288 (Resolved): VegX-VegBank mapping
- We now map "VegX->VegCore":https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/mappings/VegX-VegCore.c...
- 07:03 AM Task #314 (Resolved): Import CTFS data
- 07:02 AM Task #368 (Rejected): get TEAM VegX data
- Not needed because we have the raw TEAM data, which is easier to work with than XML
- 07:01 AM Task #455 (Resolved): change summarizing queries to use vegbien staging tables
- 06:59 AM Task #441 (Resolved): import CTFS data using JOINs from DB export, not VegX
- 06:58 AM Task #317 (Rejected): Direct mapping from VegX to VegBIEN
- We instead have a mapping from "VegX to VegCore":https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/m...
- 06:49 AM Revision 4723: schemas/vegbien.ERD.mwb: Fixed lines
- 06:48 AM Revision 4722: mappings/VegBIEN-VegBank.csv: Added table rename plantconcept->taxon
- 06:47 AM Revision 4721: schemas/vegbien.sql: taxonpath.scientificnamewithauthor: Added comment that it's equivalent to "Name sec. x"
- 06:43 AM Revision 4720: schemas/vegbien.sql: taxon: Added comment that it's VegBank's plantConcept table
09/14/2012
- 11:21 PM Revision 4719: Regenerated vegbien.ERD exports
- 11:18 PM Revision 4718: schemas/vegbien.sql: Renamed plantconcept to taxonpath for consistency with DwC's Taxon category and to emphasize that the table stores taxonomic paths
- 11:11 PM Revision 4717: schemas/vegbien.sql: Renamed plantname to taxon for consistency with DwC's Taxon category
- 11:02 PM Revision 4716: schemas/vegbien.sql: plantname: Renamed plantname field to taxonname for consistency with DwC's Taxon category
- 10:55 PM Revision 4715: Regenerated vegbien.ERD exports
- 10:49 PM Revision 4714: Updated aggregated unmapped_terms.csv, new_terms.csv. This removes terms that contained a filter (which is now in a separate column) and moves new terms that are unmapped from new_terms.csv to unmapped_terms.csv. Note that the majority of unmapped terms are from VegBank's huge tables, and are not part of the core fields needed for the analytical DB.
- 10:41 PM Revision 4713: schemas/vegbien.sql: taxonrank: Switched to using extended taxonomic ranks list derived from VegX at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegBIEN_taxonomic_schema#Extended>. This renames *division to *phylum and splits up 'cultivar/forma'.
- 10:39 PM Revision 4712: schemas/vegbien.sql: taxonrank: Removed 'authority', which doesn't belong as a taxonomic rank
- 10:38 PM Revision 4711: schemas/vegbien.sql: plantname: Added authority so each taxonomic level can have its own authority (author). Include it in the plantname_unique unique index because plantname is a globally scoped table.
- 10:25 PM Revision 4710: schemas/vegbien.sql: taxonrank: Removed 'binomial', which doesn't belong as a taxonomic rank
- 10:24 PM Revision 4709: schemas/vegbien.sql: Changed analytical_db_view to use new denormalized taxonomic names in plantconcept, which significantly reduces the number of joins. Note that changing the tables used by a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing things to be moved around in the svn diff.
- 10:01 PM Revision 4708: inputs/Madidi/Organism/map.csv: Remapped Specie+autor to new scientificNameWithAuthorship. Mapped Species and morphotypes to now-available scientificName.
- 09:59 PM Revision 4707: mappings/VegCore-VegBIEN.csv: Moved scientificNameWithAuthorship before scientificName in taxonoccurrence.authortaxoncode's _alts
- 09:55 PM Revision 4706: mappings/VegCore-VegBIEN.csv: Mapped scientificNameWithAuthorship as an _alt of taxonoccurrence.authortaxoncode
- 09:53 PM Revision 4705: mappings/VegCore-VegBIEN.csv: Mapped scientificNameWithAuthorship
- 09:51 PM Revision 4704: mappings/Veg+.terms.csv: Added scientificNameWithAuthorship
- 09:47 PM Revision 4703: mappings/VegCore-VegBIEN.csv: Taxonomic names: Remapped to new denormalized fields in plantconcept
- 09:08 PM Revision 4702: schemas/vegbien.sql: plantname: Added comment documenting how to include a taxon name at a rank with no explicit column, by using the plantname table as an ordered linked list linked together using parent_id. (This method of using a linked list is one way of storing an ordered list of user-defined data. It is similar to using locationevent.previous_id to link successive reobservations of the same location together.) Note that plantname can store both the official tree of life and the data provider's own custom tree of life (or a subset thereof), with the two being distinguished by whether the data provider's or TNRS's taxondeterminations point to them.
- 08:53 PM Revision 4701: schemas/vegbien.sql: plantname: Added verbatimrank to store ranks of custom taxonomic levels, such as rosids. Note that even if you specify a custom verbatimrank, you must also specify a closest-match rank from the taxonrank closed list. This ensures that every taxonomic name is placed in the correct relative order in the taxonomic hierarchy.
- 08:38 PM Revision 4700: schemas/vegbien.sql: plantconcept: Made plantname_id optional because the datasource's plantconcepts do not need to be placed in the recursive plantname hierarchy
- 08:35 PM Revision 4699: schemas/vegbien.sql: plantconcept: Added datasource_id and appropriate unique indexes to enable scoping by datasource. Moved plantcode right after datasource_id because it will be used for the sourceaccessioncode (if any).
- 08:21 PM Revision 4698: schemas/vegbien.sql: Moved plantconcept.plantdescription to plantname and renamed it to description, so that a taxon of any rank can have a description
- 08:02 PM Revision 4697: schemas/vegbien.sql: plantconcept: Added denormalized taxonomic ranks from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegBIEN_taxonomic_schema#Primary> and concatenated scientific name fields
- 07:25 PM Revision 4696: Removed no longer used ucase_first
- 07:23 PM Revision 4695: Removed no longer used bin/union
- 07:22 PM Revision 4694: Removed no longer used join_union_sort
- 07:21 PM Revision 4693: Removed no longer used ci_map, because all relevant mapping scripts are now case-insensitive
- 07:19 PM Revision 4692: mappings/Makefile: Inline $(review_) because it's only used once
- 07:18 PM Revision 4691: mappings/Makefile: Removed no longer used $(review)
- 07:17 PM Revision 4690: mappings/Makefile: Don't set $(SHELL) to /bin/bash because this is no longer needed
- 07:16 PM Revision 4689: mappings/Makefile: Removed empty VegCSV section. mappings/Makefile's only functionality is now to clean up (sort) the core maps whenever they change and create human-readable maps from them.
- 07:13 PM Revision 4688: mappings/Makefile: Removed no longer used self maps, because the new automapping mechanism does not use them
- 07:09 PM Revision 4687: input.Makefile: Existing maps discovery: Substituted Veg+ for $(via) because it's now only used once
- 07:05 PM Revision 4686: mappings/VegCore-VegBIEN.csv: Changed input column header from VegCore[Veg+] to VegCore because this is more accurate. This is possible now that we're using new automapping scripts that do not require a particular column header.
- 06:39 PM Revision 4685: inputs/*/*/map.csv: Changed _merge to _join everywhere because _merge's (slower) duplicate elimination functionality is not needed (the combined columns do not both contain the same value, so they can simply be concatenated)
- 06:38 PM Revision 4684: inputs/*/*/map.csv: Changed _merge to _join everywhere because _merge's (slower) duplicate elimination functionality is not needed (the combined columns do not both contain the same value, so they can simply be concatenated)
- 06:21 PM Revision 4683: schemas/functions.sql: _label(): Accept params of any type, in order to support types other than text (which come from staging tables that are imported directly from a SQL export). This fixes a bug in SALVIAS.plotMetadata's column-based import.
- 06:17 PM Revision 4682: schemas/functions.sql: _label(): Support NULL labels by not prepending a label
- 06:04 PM Revision 4681: mappings/Veg+-VegCore.csv: Changed output column header from Veg+ to VegCore because this is more accurate. This is possible now that we're using new automapping scripts that do not require a particular column header. Note that this change now requires the map.csvs to use VegCore as their output column header, because otherwise the Veg+ header will get automapped to VegCore. (The header replacing is a feature to support changing the header when the schema of the column's terms changes.)
- 06:03 PM Revision 4680: mappings/root.sh: Changed output column header from Veg+ to VegCore because this is more accurate following the initial automapping
- 05:59 PM Revision 4679: inputs/*/*/map.csv: Changed output column header from Veg+ to VegCore because the names will be VegCore names after automapping. This is possible now that we're using new automapping scripts that do not require a particular column header.
- 05:53 PM Revision 4678: inputs/import.stats.xls: Copied the Change factor formula to all rows (it displays an empty string for rows that don't have both a row-based and a column-based import)
- 05:49 PM Revision 4677: README.TXT: Data import: Added steps to record the import times in inputs/import.stats.xls
- 05:42 PM Revision 4676: inputs/import.stats.xls: Updated with stats from latest import
- 05:40 PM Revision 4675: Added import_times
09/13/2012
- 02:40 PM Revision 4674: mappings/root.sh: Removed no longer needed $in_root_suffix
- 02:39 PM Revision 4673: src_map: Upgraded to match new map format by adding Filter column
- 02:38 PM Revision 4672: input.Makefile: $(viaMaps): Fixed bug where could not wrap it in $(wildcard) because that would prevent map.csv from being created when a new datasource or new subdir is added
09/12/2012
- 05:36 PM Revision 4671: input.Makefile: $(viaMaps): Removed extra addition of */map.csv, which is already included because all $(tables) have or will get a map.csv
- 05:34 PM Revision 4670: mappings/: Removed no longer used derived file Veg+.vocab.csv
- 05:33 PM Revision 4669: input.Makefile: Removed no longer used $(vocab)
- 05:32 PM Revision 4668: input.Makefile: Maps validation: %/new_terms.csv: Filter out $(coreMap) and $(dict) successively instead of $(vocab), to avoid requiring intermediate mapping files not edited by the user
- 05:28 PM Revision 4667: input.Makefile: Maps validation: $(newTerms): Don't hardcode the caller's first filter_out_ci by prerequisite position; instead allow them to specify the command (including the var name) themselves
- 05:24 PM Revision 4666: input.Makefile: Maps validation: $(newTerms): For simplicity, subset the columns before running filter_out_ci
- 05:20 PM Revision 4665: mappings/: Removed no longer used Veg+-VegBIEN.csv and derived autogen Veg+.self.csv
- 05:16 PM Revision 4664: input.Makefile: Maps building: %/unmapped_terms.csv: Use $(coreMap) instead of $(vocab) because the terms should already be translated to VegCore terms, rather than still being Veg+
- 05:13 PM Revision 4663: input.Makefile: Maps validation: $(newTerms): Fixed bug where header needed to be removed *before* running filter_out_ci because filter_out_ci only removes the header if it matches the vocabulary's header. Removing the header afterward can cause the first row to be removed instead if the header was already removed.
- 05:11 PM Revision 4662: cols: Support CSVs without a header, such as intermediates that become unmapped_terms.csv, new_terms.csv
- 04:37 PM Revision 4661: inputs/: Regenerated unmapped_terms.csv, new_terms.csv
- 04:25 PM Revision 4660: input.Makefile: %/.map.csv.last_cleanup: Removed no longer used prerequisite $(vocab)
- 04:24 PM Revision 4659: input.Makefile: %/.map.csv.last_cleanup: Canonicalize separately on $(coreMap) and $(dict), instead of requiring them to be combined in $(vocab)
- 04:20 PM Revision 4658: input.Makefile: Use mappings/VegCore-VegBIEN.csv instead of mappings/Veg+-VegBIEN.csv as the core map, because the automapper now takes care of Veg+ -> VegCore translation
- 04:14 PM Revision 4657: inputs/*/*/map.csv: Moved filter suffixes to separate filter column to enable automapping to work on those mappings' terms, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Move-filter-suffixes-to-separate-filter-column>. Note that the only changes to VegBIEN.csvs are the (now automapped) names of terms in "No join mapping" comments.
- 03:37 PM Revision 4656: inputs/*/*/map.csv: Added Filter column to contain any suffix added after the term, so that the automapping mechanism does not have to deal with the filter expressions
- 03:35 PM Revision 4655: Added cat_cols
- 03:34 PM Revision 4654: Added ins_col
- 03:13 PM Revision 4653: input.Makefile: Maps building: %/.map.csv.last_cleanup: Reference fixed prerequisites by name instead of by position in the prerequisites list
- 02:28 PM Revision 4652: Removed no longer used intersect
- 02:18 PM Revision 4651: inputs/*/*/map.csv: Removed no longer needed [Veg+] suffix in root, because the input column is no longer used by old-style map utilities such as union that needed this
- 02:07 PM Revision 4650: translate: Translate the column header instead of passing it through, in order to properly support CSVs without a header and to support renaming the header when the column's contents change to a different schema or vocabulary
- 02:04 PM Revision 4649: canon: Canonicalize the column header instead of passing it through, in order to properly support CSVs without a header
- 01:57 PM Revision 4648: filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.
- 01:48 PM Revision 4647: autoremove: `svn rm`: Fixed bug where needed to add --force in case the file had already been modified before being autoremoved
- 01:32 PM Revision 4646: input.Makefile: Maps building: Removed no longer used $(createOnlyMaps)
- 01:30 PM Revision 4645: input.Makefile: Maps building: Removed no longer used %/src.csv, because it is no longer needed to generate map.full.csv from map.csv
- 01:21 PM Revision 4644: input.Makefile: Maps building: %/map.csv: If it doesn't exist, generate directly using $(mkSrcMap) instead of by copying %/src.csv, in order to eventually avoid the need to create a separate src.csv at all. Note that this avoids the need to run make twice when the table is first created to properly bootstrap all maps.
- 01:09 PM Revision 4643: autoremove: Try `svn rm` first in case the file is in svn
- 01:02 PM Revision 4642: input.Makefile: Maps building: Removed no longer used %/map.full.csv
- 12:59 PM Revision 4641: input.Makefile: Maps building: %/VegBIEN.csv: Use %/map.csv directly because %/map.full.csv is now a copy of it
- 12:56 PM Revision 4640: input.Makefile: Maps building: %/map.full.csv: Generate by copying map.csv, because the content of these files now differs only in the sort order of the names
- 12:53 PM Revision 4639: inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.
- 12:43 PM Revision 4638: inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.
- 12:31 PM Revision 4637: join: passthru mode: Fixed bug where empty join mappings needed to have the output field of the right-hand row manually set to the output field of the left-hand row for maps.merge_mappings() to work properly
- 12:14 PM Revision 4636: inputs/*/*/map.csv: Added back automapped mappings to map.csv, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>
- 12:07 PM Revision 4635: inputs/VegBank/taxonobservation_/map.csv: Updated with new renamings of colliding join columns
- 12:00 PM Revision 4634: join: When a join mapping exists but is empty, still include any additional columns from that mapping in the combined row
- 11:48 AM Revision 4633: inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Use input term as the initial Veg+ term, so the src.csv can be used with the Add back automapped mappings process at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>
- 11:31 AM Revision 4632: inputs/XAL/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix
- 11:22 AM Revision 4631: inputs/SpeciesLink/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix
- 11:02 AM Revision 4630: inputs/SpeciesLink/Specimen/map.csv: Removed no longer needed duplicate entries for each first letter case, which cause duplicate output mappings now that join is case- and punctuation-insensitive. Note that the `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.
- 10:27 AM Revision 4629: inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Added Comments column for consistency with autogenerated src.csv format
- 10:14 AM Revision 4628: join: Added new passthru mode which passes through terms with no input mapping or no join mapping
- 09:25 AM Revision 4627: inputs/: Added [Veg+] to via map roots to indicate that the datasource and Veg+ vocabularies are combinable. This is possible now that automapped entries are no longer subtracted when this is in the map root, so there is no concern of losing comments on subtracted automapped rows. Note that this change turns on old-style automapping for these datasources, causing SALVIAS plotMetadata to acquire additional mappings.
- 08:59 AM Revision 4626: canon, translate, filter_out_ci: Support vocabularies/dictionaries with additional columns in addition to the functional column(s) used by the program. These columns can contain comments, etc. This was not originally supported because Python 2's iterable unpacking only supports "an iterable with the same number of items as there are targets in the target list" (http://docs.python.org/reference/simple_stmts.html#assignment-statements). We now use numeric array indexes instead to get around this limitation, and for consistency with other map-manipulation scripts.
- 08:21 AM Revision 4625: Removed no longer used subtract (use filter_out_ci instead)
- 08:19 AM Revision 4624: input.Makefile: Maps building: %/.map.csv.last_cleanup: Removed no longer needed subtraction of automapped entries, because information about unmapped and new terms is now available in unmapped_terms.csv and new_terms.csv
- 08:13 AM Revision 4623: README.TXT: Data import: `make backups/download`: Removed '&' because running the command in the background prevents rsync from providing a continuously updating progress indication (because a backgrounded process's stdout is not a TTY)
- 08:04 AM Revision 4622: mappings/VegCore-VegBIEN.csv: Removed no longer needed /_simplifyPath:[next=parent_id]/path expressions in specific paths because parent_id forwarding is now set globally for all paths in the map root
- 07:56 AM Revision 4621: mappings/VegCore-VegBIEN.csv: Added /_simplifyPath:[next=parent_id]/path to root so the returned subplot location will be its parent location if there is no subplot name or ID (indicating that that particular plot did not have subplots). Note that this also causes the parent_id forwarding effect to occur for all other tables containing parent_id, which will help prevent similar issues with subplot events, etc. This will hopefully fix the SALVIAS.plotObservations bug where some organisms did not have a subplot #, causing the subplot location to become NULL and causing the corresponding locationevent rows not to match the locationevent_unique_within_location index filter condition (which requires a parent_id), which caused multiple output table pkeys to be returned for those rows, violating the locationevent_pkeys temp table's primary key.
- 07:25 AM Revision 4620: mappings/VegCore-VegBIEN.csv: namedplace elements: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg
- 07:02 AM Revision 4619: inputs/import.stats.xls: Updated with stats from latest import
09/11/2012
- 11:04 AM Revision 4618: input.Makefile: Maps validation: $(newTerms): Fixed bug where tail with positive offset needs -n flag
- 11:01 AM Revision 4617: Regenerated/modified inputs/*/*/src.csv to use the self-mapping format used by the new automapping mechanism
- 10:50 AM Revision 4616: src_map: Map source columns to themselves so that src.csv can be used directly with the new automapping mechanism
- 10:48 AM Revision 4615: input.Makefile: Maps validation: %/new_terms.csv: Remove terms which are also in %/unmapped_terms.csv, because terms are not considered new (i.e. potential Veg+ terms) until they have been mapped to an existing Veg+ term. Being unmapped has a higher priority than being new, because it affects the current datasource itself rather than the easier mapping of future datasources.
- 10:22 AM Revision 4614: lib/mappings.Makefile: missing_mappings: Display unmapped_terms.csv, new_terms.csv after generating them, to preserve the behavior of the original missing_mappings
- 10:17 AM Revision 4613: root Makefile: Maps validation: Removed no longer used $(missingMappingsCmd)
- 10:17 AM Revision 4612: input.Makefile: Maps validation: Removed no longer used $(missingMappingsCmd)
- 10:16 AM Revision 4611: lib/mappings.Makefile: Removed no longer needed missing_%_mappings targets, since unmapped_terms.csv and new_terms.csv now serve the same purpose in a more efficient way
- 10:14 AM Revision 4610: lib/mappings.Makefile: `ifndef` for $(termsSubdirs): Fixed bug where needed to be termsSubdirs instead of missingMappingsCmd
- 10:02 AM Revision 4609: lib/mappings.Makefile: Require $(termsSubdirs)
- 10:00 AM Revision 4608: Generated global unmapped_terms.csv, new_terms.csv
- 10:00 AM Revision 4607: root Makefile: Maps validation: Added $(termsSubdirs) to enable generation of global unmapped_terms.csv, new_terms.csv
- 09:59 AM Revision 4606: inputs/: Generated combined unmapped_terms.csv, new_terms.csv for all inputs
- 09:58 AM Revision 4605: lib/mappings.Makefile: $(catTerms): Fixed bug where only existing $+ files (using $(+w)) could be included in the list (both to check and to use), because otherwise cat would raise an error or try to read stdin
- 09:56 AM Revision 4604: Existing maps discovery: Fixed bug where new unmapped_terms.csv, new_terms.csv needed to be included in $(anyMap)
- 09:52 AM Revision 4603: lib/common.Makefile: Added $(+w)
- 09:22 AM Revision 4602: lib/common.Makefile: Added $(no/) to remove trailing /
- 09:18 AM Revision 4601: Extracted %/unmapped_terms.csv, %/new_terms.csv as separate targets in the Maps validation section so they can be invoked even when %/.map.csv.last_cleanup is not a top-level target (in $(MAKECMDGOALS)). Continue to invoke them in %/.map.csv.last_cleanup by using $(selfMake).
- 08:56 AM Revision 4600: input.Makefile: Maps validation: Set $(termsSubdirs) to enable unmapped_terms.csv, new_terms.csv generation
- 08:56 AM Revision 4599: lib/mappings.Makefile: Added unmapped_terms.csv, new_terms.csv which are generated by combining the correspondingly-named files in $(termsSubdirs)
- 08:42 AM Revision 4598: input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Autoremove empty terms lists to avoid clutter
- 08:40 AM Revision 4597: Added autoremove
- 08:22 AM Revision 4596: input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Remove the CSV header from the terms lists so that multiple terms lists can easily be appended together
- 08:16 AM Revision 4595: input.Makefile: Maps building: %/.map.csv.last_cleanup: unmapped_terms.csv, new_terms.csv: Factored out commands into $(newTerms)
- 08:09 AM Revision 4594: input.Makefile: Maps building: %/.map.csv.last_cleanup: Generate reports on new and unmapped terms in map.csv
- 08:07 AM Revision 4593: Added filter_out_ci
- 07:26 AM Revision 4592: input.Makefile: Maps building: %/.map.csv.last_cleanup: Translate map.csv using $(mappings)/$(via)-VegCore.csv
- 07:25 AM Revision 4591: Added translate
- 07:08 AM Revision 4590: mappings/Veg+-VegCore.csv: Removed no longer used Comments column. Use mappings/Veg+.terms.csv to cite term definitions instead.
- 07:06 AM Revision 4589: mappings/Veg+-VegCore.csv: previousCatalogNumber: Removed no longer needed "According to" comment, because this is now documented in the mappings/Veg+.terms.csv entry. Note that the citation for any mapping is the overlap of the terms' definitions, and thus only the definitions need to be cited, not the mapping itself. (The definitions are provided in the links in mappings/Veg+.terms.csv.)
- 07:01 AM Revision 4588: mappings/Veg+.terms.csv: previousCatalogNumber: Added Source link to DwC history entry, which documents the definition of this term
- 06:43 AM Revision 4587: input.Makefile: Maps building: %/.map.csv.last_cleanup: Canonicalize map.csv using $(mappings)/$(via).vocab.csv
- 06:40 AM Revision 4586: Added canon
- 06:29 AM Revision 4585: mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.
- 05:49 AM Revision 4584: Added mappings/Veg+.vocab.csv
- 04:41 AM Revision 4583: inputs/GBIF/Specimen/map.csv: Remapped *Original fields to new verbatim* taxonomic terms
- 04:31 AM Revision 4582: mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.
- 04:23 AM Revision 4581: mappings/Veg+.terms.csv: Added min/max SlopeAspect/SlopeGradient
- 04:13 AM Revision 4580: inputs/VegBank/plot_/map.csv: Omit reallatitude/reallongitude because private data should not be placed in a public database
- 04:10 AM Revision 4579: inputs/CVS/Organism/map.csv: Omit realLatitude/realLongitude because private data should not be placed in a public database. Keeping VegBIEN free of restricted-access data allows anyone to run arbitrary queries on the database, without needing an entire security mechanism/front end just to manage users' read-only access to the data (as VegBank has). Note that the private coordinates are still accessible in the staging tables, so they will need to be locked down in order to make VegBIEN secure to public access.
- 03:16 AM Revision 4578: mappings/Veg+-VegCore.csv: Remapped QuadratID to subplotID because the standard definition of an ID term is an ID that's unique within the datasource, and it's just CTFS's usage that makes it unique only within the plot
- 03:13 AM Revision 4577: inputs/CTFS/StemObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID
- 03:09 AM Revision 4576: inputs/CTFS/SubplotObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID
- 03:06 AM Revision 4575: inputs/CTFS/Subplot/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID. Omit QuadratName because QuadratID is used for the same purpose.
- 02:57 AM Revision 4574: mappings/Veg+-VegCore.csv: Removed recordNumber/_alt and recordNumber redirection mappings so that Veg+-VegCore.csv contains only renamings, not business logic. Note that removing the global ordering of these fields does not affect the datasources which contain multiple recordNumber synonyms because they either have a custom ordering or one field is duplicated or unused.
- 02:49 AM Revision 4573: inputs/NY/Specimen/map.csv: Omit CollectorNumber because it is not used, so it does not need to be mapped
- 02:45 AM Revision 4572: inputs/ARIZ/Specimen/map.csv: Omit FieldNumber because it is identical to CollectorNumber, so it does not need to be mapped
- 02:19 AM Revision 4571: inputs/SpeciesLink/Specimen/map.csv: Added manual CollectorNumber mapping which places it after recordNumber/fieldNumber, so that mappings/Veg+-VegCore.csv doesn't need to maintain a global ordering between these fields and just needs to indicate their equivalency
- 02:09 AM Revision 4570: mappings/: Removed no longer needed Veg+-VegCore.to_self.csv, because multiple levels of mappings are no longer needed to get to the VegCore term
- 02:07 AM Revision 4569: mappings/Veg+-VegCore.csv: DescriptionOfSite: Mapped directly to locality rather than to locationNarrative to avoid needing multiple levels of mappings to get to the VegCore term
- 01:56 AM Revision 4568: mappings/Veg+-VegCore.csv: Removed scientificNameAuthorship/_alt and scientificNameAuthorship redirection mappings, which were only used by SpeciesLink but it now has the necessary _alts in its own map.csv
- 01:48 AM Revision 4567: mappings/Veg+-VegCore.csv: Removed dateCollected/_alt and dateCollected redirection mappings, which were only needed when multiple dateCollected fields were being combined in Veg+-VegCore.csv
- 01:45 AM Revision 4566: mappings/: Moved year/month/dayCollected mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayCollected values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateCollected field is now higher than the year/month/dayCollected fields when both are specified, because the dateCollected field presumably contains verbatim text while the year/month/dayCollected fields contain parsed date parts.
- 01:32 AM Revision 4565: inputs/SALVIAS-CSV/Organism/map.csv: Remapped census_date to eventDate, since it is not the start of a range
- 01:31 AM Revision 4564: inputs/Madidi/Plot/map.csv: Remapped First evaluation to eventDate, since it is not necessarily the start of a range
- 01:23 AM Revision 4563: mappings/VegCore-VegBIEN.csv: startDate, endDate mappings: Removed _dateRangeStart/_dateRangeEnd filters because these are assumed to already be start and end dates of a range. (eventDate should be used for concatenated date ranges.)
- 01:09 AM Revision 4562: mappings/VegCore-VegBIEN.csv: Don't map dateCollected to locationevent.obsstartdate/obsenddate because this is the date the *specimen* was collected, not the date (range) of the entire collection *event*. This distinction may not be meaningful for specimens data, but VegBIEN should reflect what the data provider designated. This also reduces the number of dateCollected-related mappings needed for any dateCollected-related field, such as year/month/dayCollected.
- 12:55 AM Revision 4561: mappings/Veg+-VegCore.csv: Removed dateIdentified/_alt and dateIdentified redirection mappings, which were only needed when multiple dateIdentified fields were being combined in Veg+-VegCore.csv
- 12:50 AM Revision 4560: mappings/: Moved year/month/dayIdentified mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayIdentified values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateIdentified field is now higher than the year/month/dayIdentified fields when both are specified, because the dateIdentified field presumably contains verbatim text while the year/month/dayIdentified fields contain parsed date parts.
- 12:34 AM Revision 4559: mappings/: Moved verbatimGrowthForm filter mapping from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic
- 12:28 AM Revision 4558: inputs/UNCC/Specimen/map.csv, inputs/NCU-NCSC/Specimen/map.csv: Remapped cultivated fields directly via new cultivated term, rather than via establishmentMeans
- 12:06 AM Revision 4557: sql_io.py: mk_errors_table(): Don't cache the sql.table_exists() query, because the table will be created and its existence must be rechecked
- 12:02 AM Revision 4556: sql.py: table_exists(): Allow caller to set whether query will be cached. This is useful if the table will later be created and its existence should be checked again.
- 12:00 AM Revision 4555: sql.py: tables(): Allow caller to set whether query will be cached
09/10/2012
- 11:51 PM Revision 4554: mappings/VegCore-VegBIEN.csv: Mapped cultivated
- 11:47 PM Revision 4553: inputs/TEAM/: Added _src/README.TXT with Brad's comments on which files to use
- 11:01 PM Revision 4552: mappings/Veg+.terms.csv: Added cultivated
- 10:35 PM Revision 4551: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Removed manual VACUUM run because this is done as part of $(exportHeader), which calls $(cleanup)
- 10:34 PM Revision 4550: input.Makefile: Staging tables installation: $(cleanup): Append output to log
- 10:21 PM Revision 4549: schemas/py_functions.sql: Added pass-through _date(timestamp) for datasource date columns that are already timestamps
- 10:12 PM Revision 4548: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Fixed bug where embedded \ in ADD COLUMN statement was not removed by the shell, because single quotes do not remove embedded \s
- 09:55 PM Revision 4547: inputs/VegBank/vegbank.~.clean_up.sql: Also rename taxonobservation.reference_id to taxonobservation_reference_id
- 09:51 PM Revision 4546: input.Makefile: Staging tables installation: $(logInstall*Add): Fixed bug where needed to only add -a flag for tee when tee was actually being used (in verbose mode), not when &> is used instead
- 09:49 PM Revision 4545: inputs/VegBank/taxonobservation_/header.csv: Updated for new renames in vegbank.~.clean_up.sql
- 09:34 PM Revision 4544: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Also log the output of commands run after create.sql
- 09:30 PM Revision 4543: input.Makefile: Staging tables installation: Factored $(call logInstall,$*/) out into $(logInstall*)
- 09:25 PM Revision 4542: schemas/py_functions.sql: Added pass-through _dateRangeStart(timestamp), _dateRangeEnd(timestamp) for datasource date columns that are already timestamps
- 09:23 PM Revision 4541: inputs/VegBank/plantconcept_/header.csv: Updated for new renames in vegbank.~.clean_up.sql
- 09:11 PM Revision 4540: inputs/VegBank/plantconcept_/create.sql: Use new plantconcept_plantnames()
- 09:09 PM Revision 4539: inputs/VegBank/vegbank.~.utils.sql: plantconcept_plantnames(): Use SQL SELECT query and WITH clause (http://www.postgresql.org/docs/8.4/static/queries-with.html) instead of temp table, because PostgreSQL does not support using temp tables inside functions that are called repeatedly (http://archives.postgresql.org/pgsql-general/2006-02/msg00516.php; it results in an "out of shared memory" error)
- 08:30 PM Revision 4538: inputs/VegBank/vegbank.~.utils.sql: Removed hardcoded schema name, which is set dynamically by input.Makefile using `SET search_path`
- 08:26 PM Revision 4537: inputs/VegBank/vegbank.~.utils.sql: Added plantconcept_plantnames()
- 07:28 PM Revision 4536: inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Made function STABLE instead of IMMUTABLE because it accesses DB tables
- 07:21 PM Revision 4535: inputs/VegBank/vegbank.~.clean_up.sql: Fixed bug where the original plantconcept table's columns needed to be renamed, rather than the derived table plantconcept_'s. Note that this script runs before any derived tables are created, so this would be the wrong place for these statements if the derived table's columns did need to be renamed.
- 07:05 PM Revision 4534: input.Makefile: Staging tables installation: $(dbExports): Sort each group of .sql files in lexical order, since $(wildcard) apparently does not sort them that way automatically on vegbiendev
- 06:55 PM Task #490 (New): change import.stats.xls to use field rather than row count
- * This will be more accurate, because different data sources have different #s of columns, and this affects the load ...
- 06:53 PM Revision 4533: inputs/import.stats.xls: Updated with stats from latest import. Corrected input row count of CTFS.TaxonOccurrence, which had been set to the inserted row count (which is right above it in the log file).
- 06:35 PM Revision 4532: schemas/vegbien.sql: taxonrank: Added comment documenting source of values
09/07/2012
- 04:57 PM Revision 4531: inputs/VegBank/taxonobservation_/map.csv: Mapped observation_id to eventID
- 04:49 PM Revision 4530: inputs/TEAM/: Added VL
- 04:43 PM Revision 4529: inputs/VegBank/: Added taxonobservation_/
- 04:43 PM Revision 4528: inputs/VegBank/: Added plantconcept_/
- 04:22 PM Revision 4527: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Ignore errors if create.sql already added a primary key
- 04:12 PM Revision 4526: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Provide the table name as a var (:table) to the query
- 03:56 PM Revision 4525: inputs/VegBank/vegbank.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined
- 03:55 PM Revision 4524: to_do/timeline.doc: Updated to reflect additional time that validations will take, and analytical DB's dependency on it
- 02:54 PM Revision 4523: Added validation/
- 12:56 PM Revision 4522: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Time the install
- 12:54 PM Revision 4521: inputs/VegBank/: Added plantconcept_/
- 12:35 PM Revision 4520: inputs/VegBank/vegbank.~.utils.sql: plantconcept_ancestors(): Renamed ancestor_id output param to plantconcept_id for clarity and so it can be directly USING-joined with plantconcept on plantconcept_id
- 12:24 PM Revision 4519: inputs/VegBank/: Added vegbank.~.utils.sql (which runs after vegbank.sql), for use by tables' create.sql scripts
- 10:57 AM Revision 4518: inputs/import.stats.xls: Updated with stats from latest import
- 10:43 AM Revision 4517: inputs/VegBank/: Added observation_/
- 10:31 AM Revision 4516: inputs/VegBank/: Added vegbank.~.clean_up.sql (which runs after vegbank.sql), to prevent "cannot alter type of a column used by a view or rule" errors
- 10:14 AM Revision 4515: inputs/VegBank/: Added plot_/
- 10:13 AM Revision 4514: inputs/VegBank/: Added plot_/
- 10:13 AM Revision 4513: inputs/VegBank/: Added logs
- 10:12 AM Revision 4512: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Log the output to the install log, just like for other %/install targets
- 10:06 AM Revision 4511: vegbien_dest: schemas: Added public explicitly, even though it's already in the default search_path, in order to shadow any datasource's tables of the same name as a VegBIEN table (such as in VegBank). (VegBIEN tables are referenced without a schema, while datasource tables are referenced with a schema, so collisions are not a problem after this fix.)
- 09:55 AM Revision 4510: input.Makefile: Staging tables installation: sql/install: Fixed bug where needed space before \ at end of line, because one is not automatically added in a recipe command (although it's added elsewhere)
- 09:51 AM Revision 4509: sql.py: run_query(): DuplicateException: Also match "of relation" part of error message, so that parsed column name does not contain "of relation"
- 09:24 AM Revision 4508: subtract: Made it case- and punctuation-insensitive
- 09:18 AM Revision 4507: mappings/: Removed no longer needed Veg+.cs-VegBIEN.csv, which is now the same as Veg+-VegBIEN.csv which was derived from it
- 09:16 AM Revision 4506: join: Documented that it's case- and punctuation-insensitive.
- 09:16 AM Revision 4505: bin/map: map_table(): Refactored to map simplified to original column names first and then determine column index for each original name, in order to avoid trying to recover the original name from a simplified name where multiple original names might collide onto the same simplified name. Documented that it's case- and punctuation-insensitive.
- 09:11 AM Revision 4504: intersect, union: Made case- and punctuation-insensitive. mappings/Veg+-VegBIEN.csv: Removed no longer needed duplicate entries for each first letter case, which must now be removed for case- and punctuation-insensitive intersect/union to work. Note that the SpeciesLink `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.
- 08:42 AM Revision 4503: bin/map: map_table(): Resolve all mappings and prefixes after applying maps.simplify()
- 08:37 AM Revision 4502: inputs/SpeciesLink/Specimen/map.csv: _alt all scientificNameAuthorship synonyms together in one _alt
- 08:27 AM Revision 4501: schemas/functions.sql: _alt(): Added extra numbered parameters. Eventually these will need to be converted to variadic args, but this will require special support from column-based import.
- 07:26 AM Revision 4500: join: Use new maps.simplify()
- 07:26 AM Revision 4499: maps.py: Added simplify()
- 07:23 AM Revision 4498: join: Match terms with non-alphanumeric chars removed
- 07:15 AM Revision 4497: join: Match terms case-insensitively
09/06/2012
- 11:17 PM Revision 4496: Added inputs/TEAM/
- 10:55 PM Revision 4495: sql_io.py: put_table(): Creating the into table: into_out_pkey: If is_function, just use "result" as the output column name, without prefixing the function name. This shortens the table names of function calls on function calls, which need a fixed column name to detect which columns are function results and use just the table names for those columns.
- 10:32 PM Revision 4494: input.Makefile: Documentation: $(steps): Fixed bug where import make target needed to be changed to new single-table import target
- 09:38 PM Revision 4493: schemas/vegbien.sql: analytical_db_view: Changed LEFT JOINs to JOINs where tables contain information that's required for the analytical DB. This should also enable the PostgreSQL query planner to make additional join optimizations, in the hopes of avoiding disk-space-intensive hash joins.
- 08:42 PM Revision 4492: Replaced repr() with strings.urepr() (or equivalent) everywhere needed, to avoid future UnicodeEncodeErrors
- 08:30 PM Revision 4491: Replaced str() with strings.ustr() (or equivalent) everywhere needed, to avoid future UnicodeEncodeErrors
- 08:03 PM Revision 4490: sql.py: map_expr(): Replacing without quotes: Don't match unquoted name where it's preceded or followed by '.', because this could be a '.' embedded in a punctuation-containing column name, such as those frequently used by column-based import. Note that because database-internal names currently do not contain punctuation, this situation only occurs when a database-internal expression (such as a check constraint condition) is replaced in two steps, and the first step introduces punctuation-containing column names into the expression.
- 07:19 PM Revision 4489: schemas/vegbien.sql: project: Don't require projectname to be specified when sourceaccessioncode is provided
- 07:14 PM Revision 4488: sql_gen.py: ensure_not_null(): If type_ is set, cast the column to it if needed
- 06:56 PM Revision 4487: README.TXT: Data import: Added testing steps to perform on local machine before running the import
- 06:49 PM Revision 4486: README.TXT: Documentation: Redmine-formatted list of steps for column-based import: Updated make command for new table subdir name
- 06:27 PM Revision 4485: sql.py: run_query(): Parse "types cannot be matched" error as MissingCastException to type text
- 06:10 PM Revision 4484: sql_io.py: put_table(): Creating the into table: Fixed bug where in_pkey and out_pkey names would collide if the output and input pkeys have the same name (as is the case for SALVIAS.projects). This entails changing out_pkey to new into_out_pkey wherever the into table's out_pkey is created or referenced.
- 05:06 PM Revision 4483: sql_io.py: put_table(): Combining output and input pkeys in inserted order: Changed sql_gen.Table to sql_gen.Col when creating the column references (they have a similar effect, so using the wrong type did not cause any tests to fail)
- 04:49 PM Revision 4482: README.TXT: Added steps before the import to `svn up` and update the schemas
- 04:47 PM Revision 4481: README.TXT: Merged Backups > After a new import and Data import sections into one Data import section that contains the steps to perform and back up an import. Note that many `svn diff` lines result from a change in indentation.
- 04:35 PM Revision 4480: sql_io.py: put_table(): Combining output and input pkeys in inserted order: Fixed bug where column references would be ambiguous if the output and input pkeys have the same name (as is the case for SALVIAS.projects)
- 04:21 PM Revision 4479: schemas/functions.sql: Added _nullIf() overload where the type param has type text, to handle cases where row-based import auto-casts all args to text in response to a 'could not determine polymorphic type because input has type "unknown"' error
- 04:18 PM Revision 4478: schemas/vegbien.sql: party: Removed party_datasource unique index because it was causing problems with column-based import (due to multiple unique indexes covering the same columns in different ways), and because it prevented creation of more than one party per organization
- 03:54 PM Revision 4477: xml_func.py: _if(): Documented that it must be run to remove conditions that functions._if() can't handle
- 03:42 PM Revision 4476: README.TXT: Datasource setup: Testing: Added step to test column-based import (by_col=1), because it is stricter about types than row-based import and sometimes fails when row-based import succeeds
09/05/2012
- 09:18 AM Revision 4475: schemas/functions.sql: _nullIf(): Polymorphically support other datatypes besides text
- 09:09 AM Revision 4474: bin/map: Clearing errors table: Fixed bug where needed to check if sql_io.errors_table() returned None (indicating that the errors table didn't exist) before calling sql.drop_table()
- 09:04 AM Revision 4473: bin/map: Clearing errors table: Fixed bug where needed to use sql.drop_table() instead of sql.truncate() now that errors tables are not created until column-based import runs
- 08:54 AM Revision 4472: input.Makefile: Maps validation: $(missingMappingsCmd): Fixed bug where need to use system's sort, not bin/sort, now that bin/ is added to the PATH by this makefile
- 08:34 AM Revision 4471: inputs/SALVIAS/verify/plots.ref: Regenerated on PostgreSQL staging tables. The orders have changed slightly because this is derived from a PostgreSQL translation of the queries, with corresponding changes in collations and NULL sort orders. The counts have also changed slightly, possibly due to the changes Brad made to the salvias_plots database on nimoy after the initial version was downloaded. (The current counts are correct according to the current salvias_plots database.)
- 08:31 AM Revision 4470: inputs/SALVIAS/verify/plots.ref.sql: # locations: Fixed bug where a NULL value in LatDec or LongDec would propagate to the concatenated value, reducing its uniqueness
- 08:26 AM Task #484 (Resolved): support installing staging tables directly from a MySQL export
- 08:14 AM Revision 4469: inputs/SALVIAS/verify/plots.ref.sql: Retrofitted to work with PostgreSQL staging tables
- 07:51 AM Revision 4468: schemas/vegbien.sql: project: Added project_unique_name_date unique index for projects that don't have a sourceaccessioncode
- 07:46 AM Revision 4467: inputs/SALVIAS/plotMetadata/map.csv: Remapped project_id to project.sourceaccessioncode
- 07:37 AM Revision 4466: inputs/SALVIAS/: Added projects/
- 07:32 AM Revision 4465: input.Makefile: Sources: $(catSrcs): Fixed bug where needed to use cat_csv even if subdir was not actually a CSV table, because this also cats the header.csv file created for a subdir that references an already-installed staging table
- 07:26 AM Revision 4464: input.Makefile: Existing maps discovery: Fixed bug where top-level logs dir needed to be excluded from list of subdirs that are treated as tables
- 07:00 AM Revision 4463: my2pg: Prepend 'SET standard_conforming_strings = off;' because this defaults to on starting with PostgreSQL 9.1
- 06:41 AM Revision 4462: schemas/vegbien.sql: locationevent: Made location_id optional when sourceaccessioncode is provided, since a sourceaccessioncode is globally unique and does not require a location to scope it
- 06:36 AM Revision 4461: input.Makefile: Staging tables installation: Store install logs for full-DB exports in new logs subdir of main dir. This also fixes a bug where the install log itself was considered a DB export, because its extension was .log.sql.
- 06:33 AM Revision 4460: Added inputs/SALVIAS/logs/
- 06:33 AM Revision 4459: input.Makefile: SVN: add: Also add logs subdir of main dir, to store install logs for full-DB exports
- 06:23 AM Revision 4458: mappings/VegCore-VegBIEN.csv: if subplot: Also forward locationID and plotName to the location of the parent locationevent (in addition to the parent location of the location), in order to "complete the diamond" connecting subplot locationevent -> (parent plot locationevent, subplot location) -> parent plot location
- 06:09 AM Revision 4457: sql_io.py: cleanup_table(): NullValueException: Log the caught exception so it's clear that the update is being retried
- 06:05 AM Revision 4456: input.Makefile: Staging tables installation: %/install: Fixed bug where $(if $(isRef)) needed to be checked before $(if $(nonXml)) because a subdir referencing an already-installed staging table must be treated specially by ignoring its autogenerated header.csv file, and not trying to install that file as if it were itself CSV data
- 05:49 AM Revision 4455: my2pg, my2pg.data: Fixed bug where replacement for '0000-00-00' date needed to be wrapped in single quotes
- 05:45 AM Revision 4454: input.Makefile: sql/install: Log the installation of a full-DB export to a log file in the main dir
- 05:38 AM Revision 4453: input.Makefile: Staging tables installation: %/install: Factored out stderr logging into $(logInstall)
- 05:35 AM Revision 4452: input.Makefile: Support empty subdirs referencing an already-installed staging table everywhere, by replacing $(isCsv) with new $(nonXml) where needed
- 05:22 AM Revision 4451: inputs/SALVIAS/: Switched to using the DB export's staging tables instead of the exported CSVs
- 05:08 AM Revision 4450: input.Makefile: Staging tables installation: Treat empty subdirs as referencing an already-installed staging table, and run cleanup and header export operations on them
- 04:48 AM Revision 4449: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Factored out cleanup and header export operations for reuse in other types of table subdirs
- 04:23 AM Revision 4448: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Removed deprecated (but benign) errors_table_only option to csv2db. Run csv2db without a command in order to clean up the created staging table.
- 03:57 AM Revision 4447: sql_io.py: cleanup_table(): Removed no longer used cols param
- 03:56 AM Revision 4446: csv2db: When no command is specified, just clean up the specified table
- 03:55 AM Revision 4445: sql_io.py: cleanup_table(): Always clean up all columns in the table
- 03:43 AM Revision 4444: sql_io.py: cleanup_table(): Handle NullValueExceptions (due to setting values to NULL in a NOT NULL column) by dropping the NOT NULL constraint
- 03:32 AM Revision 4443: sql.py: Added drop_not_null()
- 03:29 AM Revision 4442: sql_gen.py: is_text_col(): Also consider character varying to be a text type
- 03:07 AM Revision 4441: csv2db: Removed no longer used errors_table_only option
- 03:00 AM Revision 4440: README.TXT: Schema changes: Removed step to reinstall errors tables, because they are now created automatically by column-based import
- 02:59 AM Revision 4439: csv2db: Removed no longer needed creation of errors table, because it is now created automatically by column-based import
- 02:58 AM Revision 4438: input.Makefile: Staging tables installation: $(dbExports): Fixed bug where it would be non-empty even when the input contains no DB exports, because += adds extra whitespace. This caused sql/install to be incorrectly included as part of $(allInstalls).
- 02:49 AM Revision 4437: db_xml.py: put_table(): Create errors table if it doesn't exist
- 02:48 AM Revision 4436: sql_io.py: Added mk_errors_table()
- 02:05 AM Revision 4435: inputs/Makefile: Input data: $(rsyncSrcs): Also exclude logs subdirs located at more than one level below the root, which occurs for example when a table subdir is moved into _archive/
- 01:56 AM Revision 4434: input.Makefile: Staging tables installation: sql/install: Fixed bug where _always was part of $+, causing cat to try to cat this nonexistent file
- 01:51 AM Revision 4433: Added inputs/SALVIAS/salvias_plots.schema.sql
- 01:50 AM Revision 4432: Added inputs/SALVIAS/_MySQL/
- 01:47 AM Revision 4431: input.Makefile: Staging tables installation: MySQL exports: Run all non-data-only exports through my2pg, not just schema-only exports. This supports transforming a combined schema+data export.
- 01:42 AM Revision 4430: my2pg: Also perform data-only replacements, since default values can contain data-specific replacements. This also allows my2pg to transform a combined schema+data export.
- 01:39 AM Revision 4429: input.Makefile: Staging tables installation: Also translate MySQL data to PostgreSQL
- 01:38 AM Revision 4428: Added my2pg.data
- 01:28 AM Revision 4427: input.Makefile: Staging tables installation: Place MySQL exports in separate _MySQL/ subdir so they don't clutter up the main dir, which will contain PostgreSQL translations
- 01:03 AM Revision 4426: Added my2pg
- 01:02 AM Revision 4425: input.Makefile: Staging tables installation: DB exports: Concatenate all exports together, with schemas first, so that any config options which were applied only in the schema export will remain active when the data is imported. Changed `%.pg.sql: %.my.sql` to `%.schema.sql: %.schema.my.sql` so there doesn't need to be a .pg suffix for PostgreSQL schemas and only the schema gets translated.
- 12:15 AM Revision 4424: input.Makefile: Staging tables installation: $(dbExports): Don't consider MySQL DB exports as part of the DB exports that get installed, because they are not directly installable
- 12:13 AM Revision 4423: input.Makefile: Staging tables installation: Added `%.pg.sql: %.my.sql` to translate MySQL DB schemas to PostgreSQL
09/04/2012
- 09:20 PM Revision 4422: inputs/SALVIAS/_src/: Added salvias_plots.sql.url to provide a link to where salvias_plots.sql was exported from (it was not a raw file given to us by the data provider)
- 08:57 PM Revision 4421: Added cc_tty
- 08:57 PM Revision 4420: inputs/input.Makefile: `%: %.make`: Don't automatically redirect stderr to a log file, because some .make scripts need to display password prompts, etc. on the TTY and output them to stderr instead of /dev/tty
- 08:49 PM Revision 4419: inputs/REMIB/nodes.make: Fixed bin dir path for new subdir layout
- 08:48 PM Revision 4418: inputs/SpeciesLink/tapir.make: Write log messages to a log file ($0.log) instead of to stderr, because the verbose log messages should not fill up stderr. To view the progress, you should instead tail the created log file.
- 08:41 PM Revision 4417: inputs/REMIB/nodes.make: Updated path to node exports to use new subdir layout (in Specimen subdir, and without .specimens suffix)
- 08:38 PM Revision 4416: inputs/REMIB/nodes.make: Fixed lib dir path in sys.path.append() for new subdir layout
- 08:37 PM Revision 4415: inputs/REMIB/nodes.make: Write log messages to a log file ($0.log) instead of to sys.stderr, because the verbose log messages should not fill up stderr. To view the progress, you should instead tail the created log file.
- 08:23 PM Revision 4414: input.Makefile: Add the bin folder to the PATH so .make scripts can easily use programs in it
- 08:06 PM Revision 4413: input.Makefile: Staging tables installation: Support installing a DB export directly into the staging schema, without needing to first export it as CSVs
- 07:52 PM Revision 4412: inputs/SALVIAS/: Added _src/ subdir to store original DB export (before re-export in a PostgreSQL-compatible form)
- 07:31 PM Revision 4411: input.Makefile: `%: %.make`: Only remake if doesn't exist. This prevents unintentional remaking when the make script is newly checked out from svn (which sets the mod time to now) but the output is synced externally.
- 07:23 PM Revision 4410: input.Makefile: `%: %.make`: Removed no longer applicable comment, which applied when there were two separate `%: %.make`-related rules
- 06:55 PM Revision 4409: input.Makefile: Use $(inDatasrc) wherever its value was used
- 06:54 PM Revision 4408: input.Makefile: Added $(inDatasrc)
- 06:40 PM Revision 4407: sql_io.py: cleanup_table(): Only clean up text columns, to support staging tables with other column types
- 06:40 PM Revision 4406: sql_gen.py: Added is_text_col()
- 06:29 PM Revision 4405: sql_io.py: cleanup_table(): Add table to each column so its type can later be determined from the DB
- 06:13 PM Revision 4404: inputs/NY/verify/specimens.ref: Regenerated from specimens.ref.sql. The counts have changed slightly because this is derived directly from the NY CSV file, rather than from the nybg_raw BIEN2 staging table.
- 06:11 PM Revision 4403: inputs/NY/verify/specimens.ref.sql: Retrofitted to use PostgreSQL instead of MySQL syntax, since this now runs on the PostgreSQL staging tables
- 06:09 PM Revision 4402: input.Makefile: Verification of import: Added `%.ref: %.ref.sql` rule to make datasource's summary statistics from its staging tables. (This was previously run on a MySQL installation of the datasource, and thus limited to MySQL inputs, but we are now able to use the staging tables for this.)
- 06:04 PM Revision 4401: input.Makefile: Verification of import: $(verify): Factored psql command with output format settings into separate $(psqlExport) var
- 05:57 PM Revision 4400: schemas/vegbien.sql: analytical_db_view: Switched join order of location and party (datasource) tables, to facilitate using a nested loop join to fill in the datasource names
- 05:55 PM Revision 4399: schemas/vegbien.sql: party: Added party_datasource index on just the organizationname to facilitate querying just the datasources
- 04:25 PM Revision 4398: schemas/vegbien.sql: make_analytical_db(): Removed explicit schema reference so that the function can be redirected to use the current (rotated) schema using the search_path
08/31/2012
- 08:32 PM Revision 4397: schemas/Makefile: Removed no longer needed analytical_db, which has been replaced by bin/make_analytical_db
- 08:31 PM Revision 4396: README.TXT: After a new import: Use bin/make_analytical_db instead of `make schemas/analytical_db`, and run it asynchronously because it takes a long time
- 08:29 PM Revision 4395: Added make_analytical_db
- 08:22 PM Revision 4394: schemas/Makefile: Analytical DB: analytical_db: Time the creation of the analytical DB
- 08:18 PM Revision 4393: README.TXT: After a new import: Added command to make the analytical DB
- 08:15 PM Revision 4392: schemas/Makefile: Added analytical_db target
- 08:09 PM Revision 4391: schemas/vegbien.sql: Added make_analytical_db() and helper view analytical_db_view. Note that adding a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing the svn diff to change completely even though the DB structure has only been added to.
- 08:05 PM Revision 4390: schemas/vegbien.sql: Removed OIDs from tables because we don't use them (tables have primary keys instead)
- 06:55 PM Task #486 (New): add unit-conversion mechanism
- * This is primarily needed for DBH, plot area, and elevation/depth
* Make quantities with units be a tuple type cont... - 02:34 PM Task #485 (New): track data provider's citation requirements in VegBIEN
- * Some providers require them to be cited on any analysis that's conducted with their data:
** "Forest Plots Databas... - 02:23 PM Revision 4389: inputs/import.stats.xls: Updated with stats from latest import. This now includes CTFS.TaxonOccurrence (presence-only observations), FIA (11 million rows!), and Madidi.Organism. The addition of FIA almost doubles the # of rows to 26 million and increases the import time from 9.5 to 11.5 hours.
- 02:08 PM Task #483 (Rejected): rename staging table columns according to map.csv
- Reinstalling staging tables whenever a mapping changes is not a good idea.
Renaming will instead continue to occur d...
08/30/2012
- 04:54 PM Revision 4388: sql_io.py: null_strs: Added 'UNKNOWN'
- 04:02 PM Revision 4387: Added inputs/FIA/
- 12:45 PM Revision 4386: inputs/: Renamed subfolders to VegCSV names, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-to-VegCSV-names>
- 12:37 PM Revision 4385: inputs/Madidi/1.organisms/map.csv: Mapped columns
- 11:46 AM Revision 4384: inputs/Madidi/0.plots/map.csv: Remapped DMS Latitude/Longitude to verbatimLatitude/verbatimLongitude, since this is not the decimalLatitude/decimalLongitude
- 11:40 AM Revision 4383: input.Makefile: Testing: %-ok: Rename the test output to the accepted test output instead of copying it, because outputs of successful (including newly accepted) tests should be removed to reduce clutter (as $(runTest) does)
- 11:35 AM Revision 4382: mappings/Veg+-VegCore.csv: Remapped CTFS QuadratID to subplot rather than subplotID, because it's only unique within the parent plot, not globally unique, in CTFS
- 11:23 AM Revision 4381: inputs/import.stats.xls: Updated with stats from latest import. This now includes the core CTFS tables.
- 11:10 AM Revision 4380: Added inputs/VegBank/ with DB export
- 11:04 AM Revision 4379: input.Makefile: General targets: `%: %.make`: Don't always remake the target whenever it's visited, as other targets may depend on this file and it should not be remade whenever they are visited
- 11:00 AM Revision 4378: input.Makefile: General targets: `%: %.make`: Changed log file suffix to .log, because this log does not necessarily contain SQL statements
- 10:57 AM Revision 4377: input.Makefile: General targets: `%: %.make`: Time the creating command
- 10:55 AM Revision 4376: input.Makefile: General targets: Removed duplicate `%: %.make` rule
- 10:43 AM Revision 4375: inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused
- 10:42 AM Revision 4374: inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused
- 10:32 AM Revision 4373: mappings/Veg+-VegCore.csv: Mapped speciesInvID
- 10:27 AM Revision 4372: mappings/Veg+.terms.csv: Added speciesInvID
- 10:25 AM Revision 4371: mappings/VegCore-VegBIEN.csv: Mapped taxonOccurrenceID
- 10:22 AM Revision 4370: mappings/Veg+.terms.csv: Added taxonOccurrenceID
- 10:14 AM Revision 4369: inputs/CTFS/: Added TaxonOccurrence/ and its joined tables
- 10:13 AM Revision 4368: inputs/CTFS/: Added TaxonOccurrence/ and its joined tables
- 10:06 AM Revision 4367: inputs/CTFS/_archive/Organism.VegX/README.TXT: Added calculation of StemObservation rows distribution for each plot, which indicates that the bci plot actually contains 90% of the StemObservation rows. This brings the size inflation of VegX down to ~6x.
- 09:42 AM Revision 4366: inputs/CTFS/_archive/Organism.VegX/: Added README.TXT describing that this VegX export includes only *one* of 157 CTFS plots. This is important, because it indicates that VegX creates a ~1000x (!) increase in storage size (613.6 MB for bci.sql with 157 plots vs. 3.78 GB for VegX_CTFS_row_*.xml with 1 plot, assuming roughly equal #s of stems per plot).
- 09:08 AM Revision 4365: inputs/CTFS/StemObservation/map.csv: Remapped StemID to authorStemCode since it's only unique within the parent organism (Tree), not a globally unique ID as is required for stemID
- 09:05 AM Revision 4364: mappings/VegCore-VegBIEN.csv: Mapped authorStemCode
- 08:58 AM Revision 4363: mappings/Veg+.terms.csv: Added authorStemCode
- 08:58 AM Revision 4362: mappings/VegCore-VegBIEN.csv: Mapped stemID
- 08:52 AM Revision 4361: inputs/SALVIAS/2.stems/map.csv: Mapped stem_id
- 08:49 AM Task #484 (Resolved): support installing staging tables directly from a MySQL export
- * requires (re-)exporting MySQL DB with "@--compatible=postgresql@":http://dev.mysql.com/doc/refman/5.6/en/mysqldump....
- 08:46 AM Revision 4360: README.TXT: Datasource setup: Added steps to install any MySQL export
- 08:13 AM Revision 4359: mappings/VegCore-VegBIEN.csv: Mapped stemID
- 08:10 AM Revision 4358: mappings/Veg+-VegCore.csv: Mapped stem_id
- 08:05 AM Revision 4357: repl: Support treating all patterns as plain text (non-regexp)
- 07:52 AM Revision 4356: mappings/Veg+.terms.csv: Added stem_id
- 07:51 AM Revision 4355: mappings/Veg+.terms.csv: Added stemID
- 07:44 AM Revision 4354: mappings/Veg+-VegCore.csv: Mapped speciesName, subSpeciesName
- 07:43 AM Revision 4353: mappings/Veg+.terms.csv: Added CTFS taxonomic name columns
- 07:28 AM Revision 4352: mappings/Veg+.terms.csv: Removed comments not applicable to the term itself
- 07:25 AM Revision 4351: Inputs with multiple tables: Added explicit import_order.txt files, so that sort orders can later be removed from the subdir names
08/29/2012
- 11:17 PM Revision 4350: inputs/CTFS/: Added StemObservation/ and tables it is joined from
- 11:09 PM Revision 4349: mappings/Veg+-VegCore.csv: Mapped stemTag
- 11:08 PM Revision 4348: mappings/Veg+.terms.csv: Added stemTag
- 11:04 PM Revision 4347: mappings/Veg+-VegCore.csv: Mapped DBH
- 11:02 PM Revision 4346: mappings/Veg+.terms.csv: Added DBH
- 10:58 PM Revision 4345: input.Makefile: Maps building: Added comment that you cannot make a subdir separately from the entire datasource dir
- 10:17 PM Revision 4344: inputs/CTFS/Plot/create.sql: Added newline at end of file
- 10:04 PM Revision 4343: inputs/CTFS/: Renamed Site.src to Plot.src to use a VegCSV name for the table
- 10:01 PM Revision 4342: README.TXT: Datasource setup: Adding input data for each table: `make inputs/<datasrc>/<table>/add`: Added note explaining why you need to use this command instead of just creating an empty directory of the desired name
- 08:44 PM Revision 4341: inputs/CTFS/: Added SubplotObservation/
- 08:38 PM Revision 4340: mappings/VegCore-VegBIEN.csv: Redirect eventID, fieldNumber (authoreventcode) to parent locationevent when subplot columns exist
- 08:23 PM Revision 4339: inputs/CTFS/import_order.txt: Added PlotObservation
- 08:23 PM Revision 4338: inputs/CTFS/PlotObservation/: Remade (hadn't been automatically remade because it wasn't part of import_order.txt)
- 08:13 PM Revision 4337: mappings/VegCore-VegBIEN.csv: Also redirect locationID/plotName to parent location if subplotID column was provided
- 08:08 PM Revision 4336: mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Use _first to remove specimens-related alternatives for this field from consideration when plots-related alternatives exist. This avoids unintentionally using specimens-related columns for this field in plots data.
- 08:06 PM Revision 4335: xml_func.py: Added _first() simplifying function
- 08:05 PM Revision 4334: xml_func.py: Added helper functions variadic_args() and map_names()
- 07:38 PM Revision 4333: mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Placed inside "if subplot" _if statement along with sourceaccessioncode to reduce the number of separate _if statements needing a condition mapping
- 07:32 PM Revision 4332: xml_dom.py: NodeEntryIter: Support entries with multiple children
- 07:20 PM Revision 4331: xml_dom.py: replace(): Support a list of new nodes to replace the old node with
- 07:01 PM Revision 4330: xml_dom.py: Moved only_child() near related method has_one_child()
- 07:00 PM Revision 4329: xml_dom.py: only_child(): Raise exception instead of failing assertion. Include invalid node in exception message for easier debugging.
- 06:57 PM Revision 4328: xml_dom.py: Added only_child() and use it where its definition was used
- 06:33 PM Revision 4327: mappings/VegCore-VegBIEN.csv: Changed _merge to _join wherever the duplicate-eliminating functionality of _merge is not needed and a simple concatenation of non-NULL values is sufficient
- 06:24 PM Revision 4326: xml_func.py: Added _join() simplifying function
- 06:22 PM Revision 4325: schemas/functions.sql: Added _join()
- 06:18 PM Revision 4324: mappings/VegCore-VegBIEN.csv: Moved "if subplot" _if statement around /location/parent_id and /location/sourceaccessioncode themselves, so that only one _if cond mapping for subplot is needed. Note that this is only possible because this _if statement uses _exists, allowing it to be fully evaluated by the XML template simplifying mechanism, which supports subtrees as arguments to _if.
- 06:06 PM Revision 4323: mappings/VegCore-VegBIEN.csv: Removed no longer used parentLocationID, parentPlotName (locationID and plotName now automatically map to the correct location). mappings/Veg+-VegCore.csv: Removed no longer used parentPlotID.
- 05:57 PM Revision 4322: xml_func.py: passthru(): Use xml_dom.prune() so that after empty children are removed, the node itself is also removed if it's empty. This enables further pruning of any node that contains the pruned node.
- 05:55 PM Revision 4321: xml_dom.py: Added prune()
- 05:52 PM Revision 4320: xml_func.py: Removed no longer used prune() (use xml_dom.prune_children() instead)
- 05:51 PM Revision 4319: xml_func.py: Use new xml_dom.prune_children()
- 05:51 PM Revision 4318: xml_dom.py: Added prune_empty() and prune_children()
- 05:29 PM Revision 4317: inputs/CTFS/: Moved VegX export subdir to _archive and renamed it to remove ".disabled" suffix and have a VegCSV-like name
- 05:24 PM Revision 4316: inputs/CTFS/: Renamed README.TXT to DFtemp.analysis_query.txt because it relates only to a particular query from Shash, and moved it to the _archive/ subdir
- 05:21 PM Revision 4315: inputs/CTFS/: Moved source files into new _src/ subdir to avoid cluttering up the main dir
- 05:16 PM Revision 4314: Added inputs/CTFS/_src/
- 05:02 PM Revision 4313: inputs/CTFS/: Added non-data files that weren't under version control
- 04:59 PM Revision 4312: inputs/CTFS/: Moved _scripts_to_drop_extra_tables to _archive because they are for a different version of the CTFS database than the extract we received (bci.sql)
- 04:57 PM Revision 4311: inputs/CTFS/: Moved DBv5.txt to _archive because it's for a different version of the CTFS database than the extract we received (bci.sql)
- 04:49 PM Revision 4310: inputs/CTFS/: Moved CTFS_conversion_bci.php to _archive since it's just for the DFtemp (aggregated) mapping
- 04:48 PM Revision 4309: Added inputs/CTFS/_archive
- 04:39 PM Revision 4308: inputs/import.stats.xls: Updated with stats from latest import
- 04:16 PM Task #483 (Resolved): rename staging table columns according to map.csv
- * This will allow us to have just one VegCore-VegBIEN mapping, with each staging table already using VegCore column n...
Also available in: Atom