Activity - BIEN 3 - NCEAS Projects

Activity

From 09/11/2012 to 10/10/2012

10/10/2012

11:43 AM Revision 5427: input.Makefile: $(exts): Added .dmp: Aaron Marcuse-Kubitza
11:43 AM Revision 5426: csvs.py: delims: Added |: Aaron Marcuse-Kubitza
11:28 AM Revision 5425: Removed no longer used inputs/.public/. Use inputs/.TNRS/ and inputs/.TNRS/tnrs/tnrs.make instead.: Aaron Marcuse-Kubitza
11:23 AM Revision 5424: README.TXT: Documentation: To import and scrub just the test taxonomic names: Added steps to restore the original DB when the test scrub is complete: Aaron Marcuse-Kubitza
11:22 AM Revision 5423: inputs/test_taxonomic_names/test_scrub: Also export the results to inputs/test_taxonomic_names/_scrub/: Aaron Marcuse-Kubitza
11:06 AM Revision 5422: inputs/test_taxonomic_names/test_scrub: Use regular for .. in loop with a list of what's being processed in each iteration (match_input_names, parse_accepted_names): Aaron Marcuse-Kubitza
10:58 AM Revision 5421: inputs/.TNRS/tnrs/map.csv: Mapped Genus_score, Specific_epithet_score: Aaron Marcuse-Kubitza
10:56 AM Revision 5420: mappings/VegCore-VegBIEN.csv: Mapped matchedGenusFit_fraction, matchedSpeciesFit_fraction. Reordered canon_concept_fit_fraction _maxs in the order they would be used if _alt were being used instead.: Aaron Marcuse-Kubitza
10:52 AM Revision 5419: mappings/VegCore.csv: Added matchedSpeciesFit_fraction: Aaron Marcuse-Kubitza
10:47 AM Revision 5418: mappings/VegCore.csv: matchedFamilyFit_fraction: Source the "matched" to Family_matched, which is a closer fit than Name_matched. matchedGenusFit_fraction: Fixed Genus_matched source to use #detailed_download instead of #simple_download.: Aaron Marcuse-Kubitza
10:42 AM Revision 5417: mappings/VegCore.csv: Added matchedGenusFit_fraction: Aaron Marcuse-Kubitza
10:18 AM Revision 5416: README.TXT: Removed extra trailing whitespace: Aaron Marcuse-Kubitza
10:18 AM Revision 5415: README.TXT: Documentation: To import and scrub just the test taxonomic names: Use new inputs/test_taxonomic_names/test_scrub: Aaron Marcuse-Kubitza
10:17 AM Revision 5414: Added inputs/test_taxonomic_names/test_scrub: Aaron Marcuse-Kubitza
10:01 AM Revision 5413: schemas/vegbien.sql: taxonconcept: Renamed canon_taxonconcept_id to canon_concept_id to shorten the name, which is used often: Aaron Marcuse-Kubitza
09:45 AM Revision 5412: schemas/vegbien.sql: taxonconcept: Added taxonconcept_canon_concept_min_fit() trigger to remove the canon_concept_id link from insufficient matches. These occur when e.g. a name in another language is approximated to a latin name or when the input name is not a proper taxon but TNRS provides a best-guess match anyway.: Aaron Marcuse-Kubitza
09:42 AM Revision 5411: inputs/.TNRS/tnrs/map.csv: Mapped Family_score to new matchedFamilyFit_fraction: Aaron Marcuse-Kubitza
09:39 AM Revision 5410: mappings/VegCore-VegBIEN.csv: Use matchedFamilyFit_fraction as canon_concept_fit_fraction when greater than matchedTaxonFit_fraction, because if there is at least a matched family, there is a valid taxonconcept to attach to: Aaron Marcuse-Kubitza
09:39 AM Revision 5409: xml_func.py: Simplifying functions: Added _min, _max as passthroughs: Aaron Marcuse-Kubitza
09:34 AM Revision 5408: schemas/functions.sql: Added _max(), _min(): Aaron Marcuse-Kubitza
09:21 AM Revision 5407: mappings/VegCore.csv: Added matchedFamilyFit_fraction: Aaron Marcuse-Kubitza
09:04 AM Revision 5406: mappings/VegCore-VegBIEN.csv: Remapped matchedTaxonFit_fraction to the verbatim* taxonconcept, because this is actually for the verbatim* concept's fit to the matched concept, not the matched concept's fit to the accepted concept: Aaron Marcuse-Kubitza
08:59 AM Revision 5405: inputs/.TNRS/tnrs/map.csv: Restored *-prefixed output terms for unmapped terms that had initially been mapped to OMIT but could reasonably match to something in the future. Continue mapping Name_number to OMIT because it isn't globally unique (it identifies the name only within one TNRS batch).: Aaron Marcuse-Kubitza
08:45 AM Revision 5404: inputs/.TNRS/tnrs/map.csv: Mapped Overall_score to new matchedTaxonFit_fraction: Aaron Marcuse-Kubitza
08:44 AM Revision 5403: mappings/VegCore-VegBIEN.csv: Mapped matchedTaxonFit_fraction to _set_canon_taxonconcept(canon_concept_fit_fraction): Aaron Marcuse-Kubitza
08:37 AM Revision 5402: mappings/VegCore.csv: Added matchedTaxonFit_fraction: Aaron Marcuse-Kubitza
08:20 AM Revision 5401: schemas/vegbien.sql: _set_canon_taxonconcept(): Also set the canon_concept_fit_fraction: Aaron Marcuse-Kubitza
08:10 AM Revision 5400: schemas/vegbien.sql: taxonconcept: Added canon_concept_fit_fraction to store the closeness of fit of the canon_concept: Aaron Marcuse-Kubitza
07:55 AM Revision 5399: schemas/vegbien.sql: taxonconcept: Renamed canon_taxonconcept_id to canon_concept_id to shorten the name, which is used often: Aaron Marcuse-Kubitza
07:10 AM Revision 5398: sql.py: mk_update(): in_place: Convert columns of type character varying to text so that they can be merge-joined with text columns. Note that these two types are equivalent but not aliases of one another, so the explicit type change is needed.: Aaron Marcuse-Kubitza
07:07 AM Revision 5397: sql_gen.py: Added canon_type(): Aaron Marcuse-Kubitza
06:52 AM Revision 5396: sql.py: mk_update(): in_place: Factored retrieval of column type out into separate statement for clarity: Aaron Marcuse-Kubitza
06:27 AM Revision 5395: schemas/functions.sql: _join*(): Fixed bug where was returning '' instead of NULL when only NULL inputs were provided, because array_to_string() always returns a non-NULL string. Functions must always return NULL in place of '' to ensure that empty strings do not find their way into VegBIEN, and to prevent inconsistencies between row-based and column-based import (row-based import folds empty strings to NULL while column-based import relies on having a clean input table).: Aaron Marcuse-Kubitza
06:10 AM Revision 5394: sql_io.py: cleanup_table(): Use sql.table_pkey_col() instead of sql.pkey_col() so that only an actual pkey column is removed from the list of columns to clean. This fixes a bug where the first column in the table was not cleaned up if there was no pkey. Note that this bug only affected newly re-created staging tables, because staging tables previously had a special row_num pkey column added if they did not already have a pkey. The row_num column is now added by column-based import instead.: Aaron Marcuse-Kubitza
05:51 AM Revision 5393: sql.py: table_pkey_col(): Raise a DoesNotExistException if the table has no pkey: Aaron Marcuse-Kubitza
05:23 AM Revision 5392: sql.py: pkey_col(): Call table_pkey_col() directly rather than via pkey_name(). pkey_name(): Call pkey_col() instead of table_pkey_col() now that pkey_col() calls table_pkey_col().: Aaron Marcuse-Kubitza
05:14 AM Revision 5391: sql.py: pkey_col(): Documented that if there is no pkey, returns the first column in the table: Aaron Marcuse-Kubitza
05:13 AM Revision 5390: sql.py: pkey_col(): Specify recover directly as a kw_arg because it's the only kw_arg passed to pkey_name(): Aaron Marcuse-Kubitza
05:10 AM Revision 5389: sql.py: Added table_pkey_col() and use it in pkey_name(): Aaron Marcuse-Kubitza
05:01 AM Revision 5388: sql.py: Renamed pkey() to pkey_name(): Aaron Marcuse-Kubitza
04:45 AM Revision 5387: sql.py: Renamed pkey_col_() to pkey_col(): Aaron Marcuse-Kubitza
04:43 AM Revision 5386: sql.py: Removed no longer used pkey_col: Aaron Marcuse-Kubitza
04:43 AM Revision 5385: db_xml.py: cleanup_table(): Inline sql.pkey_col ('row_num') because this is the only place it's used: Aaron Marcuse-Kubitza
04:37 AM Revision 5384: cleanup_table(): Use new sql.table_cols() instead of sql.table_col_names(): Aaron Marcuse-Kubitza
04:36 AM Revision 5383: sql.py: Added table_cols(): Aaron Marcuse-Kubitza
04:16 AM Revision 5382: db_xml.py: put(): Fixed bug where needed to avoid truncating the pkeys_loc table, in case it's the same as one of the in_tables. This occurs now that sql_io.put_table() passes through the actual input column instead of the joined-together input table's column when ignoring all rows.: Aaron Marcuse-Kubitza
03:33 AM Revision 5381: sql_io.py: put_table(): Resolving default value column: If ignoring all rows, use input cols directly instead of cols from joined-together input table. In addition to being simpler, this prevents the returned column's name from growing longer and longer as each iteration prepends its input table table name to the default value column name.: Aaron Marcuse-Kubitza
03:07 AM Revision 5380: sql_io.py: put_table(): Moved changing the table of the default value column from Resolving the default value column to Setting pkeys of missing rows, because the table change is only needed in this section: Aaron Marcuse-Kubitza
03:04 AM Revision 5379: sql_io.py: put_table(): Resolving default value column: Always call sql_gen.remove_col_rename() because it will just pass the value through if it's not a column: Aaron Marcuse-Kubitza
02:41 AM Revision 5378: sql_gen.py: simplify_parens(): Removed extra simplify_parens() at end because it is done in the final iteration that performs no other replacements, so it is not necessary to also do it explicitly: Aaron Marcuse-Kubitza
02:30 AM Revision 5377: sql_io.py: put_table(): Replaced limit_ref integer with ignore_all_ref boolean, because it is no longer used as a select statement limit: Aaron Marcuse-Kubitza
02:29 AM Revision 5376: sql_io.py: put_table(): remove_all_rows(): Corrected "just create an empty pkeys table" comment to "just return the default value column": Aaron Marcuse-Kubitza
02:27 AM Revision 5375: sql_io.py: put_table(): mk_main_select(): Removed setting limit to limit_ref[0], because an empty pkeys table is no longer created when ignoring all rows: Aaron Marcuse-Kubitza
02:19 AM Revision 5374: sql_io.py: put_table(): Setting pkeys of missing rows: Removed "limit_ref[0] == 0" check because this code is never reached in that case: Aaron Marcuse-Kubitza
02:16 AM Revision 5373: sql_io.py: put_table(): Ignoring all rows for unrecoverable errors: Even in multi-row mode, just return whatever the default value or column was, instead of creating an output table containing the default value filled in for every row. This also assists the optimization to skip empty levels of taxonconcepts, because it folds the empty level to that level's parent level rather than creating a whole new temp table with ultimately the same contents.: Aaron Marcuse-Kubitza
01:57 AM Revision 5372: sql_gen.py: not_false_re, not_true_re: Appended \b to ensure that true/false is only matched as a single word: Aaron Marcuse-Kubitza
01:56 AM Revision 5371: sql_gen.py: simplify_expr(): Also simplify "NOT false" to true: Aaron Marcuse-Kubitza
01:53 AM Revision 5370: sql_gen.py: simplify_expr(): Also simplify "NOT true" to false: Aaron Marcuse-Kubitza
01:24 AM Revision 5369: sql_io.py: put_table(): ignore_cond(): Changed "Ignoring rows where" message with the negated (filter-out) condition to "Ignoring rows that don't satisfy" with the filter condition for clarity: Aaron Marcuse-Kubitza
01:22 AM Revision 5368: sql_io.py: put_table(): ignore_cond(): If cond simplifies to false, remove all rows instead of filtering out individual rows which will all be filtered out. This optimization should improve import times of tables, such as taxonconcept, which use a check constraint instead of NOT NULL constraints to prevent empty rows. The taxonomic schema refactoring caused the creation of many more levels of taxonconcepts, many of which (such as variety, forma, cultivar) are empty for most datasources, so this optimization should also reduce overall import times for datasources that have any empty levels of taxonconcept. Note that this optimization is only possible now that sql_gen.simplify_expr() is able to simplify all the way to a single boolean value for the taxonconcept_required_key constraint.: Aaron Marcuse-Kubitza
12:55 AM Revision 5367: Moved expression transforming functions from sql.py to sql_gen.py because they do not manipulate an actual database and merely generate SQL: Aaron Marcuse-Kubitza
12:38 AM Revision 5366: sql.py: Added true_expr, false_expr and use them where their values are used: Aaron Marcuse-Kubitza
12:34 AM Revision 5365: sql.py: simplify_expr(): Also simplify "AND true" expressions: Aaron Marcuse-Kubitza
12:30 AM Revision 5364: sql.py: simplify_expr(): Also simplify "AND false" expressions: Aaron Marcuse-Kubitza
12:19 AM Revision 5363: sql.py: Added atom_re and use it in simplify_parens(): Aaron Marcuse-Kubitza
12:19 AM Revision 5362: sql.py: Added or_re and use it in simplify_expr(): Aaron Marcuse-Kubitza
12:18 AM Revision 5361: sql.py: logic_op_re(): Added expr_re param for an expr on the other side of the operator: Aaron Marcuse-Kubitza

10/09/2012

11:54 PM Revision 5360: sql.py: simplify_parens(): Use bool_re: Aaron Marcuse-Kubitza
11:54 PM Revision 5359: sql.py: Removed no longer needed paren_re(): Aaron Marcuse-Kubitza
11:53 PM Revision 5358: sql.py: true_re, false_re: Removed no longer needed paren_re() because simplify_parens() now handles this: Aaron Marcuse-Kubitza
11:50 PM Revision 5357: sql.py: simplify_expr(): Removed final simplify_parens() because this is now done by simplify_recursive(): Aaron Marcuse-Kubitza
11:49 PM Revision 5356: sql.py: simplify_expr(): Use new simplify_recursive(). This also fixes a bug where some logic expressions are not simplified because of extra parens.: Aaron Marcuse-Kubitza
11:48 PM Revision 5355: sql.py: Added simplify_recursive(): Aaron Marcuse-Kubitza
11:31 PM Revision 5354: sql.py: simplify_parens(): Also remove parens around true and false: Aaron Marcuse-Kubitza
11:26 PM Revision 5353: regexp.py: sub_nested(): Use new sub_recursive(): Aaron Marcuse-Kubitza
11:25 PM Revision 5352: regexp.py: Added sub_recursive(): Aaron Marcuse-Kubitza
11:21 PM Revision 5351: sql.py: simplify_expr(): Use new simplify_parens(): Aaron Marcuse-Kubitza
11:20 PM Revision 5350: sql.py: Added simplify_parens(): Aaron Marcuse-Kubitza
11:14 PM Revision 5349: sql.py: simplify_expr(): Use new regexp.sub_nested(): Aaron Marcuse-Kubitza
11:14 PM Revision 5348: Added regexp.py: Aaron Marcuse-Kubitza
10:46 PM Revision 5347: sql.py: simplify_expr(): Use new logic_op_re(): Aaron Marcuse-Kubitza
10:46 PM Revision 5346: sql.py: Added logic_op_re(): Aaron Marcuse-Kubitza
10:40 PM Revision 5345: sql.py: bool_re: Use new true_re, false_re: Aaron Marcuse-Kubitza
10:40 PM Revision 5344: sql.py: Added true_re, false_re: Aaron Marcuse-Kubitza
10:37 PM Revision 5343: sql.py: bool_re: Use new paren_re(): Aaron Marcuse-Kubitza
10:36 PM Revision 5342: sql.py: bool_re: Use new paren_re(): Aaron Marcuse-Kubitza
10:36 PM Revision 5341: sql.py: Added paren_re(): Aaron Marcuse-Kubitza
10:31 PM Revision 5340: sql.py: simplify_expr(): Combined replacements of bool_re+' OR ' with the value in either order into one replacement: Aaron Marcuse-Kubitza
10:27 PM Revision 5339: mappings/VegCore-VegBIEN.csv: verbatim* taxonconcept: Don't store Name_submitted in taxonomicnamewithauthor in addition to identifyingtaxonomicname, because the fields other than identifyingtaxonomicname are meant to store parsed values rather than raw, unscrubbed values and TNRS does not directly provide a concatenated taxonomic name with author: Aaron Marcuse-Kubitza
10:23 PM Revision 5338: mappings/VegCore-VegBIEN.csv: verbatim* taxonconcept: Don't create hierarchy of parent taxonconcepts, because the parsed names (rather than the names for the matched taxonconcept) are from the input taxonomic name, rather than from the official tree of life used by TNRS. Otherwise, if a taxonomic name provides e.g. no family (common), a separate genus taxonconcept would have been created with no parent_id, which would not compare equal to the matched taxonconcept's genus *with* a parent_id. Continue to store the parsed family, genus, species in the family, genus, species cached fields, because the parsed family is often different from the matched taxonconcept's family when e.g. no family is provided in the taxonomic name.: Aaron Marcuse-Kubitza
10:16 PM Revision 5337: sql.py: Renamed table_cols() to table_col_names() for clarity, because it does not return sql_gen.Col objects: Aaron Marcuse-Kubitza
10:12 PM Revision 5336: inputs/.TNRS/tnrs/test.xml.ref: Accepted new inserted row count. The change is most likely from several revisions back, but the cause of the change is unknown (it is not due to the updated TNRS.tnrs table, which is still sorted with the same rows first).: Aaron Marcuse-Kubitza
09:09 PM Revision 5335: sql_gen.py: is_text_col(): Use new is_text_type(): Aaron Marcuse-Kubitza
09:09 PM Revision 5334: sql_gen.py: Added is_text_type(): Aaron Marcuse-Kubitza
09:05 PM Revision 5333: sql_gen.py: ensure_not_null(): Documented that NULL has no type, hence the NoUnderlyingTableException being re-raised: Aaron Marcuse-Kubitza
09:04 PM Revision 5332: sql_gen.py: ensure_not_null(): Just store the column type in col_type, instead of storing typed_col and using typed_col.type, now that other info in typed_col is no longer needed: Aaron Marcuse-Kubitza
09:02 PM Revision 5331: sql_gen.py: ensure_not_null(): Use is_nullable() instead of determining nullability itself, for clarity: Aaron Marcuse-Kubitza
08:59 PM Revision 5330: sql_gen.py: is_nullable(): Fixed bug where non-columns could not be sent to db.col_info(): Aaron Marcuse-Kubitza
08:53 PM Revision 5329: sql_gen.py: ensure_not_null(): Always remove_col_rename() the column to ensure that it is acceptable by helper functions like is_nullable(): Aaron Marcuse-Kubitza
08:11 PM Revision 5328: lib/PostgreSQL-MySQL.csv: COMMENT statement: Fixed bug where ending ; could match only when preceded by ' and followed by a newline, to avoid matching ; embedded in the comment: Aaron Marcuse-Kubitza
08:07 PM Revision 5327: schemas/vegbien.sql: taxonconcept: family, genus, species comments: Changed "scoping" to "identifying" for clarity: Aaron Marcuse-Kubitza
08:06 PM Revision 5326: schemas/vegbien.sql: taxonconcept: family, genus, species: Added comment that each is a cached field for easy querying and the scoping version of it is stored in the chain of parent_id ancestors: Aaron Marcuse-Kubitza
08:03 PM Revision 5325: schemas/vegbien.sql: taxonconcept: taxonconcept_unique: Removed family, genus, species because these are now just cached fields for analytical_db_view rather than scoping fields. The scoping versions of these fields are stored in the chain of parent_id ancestors.: Aaron Marcuse-Kubitza
07:42 PM Revision 5324: tnrs_db: Moved "Processing # taxonconcepts" log message to before waiting or exiting if no taxonconcepts left, so that it would be printed right after the query is run and say that no taxonconcepts were found: Aaron Marcuse-Kubitza
07:39 PM Revision 5323: tnrs_db: Updated comments and log messages for schema changes: Aaron Marcuse-Kubitza
07:33 PM Revision 5322: tnrs_db: Updated query for schema changes: Aaron Marcuse-Kubitza
07:33 PM Revision 5321: README.TXT: Schema changes: files to update with renamings: Added bin/tnrs_db: Aaron Marcuse-Kubitza
07:25 PM Revision 5320: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
07:04 PM Revision 5319: README.TXT: Data import: Changed `inputs/*/*/logs` to `inputs/{.,}*/*/logs` to also include the TNRS names import log: Aaron Marcuse-Kubitza

10/08/2012

09:58 PM Revision 5318: import_all: Added commands to import TNRS names so the user doesn't have to do this manually: Aaron Marcuse-Kubitza
09:55 PM Revision 5317: sql.py: map_expr(): Fixed bug where names were being matched inside punctuated names replaced in previous calls of map_expr(): Aaron Marcuse-Kubitza
09:45 PM Revision 5316: schemas/vegbien.sql: party: party_required_key: Only allow NULL organizationname if party is not a root party (i.e. creator_id != party_id): Aaron Marcuse-Kubitza
09:39 PM Revision 5315: mappings/VegCore-VegBIEN.csv: Mapped to new taxonconcept.creationdate: Aaron Marcuse-Kubitza
09:37 PM Revision 5314: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key: Added creationdate as an allowable minimum field when parent_id (containing the associated hierarchical concept) is specified: Aaron Marcuse-Kubitza
09:30 PM Revision 5313: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key: Removed family and genus because these are now cached fields only, and are not used for scoping a taxonconcept. Instead, *taxonomicname and taxonname+parent_id are used for this purpose. This removes several leaf taxonconcepts with insufficient scoping information to create a taxonconcept separate from the main tree. With the upcoming population of creationdate, some of these taxonconcepts will reappear due to the date's additional distinguishing information.: Aaron Marcuse-Kubitza
09:16 PM Revision 5312: schemas/vegbien.sql: taxonconcept: Added creationdate (the date the taxonconcept was created or defined), and include it in the taxonconcept_unique unique index: Aaron Marcuse-Kubitza
09:05 PM Revision 5311: schemas/vegbien.sql: taxonconcept: Added comment with the definition of a taxon: "a group of one (or more) populations of organism(s), which a taxonomist adjudges to be a unit" (http://en.wikipedia.org/wiki/Taxon). This is useful in clarifying that our taxon concepts are intended to serve a similar purpose, by storing one person's defined taxon.: Aaron Marcuse-Kubitza
08:58 PM Revision 5310: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key: Removed family and genus because these are now cached fields only, and are not used for scoping a taxonconcept. Instead, *taxonomicname and taxonname+parent_id are used for this purpose.: Aaron Marcuse-Kubitza
08:54 PM Revision 5309: schemas/vegbien.sql: taxonconcept: Moved identifyingtaxonomicname near other full-taxonomic-name-related fields, after the fields that contain just the current level's component of the full name: Aaron Marcuse-Kubitza
08:48 PM Revision 5308: schemas/vegbien.sql: taxonconcept.canon_taxonconcept_id: Changed four-level hierarchy to use "parsed concept" and "matched concept" instead of concatenated and parsed, because the directly-parsed name components actually go in level 2 of the hierarchy (the TNRS input name), while the name components based on the matched taxon concept go in level 3: Aaron Marcuse-Kubitza
08:44 PM Revision 5307: schemas/vegbien.sql: taxonconcept.parent_id: Documented that while a taxon *name* may have multiple parents, a taxon *concept* has only one, based on the creator's opinion of where that taxonconcept goes in the taxonomic hierarchy: Aaron Marcuse-Kubitza
08:38 PM Revision 5306: mappings/VegCore-VegBIEN.csv: taxonconcept: Moved infraspecific taxonconcept to its own level, rather than combining it with the level that contains the full taxonomic name and author (as well as any morphospecies), for consistency with the storage of other ranked taxonomic name components, which each get their own taxonconcept. The infraspecific taxon concept is general to all parties making idenfitications (within a datasource), while the concatenated name and author and any morphospecies are specific to the person who defined the taxonconcept used by a taxondetermination.: Aaron Marcuse-Kubitza
08:05 PM Revision 5305: schemas/vegbien.sql: taxonconcept: Removed no longer used higher- and infraspecific taxonomic rank fields because these terms are now stored in their own taxonconcepts. family, genus, and species have not been removed because these are used to cache names of parent taxa for fast access by analytical_db_view.: Aaron Marcuse-Kubitza
07:57 PM Revision 5304: schemas/vegbien.sql: analytical_db_view: Changed taxonMorphospecies to use taxonconcept.taxonname, where any morphospecies is now stored: Aaron Marcuse-Kubitza
07:53 PM Revision 5303: mappings/VegCore-VegBIEN.csv: infraspecific taxonomic terms: Removed mappings to first-class taxonconcept fields because these terms are now stored in their own taxonconcepts, or in the lowest-level taxonconcept as the taxonname and rank: Aaron Marcuse-Kubitza
07:43 PM Revision 5302: mappings/VegCore-VegBIEN.csv: higher-level taxonomic terms: Removed mappings to first-class taxonconcept fields because these terms are now stored in their own taxonconcepts: Aaron Marcuse-Kubitza
07:41 PM Revision 5301: schemas/vegbien.sql: taxonconcept: Merged taxonconcept_unique_within_creator_by_name unique index into taxonconcept_unique_within_parent, placed parent_id first, and removed index condition, so that this index can be used as a lookup index by taxonconcept_update_ancestors() (which requires no index condition in order to apply to *all* taxonconcepts) in addition to as a unique index. Note that an index condition should not be necessary for the index's uniquifying task, because if a set of taxonconcepts provides only the identifyingtaxonomicname, that should collide in the taxonconcept_unique_within_creator_by_identifying_name unique index before this index collides. This assumes that the collision order when multiple indexes collide is alphabetical by the index name.: Aaron Marcuse-Kubitza
07:16 PM Task #486: add unit-conversion mechanism: All applicable VegBIEN fields have unit suffixes. Most corresponding VegCore terms also have unit suffixes. Aaron Marcuse-Kubitza
07:15 PM Task #499 (Resolved): map example terms into the taxonomic schema: See "README.TXT":https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/entry/README.TXT section "To import a... Aaron Marcuse-Kubitza
06:38 PM Revision 5300: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key check constraint: Also allow a taxonconcept to have just an author when it has a parent_id, so that an author can uniquely identify a taxon within a more general taxon, such as a species name, that has no author: Aaron Marcuse-Kubitza
06:22 PM Revision 5299: strings.py: concat(): Fixed bug where end index of returned str0 portion would wrap around to a negative number if str1 itself was too long, causing incorrect truncation: Aaron Marcuse-Kubitza
05:44 PM Revision 5298: schemas/vegbien.sql: taxonconcept: Renamed taxonconcept_unique_within_parent to taxonconcept_unique because the index does not apply only to taxonconcepts with a parent, and because it's the primary unique index for taxonconcept: Aaron Marcuse-Kubitza
05:42 PM Revision 5297: schemas/vegbien.sql: taxonconcept: Renamed taxonconcept_unique_within_creator_by_identifying_name to taxonconcept_0_unique_identifying_name to ensure that it is always applied before taxonconcept_unique_within_parent if both collide: Aaron Marcuse-Kubitza
05:36 PM Revision 5296: schemas/vegbien.sql: taxonconcept: Merged taxonconcept_unique_within_creator_by_name unique index into taxonconcept_unique_within_parent, placed parent_id first, and removed index condition, so that this index can be used as a lookup index by taxonconcept_update_ancestors() (which requires no index condition in order to apply to *all* taxonconcepts) in addition to as a unique index. Note that an index condition should not be necessary for the index's uniquifying task, because if a set of taxonconcepts provides only the identifyingtaxonomicname, that should collide in the taxonconcept_unique_within_creator_by_identifying_name unique index before this index collides. This assumes that the collision order when multiple indexes collide is alphabetical by the index name.: Aaron Marcuse-Kubitza
04:47 PM Revision 5295: mappings/VegCore-VegBIEN.csv: taxonconcepts: Also create the taxonconcept tree for taxonconcepts created from original*, verbatim*, and accepted* taxonomic terms: Aaron Marcuse-Kubitza
04:35 PM Revision 5294: mappings/VegCore-VegBIEN.csv: taxonconcepts: Also create the taxonconcept tree if datasource provided separated components of the taxonomic name and/or its own tree of life with higher classifications. This enables storing the datasource's own tree of life to supplement any official tree (TROPICOS, USDA, etc.).: Aaron Marcuse-Kubitza
04:25 PM Revision 5293: mappings/VegCore-VegBIEN.csv: taxonconcept tree: Don't map infraspecificEpithet+taxonRank to a taxonconcept in the tree of parent concepts because it has already been mapped to the primary, lowest-level taxonconcept: Aaron Marcuse-Kubitza
04:00 PM Revision 5292: schemas/vegbien.sql: taxonconcept: taxonconcept_unique_within_creator_by_name unique index: Fixed bug where index filter overlapped with taxonconcept_unique_within_parent's index filter, causing these unique indexes to sometimes both apply at the same time and prevent column-based import from correctly choosing which index to use for each taxonconcept import: Aaron Marcuse-Kubitza
01:15 PM Revision 5291: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
01:02 PM Revision 5290: schemas/vegbien.sql: taxonconcept.canon_taxonconcept_id comment: Changed comment to use "concept" rather than "name" where applicable. Documented that a synonym between taxonconcepts of different sources is indicated by choosing one taxonconcept to be authoritative and pointing the other taxonconcept to it using this field.: Aaron Marcuse-Kubitza

10/05/2012

10:52 PM Revision 5289: sql_io.py: put_table(): Resolving default value column: Fixed bug where the default value col needed to have its table changed from in_table to full_in_table if it's a table column, and needed to have any column rename removed if it's a literal value: Aaron Marcuse-Kubitza
10:29 PM Revision 5288: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
10:28 PM Revision 5287: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
10:23 PM Revision 5286: schemas/vegbien.sql: Renamed plant* taxonomic tables -> taxon*, as part of the taxonomic schema refactoring at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2012-10-03_conference_call#Taxonomic-schema-refactoring>: Aaron Marcuse-Kubitza
10:15 PM Revision 5285: schemas/vegbien.ERD.mwb: Rearranged to fit more of location table on the diagram, using the newly available space from taxon: Aaron Marcuse-Kubitza
10:00 PM Revision 5284: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
09:59 PM Revision 5283: schemas/tree_cross-links.sql: Synced with schema, updating with new table names: Aaron Marcuse-Kubitza
09:54 PM Revision 5282: schemas/vegbien.sql: Removed no longer used taxon table. Use taxonconcept instead.: Aaron Marcuse-Kubitza
09:51 PM Revision 5281: schemas/vegbien.sql: taxonconcept.taxonname: comment: Stated that this is the name of the taxon within its parent taxon: Aaron Marcuse-Kubitza
09:48 PM Revision 5280: schemas/vegbien.sql: taxonconcept: comment: Removed no longer accurate comment that an accepted taxonconcept points to the identified taxon in the tree of life, because it *is* the identified taxon in the tree of life: Aaron Marcuse-Kubitza
09:39 PM Revision 5279: schemas/filter_ERD.csv: Changed the table with the visible fkey from plant* to taxon* to be plantstatus rather than plantusage, since it contains more core fields: Aaron Marcuse-Kubitza
09:25 PM Revision 5278: schemas/vegbien.sql: taxonconcept: Removed taxon_id, since taxonconcept now contains all the information needed to represent a taxonomic hierarchy, including both conceptual and nomenclature information: Aaron Marcuse-Kubitza
09:20 PM Revision 5277: schemas/vegbien.sql: plantusage: Point just to taxonconcept instead of both to taxonconcept and taxon: Aaron Marcuse-Kubitza
09:16 PM Revision 5276: schemas/vegbien.sql: taxonconcept: rank, verbatimrank comments: Added info from corresponding fields in taxon that also applies to taxonconcept: Aaron Marcuse-Kubitza
09:14 PM Revision 5275: schemas/vegbien.sql: taxonconcept: comment: Added info from taxon that also applies to taxonconcept: Aaron Marcuse-Kubitza
09:06 PM Revision 5274: schemas/vegbien.sql: Added taxonconcept_ancestor cross-link table: Aaron Marcuse-Kubitza
08:40 PM Revision 5273: schemas/vegbien.sql: taxonconcept: Added description field: Aaron Marcuse-Kubitza
08:38 PM Revision 5272: mappings/VegCore-VegBIEN.csv: Remapped taxon hierarchy for accepted taxonconcepts to taxonconcept parent_id hierarchy: Aaron Marcuse-Kubitza
08:12 PM Revision 5271: schemas/vegbien.sql: Fixed bug where taxonconcept.parent_id was missing a foreign key constraint: Aaron Marcuse-Kubitza
08:10 PM Revision 5270: schemas/vegbien.sql: taxonconcept: Changed instructions for including a taxon name at a rank with no explicit column to create a parent taxonconcept for it and point to it using parent_id instead of using otherranks. Removed no longer used otherranks field.: Aaron Marcuse-Kubitza
08:05 PM Revision 5269: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key check constraint: Added taxonname: Aaron Marcuse-Kubitza
07:58 PM Revision 5268: schemas/vegbien.sql: taxonconcept: taxonconcept_unique_within_creator_by_name unique index: Removed duplicate entry for creator_id: Aaron Marcuse-Kubitza
07:57 PM Revision 5267: schemas/vegbien.sql: taxonconcept: Added parent_id to point to the parent taxonconcept: Aaron Marcuse-Kubitza
07:56 PM Revision 5266: sql_gen.py: null_sentinels: Added 'unknown' for taxonrank: Aaron Marcuse-Kubitza
07:44 PM Revision 5265: schemas/vegbien.sql: taxonrank: Added 'unknown': Aaron Marcuse-Kubitza
07:30 PM Revision 5264: mappings/VegCore-VegBIEN.csv: Also map *taxonRank to taxonconcept.rank, so that if it's in the taxonrank enum, it will automatically populate this field: Aaron Marcuse-Kubitza
07:14 PM Revision 5263: mappings/VegCore-VegBIEN.csv: Remapped *infraspecificEpithet to new taxonconcept.taxonname rather than placing it in subspecies prefixed with the taxonRank, because it isn't necessarily the subspecies and because taxonname is defined to contain the lowest-rank portion of the taxonomic name. Note that when both morphospecies and infraspecificEpithet are provided, infraspecificEpithet takes priority for the taxonname field, because if TNRS leaves unmatched terms (which are tentatively mapped to morphospecies) but also matches an infraspecificEpithet, then the unmatched terms can't be for a morphospecies (because an infraspecificEpithet and therefore also a specificEpithet was matched, so the species is definite and formally named).: Aaron Marcuse-Kubitza
06:45 PM Revision 5262: schemas/vegbien.sql: taxonconcept: Renamed morphospecies to taxonname since it's used in the same way as taxon.taxonname: to store the lowest-rank portion of the taxonomic name, such as the morphospecies suffix: Aaron Marcuse-Kubitza
06:21 PM Revision 5261: inputs/.TNRS/tnrs/map.csv: Mapped *_matched terms that are both matched in the input name and which correspond to the matched taxonconcept (Genus_matched, Specific_epithet_matched, etc.) to both the input and matched taxonconcepts: Aaron Marcuse-Kubitza
06:09 PM Revision 5260: inputs/.TNRS/tnrs/map.csv: Mapped terms matched in the original string (rather than deduced from the matched taxonconcept) to new verbatim* taxonomic terms: Aaron Marcuse-Kubitza
06:03 PM Revision 5259: mappings/VegCore-VegBIEN.csv: Mapped verbatim* taxonomic terms to the TNRS input taxonconcept: Aaron Marcuse-Kubitza
05:48 PM Revision 5258: mappings/VegCore-VegBIEN.csv: TNRS input taxonconcept: Split single _if statement controlling where morphospecies goes into two _if statements for each case, so that other verbatim* terms don't need to have an _if statement in their mapping to the input taxonconcept: Aaron Marcuse-Kubitza
05:29 PM Revision 5257: mappings/VegCore.csv: Added back verbatim* taxonomic terms, which will now be used for the TNRS input taxonconcept. Note that they will have a different meaning than the original* taxonomic terms that they were renamed to in r5062.: Aaron Marcuse-Kubitza
05:22 PM Revision 5256: mappings/VegCore-VegBIEN.csv: In TNRS mode, remapped morphospecies (Unmatched_terms) to the input name's taxonconcept, because this does not relate to the matched taxon concept: Aaron Marcuse-Kubitza
05:12 PM Revision 5255: mappings/VegCore-VegBIEN.csv: TNRS-only mappings: Switch them on when verbatimScientificNameWithAuthorship is provided rather than when acceptedScientificNameWithAuthorship is provided, because it's the presence of a separate TNRS input name that really determines when TNRS is being mapped: Aaron Marcuse-Kubitza
05:07 PM Revision 5254: Makefiles: .last_cleanup targets: Also make the file that's being cleaned up .PRECIOUS so it doesn't get deleted if the .last_cleanup target has an error: Aaron Marcuse-Kubitza
05:04 PM Revision 5253: Makefiles: .last_cleanup targets: Make each individual target .PRECIOUS (don't delete on error) because just making %.last_cleanup precious doesn't seem to prevent deletion: Aaron Marcuse-Kubitza

10/04/2012

11:19 PM Revision 5252: mappings/VegCore-VegBIEN.csv: Mapped *taxonRank to new taxonconcept.verbatimrank: Aaron Marcuse-Kubitza
11:15 PM Revision 5251: schemas/vegbien.sql: taxonconcept: Added rank, verbatimrank analogous to those fields in taxon: Aaron Marcuse-Kubitza
09:59 PM Revision 5250: Makefiles: Don't delete %.last_cleanup on error because it's a mod time record rather than a generated file, and so that it's left at the last successful cleanup time when a cleanup operation is cancelled: Aaron Marcuse-Kubitza
09:52 PM Revision 5249: input.Makefile: Maps building: %/.map.csv.last_cleanup: Removed no longer accurate comment about mappings being autoremoved: Aaron Marcuse-Kubitza
09:34 PM Revision 5248: inputs/.TNRS/tnrs/map.csv: Remapped Name_submitted to new verbatimScientificNameWithAuthorship to create an additional level of taxonconcept for the concatenated (TNRS input) name separate from the parsed (TNRS output) name: Aaron Marcuse-Kubitza
09:33 PM Revision 5247: mappings/VegCore-VegBIEN.csv: Mapped verbatimScientificNameWithAuthorship as an additional level of taxonconcept for the concatenated (TNRS input) name separate from the parsed (TNRS output) name: Aaron Marcuse-Kubitza
09:26 PM Revision 5246: schemas/vegbien.sql: taxonconcept.canon_taxonconcept_id: comment: Changed three-level hierarchy to four-level hierarchy which separates the concatenated (TNRS input) name from the parsed (TNRS output) name: Aaron Marcuse-Kubitza
09:22 PM Revision 5245: mappings/VegCore.csv: Added back verbatimScientificNameWithAuthorship, which will now be used to store the TNRS input name: Aaron Marcuse-Kubitza
08:45 PM Revision 5244: schemas/filter_ERD.csv: Removed no longer used table taxonscope: Aaron Marcuse-Kubitza
08:32 PM Revision 5243: schemas/vegbien.sql: voucher: Removed accessioncode because this table has no sourceaccessioncode which it would be generated from (it just links a taxonoccurrence to a vouchering specimenreplicate): Aaron Marcuse-Kubitza
08:26 PM Revision 5242: schemas/vegbien.sql: Renamed datasource_id to creator_id so it can apply generally to any entity (such as a person), not just an aggregated datasource. This also enables taxonconcept.datasource_id to merge with creator_id, which now serves the same purpose.: Aaron Marcuse-Kubitza
08:05 PM Revision 5241: schemas/vegbien.sql: taxonconcept: Renamed definer_id to creator_id to allow merging with datasource_id when datasource_id is renamed to creator_id: Aaron Marcuse-Kubitza
07:50 PM Revision 5240: mappings/VegCore-VegBIEN.csv: Populated new taxonconcept.definer_id from identifiedBy, or when no identifiedBy is specified, from the datasource itself (using _simplifyPath:[next=datasource_id]): Aaron Marcuse-Kubitza
07:43 PM Revision 5239: sql_io.py: put_table(): Resolve default value column *after* the main loop (inserts and selects), so that the default value column can refer to an output column that is not in the original mapping but is added to the mapping from a col_defaults entry. This requires deferring the "Missing mapping for NOT NULL column" warning until the default value column is resolved, and including all columns in the full_in_table since the default value input column is not yet known.: Aaron Marcuse-Kubitza
06:59 PM Revision 5238: schemas/vegbien.sql: taxonconcept: comment: Changed definition to "A taxon concept defined by an entity" to correspond with the table's new name and usage: Aaron Marcuse-Kubitza
06:51 PM Revision 5237: mappings/VegCore-VegBIEN.csv: Fixed bug where needed to set datasource_id=0 on the TNRS party (which concatenated names/TNRS inputs are owned by) in order to make it a datasource (a root party): Aaron Marcuse-Kubitza
06:44 PM Revision 5236: schemas/vegbien.sql: party: Fixed bug where needed separate unique index for roots (datasources), whose organizationnames must be globally unique rather than unique within a datasource: Aaron Marcuse-Kubitza
06:28 PM Revision 5235: schemas/vegbien.sql: taxonconcept: Renamed concept_reference_id to definer_id because this is a clearer name and because this will allow merging with datasource_id, which serves the same purpose: Aaron Marcuse-Kubitza
06:15 PM Revision 5234: schemas/vegbien.sql: party: Made it datasource-scoped. Since this creates a recursive fkey, a datasource (a root party) should point to itself in this field, which will happen automatically by setting it to the special value 0.: Aaron Marcuse-Kubitza
05:51 PM Revision 5233: lib/PostgreSQL-MySQL.csv: Changed translation of fulltext to quote the identifier instead of appending characters to make it not a reserved word: Aaron Marcuse-Kubitza
05:36 PM Revision 5232: schemas/vegbien.sql: taxonconcept: Moved concept_reference_id to the top of the table because it is now a key scoping field: Aaron Marcuse-Kubitza
05:30 PM Revision 5231: schemas/vegbien.sql: concept_reference_id: Made it an fkey to party instead of taxonscope, because this is now the entity that defined the taxon concept, and is no longer specific to morphospecies. Removed no longer used table taxonscope.: Aaron Marcuse-Kubitza
05:13 PM Revision 5230: schemas/vegbien.sql: taxonconcept: Documented that it's equivalent to VegBank's plantConcept table: Aaron Marcuse-Kubitza
04:56 PM Revision 5229: schemas/filter_ERD.csv: taxonconcept inward fkeys: Removed not applicable taxon filtered table, since the fkey points in the opposite direction and thus is not part of this filter: Aaron Marcuse-Kubitza
04:52 PM Revision 5228: schemas/vegbien.sql: taxonconcept: Renamed scope_id -> concept_reference_id as part of taxonomic schema refactoring at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2012-10-03_conference_call#Taxonomic-schema-refactoring>: Aaron Marcuse-Kubitza
04:47 PM Revision 5227: README.TXT: Schema changes: Moved "update the following files with any renamings" out of "Sync ERD with vegbien.sql schema" because this is needed for any schema changes, not just as part of syncing the ERD: Aaron Marcuse-Kubitza
04:42 PM Revision 5226: README.TXT: Schema changes: Added Refactoring tips section with steps to rename a table and a column: Aaron Marcuse-Kubitza
04:23 PM Revision 5225: schemas/vegbien.sql: Renamed taxonpath -> taxonconcept as part of taxonomic schema refactoring at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2012-10-03_conference_call#Taxonomic-schema-refactoring>: Aaron Marcuse-Kubitza
04:17 PM Revision 5224: README.TXT: Schema changes: Syncing ERD with vegbien.sql schema: Added step to update mappings/VegCore-VegBIEN.csv with any renamings: Aaron Marcuse-Kubitza
04:10 PM Revision 5223: README.TXT: Schema changes: Syncing ERD with vegbien.sql schema: Added step to update schemas/filter_ERD.csv with any table renamings: Aaron Marcuse-Kubitza
03:58 PM Revision 5222: inputs/import.stats.xls: Updated import times. This now includes the half-hour-long pre-import of the TNRS taxonomic names (which the datasources then match up with), as well as the concatenation of the datasource's taxonomic name components to create or match up with the TNRS input name.: Aaron Marcuse-Kubitza
03:54 PM Revision 5221: README.TXT: Data import: make backups/TNRS.backup/restore: Run it in the background because it takes awhile: Aaron Marcuse-Kubitza
03:53 PM Revision 5220: README.TXT: Data import: Added steps to sync the TNRS schema to the latest version on vegbiendev: Aaron Marcuse-Kubitza
03:38 PM Revision 5219: README.TXT: Data import: make inputs/download-logs: Added tnrs_log=1 so the TNRS daemon log is downloaded as well: Aaron Marcuse-Kubitza

10/03/2012

01:55 PM Revision 5218: Added inputs/test_taxonomic_names/Taxon/testNames.txt since this is test data, and thus can be under version control: Aaron Marcuse-Kubitza
01:55 PM Revision 5217: Added inputs/test_taxonomic_names/README.TXT with Bob's comments: Aaron Marcuse-Kubitza
01:41 PM Revision 5216: schemas/vegbien.sql: taxonpath.taxon_id: Changed comment to indicate that this used for parsed, not just accepted names. Parsed names have been standardized by TNRS but may be synonyms.: Aaron Marcuse-Kubitza
01:27 PM Revision 5215: README.TXT: Documentation: To import and scrub just the test taxonomic names: Added `yes|` before make schemas/public/reinstall so the user isn't prompted to confirm the reinstallation a second time, and can just copy and paste the set of 5 commands directly into the terminal: Aaron Marcuse-Kubitza
01:11 PM Revision 5214: tnrs_db: Made wait option default to off to facilitate running tnrs_db by itself, rather than as part of an import: Aaron Marcuse-Kubitza
01:08 PM Revision 5213: tnrs_db: Added wait option to have tnrs_db exit as soon as no more names are available. This is useful for running tnrs_db when there is no concurrent import running, and therefore no need to wait for new data.: Aaron Marcuse-Kubitza
01:00 PM Revision 5212: tnrs_db: Fixed the time of the "Waited" message so it that the total_pause (containing the next wait) would be incremented *after* the message was displayed. Split the "Waited" and "Waiting" messages into two separate messages.: Aaron Marcuse-Kubitza
12:51 PM Revision 5211: README.TXT: Data import: Added steps to back up the TNRS cache, since it takes a long time to recreate. This also enables syncing it with a local machine when `make backups/download` is run.: Aaron Marcuse-Kubitza
12:47 PM Revision 5210: README.TXT: Documentation: Added instructions to import and scrub just the test taxonomic names: Aaron Marcuse-Kubitza
12:41 PM Revision 5209: input.Makefile: Staging tables installation: uninstall: For the TNRS datasource, prompt the user before deleting the schema, since the data in it is not easily reconstructible from a flat file: Aaron Marcuse-Kubitza
11:41 AM Revision 5208: sql.py: map_expr(): When matching without quotes, support names containing spaces by not matching words when preceded or followed by quotes: Aaron Marcuse-Kubitza
11:24 AM Revision 5207: sql.py: Expressions: bool_re: Also match parentheses surrounding the boolean value: Aaron Marcuse-Kubitza
08:57 AM Revision 5206: README.TXT: Data import: import_all: Don't run with & because this prevents the created jobs from being owned by the calling shell. Instead, import the TNRS names as a separate backgrounded step and wait for it to finish before starting import_all. Removed TNRS import steps from import_all since these are now invoked separately.: Aaron Marcuse-Kubitza
08:35 AM Revision 5205: README.TXT: Data import: Run import_all in the background, because it needs to import all the taxonomic names synchronously before it can start the datasource import in the background: Aaron Marcuse-Kubitza
08:19 AM Revision 5204: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
08:14 AM Revision 5203: inputs/.TNRS/tnrs/map.csv: Mapped Unmatched_terms to morphospecies because the morphospecies is what's left once named ranks are matched: Aaron Marcuse-Kubitza
08:11 AM Revision 5202: mappings/VegCore-VegBIEN.csv: Mapped morphospecies: Aaron Marcuse-Kubitza
08:08 AM Revision 5201: mappings/VegCore.csv: Added morphospecies: Aaron Marcuse-Kubitza
08:04 AM Revision 5200: schemas/vegbien.sql: taxonpath: Added morphospecies: Aaron Marcuse-Kubitza
07:43 AM Revision 5199: inputs/.TNRS/tnrs/test.xml.ref: Updated for latest TNRS output: Aaron Marcuse-Kubitza
06:40 AM Revision 5198: inputs/.TNRS/tnrs/map.csv: Infraspecific_rank_2, Infraspecific_epithet_2_*: Mapped to UNUSED because they do not appear to be provided by TNRS (it just puts additional infraspecific names in Unmatched_terms): Aaron Marcuse-Kubitza
06:34 AM Revision 5197: inputs/.TNRS/tnrs/map.csv: Omit Infraspecific_rank because Name_matched_rank contains the unabbreviated rank and is provided more often: Aaron Marcuse-Kubitza
06:29 AM Revision 5196: mappings/VegCore-VegBIEN.csv: Also map TNRS-parsed infraspecificEpithet (Infraspecific_epithet_matched) to taxon at the infraspecies rank: Aaron Marcuse-Kubitza
06:07 AM Revision 5195: mappings/VegCore-VegBIEN.csv: Also map TNRS-parsed taxonomic ranks to the tree of life in the taxon table: Aaron Marcuse-Kubitza
05:18 AM Revision 5194: schemas/vegbien.sql: taxon: Added comment that this table stores the tree of life: Aaron Marcuse-Kubitza
05:00 AM Revision 5193: mappings/VegCore-VegBIEN.csv: accepted taxonomic terms: Use new _set_canon_taxonpath() to set the canon_taxonpath_id *after* the taxonpath has been inserted, so that if the taxonpath is an accepted name (scrubs to itself), it will link up to the just-inserted taxonpath with the taxonomic ranks parsed out, rather than to a new taxonpath containing only the few taxonomic ranks of the accepted name that TNRS provides. In particular, this (together with the tnrs_accepted_names sorting index on TNRS.tnrs) ensures that an accepted name is imported with its genus and species parsed out by TNRS instead of concatenated together in the Accepted_name_species field (genus+species). This enables the individual taxonomic ranks to be used in constructing the leaves of the tree of life (the taxon table).: Aaron Marcuse-Kubitza
04:50 AM Revision 5192: sql_io.py: put_table(): Fixed bug where row_ct_ref was incorrectly being incremented when the iteration is a function call. This bug only occurred in row-based mode, because the DB cursor for a function call is not stored in column-based mode.: Aaron Marcuse-Kubitza
04:30 AM Revision 5191: inputs/.TNRS/tnrs/map.csv: Use Name_matched_author/Name_matched_accepted_family instead of Author_matched/Family_matched because these fields are provided more often, due to being determined from the matched name itself rather than from the original string. This helps to fill in as many fields as possible. For accepted names (which scrub to themselves), this is especially important, because it adds the accepted name's family, which is not present in the input taxonomic name.: Aaron Marcuse-Kubitza
03:58 AM Revision 5190: xml_func.py: process(): Fixed bug where need to preserve complex functions that have unevaluated XML nodes as arguments, because XML nodes are not accepted by sql_io.put() (they are handled by db_xml.put()): Aaron Marcuse-Kubitza
03:08 AM Revision 5189: schemas/vegbien.sql: Renamed set_canon_taxonpath() to _set_canon_taxonpath() (adding _ prefix) so that db_xml.put() treats its arguments as arguments rather than as children with fkeys to parent: Aaron Marcuse-Kubitza
03:02 AM Revision 5188: schemas/vegbien.sql: Added set_canon_taxonpath() to set a taxonpath's canon_taxonpath_id after it has been created: Aaron Marcuse-Kubitza
02:48 AM Revision 5187: Added inputs/.TNRS/tnrs/cleanup.sql to cluster TNRS.tnrs on tnrs_accepted_names. This keeps TNRS.tnrs sorted with the accepted names first.: Aaron Marcuse-Kubitza
02:46 AM Revision 5186: input.Makefile: Staging tables installation: %/cleanup: Also run any custom cleanup.sql provided in the subdir. %/install: Removed processing of postprocess.sql because no datasources are using it and because cleanup.sql can now be used for this purpose.: Aaron Marcuse-Kubitza
02:39 AM Revision 5185: inputs/.TNRS/schema.sql: tnrs: Added tnrs_accepted_names index, which sorts accepted names first, and cluster the table on this index. This ensures that the component-parsed entries for accepted names are created before any verbatim names that point to them.: Aaron Marcuse-Kubitza
02:37 AM Revision 5184: input.Makefile: Staging tables installation: %/cleanup: Documented that this removes any index comments, due to a PostgreSQL bug. (This occurs because ALTER TABLE recreates the index but not its comment.): Aaron Marcuse-Kubitza
01:55 AM Revision 5183: inputs/.TNRS/schema.sql: Removed hardcoded schema name: Aaron Marcuse-Kubitza
01:18 AM Revision 5182: inputs/.TNRS/tnrs/map.csv: Changed Name_matched_accepted_family comment to match analogous Name_matched_author comment: Aaron Marcuse-Kubitza
01:17 AM Revision 5181: inputs/.TNRS/tnrs/map.csv: Remapped Author_matched as the scientificNameAuthorship instead of Name_matched_author, because Name_matched_author contains the author based on the matched name, not the author in the original string, so it's not strictly from the original name: Aaron Marcuse-Kubitza
12:33 AM Revision 5180: mappings/VegCore.csv: Added acceptedBinomial, originalBinomial: Aaron Marcuse-Kubitza
12:29 AM Revision 5179: mappings/VegCore.csv: Added binomial: Aaron Marcuse-Kubitza
12:03 AM Revision 5178: inputs/.TNRS/tnrs/map.csv: Mapped Specific_epithet_matched: Aaron Marcuse-Kubitza

10/02/2012

11:53 PM Revision 5177: Added inputs/test_taxonomic_names/: Aaron Marcuse-Kubitza
11:37 PM Revision 5176: mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode: Only populate if needed to distinguish the taxonoccurrence within a plot: Aaron Marcuse-Kubitza
11:24 PM Revision 5175: schemas/vegbien.sql: placepath: Removed no longer used placepath_unique constraint on place_id. Removed place_id from placepath_unique_within_datasource_by_name unique index because otherranks is now used to store custom ranks.: Aaron Marcuse-Kubitza
11:23 PM Revision 5174: schemas/vegbien.sql: placepath: Removed no longer used placepath_unique constraint on place_id. Removed place_id from placepath_unique_within_datasource_by_name unique index because otherranks is now used to store custom ranks.: Aaron Marcuse-Kubitza
11:14 PM Revision 5173: schemas/vegbien.sql: taxonpath, placepath: Added *_required_key check constraints to ensure that empty entries are not created when a row does not have taxonpath/placepath data: Aaron Marcuse-Kubitza
10:35 PM Revision 5172: import_all: Use new dedicated cleanup make target to clean up TNRS.tnrs: Aaron Marcuse-Kubitza
09:54 PM Revision 5171: tnrs.py: encode_map: Added hidden minus sign, which TNRS removes: Aaron Marcuse-Kubitza
09:44 PM Revision 5170: csvs.py: tsv_encode_map: Escape \n as \n (instead of as a \ followed by a newline) for clarity. Added escape for \r by using strings.json_encode_map. TsvReader: Decode all escapes in tsv_encode_map.: Aaron Marcuse-Kubitza
09:25 PM Revision 5169: tnrs.py: encode_map: Added × (times), which TNRS replaces with x: Aaron Marcuse-Kubitza
09:18 PM Revision 5168: tnrs.py: encode_map: Added " and ', which TNRS removes when at the beginning or end: Aaron Marcuse-Kubitza
09:12 PM Revision 5167: tnrs.py: encode_map: Documented why each character needs to be encoded: Aaron Marcuse-Kubitza
09:04 PM Revision 5166: tnrs.py: encode_map: Removed '&', which is actually not a special character for TNRS (although ';' is): Aaron Marcuse-Kubitza
09:02 PM Revision 5165: tnrs.py: encode_map: Added '_', which TNRS replaces with space: Aaron Marcuse-Kubitza
08:56 PM Revision 5164: sql_io.py: append_csv(): In INSERT mode, print # rows read (different from # lines read if some fields contained embedded newlines) and # rows inserted (different from # rows read if some violated a constraint): Aaron Marcuse-Kubitza
08:42 PM Revision 5163: sql.py: insert(): Explicitly return None if the insert failed and a DuplicateKeyException or NullValueException was suppressed: Aaron Marcuse-Kubitza
07:13 PM Revision 5162: input.Makefile: Staging tables installation: $(logInstall*Add): Fixed bug where the existing install log would be overwritten in quiet mode, even though this function should append its output to the log. Note that plain $(logInstall*) always overwrites the existing install log because it is used by the first install command.: Aaron Marcuse-Kubitza
06:53 PM Revision 5161: strings.py: json_encode(): Fixed bug where '\n' and '\r' also needed to be encoded: Aaron Marcuse-Kubitza
06:50 PM Revision 5160: tnrs.py: repeated_tnrs_request(): Also retry request in debug mode if an HTTPError is thrown, so that debugging info can also be obtained if there is a bug in the TNRS client: Aaron Marcuse-Kubitza

10/01/2012

10:44 PM Revision 5159: tnrs_db: Updated query for new three-level taxonpath hierarchy, where the concatenated name is now stored in identifyingtaxonomicname instead of taxonomicnamewithauthor: Aaron Marcuse-Kubitza
10:41 PM Revision 5158: root map: Removed no longer needed public schema override, which is now handled by vegbien_dest: Aaron Marcuse-Kubitza
10:40 PM Revision 5157: vegbien_dest: Allow user to specify a custom public schema in the $public env var. This makes custom public schema functionality available to all VegBIEN-accessing scripts, not just map.: Aaron Marcuse-Kubitza
10:12 PM Revision 5156: tnrs_db: Adjusted pause, max_pause so the daemon waits longer before exiting, because after the initial TNRS run, most names have already been scrubbed and new names may not be added until the end of the import (in the case of a very large new datasource): Aaron Marcuse-Kubitza
09:44 PM Revision 5155: input.Makefile: Staging tables installation: Added cleanup, %/cleanup to clean up already-installed tables: Aaron Marcuse-Kubitza
09:36 PM Revision 5154: tnrs.py: encode(): Also prepend special padding string to empty and whitespace-only strings because these names are otherwise ignored by TNRS (no response row): Aaron Marcuse-Kubitza
09:15 PM Revision 5153: tnrs_db: pause: Increased to 30 min because if no new names are available in TNRS.tnrs, there is no need to check every minute for new names (which clutters up the log file output). The pause feature is designed to allow tnrs_db to run in parallel with the import process, and process new names as they are made available, which only happens once for each partition of each datasource.: Aaron Marcuse-Kubitza
09:11 PM Revision 5152: tnrs_db: Fixed bug where the new filtering out of already-scrubbed names caused names to be skipped, because the loop would both advance by the number of rows found *and* those rows would no longer be returned by the query, causing only every other set of rows to be processed: Aaron Marcuse-Kubitza
08:58 PM Revision 5151: tnrs.py: tnrs_request(): Rewrapped lines (became >80 chars after adding profiling): Aaron Marcuse-Kubitza
08:52 PM Revision 5150: tnrs.py: tnrs_request(): Use new encode() and TnrsOutputStream to escape TNRS-invalid characters: Aaron Marcuse-Kubitza
08:51 PM Revision 5149: tnrs.py: Added encode(), decode(), decode_for_tsv(), and TnrsOutputStream to handle escaping TNRS-invalid characters: Aaron Marcuse-Kubitza
08:48 PM Revision 5148: strings.py: Added regexp_repl_esc(): Aaron Marcuse-Kubitza
08:47 PM Revision 5147: strings.py: Added replace_all() and replace_all_re(), as well as flip_map() for use with maps for these functions: Aaron Marcuse-Kubitza
08:46 PM Revision 5146: csvs.py: Added tsv_encode_map for use in creating TSVs parsed by TsvReader: Aaron Marcuse-Kubitza
06:42 PM Revision 5145: csvs.py: TsvReader: Also interpret '\t' as a tab, to provide a mechanism for encoding embedded tabs: Aaron Marcuse-Kubitza
05:47 PM Revision 5144: tnrs.py: gwt_encode(): Escape special characters in the string instead of removing them, so that TNRS receives the original name rather than a modified version. This will help make the submitted names match up with the returned Name_submitted.: Aaron Marcuse-Kubitza
05:45 PM Revision 5143: strings.py: Added json_encode(): Aaron Marcuse-Kubitza
05:44 PM Revision 5142: strings.py: Added esc_quotes(): Aaron Marcuse-Kubitza
04:52 PM Revision 5141: schemas/vegbien.sql: placepath.canon_placepath_id: Changed hierarchy comment to match the taxonpath.canon_taxonpath_id comment, but with a two-level hierarchy of datasource name -> accepted name. This may later be changed to a three-level hierarchy like taxonpath.canon_taxonpath_id depending on how GNRS works.: Aaron Marcuse-Kubitza
04:49 PM Revision 5140: schemas/vegbien.sql: taxonpath.canon_taxonpath_id: Changed comment to specify that taxonpaths should now be linked in a three-level hierarchy of datasource name -> concatenated name -> accepted name: Aaron Marcuse-Kubitza
04:45 PM Revision 5139: schemas/vegbien.sql: taxonpath, placepath: Changed "scrubbed" to "accepted" to emphasize that the name is the accepted name returned by TNRS or GNRS, rather than merely the matched name: Aaron Marcuse-Kubitza
04:38 PM Revision 5138: mappings/VegCore-VegBIEN.csv: non-TNRS taxonpaths: Store the concatenated identifyingtaxonomicname in a separate taxonpath owned by the TNRS datasource, so that it will match up with (and create a link to) the corresponding submitted TNRS name's taxonpath. This in turn is linked to the TNRS-determined accepted name, thus creating a three-level hierarchy of datasource name -> concatenated name -> accepted name.: Aaron Marcuse-Kubitza
03:59 PM Revision 5137: mappings/VegCore-VegBIEN.csv: taxonomic terms: Remapped the concatenated taxonomic name to new identifyingtaxonomicname to use it directly to match up with the TNRS submitted name. Continue to map scientificNameWithAuthorship to taxonomicnamewithauthor.: Aaron Marcuse-Kubitza
03:56 PM Revision 5136: schemas/vegbien.sql: taxonpath: Renamed plantcode to identifyingtaxonomicname so that it can be used to store the concatenated taxonomicname that gets scrubbed. This enables ignoring the name components when the full name is specified, so that when a TNRS submitted name's matched components are included in its taxonpath, this will not prevent a datasource's concatenated name (without the matched components) from matching up with the corresponding TNRS submitted name.: Aaron Marcuse-Kubitza
03:25 PM Revision 5135: schemas/vegbien.sql: taxonpath: Made taxonomicnamewithauthor optional again and include all columns in the taxonpath_unique_within_datasource_by_name unique index so that the original name components can be stored in a separate taxonpath from the taxonpath with the concatenated taxonomic name. (The datasource's taxonpath would not always contain an entry for taxonomicnamewithauthor, so the other columns also need to be used in the unique index.): Aaron Marcuse-Kubitza
02:57 PM Revision 5134: schemas/vegbien.sql: taxonpath: Added back datasource_id, plantcode to make taxonpath datasource-specific again. This way, the original name components can still be stored in taxonpath, in addition to storing the concatenated name in a datasource-general taxonpath for use by TNRS.: Aaron Marcuse-Kubitza

09/28/2012

03:46 PM Revision 5133: inputs/.TNRS/tnrs/map.csv: Mapped columns for components of original, submitted name: Aaron Marcuse-Kubitza
03:33 PM Revision 5132: mappings/VegCore-VegBIEN.csv, VegCore.csv: Removed no longer used verbatimScientificNameWithAuthorship. Use scientificNameWithAuthorship instead, and map accepted (scrubbed) names to acceptedScientificNameWithAuthorship to create the canon_taxonpath_id link.: Aaron Marcuse-Kubitza
03:28 PM Revision 5131: inputs/.TNRS/tnrs/map.csv: Remapped to new accepted* taxonomic terms: Aaron Marcuse-Kubitza
03:23 PM Revision 5130: mappings/VegCore-VegBIEN.csv: Mapped accepted* taxonomic terms: Aaron Marcuse-Kubitza
03:00 PM Revision 5129: sql_io.py: cleanup_table(): Don't clean up the pkey, because the canonicalization involved may produce collisions (as it does for TNRS.tnrs): Aaron Marcuse-Kubitza
02:58 PM Revision 5128: sql.py: Added pkey_col_(): Aaron Marcuse-Kubitza
02:31 PM Revision 5127: tnrs.py: tnrs_request(): Added comment that names containing only whitespace characters are ignored by TNRS and do not receive a response row. Our tnrs_db and reimport pipeline handles the necessary re-matching-up by just creating taxonpaths for each Name_submitted, and then letting the data import process on the following import attach to the prepopulated taxonpaths.: Aaron Marcuse-Kubitza
02:17 PM Revision 5126: tnrs_db: Exclude taxonomic names which have already been scrubbed, by using a filter-out LEFT JOIN on TNRS.tnrs: Aaron Marcuse-Kubitza
02:02 PM Revision 5125: tnrs.py: max_pause: Changed to 30 min because TNRS sometimes freezes for ~10 min. The freezing usually happens while the data is being uploaded rather than when it's being retrieved, so that the max_pause would not apply, but to be on the safe side, requests should not time out unnecessarily.: Aaron Marcuse-Kubitza
01:27 PM Revision 5124: tnrs_db: tnrs_profiler: Use iter_text='name' for consistency with tnrs.tnrs_request()'s own profiler's iter_text: Aaron Marcuse-Kubitza
01:25 PM Revision 5123: tnrs_db: Print cumulative profiling information after every TNRS request, rather than just at the end: Aaron Marcuse-Kubitza
01:22 PM Revision 5122: inputs/.TNRS/tnrs/tnrs.make: Append to the log file instead of overwriting it, so that the TNRS scrubbing of each import's new taxonomic names can be included in one log file. Echo the command to the log file to identify separate runs.: Aaron Marcuse-Kubitza
01:15 PM Revision 5121: TNRS-related programs: Use "names" instead of "taxons" for variable names because what's being submitted are actually verbatim taxonomic names, not official references to specific taxa: Aaron Marcuse-Kubitza
01:08 PM Revision 5120: tnrs.py: tnrs_request(): Profile the TNRS request: Aaron Marcuse-Kubitza
12:58 PM Revision 5119: tnrs.py: tnrs_request(): Fixed bug where initial_headers needed to be copied instead of just assigned to headers, because initial_headers is a global constant and should not be changed when the Cookie header is added: Aaron Marcuse-Kubitza
12:17 PM Revision 5118: mappings/VegCore.csv: originalTaxonRank, acceptedTaxonRank: Fixed sources to use verbatimTaxonRank, not taxonRank: Aaron Marcuse-Kubitza
12:15 PM Revision 5117: mappings/VegCore.csv: originalTaxonRank: Added source of the original* prefix: Aaron Marcuse-Kubitza
12:14 PM Revision 5116: mappings/VegCore.csv: acceptedTaxonRank: Added source of the accepted prefix: Aaron Marcuse-Kubitza
12:12 PM Revision 5115: mappings/VegCore.csv: accepted* taxonomic terms: Fixed sources of the accepted prefix to use acceptedNameUsage, not acceptedNameUsageID: Aaron Marcuse-Kubitza
12:09 PM Revision 5114: mappings/VegCore.csv: original* taxonomic terms: Source the original prefix to DwC originalNameUsage, which is a more offical source than SALVIAS orig_species: Aaron Marcuse-Kubitza
12:09 PM Revision 5113: mappings/VegCore.csv: original* taxonomic terms: Source the original prefix to DwC originalNameUsage, which is a more offical source than SALVIAS orig_species: Aaron Marcuse-Kubitza
11:56 AM Revision 5112: mappings/VegCore.csv: Added accepted* taxonomic terms to store the scrubbed name: Aaron Marcuse-Kubitza
11:42 AM Revision 5111: import_all: Clean up any new TNRS.tnrs entries before importing the TNRS data: Aaron Marcuse-Kubitza
11:36 AM Revision 5110: inputs/.TNRS/tnrs/: Create using datasource schema.sql file instead of text header and postprocess.sql, for clarity and to enable using `make inputs/.TNRS/tnrs/install` to clean up the tnrs entries populated by tnrs_db: Aaron Marcuse-Kubitza
11:21 AM Revision 5109: mappings/VegCore-VegBIEN.csv: Don't combine taxonRank with infraspecificEpithet if there is no infraspecificEpithet, because the taxonRank is only the infraspecificEpithet's prefix when there is an actual infraspecificEpithet. Often, taxonRank contains values like "genus" or "species" which cannot be used for this purpose.: Aaron Marcuse-Kubitza
10:54 AM Revision 5108: tnrs.py: repeated_tnrs_request(): Just retry the request once with with debug turned on, to avoid cluttering the log output with the verbose debug info of multiple failed requests if the error is not resolved on retry: Aaron Marcuse-Kubitza
10:47 AM Revision 5107: tnrs.py: tnrs_request(): repeated_tnrs_request(): Print all suppressed exceptions to stderr: Aaron Marcuse-Kubitza
10:41 AM Revision 5106: tnrs.py: tnrs_request(): parse_response(): Include both the response headers and the response body in the InvalidResponse message: Aaron Marcuse-Kubitza
10:23 AM Revision 5105: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
10:15 AM Revision 5104: profiling.py: Profiler: Fixed bug where instance variable start had the same name as method start(): Aaron Marcuse-Kubitza
10:08 AM Revision 5103: mappings/VegCore-VegBIEN.csv: verbatimScientificNameWithAuthorship: Set canon_taxonpath_id to 0 on the first, scrubbed taxonpath to auto-create the self reference that indicates a scrubbed taxonpath: Aaron Marcuse-Kubitza
10:03 AM Revision 5102: mappings/VegCore-VegBIEN.csv: Don't forward scientificName to taxonoccurrence.authortaxoncode when importing just taxonpaths, as for TNRS: Aaron Marcuse-Kubitza
09:51 AM Revision 5101: tnrs_db: Moved lower max_taxons limit to tnrs.py because it's really required to avoid crashing the TNRS server and should apply to all callers: Aaron Marcuse-Kubitza
09:35 AM Revision 5100: tnrs_db: Print log message with # of taxonpaths being sent to TNRS: Aaron Marcuse-Kubitza
09:30 AM Revision 5099: tnrs_db: Fixed bug where InvalidResponse was missing module name: Aaron Marcuse-Kubitza
09:29 AM Revision 5098: tnrs_db: Profile the TNRS requests. This involves using a finally block to ensure that the profiling stats are printed even if the program exits with an error.: Aaron Marcuse-Kubitza
09:13 AM Revision 5097: tnrs_db: Reduced the chunk size to avoid slowing down the TNRS server: Aaron Marcuse-Kubitza
09:07 AM Revision 5096: inputs/.TNRS/tnrs/tnrs.make: Added log option which outputs to the terminal instead when set to "": Aaron Marcuse-Kubitza
09:01 AM Revision 5095: tnrs_db: Added log messages for Making TNRS request and Storing TNRS response data so that if the TNRS daemon pauses, it's obvious which step it's waiting on: Aaron Marcuse-Kubitza
08:58 AM Revision 5094: sql.py: insert(): ignore optimization: Fixed bug where needed to run insert_select() recoverably so that the aborted transaction is rolled back after a DuplicateKeyException or NullValueException: Aaron Marcuse-Kubitza
08:43 AM Revision 5093: tnrs_db: If tnrs.repeated_tnrs_request() stil throws InvalidResponse, skip the current set in case its data caused the error. Note that it will still be tried again the next time tnrs_db is run.: Aaron Marcuse-Kubitza
08:34 AM Revision 5092: mappings/VegCore-VegBIEN.csv: Don't forward scientificName to taxonoccurrence.authortaxoncode when importing just taxonpaths, as for TNRS: Aaron Marcuse-Kubitza
08:30 AM Revision 5091: repeated_tnrs_request(): When retrying after an invalid response, output protocol info for debugging: Aaron Marcuse-Kubitza
08:29 AM Revision 5090: inputs/Makefile: Import logs: Don't download .TNRS/tnrs/tnrs.make.log by default because it changes each time `make inputs/.TNRS/tnrs/tnrs-remake` is run, and any version downloaded for debugging should be preserved. It can still be downloaded by setting the tnrs_log env var.: Aaron Marcuse-Kubitza
08:17 AM Revision 5089: tnrs_client, tnrs_db: Use new tnrs.repeated_tnrs_request(): Aaron Marcuse-Kubitza
08:16 AM Revision 5088: tnrs.py: Added repeated_tnrs_request() to retry a TNRS request which returned an invalid response: Aaron Marcuse-Kubitza
08:05 AM Revision 5087: db_xml.py: put_table(): Fixed bug where pkeys_loc needed to be initialized. Note that this bug was only triggered when importing a table with zero rows (in this case, the initial empty TNRS.tnrs table), because otherwise it would be set in the loop.: Aaron Marcuse-Kubitza
07:57 AM Revision 5086: inputs/Makefile: Import logs: Also download inputs/.TNRS/tnrs/tnrs.make.log: Aaron Marcuse-Kubitza
07:56 AM Revision 5085: inputs/Makefile: Import logs: Use new $(rsync*) to also sync datasources starting with ., such as .TNRS: Aaron Marcuse-Kubitza
07:55 AM Revision 5084: lib/common.Makefile: rsync: Added $(rsync*) to rsync all files, including those starting with ".": Aaron Marcuse-Kubitza
07:43 AM Revision 5083: tnrs.py: parse_response(): Raise custom InvalidResponse exception instead of SystemExit, so callers can catch the exception and respond to it: Aaron Marcuse-Kubitza
07:38 AM Revision 5082: mappings/VegCore-VegBIEN.csv: taxonpath.taxonomicnamewithauthor _join_words mappings: Added space after taxon rank prefix (var., etc.) for infraspecific ranks: Aaron Marcuse-Kubitza

09/27/2012

11:28 AM Revision 5081: import_all: Start the tnrs daemon using `make inputs/.TNRS/tnrs/tnrs-remake &`: Aaron Marcuse-Kubitza
11:25 AM Revision 5080: Added inputs/.TNRS/tnrs/tnrs.make to run tnrs_db on VegBIEN: Aaron Marcuse-Kubitza
11:25 AM Revision 5079: Added tnrs_db to scrub the taxonpaths in VegBIEN using TNRS: Aaron Marcuse-Kubitza
11:19 AM Revision 5078: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
11:17 AM Revision 5077: schemas/vegbien.sql: taxonpath: Made it datasource-general and uniquely identified only by its taxonomicnamewithauthor so that the taxonpaths imported by the TNRS datasource will be matched and used directly when the other datasources are imported: Aaron Marcuse-Kubitza
11:10 AM Revision 5076: schemas/vegbien.sql: taxonpath: taxonpath_unique_within_datasource_by_name unique index: Just do duplicate elimination on the taxonomicnamewithauthor, since that is now a required field and is generated by concatenating all the other fields. Note that the inserted row counts change slightly because the concatenation makes some names equal that are split among the fields differently, such as when the genus is included in the species field.: Aaron Marcuse-Kubitza
10:51 AM Revision 5075: db_xml.py: put(): Added _alt optimization that just returns the first arg if it's non-NULL: Aaron Marcuse-Kubitza
10:49 AM Revision 5074: sql_gen.py: Added is_nullable(): Aaron Marcuse-Kubitza
10:49 AM Revision 5073: schemas/vegbien.sql: taxonpath.taxonomicnamewithauthor: Made it NOT NULL, so that all taxonpaths would have a concatenated name to feed to TNRS: Aaron Marcuse-Kubitza
10:37 AM Revision 5072: mappings/VegCore-VegBIEN.csv: taxonomic terms: Changed _first to _alt because some datasources have NULL values in scientificNameWithAuthorship or scientificName, so it can't just be used in place of the joined-together taxonomic ranks: Aaron Marcuse-Kubitza
10:19 AM Revision 5071: db_xml.py: put(): Parse input columns and process values in separate loops, so that structural XML function optimization code can be inserted between them: Aaron Marcuse-Kubitza
10:12 AM Revision 5070: sql_io.py: put_table(): Removed comment that can support in_tables of any fixed-size iterable type, because the iterable must be ordered so that the first table can be treated specially: Aaron Marcuse-Kubitza
10:09 AM Revision 5069: sql_io.py: put_table(): Support in_tables of any fixed-size iterable type: Aaron Marcuse-Kubitza
09:13 AM Revision 5068: mappings/Veg+-VegCore.csv: cationExchangeCapacity->cationExchangeCapacity_cmol_kg mapping: Removed ? prefix because a mapping to only one set of units is unambiguous (if additional units for cationExchangeCapacity are found, this will become an ambiguous mapping). Note that canon automatically removes punctuation from VegCore terms, so this mapping would previously have had the ? prefix autoremoved anyway (both in inputs/*/*/map.csv and recently also in Veg+-VegCore.csv).: Aaron Marcuse-Kubitza
09:06 AM Revision 5067: mappings/Makefile: .Veg+-VegCore.csv.last_cleanup: Translate VegCore terms using itself so that any mapping to another Veg+ term automatically becomes a mapping to a VegCore term. .VegX-VegCore.csv.last_cleanup: Translate VegCore terms using Veg+-VegCore.csv to keep the terms up to date.: Aaron Marcuse-Kubitza
09:04 AM Revision 5066: mappings/VegX-VegCore.csv: Translated VegCore terms using Veg+-VegCore.csv: Aaron Marcuse-Kubitza
09:00 AM Revision 5065: mappings/Makefile: .VegCore.csv.last_cleanup, .VegCore-VegBIEN.csv.last_cleanup: Apply Veg+-VegCore.csv so that terms can easily be renamed just by adding a mapping in Veg+-VegCore.csv, which will auto-translate all places that use the term. .VegCore-VegBIEN.csv.last_cleanup: Canonicalize to VegCore.csv so case changes in VegCore terms will automatically propagate to VegCore-VegBIEN.csv.: Aaron Marcuse-Kubitza
08:46 AM Revision 5064: mappings/VegCore-VegBIEN.csv: Mapped verbatimScientificNameWithAuthorship, so that it links a verbatim taxonpath to the scrubbed taxonpath created from the primary taxonomic terms: Aaron Marcuse-Kubitza
08:36 AM Revision 5063: mappings/VegCore.csv: Renamed unscrubbedScientificNameWithAuthorship to the more standard verbatimScientificNameWithAuthorship, which is available now that the original taxondetermination terms use the original* prefix: Aaron Marcuse-Kubitza
08:31 AM Revision 5062: mappings/VegCore.csv: Renamed verbatim* taxonomic terms to original* because in most datasources, they are in fact for the *original* taxon determination of the organism (which can be a completely different name than the primary determination), rather than merely unscrubbed versions of the primary taxonomic name elements. Note that SALVIAS's orig_* terms do appear to be merely unscrubbed versions, but it's not a problem to add an additional taxon determination for them.: Aaron Marcuse-Kubitza
08:14 AM Revision 5061: sql.py: pkey(): Get the table's actual primary key column, rather than just using the first column in the table. Continue to return the first column in the table if the table has no primary key.: Aaron Marcuse-Kubitza
07:31 AM Revision 5060: inputs/.TNRS/tnrs/postprocess.sql: Use :table var instead of hardcoding the table name: Aaron Marcuse-Kubitza
07:30 AM Revision 5059: inputs/.TNRS/tnrs/postprocess.sql: Also add a primary key on Name_submitted, to prevent duplicate entries: Aaron Marcuse-Kubitza
07:27 AM Revision 5058: inputs/.TNRS/tnrs/: Added postprocess.sql which makes Name_submitted NOT NULL: Aaron Marcuse-Kubitza
07:25 AM Revision 5057: sql.py: insert(): ignore mode: Also ignore NullValueException: Aaron Marcuse-Kubitza
07:24 AM Revision 5056: input.Makefile: Staging tables installation: %/install: Support custom postprocess.sql which specifies commands to run after the table is imported: Aaron Marcuse-Kubitza
07:10 AM Revision 5055: import_all: Added import of .TNRS datasource, which happens synchronously before other datasources are imported: Aaron Marcuse-Kubitza
07:08 AM Revision 5054: Moved tnrs table from public (schemas/vegbien.sql) to its own TNRS schema, which is created by a new .TNRS datasource. Note that .TNRS is included in the automated testing, but not yet in the import.: Aaron Marcuse-Kubitza
06:57 AM Revision 5053: mappings/VegCore-VegBIEN.csv: Restored subplotID -> if subplot cond mapping, which had been overwritten: Aaron Marcuse-Kubitza
06:46 AM Revision 5052: inputs/ACAD/Specimen/map.csv: Remapped scientificName to scientificNameWithAuthorship: Aaron Marcuse-Kubitza
06:06 AM Revision 5051: sql_io.py: append_csv(): Using INSERT: Use ignore mode to support inserting rows into a table with a unique constraint: Aaron Marcuse-Kubitza
06:05 AM Revision 5050: sql.py: insert(): Added ignore optimization that just suppresses any DuplicateKeyException on the client side, to avoid needing to create a wrapper function just to insert-ignore one row: Aaron Marcuse-Kubitza
05:23 AM Revision 5049: mappings/VegCore-VegBIEN.csv: Synchronized verbatim* and non-verbatim taxonomic terms' mappings: Aaron Marcuse-Kubitza
05:08 AM Revision 5048: mappings/VegCore.csv: Added special term unscrubbedScientificNameWithAuthorship: Aaron Marcuse-Kubitza
05:05 AM Revision 5047: mappings/VegCore.csv: Added verbatimSubspecies, verbatimVariety, verbatimForma, verbatimCultivar (already mapped in VegCore-VegBIEN.csv): Aaron Marcuse-Kubitza
05:04 AM Revision 5046: mappings/Makefile: .VegCore.csv.last_cleanup: Also remake VegCore-VegBIEN.unsourced_terms.csv here, not just in .VegCore-VegBIEN.csv.last_cleanup, so that the unsourced_terms.csv will be remade if the user adds the missing sources to VegCore.csv: Aaron Marcuse-Kubitza
05:03 AM Revision 5045: mappings/Makefile: VegCore-VegBIEN.unsourced_terms.csv: Factored remake code into its own make target: Aaron Marcuse-Kubitza
04:51 AM Revision 5044: mappings/VegCore-VegBIEN.csv: verbatim* taxonomic terms: Added taxonomicnamewithauthor mappings analogous to those for the non-verbatim taxonomic terms: Aaron Marcuse-Kubitza
04:29 AM Revision 5043: mappings/VegCore.csv: Added verbatimScientificNameWithAuthorship: Aaron Marcuse-Kubitza
03:50 AM Revision 5042: Added inputs/.public/, which stores mappings that manipulate VegBIEN itself: Aaron Marcuse-Kubitza
03:49 AM Revision 5041: forwarding.Makefile: Differentiate between subdirs which can be sent a command and subdirs which will receive a command broadcast to "all" subdirs: Aaron Marcuse-Kubitza
03:39 AM Revision 5040: README.TXT: Data import: Starting column-based import: Use import_all, which now supports passing custom vars like by_col=1: Aaron Marcuse-Kubitza
03:37 AM Revision 5039: import_all: Pass any args, such as vars, through to with_all: Aaron Marcuse-Kubitza
03:35 AM Revision 5038: with_all: Support additional command-line args for the make target, such as vars: Aaron Marcuse-Kubitza
03:11 AM Revision 5037: sql_io.py: append_csv(): Check that the CSV's header matches the table's columns: Aaron Marcuse-Kubitza
03:08 AM Revision 5036: schemas/vegbien.sql: Added tnrs table to hold contents of TNRS response: Aaron Marcuse-Kubitza
02:20 AM Revision 5035: input.Makefile: Existing maps discovery: $(anyMap): Inlined patterns used because they are only used here: Aaron Marcuse-Kubitza
01:27 AM Revision 5034: schemas/vegbien.sql: taxonpath_canon_taxonpath_id_self_ref(), placepath_canon_placepath_id_self_ref(): Fixed bug where the pkey could only be prepopulated if it was not already set, in order to support UPDATE as well as INSERT statements: Aaron Marcuse-Kubitza
01:15 AM Revision 5033: schemas/vegbien.sql: taxonpath.canon_taxonpath_id, placepath.canon_placepath_id: Fixed comment describing that the special value 0 creates an automatic self-reference: Aaron Marcuse-Kubitza
01:09 AM Revision 5032: schemas/vegbien.sql: taxonpath.canon_taxonpath_id, placepath.canon_placepath_id: Added trigger to automatically create a self-reference (indicating a scrubbed name) when set to the special value 0: Aaron Marcuse-Kubitza
12:33 AM Revision 5031: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs): Aaron Marcuse-Kubitza
12:28 AM Revision 5030: bin/map, db_xml.put_table() (row-based and column-based import): Don't sort the input table by its pkey, in order to support input tables with no pkey. Note that reading the input table in table order and having this match the input flat file's order is only possible with sql_io.import_csv()'s truncation of the table on a failed import, which ensures that the rows will be stored in inserted order.: Aaron Marcuse-Kubitza
12:19 AM Revision 5029: input.Makefile: Staging tables installation: Removed no longer used $(isJoinedTable). Note that it is no longer necessary for joined tables to be suffixed with ".src" to prevent the creation of a row_num column, which collided during joins.: Aaron Marcuse-Kubitza
12:17 AM Revision 5028: csv2db: Removed no longer used has_row_num param: Aaron Marcuse-Kubitza
12:14 AM Revision 5027: sql_io.py: import_csv(): Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs): Aaron Marcuse-Kubitza

09/26/2012

11:49 PM Revision 5026: bin/map, db_xml.put_table() (row-based and column-based import): Don't sort the input table by its pkey, in order to support input tables with no pkey. Note that reading the input table in table order and having this match the input flat file's order is only possible with sql_io.import_csv()'s truncation of the table on a failed import, which ensures that the rows will be stored in inserted order.: Aaron Marcuse-Kubitza
11:34 PM Revision 5025: sql_io.py: import_csv(): Only do the import in a savepoint if using COPY FROM, to allow autocommits after each insert and thus make rows visible immediately after they are inserted: Aaron Marcuse-Kubitza
10:53 PM Revision 5024: db_xml.py: put_table(): Subsetting in_table: Add a row number column if in_table does not already have a pkey: Aaron Marcuse-Kubitza
10:48 PM Revision 5023: db_xml.py: put_table(): Subsetting in_table: Copy all of in_table's structure, rather than just the column types, by using sql.copy_table_struct() and sql.insert_select(). This preserves pkeys and NOT NULL constraints, which are useful for column-based import.: Aaron Marcuse-Kubitza
10:47 PM Revision 5022: db_xml.py: put_table(): Subsetting in_table: Create in_table as a completely new sql_gen.Table instead of copying full_in_table and relying on sql.run_query_into() to set is_temp and remove the schema: Aaron Marcuse-Kubitza
10:40 PM Revision 5021: sql.py: add_row_num(): Use if_not_exists in order to abort if the column already exists rather than adding a version #: Aaron Marcuse-Kubitza
10:36 PM Revision 5020: sql.py: add_col(): Added if_not_exists param to abort if the column already exists rather than adding a version #: Aaron Marcuse-Kubitza
10:14 PM Revision 5019: db_xml.py: put_table(): Removed no longer accurate comment that full_in_table will be shadowed (hidden) by the created temp table. (The temp table is now named differently, so the shadowing does not occur.): Aaron Marcuse-Kubitza
10:02 PM Revision 5018: db_xml.py: put_table(): Replaced no longer accurate Recurse comment with Import data. Rewrapped lines.: Aaron Marcuse-Kubitza
09:12 PM Revision 5017: sql_io.py: import_csv(): Factored insertion code out into new append_csv(): Aaron Marcuse-Kubitza
08:47 PM Revision 5016: README.TXT: Data import: `make test by_col=1`: Replaced errors explanation with pointer to updated explanation in the Testing section: Aaron Marcuse-Kubitza
08:31 PM Revision 5015: xml_func.py: Removed no longer used _name(). Use _join_words() instead.: Aaron Marcuse-Kubitza
08:30 PM Revision 5014: mappings/VegCore-VegBIEN.csv: Use new, more general _join_words() instead of _name(): Aaron Marcuse-Kubitza
08:22 PM Revision 5013: mappings/Veg+-VegCore.csv: Prefix ambiguous terms' VegCore replacement with "?" so it's visually flagged in map.csv, in the same way that unmatched terms are flagged with a "*" prefix: Aaron Marcuse-Kubitza
08:19 PM Revision 5012: mappings/VegCore-VegBIEN.csv: Taxonomic terms: Also join terms together in taxonomicnamewithauthor if scientificNameWithAuthorship is not provided, for use by TNRS: Aaron Marcuse-Kubitza
08:15 PM Revision 5011: xml_func.py: Simplifying functions: Merging: Added _join_words(): Aaron Marcuse-Kubitza
07:57 PM Revision 5010: inputs/ARIZ/Specimen/map.csv: Remapped ScientificNameAuthor to scientificNameWithAuthorship because it contains the binomial in addition to the authority: Aaron Marcuse-Kubitza
07:39 PM Revision 5009: schemas/functions.sql: Added _join_words(): Aaron Marcuse-Kubitza
07:33 PM Revision 5008: input.Makefile: Paths: $(datasrc): Remove any "." prefix from the subdir name. The "." prefix allows a subdir to be hidden from the normal import process.: Aaron Marcuse-Kubitza
06:56 PM Revision 5007: db_xml.py: put_table(): Allow caller to specify custom partition_size: Aaron Marcuse-Kubitza
06:45 PM Revision 5006: tnrs.py: tnrs_request(): Return the CSV stream directly instead of reading it into a string: Aaron Marcuse-Kubitza
06:42 PM Revision 5005: tnrs.py: tnrs_request(): Moved CSV-download-specific functionality from do_request() to the Download section: Aaron Marcuse-Kubitza
06:34 PM Revision 5004: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza

09/25/2012

11:13 PM Revision 5003: tnrs.py: tnrs_request(): Return the response instead of printing it to stdout: Aaron Marcuse-Kubitza
10:59 PM Revision 5002: schemas/py_functions.sql: _namePart(): Fixed bug where it was returning the empty string instead of NULL: Aaron Marcuse-Kubitza
10:46 PM Revision 5001: sql_io.py: import_csv(): Documented that sql.truncate() MUST be run so that the rows will be stored in inserted order, and the row_num added after import will match up with the CSV's row order: Aaron Marcuse-Kubitza
10:35 PM Revision 5000: sql.py: add_row_num(): Add distinguishing comment to ADD COLUMN statement so that it will be cached. The distinguishing comment is required because sometimes column names are truncated, leading to unwanted collisions with previously-cached ADD COLUMN statements. It provides a way of distinguishing the full column name behind a particular ADD COLUMN statement.: Aaron Marcuse-Kubitza
10:24 PM Revision 4999: sql_io.py: import_csv(): Free memory used by deleted rows from any failed import. Documented that sql.create_table() is not rolled back if the import fails, but instead is cached, and will not be re-run if the import is retried.: Aaron Marcuse-Kubitza
09:37 PM Revision 4998: sql_io.py: import_csv(): Fixed bug where the added row number column needed to be named row_num instead of _row_num to be autodetected as the pkey column (sql.pkey_col) by sql.pkey() and to avoid name collisions with the row number column added in column-based import: Aaron Marcuse-Kubitza
09:34 PM Revision 4997: sql.py: add_row_num(): Support custom row number column name: Aaron Marcuse-Kubitza
09:12 PM Revision 4996: csv2db: Use new sql_io.import_csv(): Aaron Marcuse-Kubitza
09:10 PM Revision 4995: sql_io.py: Added import_csv(): Aaron Marcuse-Kubitza
09:05 PM Revision 4994: csv2db: Don't truncate the table before loading rows because it has just been created, and is therefore empty. This statement may be left over from a time when the table was created only once, and its creation was not rolled back if the import fails.: Aaron Marcuse-Kubitza
08:44 PM Revision 4993: sql_io.py: cleanup_table(): Print 'Cleaning up table' log message: Aaron Marcuse-Kubitza
08:41 PM Revision 4992: sql_io.py: cleanup_table(): Also vacuum and reanalyze table: Aaron Marcuse-Kubitza
07:43 PM Revision 4991: tnrs_client: Use new tnrs.tnrs_request(): Aaron Marcuse-Kubitza
07:43 PM Revision 4990: Added tnrs.py: Aaron Marcuse-Kubitza
07:34 PM Revision 4989: tnrs_client: Factored TNRS request code into separate function tnrs_request(): Aaron Marcuse-Kubitza
07:23 PM Revision 4988: inputs/VegBank/taxonimportance/map.csv: Documented that taxonimportance is not 1:1 with taxonobservation: Aaron Marcuse-Kubitza
07:22 PM Revision 4987: mappings/VegCore-VegBIEN.csv: Removed unnecessary /_first/# suffix for multiple terms in the same _exists expression, because _exists() only checks whether its node is non-empty, and it does not matter how many child nodes it contains: Aaron Marcuse-Kubitza
06:57 PM Revision 4986: schemas/vegbien.sql: taxonoccurrence: taxonoccurrence_unique_within_locationevent unique index: Fixed bug where locationevent_id needed to be enclosed in COALESCE(..., 2147483647) so that the unique constraint also applies to rows with NULL locationevent_ids (there is no other unique constraint handling these rows): Aaron Marcuse-Kubitza
06:52 PM Revision 4985: README.TXT: Documented that if the row-based and column-based imports produce different inserted row counts, this usually means that a table is underconstrained (the unique indexes don't cover all possible rows). The inserted row count difference occurs because column-based import collapses empty table rows into one insert, while row-based import performs an insert of the empty row for each input row. Without a unique index to combine multiple row-based inserts, extra rows will be added.: Aaron Marcuse-Kubitza
06:48 PM Revision 4984: sql_io.py: put_table(): Warn if inserting empty table rows: Aaron Marcuse-Kubitza
06:13 PM Revision 4983: schemas/py_functions.sql: _namePart(): Fixed bug where it was returning the empty string instead of NULL: Aaron Marcuse-Kubitza
05:57 PM Revision 4982: schemas/functions.sql, py_functions.sql: Added schema comment that functions must always return NULL in place of the empty string, to ensure that empty strings do not find their way into VegBIEN. Note that row-based import automatically removes empty strings because the intermediate values are stored in XML and our XML DOM traversing code auto-replaces the empty string with NULL. Column-based import, on the other hand, does not, because the intermediate data is stored in database temp tables instead of a DOM tree.: Aaron Marcuse-Kubitza
05:31 PM Revision 4981: root map: Fixed custom public schema override to work with schemas lists that include public, by replacing public with the new public schema instead of just appending it: Aaron Marcuse-Kubitza
04:53 PM Revision 4980: inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.: Aaron Marcuse-Kubitza
04:52 PM Revision 4979: inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.: Aaron Marcuse-Kubitza
04:36 PM Revision 4978: ins_col: Added column fill value param: Aaron Marcuse-Kubitza
04:16 PM Revision 4977: inputs/VegBank/stemcount/map.csv: Fixed bug where taxonimportance_id needed to point to aggregateOccurrenceID instead of taxonOccurrenceID: Aaron Marcuse-Kubitza
04:15 PM Revision 4976: mappings/VegCore-VegBIEN.csv: Don't forward individualID to taxonoccurrence.sourceaccessioncode when aggregateOccurrenceID is present: Aaron Marcuse-Kubitza
03:52 PM Revision 4975: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza

09/24/2012

06:45 PM Revision 4974: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
06:33 PM Revision 4973: schemas/vegbien.sql: placepath.otherranks comment: Added analogous text from taxonpath.otherranks: Aaron Marcuse-Kubitza
06:31 PM Revision 4972: schemas/vegbien.sql: taxonpath.author comment: Added equivalent Darwin Core term: Aaron Marcuse-Kubitza
06:27 PM Revision 4971: schemas/vegbien.sql: taxon columns: Added descriptive comments for data dictionary: Aaron Marcuse-Kubitza
06:15 PM Revision 4970: schemas/vegbien.sql: placepath: Added canon_placepath_id, analogous to taxonpath.canon_taxonpath_id: Aaron Marcuse-Kubitza
06:09 PM Revision 4969: schemas/vegbien.sql: place, placepath descriptive comments: Added analogous text from taxon/taxonpath: Aaron Marcuse-Kubitza
06:05 PM Revision 4968: schemas/vegbien.sql: taxonpath: descriptive comment: Changed "applicable taxon" to "identified taxon": Aaron Marcuse-Kubitza
05:58 PM Revision 4967: schemas/vegbien.sql: taxon: descriptive comment: Reworded to emphasize that this stores only one rank (e.g. family) of the full taxonomic name, in contrast to taxonpath, which stores all of them: Aaron Marcuse-Kubitza
05:54 PM Revision 4966: schemas/vegbien.sql: taxonpath: descriptive comment: Clarified that this is the full path to a taxon, including all components of the taxonomic name: Aaron Marcuse-Kubitza
05:48 PM Revision 4965: schemas/vegbien.sql: Replaced "scientific name" with "taxonomic name" for schema-wide consistency and for consistency with the taxon/taxonomic name vocabulary: Aaron Marcuse-Kubitza
05:38 PM Revision 4964: schemas/vegbien.sql: taxonpath named ranks: Added descriptive comments for data dictionary: Aaron Marcuse-Kubitza
05:34 PM Revision 4963: schemas/vegbien.sql: taxonpath columns other than named ranks: Added descriptive comments for data dictionary: Aaron Marcuse-Kubitza
05:14 PM Revision 4962: schemas/vegbien.sql: taxonscope: descriptive comment: Reworded to make the first sentence a noun, for consistency with other descriptive table comments: Aaron Marcuse-Kubitza
05:13 PM Revision 4961: schemas/vegbien.sql: taxon: descriptive comment: Added note that the taxonname stores only one rank (e.g. family) of the full identifying name: Aaron Marcuse-Kubitza
05:07 PM Revision 4960: schemas/vegbien.sql: taxonpath: descriptive comment: Reworded to make the first sentence a noun, for consistency with other descriptive table comments. The convention is for the first "sentence" to be a noun which describes the entity that the table models.: Aaron Marcuse-Kubitza
05:00 PM Revision 4959: schemas/vegbien.sql: comments: Removed units from comments on fields which already have a units suffix, to avoid having to keep the units in sync between the suffix and the comment. Note that the units were abbreviated equally in the suffixes and comments, so this did not result in a loss of information other than the ^ for a quantity squared (but it's obvious enough that m2 is m^2).: Aaron Marcuse-Kubitza
04:54 PM Revision 4958: schemas/vegbien.sql: taxonscope: descriptive comment: Added period for consistency with other descriptive table comments: Aaron Marcuse-Kubitza
04:50 PM Revision 4957: schemas/vegbien.sql: taxon: Added descriptive comment for data dictionary: Aaron Marcuse-Kubitza
04:48 PM Revision 4956: schemas/vegbien.sql: VegBank-equivalent tables comments: Prepended "Equivalent to" before VegBank, so the equivalent tables statement can fit grammatically after a description of the table instead of having to be the first phrase in the descriptive table comment: Aaron Marcuse-Kubitza
04:41 PM Revision 4955: schemas/vegbien.sql: taxon: VegBank-equivalent tables comment: Added plantName and applicable columns from plantStatus, which are also part of the taxon table: Aaron Marcuse-Kubitza
04:37 PM Revision 4954: schemas/vegbien.sql: placepath: Added otherranks field, analogous to taxonpath.otherranks: Aaron Marcuse-Kubitza
04:26 PM Revision 4953: schemas/vegbien.sql: taxonpath: Added descriptive comment for data dictionary: Aaron Marcuse-Kubitza
03:36 PM Revision 4952: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
02:58 PM Revision 4951: inputs/UNCC/Specimen/map.csv: accession: Documented that it's globally unique, although occasionally duplicated: Aaron Marcuse-Kubitza
02:54 PM Revision 4950: inputs/REMIB/Specimen/map.csv: Remapped accession_number to catalogNumber because it is not globally unique, only (usually) unique within the institution providing the data ("acronym"). Note that there are nevertheless 11,869 rows where an accession_number appears multiple times within the same institution.: Aaron Marcuse-Kubitza
02:45 PM Revision 4949: mappings/VegCore-VegBIEN.csv: Only use institutionCode+collectionCode+catalogNumber as the authorlocationcode (location-scoping ID) if there is actually a catalogNumber. Otherwise, the mapping process would attempt to create one location for each collection in the datasource, when there should be one location for each specimen.: Aaron Marcuse-Kubitza
02:36 PM Revision 4948: schemas/py_functions.sql: _namePart(): Slice the first name from the beginning of the string to one word before the end, instead of one after the beginning, in order to avoid overlap with the last name, which starts one before the end, when there is only one word. Note that only one word means the name is assumed to be a last name. This assumption may not always be true, but when a datasource provides the name concatenated, an assumption must be made when not all name components are present.: Aaron Marcuse-Kubitza
02:30 PM Revision 4947: schemas/vegbien.sql: party: Added check constraint to require at least an organizationname or surname. Previously, NULL entries for the collector or identifier incorrectly caused the creation of an empty party entry, hence the lower inserted row counts now that this is no longer created.: Aaron Marcuse-Kubitza
02:17 PM Revision 4946: inputs/REMIB/Specimen/map.csv: Remapped acronym to institutionCode because this is an aggregator, and the field lists the datasource each record was aggregated from. Note that the inserted row count changes because of different duplicate elimination strategies in specimenreplicate and party (which institutionCode is placed in).: Aaron Marcuse-Kubitza
02:11 PM Revision 4945: inputs/REMIB/Specimen/create.sql: Also filter out rows where acronym (collectionCode) is NULL because this is a required field for valid records: Aaron Marcuse-Kubitza
01:28 PM Revision 4944: schemas/vegbien.sql: taxonpath: Renamed scientificnameauthor to author so the column name doesn't have "scientificname" in it, which made the term look confusingly like scientificname itself. Added descriptive comment that this is the author of the scientific name.: Aaron Marcuse-Kubitza
01:19 PM Revision 4943: schemas/vegbien.sql: taxonpath: Renamed canon_id to canon_taxonpath_id to clarify that this is a recursive fkey. The convention is that a recursive fkey includes the table name plus a descriptive prefix.: Aaron Marcuse-Kubitza
01:14 PM Revision 4942: schemas/filter_ERD.csv: Don't filter out fkeys from taxonpath to itself: Aaron Marcuse-Kubitza
01:04 PM Task #501 (Resolved): find out which datasources won't allow their data to be publicly accessible: * needed before we can make VegBIEN public
These datasources are:
* "REMIB":http://www.conabio.gob.mx/remib/cgi... Aaron Marcuse-Kubitza
01:02 PM Task #500 (New): when lower rank has name concatenated together, use lowest rank as the scientific name: Aaron Marcuse-Kubitza
12:57 PM Task #499 (Resolved): map example terms into the taxonomic schema: Aaron Marcuse-Kubitza
12:57 PM Task #498 (Resolved): add definitions to columns in "green tables": Aaron Marcuse-Kubitza
12:57 PM Task #497 (Resolved): create examples of taxonomic names to test the limits of the new taxonomic schema: * need types of morphospecies indicators Aaron Marcuse-Kubitza
11:32 AM Revision 4941: schemas/vegbien.sql: taxonpath: Added canon_id for the canonical (scrubbed) taxonpath determined by TNRS: Aaron Marcuse-Kubitza
11:24 AM Revision 4940: schemas/vegbien.sql: taxonpath: taxonpath_unique_within_datasource_by_name unique index: Added otherranks, so that ranks without a named column will be used in uniquely identifying the taxonpath: Aaron Marcuse-Kubitza
11:22 AM Revision 4939: sql.py: DbConn.col_info(): Parse array types as sql_gen.ArrayType: Aaron Marcuse-Kubitza
11:22 AM Revision 4938: sql_gen.py: EnsureNotNull: Support ArrayType types: Aaron Marcuse-Kubitza
11:21 AM Revision 4937: strings.py: remove_prefix(), remove_suffix(): Added require param to raise aan exception if the string does not have the given prefix/suffix: Aaron Marcuse-Kubitza
11:06 AM Revision 4936: sql.py: DbConn.col_info(): Moved parsing of user-defined datatypes to Python code, so that parsing for other composite types which also requires both data_type and udt_name can easily be added: Aaron Marcuse-Kubitza
11:03 AM Revision 4935: sql_gen.py: Added ArrayType: Aaron Marcuse-Kubitza
10:29 AM Revision 4934: schemas/vegbien.sql: Scope taxonpath instead of taxon with taxonscope, because a morphospecies name is specific to a datasource entity, so it should go in the datasource-specific taxonpath table instead of the datasource-general taxon table: Aaron Marcuse-Kubitza
10:14 AM Revision 4933: schemas/vegbien.sql: taxonpath: Added otherranks array column to store ranked names without a named column. Documented that ranks with no named column should be stored in this new field instead of in a chain of taxons pointed to by taxon_id. This ensures that only the tree of life uses the taxon table.: Aaron Marcuse-Kubitza
09:47 AM Revision 4932: schemas/vegbien.sql: Removed no longer used table stemtag, which has been replaced by stemobservation.tag, stemobservation.tags: Aaron Marcuse-Kubitza

09/21/2012

04:28 PM Revision 4931: inputs/ARIZ/Specimen/test.xml.ref: Updated after reinstalling staging table with new sql_io.null_strs: Aaron Marcuse-Kubitza
04:22 PM Revision 4930: inputs/VegBank/: Added stemlocation/: Aaron Marcuse-Kubitza
04:17 PM Revision 4929: inputs/VegBank/: Added stemcount/: Aaron Marcuse-Kubitza
04:10 PM Revision 4928: sql_io.py: cleanup_table(): Fixed bug where couldn't run any update statement when no columns are text: Aaron Marcuse-Kubitza
03:57 PM Revision 4927: csv2db: COPY FROM mode: Removed no longer needed explicit column list, now that the initial table has the exact width of the CSV (the row_num is added later): Aaron Marcuse-Kubitza
03:55 PM Revision 4926: csv2db: Add any row_num column after creating the table, so it does not interfere with row widths when using COPY FROM without explicit column names: Aaron Marcuse-Kubitza
03:48 PM Revision 4925: csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.: Aaron Marcuse-Kubitza
03:44 PM Revision 4924: csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.: Aaron Marcuse-Kubitza
03:28 PM Revision 4923: inputs/VegBank/: Added taxonimportance/: Aaron Marcuse-Kubitza
03:20 PM Revision 4922: mappings/VegCore.csv: Added and mapped aggregateOccurrenceID: Aaron Marcuse-Kubitza
03:12 PM Revision 4921: mappings/VegCore.csv: taxonOccurrenceID: Re-sourced to VegBank taxonobservation and DwC occurrenceID, because this is where the VegBIEN table name came from: Aaron Marcuse-Kubitza
02:57 PM Revision 4920: tnrs_client: Support parsing multiple taxons at once, by specifying each as a command-line argument. Increased the max_pause to 10 min to support large batches. Limited the batch size to 5000 names, using the limit at <http://tnrs.iplantcollaborative.org/TNRSapp.html>. Note that when using xargs to pass many names, xargs will by default split its arguments into chunks of 5000. You can change this using the -n option.: Aaron Marcuse-Kubitza
02:29 PM Revision 4919: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
01:20 PM Revision 4918: Added tnrs_client. Note that obtaining an actual CSV requires four (!) steps: submit, retrieve, prepare download, and download. The output of the retrieve step is unusable because the array has different lengths depending on the taxonomic ranks present in the provided taxon name. This initial version runs one name at a time, but could later be expanded to batch process because TNRS can run multiple names at once.: Aaron Marcuse-Kubitza
12:36 PM Revision 4917: streams.py: Line iteration: Added read_all(): Aaron Marcuse-Kubitza
08:24 AM Revision 4916: inputs/Madidi/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values: Aaron Marcuse-Kubitza
08:18 AM Revision 4915: sql_io.py: null_strs: Added '-': Aaron Marcuse-Kubitza
08:18 AM Revision 4914: sql_io.py: cleanup_table(): Fixed bug where each column name needed to be converted to Unicode before being concatenated with other strings, to support non-ASCII characters: Aaron Marcuse-Kubitza
07:57 AM Revision 4913: inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values: Aaron Marcuse-Kubitza
07:52 AM Revision 4912: inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Removed no longer needed old-style _units filter, now that unit conversion is handled by mappings/VegCore-VegBIEN.csv using _percent_to_fraction: Aaron Marcuse-Kubitza
07:48 AM Revision 4911: inputs/VegBank/observation_/map.csv: soilObs fields: Cited data dictionary source of units: Aaron Marcuse-Kubitza
07:15 AM Revision 4910: mappings/Veg+-VegCore.csv: Soil component measurements: Added unitless terms that automap to all alternatives of units: Aaron Marcuse-Kubitza
07:08 AM Revision 4909: mappings/VegCore.csv: Added term with *_fraction units for every *_percent term: Aaron Marcuse-Kubitza
07:03 AM Revision 4908: mappings/VegCore.csv: Soil component measurements: Added default units of percent (cmol_kg for cationExchangeCapacity). This involves translating the names everywhere and adding a _percent_to_fraction conversion in mappings/VegCore-VegBIEN.csv.: Aaron Marcuse-Kubitza

09/20/2012

11:15 PM Revision 4907: mappings/VegCore-VegBIEN.csv: Remapped verbatimLatitude/Longitude to locationcoords.verbatimlatitude/longitude because these fields now contain only non-decimal coordinates. This involves removing the _alt suffix on decimalLatitude/Longitude, which causes the VegBIEN.csvs to change.: Aaron Marcuse-Kubitza
11:11 PM Revision 4906: inputs/*/*/map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees: Aaron Marcuse-Kubitza
11:06 PM Revision 4905: inputs/*/*/map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees: Aaron Marcuse-Kubitza
10:54 PM Revision 4904: inputs/SpeciesLink/Specimen/map.csv: Documented that dwc_geospatial_VerbatimLatitude/Longitude contain a mix of DMS and other verbatim coordinates: Aaron Marcuse-Kubitza
10:47 PM Revision 4903: inputs/QMOR/Specimen/map.csv: Remapped verbatimLatitude/verbatimLongitude to latitude_DMS/longitude_DMS since these fields contain DMS values: Aaron Marcuse-Kubitza
10:43 PM Revision 4902: inputs/Madidi/Plot/map.csv: Remapped Latitude/Longitude (DMS) to new latitude_DMS/longitude_DMS: Aaron Marcuse-Kubitza
10:41 PM Revision 4901: mappings/VegCore-VegBIEN.csv: Mapped latitude_DMS, longitude_DMS: Aaron Marcuse-Kubitza
10:38 PM Revision 4900: mappings/VegCore.csv: Added latitude_DMS, longitude_DMS: Aaron Marcuse-Kubitza
10:34 PM Revision 4899: inputs/REMIB/Specimen/map.csv: Remapped lat_deg/long_deg to decimalLatitude/Longitude because these values are (integer) degrees suitable for decimalLatitude/Longitude. Note that the other DMS fields are not yet translated to decimal degrees.: Aaron Marcuse-Kubitza
10:28 PM Revision 4898: mappings/Veg+-VegCore.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees: Aaron Marcuse-Kubitza
10:26 PM Revision 4897: mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".: Aaron Marcuse-Kubitza
10:23 PM Revision 4896: input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Don't automatically regenerate the aggregated unmapped_terms.csv, new_terms.csv because this almost doubles the remake time when a mappings/ prerequisite changes (41s -> 75s): Aaron Marcuse-Kubitza
10:14 PM Revision 4895: mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".: Aaron Marcuse-Kubitza
10:09 PM Revision 4894: inputs/GBIF/Specimen/map.csv: Remapped VerbatimLatitude/Longitude to decimalLatitude/Longitude because DecimalLatitude/Longitude just contains VerbatimLatitude/Longitude cast to a low-resolution float, which created spurious repeating decimals: Aaron Marcuse-Kubitza
09:56 PM Revision 4893: mappings/Makefile: .VegCore-VegBIEN.csv.last_cleanup: Generate VegCore-VegBIEN.unsourced_terms.csv whenever VegCore-VegBIEN.csv changes, to track VegCore terms that are mapped to VegBIEN but not documented in VegCore.csv. Note that this file is *not* svn:ignored, so it will show up with a ? when the user runs `svn st` if there are any unsourced terms.: Aaron Marcuse-Kubitza
09:47 PM Revision 4892: mappings/Makefile: Changed catch-all `.%.last_cleanup: %` target to a specific target for VegCore-VegBIEN.csv, because it's the only file that uses this target: Aaron Marcuse-Kubitza
09:45 PM Revision 4891: mappings/: Don't generate a for_review version of Veg+-VegCore.csv, because it is identical to the machine-readable Veg+-VegCore.csv (there are no output XPaths to simplify): Aaron Marcuse-Kubitza
09:41 PM Revision 4890: mappings/: Don't generate a for_review version of VegX-VegCore.csv, because it is identical to the machine-readable VegX-VegCore.csv (there are no output XPaths to simplify): Aaron Marcuse-Kubitza
09:37 PM Revision 4889: mappings/: Removed Veg+.unmapped_terms.csv because these terms are found in each datasource's new_terms.csv, which are updated regularly, while this file isn't, and which exist for every datasource, while this file only contained terms from a few datasources: Aaron Marcuse-Kubitza
09:29 PM Revision 4888: inputs/ARIZ/Specimen/map.csv: Remapped VerbatimLatitude, VerbatimLongitude to UNUSED: Aaron Marcuse-Kubitza
09:21 PM Revision 4887: Regenerated root unmapped_terms.csv, new_terms.csv: Aaron Marcuse-Kubitza
09:19 PM Revision 4886: lib/mappings.Makefile: unmapped_terms.csv, new_terms.csv: Only remake if newer than existing %/unmapped_terms.csv, %/new_terms.csv which haven't been autoremoved. This avoids always remaking every unmapped_terms.csv, new_terms.csv whenever `make missing_mappings` is run. Note that these files will automatically be remade whenever their corresponding map.csv changes, so it is not necessary to actually remake %/unmapped_terms.csv, %/new_terms.csv; they are prerequisites only so that their modification time may be checked to determine whether unmapped_terms.csv, new_terms.csv needs to be remade.: Aaron Marcuse-Kubitza
09:11 PM Revision 4885: input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Automatically regenerate aggregated unmapped_terms.csv, new_terms.csv when a subdir's corresponding file changes: Aaron Marcuse-Kubitza
09:10 PM Revision 4884: inputs/: Regenerated aggregated unmapped_terms.csv, new_terms.csv: Aaron Marcuse-Kubitza
08:58 PM Revision 4883: inputs/REMIB/: Moved nodes.make into Specimen.src/ so it's with the data it generates: Aaron Marcuse-Kubitza
08:55 PM Revision 4882: inputs/TEAM/: Regenerated */new_terms.csv: Aaron Marcuse-Kubitza
08:30 PM Revision 4881: inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.: Aaron Marcuse-Kubitza
08:28 PM Revision 4880: inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.: Aaron Marcuse-Kubitza
06:58 PM Revision 4879: inputs/TEAM/VL, VT: Split concatenated flat files apart into separate parts each time a header is duplicated, so that the header would be autoremoved by cat_csv. Changed modified BIEN2 flat file headers back to original headers (the duplicated headers) so the headers of all part files would match up. (This is required for cat_csv header autoremoval to work properly.) This results in changes to the input column names in */map.csv.: Aaron Marcuse-Kubitza
06:49 PM Revision 4878: sql_io.py: null_strs: Added 'nulo' (used by REMIB): Aaron Marcuse-Kubitza
06:13 PM Revision 4877: mappings/Veg+-VegCore.csv: DBH: Removed diameterBreastHeight_m alternative because datasources that don't append units to DBH almost always have units of cm or in: Aaron Marcuse-Kubitza
06:11 PM Revision 4876: inputs/TEAM/*/map.csv: Remapped dbh from diameterBreastHeight_m to diameterBreastHeight_cm, using the units defined in Vegetation-Metadata-1.4.pdf: Aaron Marcuse-Kubitza
06:05 PM Revision 4875: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza

09/19/2012

11:16 PM Revision 4874: inputs/TEAM/: Added TeamPlotMetaData: Aaron Marcuse-Kubitza
11:09 PM Revision 4873: inputs/TEAM/_src/: Added ci-team_extract/Vegetation-Metadata-1.4.pdf and symlink to it in the _src subdir: Aaron Marcuse-Kubitza
10:51 PM Revision 4872: inputs/: Added aggregated unmapped_terms.csv, new_terms.csv which were not already under version control: Aaron Marcuse-Kubitza
10:41 PM Revision 4871: inputs/SALVIAS-CSV/Organism/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension: Aaron Marcuse-Kubitza
10:36 PM Revision 4870: inputs/SALVIAS/stems/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for plotObservations.intercept_cm, which measures the same dimension: Aaron Marcuse-Kubitza
10:33 PM Revision 4869: inputs/SALVIAS/plotObservations/map.csv: Remapped temp_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension: Aaron Marcuse-Kubitza
10:25 PM Revision 4868: inputs/Madidi/Organism/map.csv: Remapped Diameter from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the range and precision of values: Aaron Marcuse-Kubitza
10:23 PM Revision 4867: inputs/FIA/Organism/map.csv: DBH: Changed units comment to include that assumption was also based on location inside the U.S., because some data outside the U.S. also uses fractional DBHs, but these are not likely to be inch measurements: Aaron Marcuse-Kubitza
10:19 PM Revision 4866: inputs/FIA/Organism/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_in, assuming units based on the range and precision of values: Aaron Marcuse-Kubitza
10:16 PM Revision 4865: inputs/CTFS/StemObservation/map.csv: DBH: Changed units comment to include that assumption was also based on the precision of values, because fractional DBHs sometimes indicate units of inches: Aaron Marcuse-Kubitza
10:13 PM Revision 4864: mappings/VegCore.csv: Added diameterBreastHeight_in: Aaron Marcuse-Kubitza
10:09 PM Revision 4863: schemas/functions.sql: Added _in_to_m(): Aaron Marcuse-Kubitza
10:00 PM Revision 4862: mappings/Veg+-VegCore.csv: Remapped DBH from no longer existing term diameterBreastHeight to diameterBreastHeight_cm, diameterBreastHeight_m (both terms will be listed in the map spreadsheet after automapping, and the user can then choose one): Aaron Marcuse-Kubitza
09:57 PM Revision 4861: inputs/CTFS/StemObservation/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units are cm based on the range of values: Aaron Marcuse-Kubitza
09:56 PM Revision 4860: mappings/VegCore.csv: Added diameterBreastHeight_cm: Aaron Marcuse-Kubitza
09:41 PM Revision 4859: mappings/VegCore.csv: Added stemID, which was only in mappings/VegCore-VegBIEN.csv: Aaron Marcuse-Kubitza
09:35 PM Revision 4858: input.Makefile: Maps validation: Inline $(unmappedTerms) because it's only used once: Aaron Marcuse-Kubitza
09:31 PM Revision 4857: input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.: Aaron Marcuse-Kubitza
09:19 PM Revision 4856: inputs/: Added unmapped_terms.csv, new_terms.csv which were not already under version control: Aaron Marcuse-Kubitza
08:43 PM Revision 4855: inputs/VegBank/plot_/: Automapped with new parentPlotID term, which now has a join mapping in mappings/VegCore-VegBIEN.csv: Aaron Marcuse-Kubitza
08:41 PM Revision 4854: Regenerated unmapped_terms.csv, new_terms.csv: Aaron Marcuse-Kubitza
08:24 PM Revision 4853: mappings/Veg+-VegCore.csv: Added parentPlotID: Aaron Marcuse-Kubitza
08:22 PM Revision 4852: mappings/VegCore-VegBIEN.csv: Added parentLocationID, parentPlotName, which always map directly to the parent location, regardless of whether any subplot ID is present: Aaron Marcuse-Kubitza
08:16 PM Revision 4851: mappings/Veg+.unmapped_terms.csv: Removed vague term volumeCanopy, which has no definition in VegX: Aaron Marcuse-Kubitza
08:14 PM Revision 4850: mappings/Makefile: .VegCore.csv.last_cleanup: Fixed bug where needed to change sorting columns to match new column order: Aaron Marcuse-Kubitza
08:11 PM Revision 4849: mappings/VegCore.csv: Reordered columns to put Comments first, which matches mappings/Veg+-VegCore.csv: Aaron Marcuse-Kubitza
08:08 PM Revision 4848: mappings/Veg+-VegCore.csv: Removed redundant stem_id->stemID mapping: Aaron Marcuse-Kubitza
08:07 PM Revision 4847: mappings/Veg+-VegCore.csv: Standardized the capitalization of names, by camel-casing each name except for acronyms and "ID", which are made all uppercase: Aaron Marcuse-Kubitza
07:59 PM Revision 4846: mappings/VegCore.csv: Renamed diameterBreastHeight to diameterBreastHeight_m to assert units matching the VegBIEN field: Aaron Marcuse-Kubitza
07:44 PM Revision 4845: mappings/VegCore.csv: Removed duplicates: Aaron Marcuse-Kubitza
07:22 PM Revision 4844: input.Makefile: Maps building: Use new mappings/VegCore.csv as the VegCore vocabulary to canonicalize on, in order to also canonicalize VegCore terms which are not yet mapped to VegBIEN. This results in several DwC terms getting their case standardized according to http://rs.tdwg.org/dwc/terms/. Continue to determine unmapped terms using mappings/VegCore-VegBIEN.csv, because a term should not be considered mapped until it has been mapped all the way through to VegBIEN.: Aaron Marcuse-Kubitza
07:12 PM Revision 4843: mappings/VegCore.csv: Removed trailing spaces from terms: Aaron Marcuse-Kubitza
07:05 PM Revision 4842: mappings/Veg+.unmapped_terms.csv: Removed duplicates of VegCore terms: Aaron Marcuse-Kubitza
07:02 PM Revision 4841: mappings/: Split Veg+.terms.csv into VegCore.csv and Veg+.unmapped_terms.csv: Aaron Marcuse-Kubitza
06:36 PM Revision 4840: mappings/Veg+.terms.csv: Removed terms that are in mappings/Veg+-VegCore.csv: Aaron Marcuse-Kubitza
06:31 PM Revision 4839: mappings/Veg+-VegCore.csv: Added sources where missing: Aaron Marcuse-Kubitza
06:20 PM Revision 4838: mappings/Veg+-VegCore.csv: Added Source and Comments columns from mappings/Veg+.terms.csv. Reordered columns to put Comments first.: Aaron Marcuse-Kubitza
06:17 PM Revision 4837: mappings/Veg+.terms.csv: Removed duplicate entries for stem_id/stemID, collector: Aaron Marcuse-Kubitza
05:56 PM Revision 4836: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
05:24 PM Revision 4835: inputs/REMIB/Specimen/: Filter out invalid, frameshifted rows so they don't produce errors in the import or anomalies like thousands of taxondeterminations for one taxonoccurrence. This involves moving the CSVs to Specimen.src and using a create.sql to create the filtered table.: Aaron Marcuse-Kubitza
04:47 PM Revision 4834: mappings/VegCore-VegBIEN.csv: Forward occurrenceID to taxonoccurrence.sourceaccessioncode when there is no other taxonoccurrence.sourceaccessioncode, to ensure that taxonoccurrence is uniquely identified so that there is one taxonoccurrence per organism: Aaron Marcuse-Kubitza
04:16 PM Revision 4833: mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.): Aaron Marcuse-Kubitza
04:07 PM Revision 4832: mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.: Aaron Marcuse-Kubitza
03:36 PM Revision 4831: input.Makefile: Testing: %/test.by_col.xml: Do abort tester if by-column test fails. There are no longer small rowcount differences between row-based and column-based import on some datasources, so this is now possible.: Aaron Marcuse-Kubitza

09/18/2012

11:13 PM Revision 4830: schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Added tag so that a stemobservation can be scoped by its tag when no other ID is specified: Aaron Marcuse-Kubitza
11:11 PM Revision 4829: schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Fixed bug where filter condition underconstrained stemobservation when neither sourceaccessioncode nor authorstemcode was specified, by making sure that at least one *_unique index always applies: Aaron Marcuse-Kubitza
11:08 PM Revision 4828: mappings/VegCore-VegBIEN.csv: Remapped tag to new stemobservation.tag: Aaron Marcuse-Kubitza
11:06 PM Revision 4827: schemas/vegbien.sql: stemobservation: Added tag, tags: Aaron Marcuse-Kubitza
10:53 PM Revision 4826: mappings/VegCore-VegBIEN.csv: tag: Removed no longer applicable comment: Aaron Marcuse-Kubitza
10:49 PM Revision 4825: mappings/VegCore-VegBIEN.csv: Removed no longer used previousTag and the complex mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. tag: Removed iscurrent=true because there is now only one tag field.: Aaron Marcuse-Kubitza
10:41 PM Revision 4824: inputs/SALVIAS/*/map.csv: Remapped all versions of stem and tree tags to tag, with the second tag superceding the first, to avoid the complex VegCore-VegBIEN mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. inputs/SALVIAS-CSV/Organism/map.csv: stem and tree tags: Made the stem tag supercede the tree tag instead of vice versa, to have as specific of a tag as possible.: Aaron Marcuse-Kubitza
10:30 PM Revision 4823: inputs/SALVIAS/stems/map.csv: Copied Brad's comments on plotObservations.tag1, tag2 to stem_tag1, stem_tag2: Aaron Marcuse-Kubitza
10:18 PM Revision 4822: mappings/VegCore-VegBIEN.csv: Removed _rangeStart and _rangeEnd filters from fields which should contain decimal values. These filters should be added on a per-datasource basis instead.: Aaron Marcuse-Kubitza
10:12 PM Revision 4821: inputs/ARIZ/Specimen/map.csv: Documented that MinimumElevationInMeters, MinimumElevationInMeters contain some verbatim values, including ranges and units: Aaron Marcuse-Kubitza
10:09 PM Revision 4820: mappings/VegCore-VegBIEN.csv: Removed /_units:[default=m,to=m,to=]/value filter from fields. It should be added on a per-datasource basis instead.: Aaron Marcuse-Kubitza
10:05 PM Revision 4819: mappings/VegCore-VegBIEN.csv: Removed /_replace:["\bca\.?"=]/value filter from fields. It should be added on a per-datasource basis instead.: Aaron Marcuse-Kubitza
09:36 PM Revision 4818: mappings/VegCore-VegBIEN.csv: verbatimElevation->elevation_m mapping: Translate units automatically (currently only works in row-based mode). Don't remove any "ca." prefix because this is a datasource-specific filter that does not apply to current datasources with verbatimElevation. Also map verbatimElevation to location.verbatimelevation.: Aaron Marcuse-Kubitza
09:21 PM Revision 4817: inputs/NCU-NCSC/Specimen/map.csv: Elevation: Removed comment that it includes units, because this is now part of the definition of verbatimElevation: Aaron Marcuse-Kubitza
09:20 PM Revision 4816: mappings/Veg+.terms.csv: Documented that verbatimElevation must include units: Aaron Marcuse-Kubitza
09:14 PM Revision 4815: inputs/ARIZ/Specimen/map.csv: Remapped VerbatimElevation to UNUSED: Aaron Marcuse-Kubitza
09:11 PM Revision 4814: inputs/*/*/map.csv: Remapped all unused terms to special value UNUSED. Remapped all private terms to special value PRIVATE. Remapped all deliberately unmapped terms to special value OMIT.: Aaron Marcuse-Kubitza
08:53 PM Revision 4813: mappings/Veg+-VegCore.csv: Remapped realLatitude, realLongitude to new special value PRIVATE, which is more specific than OMIT: Aaron Marcuse-Kubitza
08:51 PM Revision 4812: mappings/Veg+.terms.csv: Added special value PRIVATE: Aaron Marcuse-Kubitza
08:44 PM Revision 4811: mappings/Veg+.terms.csv: Added special values OMIT, UNUSED: Aaron Marcuse-Kubitza
08:20 PM Revision 4810: inputs/VegBank/plot_/map.csv: Remapped elevation from verbatimElevation to elevationInMeters, since the values are all decimals. The units come from the data dictionary.: Aaron Marcuse-Kubitza
08:14 PM Revision 4809: inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Remapped elev_m from verbatimElevation to elevationInMeters, since the values are all decimals. Note that the units of SALVIAS Elev were provided by a comment from Brad (and can also be assumed to be the same as SALVIAS-CSV elev_m).: Aaron Marcuse-Kubitza
08:02 PM Revision 4808: inputs/NCU-NCSC/Specimen/map.csv: Documented that Elevation includes units: Aaron Marcuse-Kubitza
07:50 PM Revision 4807: inputs/Madidi/Plot/map.csv: Remapped Minimum altitude from minimumElevationInMeters to verbatimElevation_m, since it is a range, not a minimum. Note that the units are assumed based on the range of values present and the region the data is from (Madidi National Park).: Aaron Marcuse-Kubitza
07:46 PM Revision 4806: mappings/VegCore-VegBIEN.csv: Also mapped verbatimElevation_m to verbatimelevation: Aaron Marcuse-Kubitza
07:44 PM Revision 4805: mappings/VegCore-VegBIEN.csv: Also mapped verbatimElevation_m to elevationrange_m: Aaron Marcuse-Kubitza
07:38 PM Revision 4804: mappings/VegCore-VegBIEN.csv: Mapped verbatimElevation_m: Aaron Marcuse-Kubitza
07:31 PM Revision 4803: mappings/Veg+.terms.csv: Added verbatimElevation_m: Aaron Marcuse-Kubitza
07:28 PM Revision 4802: mappings/Veg+-VegCore.csv: Mapped realLatitude, realLongitude to OMIT because private data should not be placed in a public database: Aaron Marcuse-Kubitza
07:26 PM Revision 4801: mappings/Veg+.terms.csv: Added realLatitude, realLongitude: Aaron Marcuse-Kubitza
07:23 PM Revision 4800: inputs/VegBank/plot_/map.csv: Documented that elevationrange is unused: Aaron Marcuse-Kubitza
07:13 PM Revision 4799: inputs/Madidi/Plot/map.csv: Fixed comments on Direction and Orientación/exposicion so each comment refers to the other field that is equivalent: Aaron Marcuse-Kubitza
07:10 PM Revision 4798: inputs/Madidi/Plot/map.csv: Remapped Altitude from verbatimElevation to elevationInMeters, since the values are all decimals. Note that the units are assumed based on the range of values present and the region the data is from (Madidi National Park).: Aaron Marcuse-Kubitza
06:50 PM Revision 4797: inputs/CTFS/Plot/map.csv: Remapped Elevation from verbatimElevation to elevationInMeters, since it is a float in the original bci.sql database. Note that the units are assumed based on the range of values present and the country the data is from (Panama).: Aaron Marcuse-Kubitza
06:33 PM Revision 4796: mappings/VegCore-VegBIEN.csv: Mapped elevationInMeters: Aaron Marcuse-Kubitza
06:30 PM Revision 4795: mappings/Veg+.terms.csv: Added elevationInMeters: Aaron Marcuse-Kubitza
05:43 PM Revision 4794: schemas/vegbien.sql: location: Added verbatimelevation: Aaron Marcuse-Kubitza
05:21 PM Revision 4793: README.TXT: Data import: Added note that `make schemas/reinstall` must be done *after* running make_analytical_db on a previous import: Aaron Marcuse-Kubitza
05:18 PM Task #495 (Resolved): add separate datasource table rather than using party for this: Aaron Marcuse-Kubitza
05:16 PM Revision 4792: schemas/vegbien.sql: Added indexes for additional analytical_db_view joins, as described at <https://projects.nceas.ucsb.edu/nceas/issues/494>: Aaron Marcuse-Kubitza
05:14 PM Task #494 (Resolved): add indexes for the analytical_db_view joins: Index added on specimenreplicate Aaron Marcuse-Kubitza
05:01 PM Task #494: add indexes for the analytical_db_view joins: Indexes added on locationevent, taxonoccurrence, aggregateoccurrence Aaron Marcuse-Kubitza
04:38 PM Task #494 (Resolved): add indexes for the analytical_db_view joins: * *_unique indexes are often used in joins, but some (such as locationevent_unique_within_location) have filter condi... Aaron Marcuse-Kubitza
04:51 PM Revision 4791: schemas/vegbien.sql: Added indexes for the analytical_db_view joins, as described at <https://projects.nceas.ucsb.edu/nceas/issues/494>: Aaron Marcuse-Kubitza
04:28 PM Revision 4790: README.TXT: Data import: Added note that `make schemas/rotate` must be done *after* running make_analytical_db: Aaron Marcuse-Kubitza
04:17 PM Revision 4789: schemas/functions.sql: Renamed _pct_to_frac() to _percent_to_fraction() and _frac_to_pct() to _fraction_to_percent(), for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere: Aaron Marcuse-Kubitza
04:06 PM Revision 4788: review: Don't remove XML functions that are unit conversions: Aaron Marcuse-Kubitza
04:00 PM Revision 4787: schemas/vegbien.sql: Changed _frac units suffix to _fraction for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere: Aaron Marcuse-Kubitza
03:58 PM Revision 4786: schemas/vegbien.sql: Changed _frac units suffix to _fraction for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere: Aaron Marcuse-Kubitza
03:47 PM Revision 4785: inputs/*/*/map.csv: Remapped intercept_cm to new intercept_cm so that units match: Aaron Marcuse-Kubitza
03:45 PM Revision 4784: mappings/VegCore-VegBIEN.csv: Mapped intercept_cm: Aaron Marcuse-Kubitza
03:41 PM Revision 4783: schemas/functions.sql: Added _cm_to_m(): Aaron Marcuse-Kubitza
03:39 PM Revision 4782: mappings/Veg+.terms.csv: Added intercept_cm: Aaron Marcuse-Kubitza
03:35 PM Revision 4781: mappings/VegCore-VegBIEN.csv: Changed volumeCanopy to the more accurate intercept_m. volumeCanopy was the closest equivalent VegX term, but did not really fit line-intercept information, nor did it include units.: Aaron Marcuse-Kubitza
03:28 PM Revision 4780: mappings/Veg+.terms.csv: Added intercept_m: Aaron Marcuse-Kubitza
02:46 PM Revision 4779: schemas/vegbien.sql: taxonscope: Added comment that it stores the scope of a morphospecies name: Aaron Marcuse-Kubitza
02:32 PM Revision 4778: inputs/import.stats.xls: Updated import times: Aaron Marcuse-Kubitza
02:31 PM Revision 4777: README.TXT: Data import: Commit: Shortened import message to fit on one line in the README, to avoid issues when copying and pasting: Aaron Marcuse-Kubitza

09/17/2012

05:02 PM Revision 4776: schemas/functions.sql: Added _ha_to_m2(text), _pct_to_frac(text): Aaron Marcuse-Kubitza
04:55 PM Revision 4775: schemas/vegbien.sql: analytical_db_view: Use _m2_to_ha() on location.area_m2 to get plotAreaHa: Aaron Marcuse-Kubitza
04:50 PM Revision 4774: schemas/vegbien.sql: analytical_db_view: Use _m2_to_ha() on location.area_m2 to get plotAreaHa: Aaron Marcuse-Kubitza
04:49 PM Revision 4773: schemas/functions.sql: Added _m2_to_ha(): Aaron Marcuse-Kubitza
04:46 PM Revision 4772: mappings/VegCore-VegBIEN.csv, Veg+.terms.csv: Removed imprecise and no longer used plotArea and area. Use plotArea_<units> instead.: Aaron Marcuse-Kubitza
04:44 PM Revision 4771: inputs/*/*/map.csv: Remapped applicable plotArea fields to plotArea_m2: Aaron Marcuse-Kubitza
04:41 PM Revision 4770: mappings/VegCore-VegBIEN.csv: Mapped plotArea_m2: Aaron Marcuse-Kubitza
04:40 PM Revision 4769: mappings/Veg+.terms.csv: Added plotArea_m2: Aaron Marcuse-Kubitza
04:39 PM Revision 4768: mappings/VegCore-VegBIEN.csv: Renamed plotAreaHa to plotArea_ha for consistency with VegBIEN units suffixing convention, which includes an "_": Aaron Marcuse-Kubitza
04:35 PM Revision 4767: inputs/*/*/map.csv: Remapped applicable plotArea fields to plotAreaHa: Aaron Marcuse-Kubitza
04:19 PM Revision 4766: mappings/Veg+-VegCore.csv: Removed inaccurate SizeOfSite->plotArea mapping, which does not match units: Aaron Marcuse-Kubitza
04:16 PM Revision 4765: mappings/VegCore-VegBIEN.csv: Mapped plotAreaHa: Aaron Marcuse-Kubitza
04:16 PM Revision 4764: schemas/functions.sql: Added _ha_to_m2(): Aaron Marcuse-Kubitza
04:11 PM Revision 4763: mappings/Veg+.terms.csv: Added plotAreaHa: Aaron Marcuse-Kubitza
04:08 PM Revision 4762: mappings/Veg+.terms.csv: Standardize area using VegX /plots/plot/area instead of Madidi Inventory+description.Area: Aaron Marcuse-Kubitza
04:01 PM Revision 4761: schemas/vegbien.sql: analytical_db_view: Use _frac_to_pct() on aggregateoccurrence.cover_frac to get pctCover: Aaron Marcuse-Kubitza
03:43 PM Revision 4760: schemas/functions.sql: Added _pct_to_frac(): Aaron Marcuse-Kubitza
03:37 PM Revision 4759: mappings/VegCore-VegBIEN.csv: coverPercent: Convert to fraction using _pct_to_frac(): Aaron Marcuse-Kubitza
03:37 PM Revision 4758: xml_dom.py: replace_with_text(): Support ints and floats: Aaron Marcuse-Kubitza
03:36 PM Revision 4757: xml_dom.py: replace_with_text(): Support ints and floats: Aaron Marcuse-Kubitza
03:31 PM Revision 4756: xml_func.py: simplify(): Run xml_dom.prune_empty() on function nodes that don't have an explicit simplifying function. This allows single-arg functions with no arg to be pruned rather than called with no args (causing errors if the single param does not have a default value).: Aaron Marcuse-Kubitza
02:31 PM Revision 4755: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
02:29 PM Revision 4754: schemas/vegbien.sql: Added units suffix to additional VegBIEN fields that have units: Aaron Marcuse-Kubitza
02:01 PM Revision 4753: schemas/vegbien.sql: Added units suffix to all core VegBIEN fields that have units. It is the responsibility of the mappings to ensure that all units are properly translated.: Aaron Marcuse-Kubitza
12:18 PM Revision 4752: root Makefile: PostgreSQL: postgres-Linux: Added postgresql-postgis apt-get: Aaron Marcuse-Kubitza
11:58 AM Revision 4751: backups/Makefile: Backups: Full DB: Specify the date suffix of the backup when it's created rather than adding it afterwards. This allows the user to specify a suffix that matches the corresponding public-schema backup.: Aaron Marcuse-Kubitza
11:41 AM Revision 4750: inputs/*/*/map.csv: Mapped variants of subspecies directly to new subspecies term: Aaron Marcuse-Kubitza
11:31 AM Revision 4749: mappings/VegCore-VegBIEN.csv: subspecies, infraspecificEpithet: Added _alts for datasources that specify both: Aaron Marcuse-Kubitza
11:27 AM Revision 4748: input.Makefile: Mapping: $(map2db): Inline $(map) because this is the only place it's used: Aaron Marcuse-Kubitza
11:26 AM Revision 4747: input.Makefile: Mapping: $(map): Don't require flat files because they don't need to be used directly anymore (staging tables are used instead): Aaron Marcuse-Kubitza
11:24 AM Revision 4746: input.Makefile: Mapping: $(map2db): Always use staging tables, because the flat files don't need to be used directly anymore: Aaron Marcuse-Kubitza
11:02 AM Revision 4745: mappings/Veg+-VegCore.csv: Remapped subspecies, subSpeciesName to new subspecies term: Aaron Marcuse-Kubitza
10:52 AM Revision 4744: mappings/VegCore-VegBIEN.csv: Mapped subspecies, variety, forma, cultivar: Aaron Marcuse-Kubitza
10:47 AM Revision 4743: mappings/Veg+.terms.csv: Added subspecies, variety, forma, cultivar: Aaron Marcuse-Kubitza
10:33 AM Revision 4742: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
10:30 AM Revision 4741: schemas/vegbien.sql: taxon.authority_id: Added descriptive comment that this is the authority which defines the taxon name (as opposed to the author of the taxon name): Aaron Marcuse-Kubitza
10:29 AM Revision 4740: schemas/vegbien.sql: taxon: Added author_id for the author of the taxon name. This is distinct from authority_id, which is the authority used to determine which taxon name to apply.: Aaron Marcuse-Kubitza
10:14 AM Revision 4739: schemas/vegbien.sql: analytical_db_view: Use new denormalized placepath table instead of place, which significantly reduces the number of joins: Aaron Marcuse-Kubitza
10:11 AM Revision 4738: schemas/vegbien.sql: location: Removed stateprovince, country because these are now in placepath (as well as in place.rank): Aaron Marcuse-Kubitza
10:06 AM Task #383: convert VegBank data dictionary to database comments: Bob wants a VegBIEN data dictionary Aaron Marcuse-Kubitza
10:01 AM Revision 4737: schemas/vegbien.sql: analytical_db_view: LEFT JOIN locationcoords and locationplace so that locations will be included even if they don't have one of these two determinations: Aaron Marcuse-Kubitza
10:00 AM Revision 4736: schemas/vegbien.sql: analytical_db_view: Fixed bug where method was being joined instead of left-joined, causing only rows with a method to be included: Aaron Marcuse-Kubitza
09:44 AM Revision 4735: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
09:41 AM Revision 4734: schemas/vegbien.sql: locationplace: Added identifier_id, so that different identifiers (e.g. the data provider and GNRS) can provide separate locationplaces even if the standardized name happens to be the same as the original name: Aaron Marcuse-Kubitza
09:31 AM Revision 4733: mappings/VegBank-VegBIEN.csv: Added place->locationplace renaming: Aaron Marcuse-Kubitza
09:30 AM Revision 4732: mappings/VegBIEN-VegBank.csv: Reversed the order of the columns so it's a more natural forward renaming, and renamed the file to VegBank-VegBIEN.csv to reflect the new column order: Aaron Marcuse-Kubitza
09:27 AM Revision 4731: mappings/VegBIEN-VegBank.csv: Fixed order of plantconcept->taxon renaming because the VegBIEN column is on the right: Aaron Marcuse-Kubitza
09:26 AM Revision 4730: schemas/vegbien.sql: Renamed namedplace to place for simplicity and consistency with placepath and locationplace: Aaron Marcuse-Kubitza
09:09 AM Revision 4729: schemas/vegbien.sql: taxon: Made authority an fkey to reference instead of a text field: Aaron Marcuse-Kubitza
09:03 AM Revision 4728: schemas/vegbien.sql: Moved steps to include a taxon name at a rank with no explicit column from taxon's comment to taxonpath's comment, because that is the table the steps apply to: Aaron Marcuse-Kubitza
09:00 AM Revision 4727: schemas/vegbien.sql: Added placepath (analogous to taxonpath), and point locationplace to it instead of directly to namedplace: Aaron Marcuse-Kubitza
08:11 AM Revision 4726: schemas/vegbien.sql: Split locationdetermination into locationcoords and locationplace, so that coordinate determinations can be made separately from place determinations: Aaron Marcuse-Kubitza
07:22 AM Revision 4725: schemas/vegbien.sql: location: Removed authore, authorn because this information is now in locationdetermination as verbatimlongitude, verbatimlatitude: Aaron Marcuse-Kubitza
07:20 AM Revision 4724: schemas/vegbien.sql: location: Removed centerlatitude/longitude, publiclatitude/longitude because this information is now in locationdetermination: Aaron Marcuse-Kubitza
07:09 AM Task #327 (Resolved): look into Clio: *[[Column-based import]]* does effectively what "*Clio*":http://www.almaden.ibm.com/cs/projects/criollo/ does Aaron Marcuse-Kubitza
07:07 AM Task #427 (Resolved): Load all plots data: All *[[Databanks#BIEN 2 datasources|BIEN2 plots data*]] has been loaded, including the core fields of VegBank Aaron Marcuse-Kubitza
07:05 AM Task #288 (Resolved): VegX-VegBank mapping: We now map "VegX->VegCore":https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/mappings/VegX-VegCore.c... Aaron Marcuse-Kubitza
07:03 AM Task #314 (Resolved): Import CTFS data: Aaron Marcuse-Kubitza
07:02 AM Task #368 (Rejected): get TEAM VegX data: Not needed because we have the raw TEAM data, which is easier to work with than XML Aaron Marcuse-Kubitza
07:01 AM Task #455 (Resolved): change summarizing queries to use vegbien staging tables: Aaron Marcuse-Kubitza
06:59 AM Task #441 (Resolved): import CTFS data using JOINs from DB export, not VegX: Aaron Marcuse-Kubitza
06:58 AM Task #317 (Rejected): Direct mapping from VegX to VegBIEN: We instead have a mapping from "VegX to VegCore":https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/m... Aaron Marcuse-Kubitza
06:49 AM Revision 4723: schemas/vegbien.ERD.mwb: Fixed lines: Aaron Marcuse-Kubitza
06:48 AM Revision 4722: mappings/VegBIEN-VegBank.csv: Added table rename plantconcept->taxon: Aaron Marcuse-Kubitza
06:47 AM Revision 4721: schemas/vegbien.sql: taxonpath.scientificnamewithauthor: Added comment that it's equivalent to "Name sec. x": Aaron Marcuse-Kubitza
06:43 AM Revision 4720: schemas/vegbien.sql: taxon: Added comment that it's VegBank's plantConcept table: Aaron Marcuse-Kubitza

09/14/2012

11:21 PM Revision 4719: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
11:18 PM Revision 4718: schemas/vegbien.sql: Renamed plantconcept to taxonpath for consistency with DwC's Taxon category and to emphasize that the table stores taxonomic paths: Aaron Marcuse-Kubitza
11:11 PM Revision 4717: schemas/vegbien.sql: Renamed plantname to taxon for consistency with DwC's Taxon category: Aaron Marcuse-Kubitza
11:02 PM Revision 4716: schemas/vegbien.sql: plantname: Renamed plantname field to taxonname for consistency with DwC's Taxon category: Aaron Marcuse-Kubitza
10:55 PM Revision 4715: Regenerated vegbien.ERD exports: Aaron Marcuse-Kubitza
10:49 PM Revision 4714: Updated aggregated unmapped_terms.csv, new_terms.csv. This removes terms that contained a filter (which is now in a separate column) and moves new terms that are unmapped from new_terms.csv to unmapped_terms.csv. Note that the majority of unmapped terms are from VegBank's huge tables, and are not part of the core fields needed for the analytical DB.: Aaron Marcuse-Kubitza
10:41 PM Revision 4713: schemas/vegbien.sql: taxonrank: Switched to using extended taxonomic ranks list derived from VegX at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegBIEN_taxonomic_schema#Extended>. This renames *division to *phylum and splits up 'cultivar/forma'.: Aaron Marcuse-Kubitza
10:39 PM Revision 4712: schemas/vegbien.sql: taxonrank: Removed 'authority', which doesn't belong as a taxonomic rank: Aaron Marcuse-Kubitza
10:38 PM Revision 4711: schemas/vegbien.sql: plantname: Added authority so each taxonomic level can have its own authority (author). Include it in the plantname_unique unique index because plantname is a globally scoped table.: Aaron Marcuse-Kubitza
10:25 PM Revision 4710: schemas/vegbien.sql: taxonrank: Removed 'binomial', which doesn't belong as a taxonomic rank: Aaron Marcuse-Kubitza
10:24 PM Revision 4709: schemas/vegbien.sql: Changed analytical_db_view to use new denormalized taxonomic names in plantconcept, which significantly reduces the number of joins. Note that changing the tables used by a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing things to be moved around in the svn diff.: Aaron Marcuse-Kubitza
10:01 PM Revision 4708: inputs/Madidi/Organism/map.csv: Remapped Specie+autor to new scientificNameWithAuthorship. Mapped Species and morphotypes to now-available scientificName.: Aaron Marcuse-Kubitza
09:59 PM Revision 4707: mappings/VegCore-VegBIEN.csv: Moved scientificNameWithAuthorship before scientificName in taxonoccurrence.authortaxoncode's _alts: Aaron Marcuse-Kubitza
09:55 PM Revision 4706: mappings/VegCore-VegBIEN.csv: Mapped scientificNameWithAuthorship as an _alt of taxonoccurrence.authortaxoncode: Aaron Marcuse-Kubitza
09:53 PM Revision 4705: mappings/VegCore-VegBIEN.csv: Mapped scientificNameWithAuthorship: Aaron Marcuse-Kubitza
09:51 PM Revision 4704: mappings/Veg+.terms.csv: Added scientificNameWithAuthorship: Aaron Marcuse-Kubitza
09:47 PM Revision 4703: mappings/VegCore-VegBIEN.csv: Taxonomic names: Remapped to new denormalized fields in plantconcept: Aaron Marcuse-Kubitza
09:08 PM Revision 4702: schemas/vegbien.sql: plantname: Added comment documenting how to include a taxon name at a rank with no explicit column, by using the plantname table as an ordered linked list linked together using parent_id. (This method of using a linked list is one way of storing an ordered list of user-defined data. It is similar to using locationevent.previous_id to link successive reobservations of the same location together.) Note that plantname can store both the official tree of life and the data provider's own custom tree of life (or a subset thereof), with the two being distinguished by whether the data provider's or TNRS's taxondeterminations point to them.: Aaron Marcuse-Kubitza
08:53 PM Revision 4701: schemas/vegbien.sql: plantname: Added verbatimrank to store ranks of custom taxonomic levels, such as rosids. Note that even if you specify a custom verbatimrank, you must also specify a closest-match rank from the taxonrank closed list. This ensures that every taxonomic name is placed in the correct relative order in the taxonomic hierarchy.: Aaron Marcuse-Kubitza
08:38 PM Revision 4700: schemas/vegbien.sql: plantconcept: Made plantname_id optional because the datasource's plantconcepts do not need to be placed in the recursive plantname hierarchy: Aaron Marcuse-Kubitza
08:35 PM Revision 4699: schemas/vegbien.sql: plantconcept: Added datasource_id and appropriate unique indexes to enable scoping by datasource. Moved plantcode right after datasource_id because it will be used for the sourceaccessioncode (if any).: Aaron Marcuse-Kubitza
08:21 PM Revision 4698: schemas/vegbien.sql: Moved plantconcept.plantdescription to plantname and renamed it to description, so that a taxon of any rank can have a description: Aaron Marcuse-Kubitza
08:02 PM Revision 4697: schemas/vegbien.sql: plantconcept: Added denormalized taxonomic ranks from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegBIEN_taxonomic_schema#Primary> and concatenated scientific name fields: Aaron Marcuse-Kubitza
07:25 PM Revision 4696: Removed no longer used ucase_first: Aaron Marcuse-Kubitza
07:23 PM Revision 4695: Removed no longer used bin/union: Aaron Marcuse-Kubitza
07:22 PM Revision 4694: Removed no longer used join_union_sort: Aaron Marcuse-Kubitza
07:21 PM Revision 4693: Removed no longer used ci_map, because all relevant mapping scripts are now case-insensitive: Aaron Marcuse-Kubitza
07:19 PM Revision 4692: mappings/Makefile: Inline $(review_) because it's only used once: Aaron Marcuse-Kubitza
07:18 PM Revision 4691: mappings/Makefile: Removed no longer used $(review): Aaron Marcuse-Kubitza
07:17 PM Revision 4690: mappings/Makefile: Don't set $(SHELL) to /bin/bash because this is no longer needed: Aaron Marcuse-Kubitza
07:16 PM Revision 4689: mappings/Makefile: Removed empty VegCSV section. mappings/Makefile's only functionality is now to clean up (sort) the core maps whenever they change and create human-readable maps from them.: Aaron Marcuse-Kubitza
07:13 PM Revision 4688: mappings/Makefile: Removed no longer used self maps, because the new automapping mechanism does not use them: Aaron Marcuse-Kubitza
07:09 PM Revision 4687: input.Makefile: Existing maps discovery: Substituted Veg+ for $(via) because it's now only used once: Aaron Marcuse-Kubitza
07:05 PM Revision 4686: mappings/VegCore-VegBIEN.csv: Changed input column header from VegCore[Veg+] to VegCore because this is more accurate. This is possible now that we're using new automapping scripts that do not require a particular column header.: Aaron Marcuse-Kubitza
06:39 PM Revision 4685: inputs/*/*/map.csv: Changed _merge to _join everywhere because _merge's (slower) duplicate elimination functionality is not needed (the combined columns do not both contain the same value, so they can simply be concatenated): Aaron Marcuse-Kubitza
06:38 PM Revision 4684: inputs/*/*/map.csv: Changed _merge to _join everywhere because _merge's (slower) duplicate elimination functionality is not needed (the combined columns do not both contain the same value, so they can simply be concatenated): Aaron Marcuse-Kubitza
06:21 PM Revision 4683: schemas/functions.sql: _label(): Accept params of any type, in order to support types other than text (which come from staging tables that are imported directly from a SQL export). This fixes a bug in SALVIAS.plotMetadata's column-based import.: Aaron Marcuse-Kubitza
06:17 PM Revision 4682: schemas/functions.sql: _label(): Support NULL labels by not prepending a label: Aaron Marcuse-Kubitza
06:04 PM Revision 4681: mappings/Veg+-VegCore.csv: Changed output column header from Veg+ to VegCore because this is more accurate. This is possible now that we're using new automapping scripts that do not require a particular column header. Note that this change now requires the map.csvs to use VegCore as their output column header, because otherwise the Veg+ header will get automapped to VegCore. (The header replacing is a feature to support changing the header when the schema of the column's terms changes.): Aaron Marcuse-Kubitza
06:03 PM Revision 4680: mappings/root.sh: Changed output column header from Veg+ to VegCore because this is more accurate following the initial automapping: Aaron Marcuse-Kubitza
05:59 PM Revision 4679: inputs/*/*/map.csv: Changed output column header from Veg+ to VegCore because the names will be VegCore names after automapping. This is possible now that we're using new automapping scripts that do not require a particular column header.: Aaron Marcuse-Kubitza
05:53 PM Revision 4678: inputs/import.stats.xls: Copied the Change factor formula to all rows (it displays an empty string for rows that don't have both a row-based and a column-based import): Aaron Marcuse-Kubitza
05:49 PM Revision 4677: README.TXT: Data import: Added steps to record the import times in inputs/import.stats.xls: Aaron Marcuse-Kubitza
05:42 PM Revision 4676: inputs/import.stats.xls: Updated with stats from latest import: Aaron Marcuse-Kubitza
05:40 PM Revision 4675: Added import_times: Aaron Marcuse-Kubitza

09/13/2012

02:40 PM Revision 4674: mappings/root.sh: Removed no longer needed $in_root_suffix: Aaron Marcuse-Kubitza
02:39 PM Revision 4673: src_map: Upgraded to match new map format by adding Filter column: Aaron Marcuse-Kubitza
02:38 PM Revision 4672: input.Makefile: $(viaMaps): Fixed bug where could not wrap it in $(wildcard) because that would prevent map.csv from being created when a new datasource or new subdir is added: Aaron Marcuse-Kubitza

09/12/2012

05:36 PM Revision 4671: input.Makefile: $(viaMaps): Removed extra addition of */map.csv, which is already included because all $(tables) have or will get a map.csv: Aaron Marcuse-Kubitza
05:34 PM Revision 4670: mappings/: Removed no longer used derived file Veg+.vocab.csv: Aaron Marcuse-Kubitza
05:33 PM Revision 4669: input.Makefile: Removed no longer used $(vocab): Aaron Marcuse-Kubitza
05:32 PM Revision 4668: input.Makefile: Maps validation: %/new_terms.csv: Filter out $(coreMap) and $(dict) successively instead of $(vocab), to avoid requiring intermediate mapping files not edited by the user: Aaron Marcuse-Kubitza
05:28 PM Revision 4667: input.Makefile: Maps validation: $(newTerms): Don't hardcode the caller's first filter_out_ci by prerequisite position; instead allow them to specify the command (including the var name) themselves: Aaron Marcuse-Kubitza
05:24 PM Revision 4666: input.Makefile: Maps validation: $(newTerms): For simplicity, subset the columns before running filter_out_ci: Aaron Marcuse-Kubitza
05:20 PM Revision 4665: mappings/: Removed no longer used Veg+-VegBIEN.csv and derived autogen Veg+.self.csv: Aaron Marcuse-Kubitza
05:16 PM Revision 4664: input.Makefile: Maps building: %/unmapped_terms.csv: Use $(coreMap) instead of $(vocab) because the terms should already be translated to VegCore terms, rather than still being Veg+: Aaron Marcuse-Kubitza
05:13 PM Revision 4663: input.Makefile: Maps validation: $(newTerms): Fixed bug where header needed to be removed *before* running filter_out_ci because filter_out_ci only removes the header if it matches the vocabulary's header. Removing the header afterward can cause the first row to be removed instead if the header was already removed.: Aaron Marcuse-Kubitza
05:11 PM Revision 4662: cols: Support CSVs without a header, such as intermediates that become unmapped_terms.csv, new_terms.csv: Aaron Marcuse-Kubitza
04:37 PM Revision 4661: inputs/: Regenerated unmapped_terms.csv, new_terms.csv: Aaron Marcuse-Kubitza
04:25 PM Revision 4660: input.Makefile: %/.map.csv.last_cleanup: Removed no longer used prerequisite $(vocab): Aaron Marcuse-Kubitza
04:24 PM Revision 4659: input.Makefile: %/.map.csv.last_cleanup: Canonicalize separately on $(coreMap) and $(dict), instead of requiring them to be combined in $(vocab): Aaron Marcuse-Kubitza
04:20 PM Revision 4658: input.Makefile: Use mappings/VegCore-VegBIEN.csv instead of mappings/Veg+-VegBIEN.csv as the core map, because the automapper now takes care of Veg+ -> VegCore translation: Aaron Marcuse-Kubitza
04:14 PM Revision 4657: inputs/*/*/map.csv: Moved filter suffixes to separate filter column to enable automapping to work on those mappings' terms, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Move-filter-suffixes-to-separate-filter-column>. Note that the only changes to VegBIEN.csvs are the (now automapped) names of terms in "No join mapping" comments.: Aaron Marcuse-Kubitza
03:37 PM Revision 4656: inputs/*/*/map.csv: Added Filter column to contain any suffix added after the term, so that the automapping mechanism does not have to deal with the filter expressions: Aaron Marcuse-Kubitza
03:35 PM Revision 4655: Added cat_cols: Aaron Marcuse-Kubitza
03:34 PM Revision 4654: Added ins_col: Aaron Marcuse-Kubitza
03:13 PM Revision 4653: input.Makefile: Maps building: %/.map.csv.last_cleanup: Reference fixed prerequisites by name instead of by position in the prerequisites list: Aaron Marcuse-Kubitza
02:28 PM Revision 4652: Removed no longer used intersect: Aaron Marcuse-Kubitza
02:18 PM Revision 4651: inputs/*/*/map.csv: Removed no longer needed [Veg+] suffix in root, because the input column is no longer used by old-style map utilities such as union that needed this: Aaron Marcuse-Kubitza
02:07 PM Revision 4650: translate: Translate the column header instead of passing it through, in order to properly support CSVs without a header and to support renaming the header when the column's contents change to a different schema or vocabulary: Aaron Marcuse-Kubitza
02:04 PM Revision 4649: canon: Canonicalize the column header instead of passing it through, in order to properly support CSVs without a header: Aaron Marcuse-Kubitza
01:57 PM Revision 4648: filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.: Aaron Marcuse-Kubitza
01:48 PM Revision 4647: autoremove: `svn rm`: Fixed bug where needed to add --force in case the file had already been modified before being autoremoved: Aaron Marcuse-Kubitza
01:32 PM Revision 4646: input.Makefile: Maps building: Removed no longer used $(createOnlyMaps): Aaron Marcuse-Kubitza
01:30 PM Revision 4645: input.Makefile: Maps building: Removed no longer used %/src.csv, because it is no longer needed to generate map.full.csv from map.csv: Aaron Marcuse-Kubitza
01:21 PM Revision 4644: input.Makefile: Maps building: %/map.csv: If it doesn't exist, generate directly using $(mkSrcMap) instead of by copying %/src.csv, in order to eventually avoid the need to create a separate src.csv at all. Note that this avoids the need to run make twice when the table is first created to properly bootstrap all maps.: Aaron Marcuse-Kubitza
01:09 PM Revision 4643: autoremove: Try `svn rm` first in case the file is in svn: Aaron Marcuse-Kubitza
01:02 PM Revision 4642: input.Makefile: Maps building: Removed no longer used %/map.full.csv: Aaron Marcuse-Kubitza
12:59 PM Revision 4641: input.Makefile: Maps building: %/VegBIEN.csv: Use %/map.csv directly because %/map.full.csv is now a copy of it: Aaron Marcuse-Kubitza
12:56 PM Revision 4640: input.Makefile: Maps building: %/map.full.csv: Generate by copying map.csv, because the content of these files now differs only in the sort order of the names: Aaron Marcuse-Kubitza
12:53 PM Revision 4639: inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.: Aaron Marcuse-Kubitza
12:43 PM Revision 4638: inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.: Aaron Marcuse-Kubitza
12:31 PM Revision 4637: join: passthru mode: Fixed bug where empty join mappings needed to have the output field of the right-hand row manually set to the output field of the left-hand row for maps.merge_mappings() to work properly: Aaron Marcuse-Kubitza
12:14 PM Revision 4636: inputs/*/*/map.csv: Added back automapped mappings to map.csv, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>: Aaron Marcuse-Kubitza
12:07 PM Revision 4635: inputs/VegBank/taxonobservation_/map.csv: Updated with new renamings of colliding join columns: Aaron Marcuse-Kubitza
12:00 PM Revision 4634: join: When a join mapping exists but is empty, still include any additional columns from that mapping in the combined row: Aaron Marcuse-Kubitza
11:48 AM Revision 4633: inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Use input term as the initial Veg+ term, so the src.csv can be used with the Add back automapped mappings process at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>: Aaron Marcuse-Kubitza
11:31 AM Revision 4632: inputs/XAL/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix: Aaron Marcuse-Kubitza
11:22 AM Revision 4631: inputs/SpeciesLink/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix: Aaron Marcuse-Kubitza
11:02 AM Revision 4630: inputs/SpeciesLink/Specimen/map.csv: Removed no longer needed duplicate entries for each first letter case, which cause duplicate output mappings now that join is case- and punctuation-insensitive. Note that the `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.: Aaron Marcuse-Kubitza
10:27 AM Revision 4629: inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Added Comments column for consistency with autogenerated src.csv format: Aaron Marcuse-Kubitza
10:14 AM Revision 4628: join: Added new passthru mode which passes through terms with no input mapping or no join mapping: Aaron Marcuse-Kubitza
09:25 AM Revision 4627: inputs/: Added [Veg+] to via map roots to indicate that the datasource and Veg+ vocabularies are combinable. This is possible now that automapped entries are no longer subtracted when this is in the map root, so there is no concern of losing comments on subtracted automapped rows. Note that this change turns on old-style automapping for these datasources, causing SALVIAS plotMetadata to acquire additional mappings.: Aaron Marcuse-Kubitza
08:59 AM Revision 4626: canon, translate, filter_out_ci: Support vocabularies/dictionaries with additional columns in addition to the functional column(s) used by the program. These columns can contain comments, etc. This was not originally supported because Python 2's iterable unpacking only supports "an iterable with the same number of items as there are targets in the target list" (http://docs.python.org/reference/simple_stmts.html#assignment-statements). We now use numeric array indexes instead to get around this limitation, and for consistency with other map-manipulation scripts.: Aaron Marcuse-Kubitza
08:21 AM Revision 4625: Removed no longer used subtract (use filter_out_ci instead): Aaron Marcuse-Kubitza
08:19 AM Revision 4624: input.Makefile: Maps building: %/.map.csv.last_cleanup: Removed no longer needed subtraction of automapped entries, because information about unmapped and new terms is now available in unmapped_terms.csv and new_terms.csv: Aaron Marcuse-Kubitza
08:13 AM Revision 4623: README.TXT: Data import: `make backups/download`: Removed '&' because running the command in the background prevents rsync from providing a continuously updating progress indication (because a backgrounded process's stdout is not a TTY): Aaron Marcuse-Kubitza
08:04 AM Revision 4622: mappings/VegCore-VegBIEN.csv: Removed no longer needed /_simplifyPath:[next=parent_id]/path expressions in specific paths because parent_id forwarding is now set globally for all paths in the map root: Aaron Marcuse-Kubitza
07:56 AM Revision 4621: mappings/VegCore-VegBIEN.csv: Added /_simplifyPath:[next=parent_id]/path to root so the returned subplot location will be its parent location if there is no subplot name or ID (indicating that that particular plot did not have subplots). Note that this also causes the parent_id forwarding effect to occur for all other tables containing parent_id, which will help prevent similar issues with subplot events, etc. This will hopefully fix the SALVIAS.plotObservations bug where some organisms did not have a subplot #, causing the subplot location to become NULL and causing the corresponding locationevent rows not to match the locationevent_unique_within_location index filter condition (which requires a parent_id), which caused multiple output table pkeys to be returned for those rows, violating the locationevent_pkeys temp table's primary key.: Aaron Marcuse-Kubitza
07:25 AM Revision 4620: mappings/VegCore-VegBIEN.csv: namedplace elements: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg: Aaron Marcuse-Kubitza
07:02 AM Revision 4619: inputs/import.stats.xls: Updated with stats from latest import: Aaron Marcuse-Kubitza

09/11/2012

11:04 AM Revision 4618: input.Makefile: Maps validation: $(newTerms): Fixed bug where tail with positive offset needs -n flag: Aaron Marcuse-Kubitza
11:01 AM Revision 4617: Regenerated/modified inputs/*/*/src.csv to use the self-mapping format used by the new automapping mechanism: Aaron Marcuse-Kubitza
10:50 AM Revision 4616: src_map: Map source columns to themselves so that src.csv can be used directly with the new automapping mechanism: Aaron Marcuse-Kubitza
10:48 AM Revision 4615: input.Makefile: Maps validation: %/new_terms.csv: Remove terms which are also in %/unmapped_terms.csv, because terms are not considered new (i.e. potential Veg+ terms) until they have been mapped to an existing Veg+ term. Being unmapped has a higher priority than being new, because it affects the current datasource itself rather than the easier mapping of future datasources.: Aaron Marcuse-Kubitza
10:22 AM Revision 4614: lib/mappings.Makefile: missing_mappings: Display unmapped_terms.csv, new_terms.csv after generating them, to preserve the behavior of the original missing_mappings: Aaron Marcuse-Kubitza
10:17 AM Revision 4613: root Makefile: Maps validation: Removed no longer used $(missingMappingsCmd): Aaron Marcuse-Kubitza
10:17 AM Revision 4612: input.Makefile: Maps validation: Removed no longer used $(missingMappingsCmd): Aaron Marcuse-Kubitza
10:16 AM Revision 4611: lib/mappings.Makefile: Removed no longer needed missing_%_mappings targets, since unmapped_terms.csv and new_terms.csv now serve the same purpose in a more efficient way: Aaron Marcuse-Kubitza
10:14 AM Revision 4610: lib/mappings.Makefile: `ifndef` for $(termsSubdirs): Fixed bug where needed to be termsSubdirs instead of missingMappingsCmd: Aaron Marcuse-Kubitza
10:02 AM Revision 4609: lib/mappings.Makefile: Require $(termsSubdirs): Aaron Marcuse-Kubitza
10:00 AM Revision 4608: Generated global unmapped_terms.csv, new_terms.csv: Aaron Marcuse-Kubitza
10:00 AM Revision 4607: root Makefile: Maps validation: Added $(termsSubdirs) to enable generation of global unmapped_terms.csv, new_terms.csv: Aaron Marcuse-Kubitza
09:59 AM Revision 4606: inputs/: Generated combined unmapped_terms.csv, new_terms.csv for all inputs: Aaron Marcuse-Kubitza
09:58 AM Revision 4605: lib/mappings.Makefile: $(catTerms): Fixed bug where only existing $+ files (using $(+w)) could be included in the list (both to check and to use), because otherwise cat would raise an error or try to read stdin: Aaron Marcuse-Kubitza
09:56 AM Revision 4604: Existing maps discovery: Fixed bug where new unmapped_terms.csv, new_terms.csv needed to be included in $(anyMap): Aaron Marcuse-Kubitza
09:52 AM Revision 4603: lib/common.Makefile: Added $(+w): Aaron Marcuse-Kubitza
09:22 AM Revision 4602: lib/common.Makefile: Added $(no/) to remove trailing /: Aaron Marcuse-Kubitza
09:18 AM Revision 4601: Extracted %/unmapped_terms.csv, %/new_terms.csv as separate targets in the Maps validation section so they can be invoked even when %/.map.csv.last_cleanup is not a top-level target (in $(MAKECMDGOALS)). Continue to invoke them in %/.map.csv.last_cleanup by using $(selfMake).: Aaron Marcuse-Kubitza
08:56 AM Revision 4600: input.Makefile: Maps validation: Set $(termsSubdirs) to enable unmapped_terms.csv, new_terms.csv generation: Aaron Marcuse-Kubitza
08:56 AM Revision 4599: lib/mappings.Makefile: Added unmapped_terms.csv, new_terms.csv which are generated by combining the correspondingly-named files in $(termsSubdirs): Aaron Marcuse-Kubitza
08:42 AM Revision 4598: input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Autoremove empty terms lists to avoid clutter: Aaron Marcuse-Kubitza
08:40 AM Revision 4597: Added autoremove: Aaron Marcuse-Kubitza
08:22 AM Revision 4596: input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Remove the CSV header from the terms lists so that multiple terms lists can easily be appended together: Aaron Marcuse-Kubitza
08:16 AM Revision 4595: input.Makefile: Maps building: %/.map.csv.last_cleanup: unmapped_terms.csv, new_terms.csv: Factored out commands into $(newTerms): Aaron Marcuse-Kubitza
08:09 AM Revision 4594: input.Makefile: Maps building: %/.map.csv.last_cleanup: Generate reports on new and unmapped terms in map.csv: Aaron Marcuse-Kubitza
08:07 AM Revision 4593: Added filter_out_ci: Aaron Marcuse-Kubitza
07:26 AM Revision 4592: input.Makefile: Maps building: %/.map.csv.last_cleanup: Translate map.csv using $(mappings)/$(via)-VegCore.csv: Aaron Marcuse-Kubitza
07:25 AM Revision 4591: Added translate: Aaron Marcuse-Kubitza
07:08 AM Revision 4590: mappings/Veg+-VegCore.csv: Removed no longer used Comments column. Use mappings/Veg+.terms.csv to cite term definitions instead.: Aaron Marcuse-Kubitza
07:06 AM Revision 4589: mappings/Veg+-VegCore.csv: previousCatalogNumber: Removed no longer needed "According to" comment, because this is now documented in the mappings/Veg+.terms.csv entry. Note that the citation for any mapping is the overlap of the terms' definitions, and thus only the definitions need to be cited, not the mapping itself. (The definitions are provided in the links in mappings/Veg+.terms.csv.): Aaron Marcuse-Kubitza
07:01 AM Revision 4588: mappings/Veg+.terms.csv: previousCatalogNumber: Added Source link to DwC history entry, which documents the definition of this term: Aaron Marcuse-Kubitza
06:43 AM Revision 4587: input.Makefile: Maps building: %/.map.csv.last_cleanup: Canonicalize map.csv using $(mappings)/$(via).vocab.csv: Aaron Marcuse-Kubitza
06:40 AM Revision 4586: Added canon: Aaron Marcuse-Kubitza
06:29 AM Revision 4585: mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.: Aaron Marcuse-Kubitza
05:49 AM Revision 4584: Added mappings/Veg+.vocab.csv: Aaron Marcuse-Kubitza
04:41 AM Revision 4583: inputs/GBIF/Specimen/map.csv: Remapped *Original fields to new verbatim* taxonomic terms: Aaron Marcuse-Kubitza
04:31 AM Revision 4582: mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.: Aaron Marcuse-Kubitza
04:23 AM Revision 4581: mappings/Veg+.terms.csv: Added min/max SlopeAspect/SlopeGradient: Aaron Marcuse-Kubitza
04:13 AM Revision 4580: inputs/VegBank/plot_/map.csv: Omit reallatitude/reallongitude because private data should not be placed in a public database: Aaron Marcuse-Kubitza
04:10 AM Revision 4579: inputs/CVS/Organism/map.csv: Omit realLatitude/realLongitude because private data should not be placed in a public database. Keeping VegBIEN free of restricted-access data allows anyone to run arbitrary queries on the database, without needing an entire security mechanism/front end just to manage users' read-only access to the data (as VegBank has). Note that the private coordinates are still accessible in the staging tables, so they will need to be locked down in order to make VegBIEN secure to public access.: Aaron Marcuse-Kubitza
03:16 AM Revision 4578: mappings/Veg+-VegCore.csv: Remapped QuadratID to subplotID because the standard definition of an ID term is an ID that's unique within the datasource, and it's just CTFS's usage that makes it unique only within the plot: Aaron Marcuse-Kubitza
03:13 AM Revision 4577: inputs/CTFS/StemObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID: Aaron Marcuse-Kubitza
03:09 AM Revision 4576: inputs/CTFS/SubplotObservation/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID: Aaron Marcuse-Kubitza
03:06 AM Revision 4575: inputs/CTFS/Subplot/map.csv: Manually mapped QuadratID to subplot since it is unique only within Site, and thus can't be the subplotID. Omit QuadratName because QuadratID is used for the same purpose.: Aaron Marcuse-Kubitza
02:57 AM Revision 4574: mappings/Veg+-VegCore.csv: Removed recordNumber/_alt and recordNumber redirection mappings so that Veg+-VegCore.csv contains only renamings, not business logic. Note that removing the global ordering of these fields does not affect the datasources which contain multiple recordNumber synonyms because they either have a custom ordering or one field is duplicated or unused.: Aaron Marcuse-Kubitza
02:49 AM Revision 4573: inputs/NY/Specimen/map.csv: Omit CollectorNumber because it is not used, so it does not need to be mapped: Aaron Marcuse-Kubitza
02:45 AM Revision 4572: inputs/ARIZ/Specimen/map.csv: Omit FieldNumber because it is identical to CollectorNumber, so it does not need to be mapped: Aaron Marcuse-Kubitza
02:19 AM Revision 4571: inputs/SpeciesLink/Specimen/map.csv: Added manual CollectorNumber mapping which places it after recordNumber/fieldNumber, so that mappings/Veg+-VegCore.csv doesn't need to maintain a global ordering between these fields and just needs to indicate their equivalency: Aaron Marcuse-Kubitza
02:09 AM Revision 4570: mappings/: Removed no longer needed Veg+-VegCore.to_self.csv, because multiple levels of mappings are no longer needed to get to the VegCore term: Aaron Marcuse-Kubitza
02:07 AM Revision 4569: mappings/Veg+-VegCore.csv: DescriptionOfSite: Mapped directly to locality rather than to locationNarrative to avoid needing multiple levels of mappings to get to the VegCore term: Aaron Marcuse-Kubitza
01:56 AM Revision 4568: mappings/Veg+-VegCore.csv: Removed scientificNameAuthorship/_alt and scientificNameAuthorship redirection mappings, which were only used by SpeciesLink but it now has the necessary _alts in its own map.csv: Aaron Marcuse-Kubitza
01:48 AM Revision 4567: mappings/Veg+-VegCore.csv: Removed dateCollected/_alt and dateCollected redirection mappings, which were only needed when multiple dateCollected fields were being combined in Veg+-VegCore.csv: Aaron Marcuse-Kubitza
01:45 AM Revision 4566: mappings/: Moved year/month/dayCollected mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayCollected values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateCollected field is now higher than the year/month/dayCollected fields when both are specified, because the dateCollected field presumably contains verbatim text while the year/month/dayCollected fields contain parsed date parts.: Aaron Marcuse-Kubitza
01:32 AM Revision 4565: inputs/SALVIAS-CSV/Organism/map.csv: Remapped census_date to eventDate, since it is not the start of a range: Aaron Marcuse-Kubitza
01:31 AM Revision 4564: inputs/Madidi/Plot/map.csv: Remapped First evaluation to eventDate, since it is not necessarily the start of a range: Aaron Marcuse-Kubitza
01:23 AM Revision 4563: mappings/VegCore-VegBIEN.csv: startDate, endDate mappings: Removed _dateRangeStart/_dateRangeEnd filters because these are assumed to already be start and end dates of a range. (eventDate should be used for concatenated date ranges.): Aaron Marcuse-Kubitza
01:09 AM Revision 4562: mappings/VegCore-VegBIEN.csv: Don't map dateCollected to locationevent.obsstartdate/obsenddate because this is the date the *specimen* was collected, not the date (range) of the entire collection *event*. This distinction may not be meaningful for specimens data, but VegBIEN should reflect what the data provider designated. This also reduces the number of dateCollected-related mappings needed for any dateCollected-related field, such as year/month/dayCollected.: Aaron Marcuse-Kubitza
12:55 AM Revision 4561: mappings/Veg+-VegCore.csv: Removed dateIdentified/_alt and dateIdentified redirection mappings, which were only needed when multiple dateIdentified fields were being combined in Veg+-VegCore.csv: Aaron Marcuse-Kubitza
12:50 AM Revision 4560: mappings/: Moved year/month/dayIdentified mappings from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic. Note that this allows the year/month/dayIdentified values to bypass the additional _dateRangeStart filter that is applied to text dates. The priority of the plain dateIdentified field is now higher than the year/month/dayIdentified fields when both are specified, because the dateIdentified field presumably contains verbatim text while the year/month/dayIdentified fields contain parsed date parts.: Aaron Marcuse-Kubitza
12:34 AM Revision 4559: mappings/: Moved verbatimGrowthForm filter mapping from Veg+-VegCore.csv to VegCore-VegBIEN.csv so that Veg+-VegCore.csv contains only renamings, not business logic: Aaron Marcuse-Kubitza
12:28 AM Revision 4558: inputs/UNCC/Specimen/map.csv, inputs/NCU-NCSC/Specimen/map.csv: Remapped cultivated fields directly via new cultivated term, rather than via establishmentMeans: Aaron Marcuse-Kubitza
12:06 AM Revision 4557: sql_io.py: mk_errors_table(): Don't cache the sql.table_exists() query, because the table will be created and its existence must be rechecked: Aaron Marcuse-Kubitza
12:02 AM Revision 4556: sql.py: table_exists(): Allow caller to set whether query will be cached. This is useful if the table will later be created and its existence should be checked again.: Aaron Marcuse-Kubitza
12:00 AM Revision 4555: sql.py: tables(): Allow caller to set whether query will be cached: Aaron Marcuse-Kubitza

Also available in: Atom