Activity
From 09/21/2012 to 10/20/2012
10/19/2012
- 06:20 PM Revision 5690: sql.py: parse_exception(): function MissingCastException: Support functions with named parameters
- 06:18 PM Revision 5689: sql.py: parse_exception(): function MissingCastException: Support function names enclosed in quotes on the context line
- 06:15 PM Revision 5688: mappings/VegCore-VegBIEN.csv: taxonName: Place it in taxonomicname instead of taxonepithet for lower taxa, because the only datasource that currently provides this field (NCBI) actually provides the full taxonomicname instead of the epithet at the current rank for lower taxa. (taxonomicname is not applicable to higher taxa because their names are not guaranteed to be globally unique.) taxonName may need to be renamed and/or redefined to account for this ambiguity in NCBI's usage.
- 06:14 PM Revision 5687: mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.
- 06:04 PM Revision 5686: mappings/VegCore-VegBIEN.csv: Do not include the taxonName in the concatenated taxonomicname because it is NOT globally unique. The same name may be used at different taxonomic ranks and mean different things, and lower taxa may have the name appear in multiple genuses or species, meaning different things.
- 05:57 PM Revision 5685: schemas/vegbien.sql: Added _is_higher_taxon()
- 05:52 PM Revision 5684: README.TXT: Documentation: To import and scrub just the test taxonomic names: Added `make inputs/.TNRS/cleanup` after `make backups/TNRS.backup/restore` because the PostgreSQL collation may differ between vegbiendev's and the user's DB
- 05:50 PM Revision 5683: sql.py: parse_exception(): DoesNotExistException: If item not found was a function and not found only because of a missing cast, raise MissingCastException instead. This should allow automatic casts to be added on function parameters as well as table columns.
- 05:28 PM Revision 5682: schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join to taxonverbatim on taxonverbatim_id (the pkey) instead of taxonlabel_id, which used to be the pkey but is now an fkey
- 05:22 PM Revision 5681: inputs/test_taxonomic_names/test_scrub: Remove any previous version of public.test_taxonomic_names before renaming public to it
- 05:19 PM Revision 5680: inputs/test_taxonomic_names/test_scrub: Fixed bug where public.sql export did not include the "CREATE SCHEMA public" statement, because pg_dump doesn't add it to backups, by using new schemas/rename/% make target to first rename the public schema and then exporting it
- 05:12 PM Revision 5679: root Makefile: VegBIEN DB: Schemas: schemas/rotate: Use new schemas/rename/%
- 05:12 PM Revision 5678: root Makefile: VegBIEN DB: Schemas: Added schemas/rename/% to rename the public schema
- 04:54 PM Revision 5677: mappings/VegCore-VegBIEN.csv: Removed filter preventing taxonomicStatus from being placed in taxonlabel if a morphospecies was provided, because the morphospecies actually never goes in the *matched* taxonlabel, only the *verbatim* taxonlabel
- 04:50 PM Revision 5676: mappings/VegCore-VegBIEN.csv: morphospecies: Also place it in the verbatim (input name's) taxonlabel. Note that it does not go in the matched name's taxonlabel, because that contains only fields from the matched name. The verbatim taxonlabel is thus a synonym of the matched taxonlabel where there is no morphospecies, or a child of it if there is a morphospecies.
- 04:36 PM Revision 5675: mappings/VegCore-VegBIEN.csv: Do not place taxonomicStatus in taxonlabel if a morphospecies was provided, to prevent it from being incorrectly marked as accepted
- 04:25 PM Revision 5674: mappings/VegCore-VegBIEN.csv: morphospecies -> taxonverbatim.morphospecies: Fixed bug where needed suffix with _if statement then clause
- 04:23 PM Revision 5673: inputs/test_taxonomic_names/_scrub/public.sql, TNRS.sql: Regenerated with schema changes
- 03:45 PM Revision 5672: pg_dump_vegbien: Added opts env var to allow specifying options to a Makefile command, which does not take positional arguments
- 03:37 PM Revision 5671: README.TXT: Schema changes: files to update with any renamings: Removed tnrs_db because that is now abstracted from the schema through the tnrs_input_name view. Note that PostgreSQL will automatically update tnrs_input_name with any table or column renames, which is the significant advantage of using a view rather than a hardcoded query.
- 03:35 PM Revision 5670: schemas/vegbien.sql: tnrs_input_name: Use DISTINCT instead of DISTINCT ON because there is only one column
- 03:34 PM Revision 5669: tnrs_db: Use new tnrs_input_name view to avoid hardcoding changing schema information
- 03:25 PM Revision 5668: inputs/test_taxonomic_names/test_scrub, README.TXT: Documented that `make schemas/public/reinstall` must come after TNRS restore to recreate the tnrs_input_name view, which has a dependency on the TNRS schema
- 03:23 PM Revision 5667: schemas/vegbien.sql: Added tnrs_input_name view for use by tnrs_db
- 12:53 PM Revision 5666: schemas/vegbien.sql: taxonlabel, taxonverbatim: Updated comments for new taxonlabel/taxonverbatim split
- 12:42 PM Revision 5665: schemas/vegbien.sql: taxonlabel_update_ancestors(): Use aliased types (http://www.postgresql.org/docs/8.3/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-TYPE) where possible
- 12:37 PM Revision 5664: schemas/vegbien.sql: taxonlabel_update_ancestors(): Adding new parent's ancestors: Change unique_violations to warnings so they don't abort the import. unique_violations should never happen unless there are cycles of two or mode nodes, but they seem to be happening nevertheless, so this will provide a workaround to that problem.
- 12:18 PM Revision 5663: inputs/import.stats.xls: Updated import times
10/18/2012
- 04:58 PM Revision 5662: Regenerated vegbien.ERD exports
- 04:55 PM Revision 5661: tnrs_db: Updated with schema changes
- 04:54 PM Revision 5660: schemas/vegbien.sql: taxonverbatim: Removed subclass relationship to taxonlabel in order to allow multiple taxonverbatims to point to the same taxonlabel. This involves adding a taxonverbatim_id serial column and pointing all fkeys to taxonverbatim to that column.
- 04:43 PM Revision 5659: schemas/vegbien.sql: analytical_db_view: Fixed bug where needed to join on taxonverbatim before joining on taxonlabel, now that taxondetermination is linked directly to taxonverbatim. Interestingly, PostgreSQL did not flag this error when the schema was changed, but only when the schema was reloaded from the DDL.
- 04:30 PM Revision 5658: schemas/vegbien.ERD.mwb: Moved taxonlabel to the right of taxonverbatim to make room for taxonverbatim to expand
- 04:21 PM Revision 5657: schemas/vegbien.sql: Link taxondetermination to taxonverbatim (which is a subclass of taxonlabel) instead of directly to taxonlabel. This will enable later having multiple taxonverbatims for one taxonlabel.
- 04:04 PM Revision 5656: schemas/vegbien.sql: taxonlabel: Renamed identifyingtaxonomicname to taxonomicname because the taxonomicname provided by the datasource is now in taxonverbatim, so there is no name collision. Note that both of these fields store the same type of information, but taxonlabel's is autogenerated while taxonverbatim's is verbatim (and is only set if provided by the datasource).
- 03:57 PM Revision 5655: schemas/vegbien.sql: taxonlabel: Moved non-scoping fields to new taxonverbatim subclass table, which contains the component parts of the taxonlabel
- 03:06 PM Revision 5654: schemas/vegbien.sql: taxonlabel: Renamed taxonlabel_2_propagate_canon_label_id() to taxonlabel_2_set_canon_label_id() for clarity
- 03:04 PM Revision 5653: schemas/vegbien.sql: taxonlabel_2_propagate_canon_label_id(): If no matched taxonlabel, make self-reference. This fixes a bug in analytical_db_view where rows without a canon_label_id were excluded because they did not have a corresponding canonical taxonlabel.
- 02:53 PM Revision 5652: schemas/vegbien.sql: taxonlabel_unique unique index: Removed binomial, author, taxonomicname, and morphospecies because these are now part of the identifyingtaxonomicname, which is also in the unique index
- 02:44 PM Revision 5651: schemas/vegbien.sql: taxonlabel: Require either an identifyingtaxonomicname or a taxonepithet. The NCBI inserted row count decreases by one because this prunes off a taxonlabel created for a parent node which was not contained in the first two rows (remember that NCBI taxa are not in dependency order, so parents are often imported after children).
- 02:41 PM Revision 5650: mappings/VegCore-VegBIEN.csv: Also generate the identifyingtaxonomicname for the original* taxondetermination's taxonlabel
- 02:31 PM Revision 5649: schemas/vegbien.sql: taxonlabel: Renamed taxonomicnamewithauthor to taxonomicname because it is equivalent to Darwin Core's scientificName
- 02:25 PM Revision 5648: mappings/VegCore-VegBIEN.csv: Also include morphospecies in the identifyingtaxonomicname, except for the matched TNRS taxonlabel, which should not contain morphospecies information
- 02:14 PM Revision 5647: mappings/VegCore-VegBIEN.csv: Mapped acceptedScientificName
- 01:51 PM Revision 5646: mappings/VegCore-VegBIEN.csv: Also create the identifyingtaxonomicname on the verbatim taxonlabel supplied by the datasource, in addition to on the TNRS input taxonlabel that the verbatim taxonlabel is matched up with
- 01:46 PM Revision 5645: mappings/VegCore-VegBIEN.csv: Expanded brace expressions for putting together the identifyingtaxonomicname
- 01:21 PM Revision 5644: mappings/VegCore-VegBIEN.csv: Always generate the concatenated identifyingtaxonomicname, even for higher taxa, to ensure that this field is always populated. Note that this will cause names of higher taxa to be scrubbed by TNRS, but this is usually not a problem because such names either have no match or not a close enough match based on the name only. Naming conventions generally cause names at different ranks to be different, so that collisions with lower ranks should not be a problem.
- 01:05 PM Revision 5643: tnrs_db: Fixed bug where needed to remove internal identifyingtaxonomicname duplicates as well as duplicates with existing Name_submitted values, to avoid violating the TNRS.tnrs pkey constraint when the scrubbed names are later inserted. Note that the taxonlabel_0_unique_identifying_name unique index is not sufficient to prevent internal duplicates, because it includes the creator_id (and thus allows multiple instances of the same name defined by different creators).
- 01:01 PM Revision 5642: sql.py: mk_select(): Don't add table0 to order_by with no table, because this could cause it not to match a corresponding DISTINCT ON column with no explicit table. PostgreSQL apparently does not treat a column with no explicit table and a column with the applicable table as identical for purposes of ORDER BY/DISTINCT ON checking, even when they refer to the same physical column.
- 12:53 PM Revision 5641: sql.py: mk_select(): order_by defaults to first distinct_on column when distinct_on provided
- 12:36 PM Revision 5640: tnrs_db: Updated with schema changes
- 12:33 PM Revision 5639: schemas/vegbien.sql: taxonlabel: Renamed taxonomicnamewithauthor to taxonomicname because it is equivalent to Darwin Core's scientificName
- 12:25 PM Revision 5638: schemas/vegbien.sql: taxonlabel: Renamed taxonomicname to binomial because it excludes the author
- 12:15 PM Revision 5637: schemas/vegbien.sql: taxonlabel.taxonomicname, taxonomicnamewithauthor comments: Corrected to show that taxonomicnamewithauthor is actually scientificName, while taxonomicname does not directly correspond to a DwC term (but would be the binomial)
- 12:13 PM Revision 5636: schemas/vegbien.sql: taxonlabel.taxonomicnamewithauthor comment: Removed no longer applicable 'Equivalent to "Name sec. x"'. The "sec" is now stored in taxonconcept.concept_reference_id.
- 12:10 PM Revision 5635: mappings/Makefile: .VegCore.csv.last_cleanup: Remove duplicate entries using uniq
- 12:09 PM Revision 5634: mappings/VegCore.csv: Removed duplicate entries using uniq
- 12:06 PM Revision 5633: mappings/VegCore.csv: Removed *scientificNameWithAuthorship, which are now represented by *scientificName
- 12:04 PM Revision 5632: mappings: Renamed *scientificNameWithAuthorship to *scientificName because scientificNameWithAuthorship is actually a synonym of DwC's scientificName ("The full scientific name, with authorship and date information if known" <http://rs.tdwg.org/dwc/terms/#scientificName>)
- 11:57 AM Revision 5631: mappings: Renamed *scientificName to *binomial because DwC defines the scientificName as "The full scientific name, with authorship and date information if known", but many datasources do not include the author in their scientific name, and the fields scientificName is mapped to in VegBIEN assume it does not include the author
- 11:44 AM Revision 5630: mappings/VegCore.csv: Added verbatimBinomial
- 11:41 AM Revision 5629: mappings/VegCore.csv: Redefined *binomial to "Taxonomic name without author", rather than genus+species
- 11:32 AM Revision 5628: schemas/vegbien.sql: taxonconcept.taxonlabel_id: Changed type from serial to integer because this is a subclass, and therefore each taxonconcept must first have a corresponding entry in taxonlabel
- 11:29 AM Revision 5627: schemas/vegbien.sql: Moved taxonlabel.concept_reference_id to new taxonconcept table, which is a subclass of taxonlabel that adds information about who the taxon concept is according to
- 11:13 AM Revision 5626: taxonlabel: Renamed accepted_label_id to canon_label_id to allow any taxonlabel to be the canonical taxonlabel for this taxonlabel, whether or not its status is accepted
- 11:01 AM Revision 5625: schemas/filter_ERD.csv: Remove the methodtaxonclass.submethod_id fkey to taxonlabel, to make room in the ERD for additional taxon tables
- 10:52 AM Revision 5624: schemas/vegbien.sql: establishmentmeans_dwc: Corrected source comment
- 10:51 AM Revision 5623: schemas/vegbien.sql: taxonomic_status enum: Added source comment
- 10:49 AM Revision 5622: schemas/vegbien.sql: taxonlabel_relationship: Added relationship, with relationship enum
- 10:21 AM Revision 5621: mappings/VegCore-VegBIEN.csv: Mapped taxonomicStatus
- 10:20 AM Revision 5620: inputs/.TNRS/tnrs/test.xml.ref: Updated inserted row count
- 10:15 AM Revision 5619: mappings/VegCore.csv: Removed duplicate entry for taxonomicStatus, which is also a DwC term
- 10:14 AM Revision 5618: mappings/VegCore.csv: Added taxonomicStatus
- 10:13 AM Revision 5617: schemas/vegbien.sql: taxonlabel: Added taxonstatus, with taxonomic_status enum
- 09:40 AM Revision 5616: schemas/vegbien.sql: taxonlabel.creator_id comment: Removed no longer accurate comment that this is the "according to" and "Name sec. x", which is now stored in concept_reference_id
- 09:37 AM Revision 5615: schemas/vegbien.sql: taxonlabel: Added concept_reference_id, which is the entity that defined the taxon concept (who the taxon label is according to)
- 09:22 AM Revision 5614: schemas/vegbien.ERD.mwb: Moved taxonlabel_relationship to the right of taxonlabel to provide room for taxonlabel to grow
10/17/2012
- 04:27 PM Revision 5613: Regenerated vegbien.ERD exports
- 04:25 PM Revision 5612: mappings/VegCore-VegBIEN.csv: Remapped morphospecies to new taxonlabel.morphospecies per today's conference call
- 04:23 PM Revision 5611: schemas/vegbien.sql: taxonlabel: Added separate morphospecies field per today's conference call, where it was decided it could not go in taxonepithet (the lowest-rank component of the name)
- 04:17 PM Revision 5610: schemas/vegbien.sql: Deleted taxonusage table per today's conference call, where it was decided that it was not needed
- 04:14 PM Revision 5609: schemas/vegbien.sql: Renamed taxonlabel_ancestor to taxonlabel_relationship per today's conference call, where it was decided that it would eventually contain asserted relationships (such as synonym and parent) in addition to autopopulated ancestor relationships
- 04:12 PM Revision 5608: schemas/vegbien.sql: Renamed taxonconcept to taxonlabel per today's conference call, where it was decided that taxonconcept contained too many unrelated fields to be purely a taxon concept
- 04:01 PM Revision 5607: inputs/import.stats.xls: Updated import times
- 04:01 PM Revision 5606: inputs/test_taxonomic_names/_scrub/public.sql, TNRS.sql: Regenerated with schema changes
- 01:47 PM Revision 5605: schemas/vegbien.ERD.mwb: Fixed lines
- 01:45 PM Revision 5604: Regenerated vegbien.ERD exports
- 01:44 PM Revision 5603: schemas/vegbien.sql: taxonconcept_ancestor: Renamed taxonconcept_id to descendant_id to emphasize the direction of the relationship between the two taxonconcepts
- 01:35 PM Revision 5602: schemas/vegbien.ERD.mwb: Added taxonconcept_ancestor to the diagram since it is now a core table for storing taxonomic information
- 01:15 PM Revision 5601: mappings/VegCore-VegBIEN.csv: Mapped accordingTo to taxonconcept.creator_id, and have it take the place of identifiedBy when both are present
- 01:12 PM Revision 5600: mappings/VegCore-VegBIEN.csv: Remapped people's names split apart into name components in party to new party.fullname, which does not require splitting or make assumptions about the number of people who may be listed in a particular name field and which components of their name(s) are present
- 01:02 PM Revision 5599: schemas/vegbien.sql: party: Added fullname
- 12:55 PM Revision 5598: mappings/VegCore.csv: Added accordingTo
- 12:47 PM Revision 5597: inputs/.TNRS/tnrs/map.csv: Mapped Name_matched_url to scientificNameID, since the URL uniquely identifies the matched taxonconcept
- 12:43 PM Revision 5596: schemas/vegbien.sql: taxonconcept: Renamed taxonname to taxonepithet for clarity and to be consistent with TCS's use of "epithet" to denote what the taxonname was intended to be (http://www.tdwg.org/standards/117/download/#/UserGuidev_1.3.pdf)
- 12:18 PM Revision 5595: schemas/vegbien.sql: taxonconcept.creator_id: Documented that this is the concept reference for a taxon concept with an "according to", or the identifier's name for a nominal concept, and is equivalent to "Name sec. x"
- 12:02 PM Task #497 (Resolved): create examples of taxonomic names to test the limits of the new taxonomic schema
- See "*testNames.txt*":https://projects.nceas.ucsb.edu/nceas/attachments/download/377/testNames.txt
- 11:50 AM Revision 5594: sql_io.py: import_csv(): Add a row_num column at the beginning of the table, which is autopopulated by csvs.RowNumFilter (it cannot be autopopulated by the serial datatype, because this does not support COPY FROM with a NULL-equivalent value in the serial field). This fixes a bug in csv2db where rows would not stay in inserted order upon querying the table, and would be returned in a different order each query, which prevented LIMIT/OFFSET based subsetting from returning consistent, nonoverlapping results. This occurs because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html>), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.
- 11:43 AM Revision 5593: csvs.py: Added RowNumFilter, which adds a row # column at the beginning of each row
- 11:42 AM Revision 5592: streams.py: LineCountStream, LineCountInputStream: Fixed bug where line_num was 1 too high because it started at 1 *and* was incremented *before* each line is returned. It now properly starts at 1, but the initial line_num value is 0 to increment to 1 upon encountering the first line. This off-by-one behavior may have been needed for code that associates an error message with a line #, but such code should add 1 to the line_num to get the line # of the error *if* the error prevents the next line from being read by the LineCount*Stream.
- 11:04 AM Revision 5591: sql_io.py: import_csv(): Take a reader and header rather than a stream to allow callers to pass in a wrapped CSV reader for filtering, etc.
- 11:00 AM Revision 5590: sql_io.py: append_csv(): Take a reader and header rather than a stream_info and stream to allow callers to use the simpler csvs.reader_and_header() function. This also allows callers to pass in a wrapped CSV reader for filtering, etc.
- 10:44 AM Revision 5589: csv2db, tnrs_db: Removed ProgressInputStream wrapper around input stream, which is no longer needed (and causes overlapping output) now that sql_io.append_csv() prints # rows read
- 10:42 AM Revision 5588: sql_io.py: append_csv(): Wrap input stream in a ProgressInputStream that reports rows (rather than lines) read
- 10:40 AM Revision 5587: csvs.py: InputRewriter: Use new StreamFilter to translate StopIteration EOF to ''
- 10:36 AM Revision 5586: csvs.py: Added StreamFilter
- 10:36 AM Revision 5585: csvs.py: InputRewriter: Also support stream inputs which report EOF as '' instead of StopIteration
- 09:55 AM Revision 5584: sql_io.py: append_csv(): Removed no longer used INSERT mode, since all callers now use the default COPY FROM
- 09:53 AM Revision 5583: sql_io.py: import_csv(): Removed no longer needed manual setting of use_copy_from, which defaults to True in append_csv()
- 09:50 AM Revision 5582: csv2db: Removed no longer needed manual setting of use_copy_from, which defaults to True in sql_io.import_csv()
- 09:49 AM Revision 5581: csv2db: Removed no longer needed separate handling of sql.DatabaseErrors, because all recoverable errors caused by COPY FROM (EncodingException and ragged rows) are now handled or avoided
- 09:46 AM Revision 5580: csv2db: Handle EncodingException separately by changing the connection encoding to LATIN1 and retrying
- 09:45 AM Revision 5579: sql.py: DbConn: Added set_encoding()
- 09:32 AM Revision 5578: sql_io.py: append_csv(): Parse any exceptions generated by the COPY FROM using new sql.parse_exception()
- 09:28 AM Revision 5577: sql.py: run_query(): Factored exception parsing out into new parse_exception()
- 09:22 AM Revision 5576: sql.py: Added EncodingException and parse it in run_query()
- 09:14 AM Revision 5575: sql.py: Removed no longer used NameException
- 09:14 AM Revision 5574: csvs.py: Filter: Added empty close() method to support using it as a stream (such as with streams.ProgressInputStream)
- 09:01 AM Revision 5573: sql_io.py: append_csv(): Don't disable COPY FROM for TSVs, which are now supported using csvs.InputRewriter
- 08:59 AM Revision 5572: sql_io.py: append_csv(): COPY FROM: Wrap provided stream in standardizing stream to fix ragged rows (with unequal # columns) and nonstandard CSV dialects (such as TSV with \-escaped newlines)
- 08:56 AM Revision 5571: csvs.py: Added InputRewriter, which wraps a reader, writing each row back to CSV
- 08:54 AM Revision 5570: csvs.py: Added ColCtFilter, which gives all rows the same # columns
- 07:25 AM Revision 5569: sql_io.py: row_num_col_def: Changed type to integer so the row_num can be populated directly by the insert process
- 07:19 AM Revision 5568: sql_io.py: Added row_num_col_def for use by import_csv(). The row_num column will be necessary again because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html>), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.
10/16/2012
- 10:58 PM Revision 5567: mappings/VegCore.csv: Removed unit-ambiguous height. Use height_m, height_ft instead.
- 10:57 PM Revision 5566: mappings/Veg+-VegCore.csv: Added height
- 10:57 PM Revision 5565: mappings/Veg+-VegCore.csv: Added height
- 10:52 PM Revision 5564: mappings/VegCore-VegBIEN.csv: Removed no longer used height mapping. Use height_m, height_ft instead.
- 10:39 PM Revision 5563: README.TXT: Data import: import_all: Added NCBI backbone to note about import_all not immediately returning control to the shell
- 10:30 PM Revision 5562: inputs/FIA/Organism/map.csv: Height: Remapped to height_ft, assuming units based on the range of values, the height of the tallest tree, and location inside the U.S.
- 10:23 PM Revision 5561: inputs/FIA/Organism/test.xml.ref: Accepted new inserted row count
- 10:01 PM Revision 5560: mappings/VegCore-VegBIEN.csv: Mapped height_ft
- 09:58 PM Revision 5559: schemas/functions.sql: Added _ft_to_m()
- 09:52 PM Revision 5558: mappings/VegCore.csv: Added height_ft
- 09:38 PM Revision 5557: inputs/SALVIAS/stems/map.csv: stem_height_m: Remapped to height_m using units from <http://salvias.net/Documents/salvias_data_dictionary.html#Plot+data>
- 09:37 PM Revision 5556: inputs/SALVIAS-CSV/Organism/map.csv: stem_height_m: Re-sourced units to stem_height_m rather than height_m definition in SALVIAS data dictionary
- 09:29 PM Revision 5555: Regenerated vegbien.ERD exports
- 09:23 PM Revision 5554: schemas/vegbien.sql: taxonconcept: taxonconcept_update_ancestors() trigger: Fixed bug where matched_concept_id needed to be changed to NULL when equal to taxonconcept_id, to avoid including the node itself with its parent's ancestors (which would violate the taxonconcept_ancestor pkey)
- 09:19 PM Revision 5553: sql_io.py: put_table(): Ensuring into's out_pkey is different from in_pkey: Prepend "out." instead of out_table to avoid long column names for the output pkey
- 09:18 PM Revision 5552: sql_gen.py: concat(): Allow multiple "column" suffixes with "." when matching the existing suffix
- 08:47 PM Revision 5551: schemas/vegbien.sql: taxonconcept: taxonconcept_update_ancestors() trigger: Corrected comment explaining why we don't need an ON DELETE trigger to say that this is because the foreign key for *taxonconcept_ancestor.ancestor_id*, not taxonconcept.parent_id, is ON DELETE CASCADE. The auto-deletion will also occur if taxonconcept.parent_id is ON DELETE CASCADE, because taxonconcept_ancestor.taxonconcept_id is ON DELETE CASCADE, but it is not actually necessary to have cascading deletes on taxonconcept.parent_id (and SET NULL may in fact sometimes be more appropriate).
- 08:33 PM Revision 5550: schemas/tree_cross-links.sql: Removed header comments added by pgAdmin
- 08:30 PM Revision 5549: schemas/tree_cross-links.sql: Updated for new taxonconcept_update_ancestors() trigger
- 08:21 PM Revision 5548: schemas/vegbien.sql: taxonconcept: Rewrote taxonconcept() trigger to avoid completely reinserting the taxonconcept_ancestor entries of all descendants every time taxonconcept changes or using trigger recursion to find descendants. Instead, just delete the old parent's ancestors from and add the new parent's ancestors to each descendant, using taxonconcept_ancestor itself (with the new taxonconcept_ancestor_descendants index) to find all descendants. As an additional optimization, only update taxonconcept_ancestor if the parent_id or matched_concept_id has actually changed. This fixes a bug in NCBI where inserting taxonconcepts out of dependency order caused taxonconcept_ancestor entries to be repeatedly regenerated, slowing the import down to a crawl.
- 07:42 PM Revision 5547: schemas/vegbien.sql: taxonconcept: Added taxonconcept_3_parent_id_avoid_self_ref() trigger to avoid recursive references in root taxonconcepts (taxonconcepts with no parent). This will simplify the new taxonconcept_update_ancestors() trigger.
- 06:32 PM Revision 5546: schemas/vegbien.sql: taxonconcept_ancestor: Added taxonconcept_ancestor_descendants index to support looking up all the descendants for a taxonconcept. This will be used by the new taxonconcept_update_ancestors() trigger, which will support inserting taxonconcepts out of dependency order (such as for NCBI).
- 04:35 PM Revision 5545: schemas/vegbien.sql: *_update_ancestors(): Made trigger deferred, so that it would run after all rows have been inserted in a bulk insert, such as during column-based import. This ensures that ancestors lists are not populated until all parents are inserted, which may occur out of order for datasources (such as NCBI) whose nodes are not in dependency order. (A node that newly acquires a parent will have to update all its descendants, which will then be updated again when its parent acquires its own parent.)
- 04:28 PM Revision 5544: lib/PostgreSQL-MySQL.csv: Also filter out constraint triggers in addition to regular triggers
10/15/2012
- 05:37 PM Revision 5543: inputs/Madidi/Organism/map.csv: Total height: Remapped to height_m, assuming units based on the range and precision of values
- 05:33 PM Revision 5542: inputs/VegBank/stemcount/map.csv: stemheight: Remapped to height_m using units from <http://vegbank.org/vegbank/views/dba_tabledescription_detail.jsp?view=detail&wparam=stemcount&entity=dba_tabledescription&where=where_tablename>
- 05:29 PM Revision 5541: inputs/SALVIAS/plotObservations/map.csv, inputs/SALVIAS-CSV/Organism/map.csv: height_m, stem_height_m: Remapped to height_m using units from <http://salvias.net/Documents/salvias_data_dictionary.html#Plot+data>
- 05:24 PM Revision 5540: mappings/VegCore-VegBIEN.csv: Mapped height_m
- 05:15 PM Revision 5539: mappings/VegCore.csv: Added height_m
- 04:20 PM Revision 5538: mappings/VegCore.csv, VegCore-VegBIEN.csv: Removed no longer used and unit-ambiguous organismX, organismY. Use organismX_m, organismY_m instead.
- 04:18 PM Revision 5537: inputs/VegBank/stemlocation/map.csv: stemxposition, stemyposition: Remapped to organismX_m/organismY_m using units from <http://vegbank.org/vegbank/views/dba_tabledescription_detail.jsp?view=detail&wparam=stemlocation&entity=dba_tabledescription&where=where_tablename>
- 04:06 PM Revision 5536: inputs/TEAM/*/map.csv: 1ha Plot X Coordinate, 1ha Plot Y Coordinate: Remapped to organismX_m/organismY_m using units from <https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/inputs/TEAM/_src/TEAM-DataPackage-20120920191251_3859/Vegetation+-+Trees+&+Lianas/Vegetation-Tree-and-Liana-Metadata-1.5.pdf>
- 03:59 PM Revision 5535: inputs/SALVIAS/plotObservations/map.csv, inputs/SALVIAS-CSV/Organism/map.csv: x_position, y_position: Remapped to organismX_m/organismY_m using units from <http://salvias.net/Documents/salvias_data_dictionary.html#Plot+data>
- 03:51 PM Revision 5534: inputs/Madidi/Organism/map.csv: Subplot X, Subplot Y: Remapped to organismX_m/organismY_m, assuming units based on the size of values relative to the plot area, which has units of ha
- 03:44 PM Revision 5533: inputs/CTFS/StemObservation/map.csv: x, y: Remapped to organismX_m/organismY_m, assuming units based on the size of values relative to plot area, which has units of ha
- 03:30 PM Revision 5532: mappings/VegCore-VegBIEN.csv: Mapped organismX_m, organismY_m
- 03:29 PM Revision 5531: mappings/VegCore.csv: Added organismX_m, organismY_m
- 03:23 PM Revision 5530: sql_io.py: put_table(): full_in_table: Create it using new sql.copy_table() instead of sql.run_query_into()
- 03:23 PM Revision 5529: sql.py: Added copy_table()
- 03:14 PM Revision 5528: sql.mk_select() calls: Removed no longer needed order_by=None when limit=0
- 03:11 PM Revision 5527: sql.py: mk_select(): Set order_by to None if limit == 0
- 03:09 PM Revision 5526: inputs/.TNRS/schema.sql: Documented that accepted names must be processed before any names that resolve to them, because the entry for the accepted name contains all the ranks parsed out but the resolved name of another entry contains just some ranks and the taxonomic name. Column-based import will do this automatically when the total # of rows is <= the partition_size (because _taxonconcept_set_matched_concept_id()'s accepted taxonconcept is created after the main taxonconcept), but TNRS has more rows than this so sorting is needed to ensure that all the accepted names are processed in the first partitions.
- 02:52 PM Revision 5525: sql.py: table_order_by(): Cache the order_by in table.order_by and propagate it when a LIKE table is created
- 02:51 PM Revision 5524: sql_gen.py: Table: Added order_by attr to cache the results of table_order_by()
- 02:36 PM Revision 5523: sql.select() calls: Removed order_by=None everywhere that a stable row order is required (i.e. consistent between selects, or consistent between table transformations). This causes several tests to return different inserted row counts, because the input table is now being accessed in pkey order instead of in table order. This fixes a bug where tables with more rows than ~100 would return different results for repeated calls of the same non-ordered select.
- 02:27 PM Revision 5522: sql.py: mk_select(): Use table_order_by() instead of table_pkey_col() to determine what column(s) to order by if order_by is set to order_by_pkey
- 02:26 PM Revision 5521: sql.py: Added table_pkey_index(), index_order_by(), table_cluster_on(), table_order_by()
- 01:10 PM Revision 5520: sql.py: Added index_exprs() and use it in index_cols()
- 01:08 PM Revision 5519: README.TXT: Data import: On local machine: Added `make inputs/.TNRS/cleanup`, which is necessary because the PostgreSQL collation may differ between vegbiendev's and your DB
- 12:24 PM Revision 5518: schemas/vegbien.sql: taxonconcept: taxonconcept_update_ancestors(): Use matched_concept_id's ancestors instead if available. (Recursively applied, this will use the ancestors of the accepted concept.) This facilitates finding all children of and matches to an accepted concept, which will all have an entry for that concept in taxonconcept_ancestor. Note that the concept's own parents will not be indexed in taxonconcept_ancestor, because only accepted ancestors are now stored in taxonconcept_ancestor. Documented that taxonconcept_ancestor now stores the *accepted* ancestors of a taxonconcept.
- 12:14 PM Revision 5517: schemas/vegbien.sql: taxonconcept: taxonconcept_2_propagate_accepted_concept_id(): Also update accepted_concept_id on concepts that resolve to this concept, which may have been created before this concept was marked as accepted if concepts are not imported in dependency order (accepted concepts first). Added index on matched_concept_id to speed up finding concepts that resolve to this concept.
- 12:10 PM Revision 5516: sql.py: mk_select(): order_by is order_by_pkey: Only order by the table's actual pkey, if it has one, rather than using the first column if it doesn't
- 12:08 PM Revision 5515: inputs/.TNRS/tnrs/test.xml.ref: Updated inserted row count
- 10:21 AM Revision 5514: db_xml.py: partition_size: Increased to 1,000,000 (>= NCBI.higher_taxa's size) so NCBI.higher_taxa can be imported completely in one partition. This is necessary because NCBI's taxonconcepts are not in dependency order (parents first), so a later partition cannot rely on the parents of its taxonconcepts having already been imported. Instead, all taxonconcepts must be imported at once and then separately, the parents of all taxonconcepts must be set.
- 10:08 AM Revision 5513: mappings/VegCore-VegBIEN.csv: taxonconcept.parent_id when explicit parent provided: Set taxonconcept.parent_id using new _taxonconcept_set_parent_id() *after* creating the child taxonconcept, so that the parent_id will point to the already-inserted parent taxonconcept instead of creating a new, empty parent taxonconcept. This creates a two-step import, where first the taxonconcepts are imported, and then the parent_ids are matched up. This is necessary for column-based import because all the parent taxonconcepts are imported in a separate iteration from the child taxonconcepts with only their sourceaccessioncode, so this iteration must occur after the child taxonconcept iteration in order to match up with fully-populated taxonconcepts. Row-based import, on the other hand, does not require _taxonconcept_set_parent_id() but does require the taxonconcepts to be provided in dependency order (parents first), which is unfortunately not the case for NCBI.
- 09:57 AM Revision 5512: schemas/vegbien.sql: *_update_ancestors(): Telling immediate children to update their ancestors lists: Exclude self to avoid infinite recursion
- 09:57 AM Revision 5511: schemas/vegbien.sql: *_update_ancestors(): Telling immediate children to update their ancestors lists: Exclude self to avoid infinite recursion
- 09:41 AM Revision 5510: schemas/vegbien.sql: Added _taxonconcept_set_parent_id()
- 09:37 AM Revision 5509: schemas/vegbien.sql: Renamed _set_matched_taxonconcept() to _taxonconcept_set_matched_concept_id() so that the function name is prefixed with the table it applies to
- 09:35 AM Revision 5508: db_xml.py: put(): Treat a child node which is a function (starts with _) as a child with fkey to parent rather than as a field in the table. Such a function accepts the table's pkey as one of its arguments.
- 09:05 AM Revision 5507: sql_gen.py: map_expr(): Don't replace an unquoted name when followed by ",", as it would be in an into table name for a function with multiple arguments (e.g. family in "_join_words(1=Field family, 2=Field name)")
- 08:49 AM Revision 5506: schemas/vegbien.sql: locationevent: Moved obsstartdate, obsenddate to top of table so they would be visible in the ERD
- 08:45 AM Revision 5505: sql_io.py: put_table(): ensure_cond(): track_data_error(): Concatenate the columns in the constraint together using , rather than adding a separate entry for each column, because the constraint is applicable to all columns together rather than to each column separately
- 08:26 AM Revision 5504: sql_io.py: put_table(): Renamed ignore_cond() to ensure_cond() for clarity
- 08:22 AM Revision 5503: import_all: Also import the NCBI tree of life, before the TNRS names
- 08:17 AM Revision 5502: mappings/VegCore-VegBIEN.csv: Also map acceptedFamily to the corresponding NCBI family
- 08:07 AM Revision 5501: lib/PostgreSQL-MySQL.csv: custom types: Also exclude time. Reordered excluded (built-in) types by name.
- 07:57 AM Revision 5500: inputs/import.stats.xls: Updated import times
- 07:50 AM Revision 5499: schemas/vegbien.sql: Changed `timestamp with time zone` fields to `date` because time information is not stored in these fields, and it's confusing to have an arbitrary timezone (the server's timezone) and an arbitrary time (midnight) set for input data that only has a precision to the nearest day
- 07:43 AM Revision 5498: sql_gen.py: null_sentinels: Added entry for date
- 07:40 AM Revision 5497: lib/PostgreSQL-MySQL.csv: custom types: Also exclude date, datetime
- 07:11 AM Revision 5496: README.TXT: Documentation: To import and scrub just the test taxonomic names: Run `make backups/TNRS.backup/restore` in the background because it takes awhile
- 06:45 AM Revision 5495: mappings/VegCore.csv: Re-sourced TaxonomicRankEnum fields to the official TCS schema rather than the TCS version in VegX
- 06:42 AM Revision 5494: schemas/vegbien.sql: taxonrank: Updated source to the TCS schema (rather than VegBank) for the new, expanded list. Note that although the list itself was compiled from the TCS version in VegX, the official TCS download does not differ from the VegX TCS in the TaxonomicRankEnum fields (the xs: namespace has just been replaced with xsd: by VegX).
10/12/2012
- 05:21 PM Revision 5493: schemas/vegbien.sql: analytical_db_view: taxonconcept: Join again on the accepted_concept_id in order to use the accepted taxonconcept rather than the verbatim taxonconcept from the datasource
- 05:14 PM Revision 5492: schemas/: svn:ignore log files
- 05:11 PM Revision 5491: Added inputs/.NCBI/. This uses many of the new schema and mappings features, such as taxonconcept.sourceaccessioncode and parentTaxonID
- 05:07 PM Revision 5490: mappings/VegCore-VegBIEN.csv: identifyingtaxonomicname: Don't create if taxonconcept has an explicit parent, because the taxonName (which is generally only a component of the full taxonomic name, e.g. specificEpithet) is not globally unique. Datasources that provide name components in such a way that levels at or below family can't be directly concatenated cannot currently receive an identifyingtaxonomicname for input to TNRS.
- 04:54 PM Revision 5489: mappings/VegCore-VegBIEN.csv: taxonName->identifyingtaxonomicname: Don't include the rank with the taxonName, because TNRS only allows the rank to be included in the taxonomic name if it's infraspecific (otherwise, it returns no or an invalid match due to the presence of what it sees as an invalid term or a name component)
- 04:48 PM Revision 5488: mappings/VegCore-VegBIEN.csv: Mapped taxonName to the TNRS input taxonconcept's identifyingtaxonomicname
- 04:28 PM Revision 5487: mappings/VegCore-VegBIEN.csv: Only forward taxonRank to the parent taxonconcept (which stores the infraspecific taxonconcept when the infraspecificEpithet is provided) if there is no explicit parent provided via parentTaxonID/etc.
- 04:09 PM Revision 5486: mappings/VegCore-VegBIEN.csv: Mapped parentScientificNameID, parentTaxonConceptID, parentTaxonID
- 04:03 PM Revision 5485: mappings/VegCore.csv: Added parentScientificNameID, parentTaxonConceptID, parentTaxonID
- 03:53 PM Revision 5484: input.Makefile: $(inDatasrc): Also include the vegbien_dest $schemas in the search_path, so that the datasource's SQL scripts (create.sql, etc.) can use VegBIEN functions and types
- 03:44 PM Revision 5483: lib/common.Makefile: Added $(comma)
- 02:41 PM Revision 5482: inputs/test_taxonomic_names/_scrub/public.sql: Regenerated with schema changes
- 02:38 PM Revision 5481: input.Makefile: Maps building: %/.map.csv.last_cleanup: Fixed bug where needed to include $(coreMap) as a prerequisite, because even though it is not used directly in this target's recipe, it is used by targets invoked via recursive make after the main recipe runs. In general, whenever targets forward commands to a recursive make target, they also need to forward those recursive targets' prerequisites by including them in their own prerequisites list.
- 02:29 PM Revision 5480: mappings/VegCore-VegBIEN.csv: Mapped taxonConceptID, taxonID, scientificNameID to taxonconcept.sourceaccessioncode. Note that taxonconcept stores all of these taxonomic entities, using creator_id+creationdate, taxonname+rank+parent_id, and identifyingtaxonomicname, respectively.
- 02:28 PM Revision 5479: mappings/VegCore-VegBIEN.csv: Mapped taxonConceptID, taxonID, scientificNameID to taxonconcept.sourceaccessioncode. Note that taxonconcept stores all of these taxonomic entities, using creator_id+creationdate, taxonname+rank+parent_id, and identifyingtaxonomicname, respectively.
- 02:13 PM Revision 5478: mappings/VegCore-VegBIEN.csv: Mapped taxonName
- 02:11 PM Revision 5477: mappings/VegCore.csv: Added taxonName
- 02:05 PM Revision 5476: schemas/vegbien.ERD.mwb: Fixed lines
- 01:55 PM Revision 5475: schemas/vegbien.sql: Copied functions in the functions schema that are also used by the public schema to the public schema, so that reinstalling the functions schema would not cause anything that depends on a function in it to be cascadingly deleted. Currently, this just affects analytical_db_view, which uses _fraction_to_percent().
- 01:44 PM Revision 5474: inputs/test_taxonomic_names/_scrub/public.sql: Regenerated with schema changes
- 01:36 PM Revision 5473: schemas/vegbien.sql: taxonconcept: Added taxonconcept_2_propagate_accepted_concept_id() trigger to auto-populate the accepted_concept_id
- 12:53 PM Revision 5472: schemas/vegbien.sql: taxonconcept.sourceaccessioncode: Added descriptive comment
- 12:53 PM Revision 5471: schemas/vegbien.sql: taxonconcept.accepted_concept_id: Added descriptive comment
- 12:48 PM Revision 5470: Regenerated vegbien.ERD exports
- 12:47 PM Revision 5469: schemas/vegbien.sql: taxonconcept: Added sourceaccessioncode, and allow it to scope the taxonconcept when provided
- 12:33 PM Revision 5468: inputs/test_taxonomic_names/_scrub/public.sql: Regenerated with schema changes
- 12:29 PM Revision 5467: schemas/vegbien.sql: taxonconcept: Renamed canon_concept_id to matched_concept_id, because this is actually the closest-match taxonconcept in the match hierarchy (datasource concept -> parsed concept -> matched concept -> accepted concept) rather than the accepted synonym, which goes in accepted_concept_id
- 05:51 AM Revision 5466: Regenerated vegbien.ERD exports
- 05:47 AM Revision 5465: schemas/vegbien.sql: taxonconcept: Renamed canon_concept_id to matched_concept_id, because this is actually the closest-match taxonconcept in the match hierarchy (datasource concept -> parsed concept -> matched concept -> accepted concept) rather than the accepted synonym, which goes in accepted_concept_id
- 05:34 AM Revision 5464: schemas/vegbien.sql: taxonconcept: Added accepted_concept_id
- 05:27 AM Revision 5463: schemas/vegbien.sql: taxonconcept.canon_concept_id: comment: Changed "accepted synonym" to "closest match", since canon_concept_id is actually a hierarchy from datasource concept -> parsed concept -> matched concept -> accepted concept
- 05:22 AM Revision 5462: schemas/vegbien.sql: taxonconcept: Added order # to trigger names so they run in a defined order (triggers are run in alphabetical order)
- 05:01 AM Task #498 (Resolved): add definitions to columns in "green tables"
- Definitions have been added on all new tables (currently taxonconcept and taxonconcept_ancestor). Other tables' colum...
- 04:56 AM Task #462 (Resolved): name backups according to svn revision instead of or in addition to the date
- This now happens for backups as well as log files
- 04:53 AM Revision 5461: README.TXT: Use new revision # in log filenames to get all the logs for an import. Changed <datetime> to <version> because the rotated public schema now also includes the svn revision.
- 04:44 AM Revision 5460: lib/common.Makefile: $(version): Include both the svn revision when make was started as well as the svn revision when the command is actually run (when these values differ), in case svn was updated between the time an import was started and the time a particular table started being imported. Because tables within a datasource are imported sequentially, it is possible that an update would have happened before the last table started importing.
- 04:23 AM Revision 5459: Makefile: Moved setting of $(root) before include of lib/common.Makefile because it's used by lib/common.Makefile
- 04:21 AM Revision 5458: Factored OS section out from Makefile, input.Makefile into lib/common.Makefile
- 04:13 AM Revision 5457: Makefile, input.Makefile: Use new $(version), which unlike $(date) also includes the svn revision, to version log files, etc. This way, the working copy can be put back to the way it was at the time of a given import (excluding changes to nonversioned files). This also makes it easier to get all the log files for a particular import when different tables' imports started at different times.
- 04:08 AM Revision 5456: Makefile: Added $(root) for use with $(rootRevision)
- 04:08 AM Revision 5455: lib/common.Makefile: Added $(version), to replace $(date) for versioning log files, etc., and helper function $(rootRevision)
- 04:07 AM Revision 5454: lib/common.Makefile: Added $(revision)
- 04:04 AM Revision 5453: input.Makefile: Removed no longer used $(SED)
- 04:03 AM Revision 5452: lib/common.Makefile: Added $(sed)
- 03:58 AM Revision 5451: Factored $(date) out from Makefile, input.Makefile into lib/common.Makefile
- 03:18 AM Revision 5450: sql_io.py: put_table(): DuplicateKeyException: Fixed bug where indexes with conditions needed to have the input rows filtered by the condition, to prevent trying to retrieve an existing/inserted row using a join on the index columns when the index in fact does not apply. This fixes a bug in the import of taxonconcept where the taxonconcept_0_unique_identifying_name unique index has a condition which was not satisfied for input rows with no identifyingtaxonomicname, causing any input row with NULL in this column to match *all* taxonconcepts with a NULL identifyingtaxonomicname. This uses ignore_cond()'s new support for constraints that did not fail at least once.
- 03:12 AM Revision 5449: sql_io.py: put_table(): ignore_cond(): Added support for constraints that did not fail at least once, and therefore should not be required to simplify to a non-false value. As part of this, only track the failed constraint in the errors table if it actually failed at least once based on the deleted row count or the `failed` param.
- 03:05 AM Revision 5448: sql_gen.py: map_expr(): Fixed bug where names were being replaced when they were inside another name. This occurred with combined names created by sql_io.into_table_name().
- 01:11 AM Revision 5447: sql.py: ConstraintException: message: Wrap condition in strings.as_tt()
- 01:06 AM Task #522 (New): fix deadlock when multiple testers are running simultaneously
- Commands running simultaneously:
* @make inputs/.TNRS/tnrs/test.by_col.xml verbosity=3@
* @make test by_col=1@
... - 12:30 AM Revision 5446: sql.py: run_query(): DuplicateKeyException: Also retrieve the index's condition using new index_cond()
- 12:28 AM Revision 5445: sql.py: Added index_cond()
- 12:11 AM Revision 5444: sql_io.py: put_table(): insert_into_pkeys(): Take a query as the param instead of sql.mk_select()'s params, to allow the caller to pass in any query without needing insert_into_pkeys() to manually pass through those args
10/11/2012
- 11:40 PM Revision 5443: sql.py: constraint_cond(): Fixed NotImplementedError message to apply to this function
- 09:53 PM Task #521 (Resolved): make place* tables use a structure similar to taxonconcept
- 09:36 PM Revision 5442: sql_io.py: put_table(): ignore_cond(): Log message: Replaced don't with do not so it wouldn't mess up syntax highlighting when viewing the log file in a text editor
- 09:07 PM Revision 5441: input.Makefile: Staging tables installation: Don't delete %/header.csv on error, because header.csv is a byproduct rather than the primary output and is created roughly atomically
- 08:40 PM Revision 5440: schemas/vegbien.sql: *_ancestor tables: Added descriptive comment that these are ancestor cross link tables
- 08:23 PM Revision 5439: csvs.py: sniff(): Support multi-char delims using \t, such as \t|\t used by NCBI. Support custom line suffixes, such as \t| used by NCBI.
- 08:18 PM Revision 5438: csvs.py: TsvReader.next(): Remove only the autodetected line ending instead of any standard line ending. Note that this requires all header override files to use the same line ending as the CSV they override, which is now the case.
- 08:15 PM Revision 5437: csvs.py: is_tsv(): Support multi-char delimiters by checking only the first char of the delimiter
- 08:12 PM Revision 5436: csvs.py: sniff(): Also autodetect the line ending
- 08:11 PM Revision 5435: csvs.py: sniff(): Also autodetect the line ending
- 08:02 PM Revision 5434: inputs/test_taxonomic_names/Taxon/+header.txt: Changed line endings to \r\n to match testNames.txt line endings. This will be necessary when the line ending is autodetected by csvs.sniff().
- 07:59 PM Revision 5433: csvs.py: TsvReader.next(): Renamed raw_contents var to line, since this is just the line with the ending removed
- 07:36 PM Revision 5432: strings.py: Replaced no longer used contains_any() with find_any(), which returns any found substring, or None if none of the substrings were found
- 07:22 PM Revision 5431: csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader
- 07:21 PM Revision 5430: csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader
- 06:58 PM Revision 5429: csvs.py: TsvReader: Use str.split() instead of csv.reader().next() to parse the row, for efficiency and to support multi-char delimiters. This is possible because the TSV dialect doesn't use CSV parsing features other than the delimiter and newline-escaping (which is handled separately).
- 06:02 PM Revision 5428: Regenerated vegbien.ERD exports
10/10/2012
- 11:43 AM Revision 5427: input.Makefile: $(exts): Added .dmp
- 11:43 AM Revision 5426: csvs.py: delims: Added |
- 11:28 AM Revision 5425: Removed no longer used inputs/.public/. Use inputs/.TNRS/ and inputs/.TNRS/tnrs/tnrs.make instead.
- 11:23 AM Revision 5424: README.TXT: Documentation: To import and scrub just the test taxonomic names: Added steps to restore the original DB when the test scrub is complete
- 11:22 AM Revision 5423: inputs/test_taxonomic_names/test_scrub: Also export the results to inputs/test_taxonomic_names/_scrub/
- 11:06 AM Revision 5422: inputs/test_taxonomic_names/test_scrub: Use regular for .. in loop with a list of what's being processed in each iteration (match_input_names, parse_accepted_names)
- 10:58 AM Revision 5421: inputs/.TNRS/tnrs/map.csv: Mapped Genus_score, Specific_epithet_score
- 10:56 AM Revision 5420: mappings/VegCore-VegBIEN.csv: Mapped matchedGenusFit_fraction, matchedSpeciesFit_fraction. Reordered canon_concept_fit_fraction _maxs in the order they would be used if _alt were being used instead.
- 10:52 AM Revision 5419: mappings/VegCore.csv: Added matchedSpeciesFit_fraction
- 10:47 AM Revision 5418: mappings/VegCore.csv: matchedFamilyFit_fraction: Source the "matched" to Family_matched, which is a closer fit than Name_matched. matchedGenusFit_fraction: Fixed Genus_matched source to use #detailed_download instead of #simple_download.
- 10:42 AM Revision 5417: mappings/VegCore.csv: Added matchedGenusFit_fraction
- 10:18 AM Revision 5416: README.TXT: Removed extra trailing whitespace
- 10:18 AM Revision 5415: README.TXT: Documentation: To import and scrub just the test taxonomic names: Use new inputs/test_taxonomic_names/test_scrub
- 10:17 AM Revision 5414: Added inputs/test_taxonomic_names/test_scrub
- 10:01 AM Revision 5413: schemas/vegbien.sql: taxonconcept: Renamed canon_taxonconcept_id to canon_concept_id to shorten the name, which is used often
- 09:45 AM Revision 5412: schemas/vegbien.sql: taxonconcept: Added taxonconcept_canon_concept_min_fit() trigger to remove the canon_concept_id link from insufficient matches. These occur when e.g. a name in another language is approximated to a latin name or when the input name is not a proper taxon but TNRS provides a best-guess match anyway.
- 09:42 AM Revision 5411: inputs/.TNRS/tnrs/map.csv: Mapped Family_score to new matchedFamilyFit_fraction
- 09:39 AM Revision 5410: mappings/VegCore-VegBIEN.csv: Use matchedFamilyFit_fraction as canon_concept_fit_fraction when greater than matchedTaxonFit_fraction, because if there is at least a matched family, there is a valid taxonconcept to attach to
- 09:39 AM Revision 5409: xml_func.py: Simplifying functions: Added _min, _max as passthroughs
- 09:34 AM Revision 5408: schemas/functions.sql: Added _max(), _min()
- 09:21 AM Revision 5407: mappings/VegCore.csv: Added matchedFamilyFit_fraction
- 09:04 AM Revision 5406: mappings/VegCore-VegBIEN.csv: Remapped matchedTaxonFit_fraction to the verbatim* taxonconcept, because this is actually for the verbatim* concept's fit to the matched concept, not the matched concept's fit to the accepted concept
- 08:59 AM Revision 5405: inputs/.TNRS/tnrs/map.csv: Restored *-prefixed output terms for unmapped terms that had initially been mapped to OMIT but could reasonably match to something in the future. Continue mapping Name_number to OMIT because it isn't globally unique (it identifies the name only within one TNRS batch).
- 08:45 AM Revision 5404: inputs/.TNRS/tnrs/map.csv: Mapped Overall_score to new matchedTaxonFit_fraction
- 08:44 AM Revision 5403: mappings/VegCore-VegBIEN.csv: Mapped matchedTaxonFit_fraction to _set_canon_taxonconcept(canon_concept_fit_fraction)
- 08:37 AM Revision 5402: mappings/VegCore.csv: Added matchedTaxonFit_fraction
- 08:20 AM Revision 5401: schemas/vegbien.sql: _set_canon_taxonconcept(): Also set the canon_concept_fit_fraction
- 08:10 AM Revision 5400: schemas/vegbien.sql: taxonconcept: Added canon_concept_fit_fraction to store the closeness of fit of the canon_concept
- 07:55 AM Revision 5399: schemas/vegbien.sql: taxonconcept: Renamed canon_taxonconcept_id to canon_concept_id to shorten the name, which is used often
- 07:10 AM Revision 5398: sql.py: mk_update(): in_place: Convert columns of type character varying to text so that they can be merge-joined with text columns. Note that these two types are equivalent but not aliases of one another, so the explicit type change is needed.
- 07:07 AM Revision 5397: sql_gen.py: Added canon_type()
- 06:52 AM Revision 5396: sql.py: mk_update(): in_place: Factored retrieval of column type out into separate statement for clarity
- 06:27 AM Revision 5395: schemas/functions.sql: _join*(): Fixed bug where was returning '' instead of NULL when only NULL inputs were provided, because array_to_string() always returns a non-NULL string. Functions must always return NULL in place of '' to ensure that empty strings do not find their way into VegBIEN, and to prevent inconsistencies between row-based and column-based import (row-based import folds empty strings to NULL while column-based import relies on having a clean input table).
- 06:10 AM Revision 5394: sql_io.py: cleanup_table(): Use sql.table_pkey_col() instead of sql.pkey_col() so that only an actual pkey column is removed from the list of columns to clean. This fixes a bug where the first column in the table was not cleaned up if there was no pkey. Note that this bug only affected newly re-created staging tables, because staging tables previously had a special row_num pkey column added if they did not already have a pkey. The row_num column is now added by column-based import instead.
- 05:51 AM Revision 5393: sql.py: table_pkey_col(): Raise a DoesNotExistException if the table has no pkey
- 05:23 AM Revision 5392: sql.py: pkey_col(): Call table_pkey_col() directly rather than via pkey_name(). pkey_name(): Call pkey_col() instead of table_pkey_col() now that pkey_col() calls table_pkey_col().
- 05:14 AM Revision 5391: sql.py: pkey_col(): Documented that if there is no pkey, returns the first column in the table
- 05:13 AM Revision 5390: sql.py: pkey_col(): Specify recover directly as a kw_arg because it's the only kw_arg passed to pkey_name()
- 05:10 AM Revision 5389: sql.py: Added table_pkey_col() and use it in pkey_name()
- 05:01 AM Revision 5388: sql.py: Renamed pkey() to pkey_name()
- 04:45 AM Revision 5387: sql.py: Renamed pkey_col_() to pkey_col()
- 04:43 AM Revision 5386: sql.py: Removed no longer used pkey_col
- 04:43 AM Revision 5385: db_xml.py: cleanup_table(): Inline sql.pkey_col ('row_num') because this is the only place it's used
- 04:37 AM Revision 5384: cleanup_table(): Use new sql.table_cols() instead of sql.table_col_names()
- 04:36 AM Revision 5383: sql.py: Added table_cols()
- 04:16 AM Revision 5382: db_xml.py: put(): Fixed bug where needed to avoid truncating the pkeys_loc table, in case it's the same as one of the in_tables. This occurs now that sql_io.put_table() passes through the actual input column instead of the joined-together input table's column when ignoring all rows.
- 03:33 AM Revision 5381: sql_io.py: put_table(): Resolving default value column: If ignoring all rows, use input cols directly instead of cols from joined-together input table. In addition to being simpler, this prevents the returned column's name from growing longer and longer as each iteration prepends its input table table name to the default value column name.
- 03:07 AM Revision 5380: sql_io.py: put_table(): Moved changing the table of the default value column from Resolving the default value column to Setting pkeys of missing rows, because the table change is only needed in this section
- 03:04 AM Revision 5379: sql_io.py: put_table(): Resolving default value column: Always call sql_gen.remove_col_rename() because it will just pass the value through if it's not a column
- 02:41 AM Revision 5378: sql_gen.py: simplify_parens(): Removed extra simplify_parens() at end because it is done in the final iteration that performs no other replacements, so it is not necessary to also do it explicitly
- 02:30 AM Revision 5377: sql_io.py: put_table(): Replaced limit_ref integer with ignore_all_ref boolean, because it is no longer used as a select statement limit
- 02:29 AM Revision 5376: sql_io.py: put_table(): remove_all_rows(): Corrected "just create an empty pkeys table" comment to "just return the default value column"
- 02:27 AM Revision 5375: sql_io.py: put_table(): mk_main_select(): Removed setting limit to limit_ref[0], because an empty pkeys table is no longer created when ignoring all rows
- 02:19 AM Revision 5374: sql_io.py: put_table(): Setting pkeys of missing rows: Removed "limit_ref[0] == 0" check because this code is never reached in that case
- 02:16 AM Revision 5373: sql_io.py: put_table(): Ignoring all rows for unrecoverable errors: Even in multi-row mode, just return whatever the default value or column was, instead of creating an output table containing the default value filled in for every row. This also assists the optimization to skip empty levels of taxonconcepts, because it folds the empty level to that level's parent level rather than creating a whole new temp table with ultimately the same contents.
- 01:57 AM Revision 5372: sql_gen.py: not_false_re, not_true_re: Appended \b to ensure that true/false is only matched as a single word
- 01:56 AM Revision 5371: sql_gen.py: simplify_expr(): Also simplify "NOT false" to true
- 01:53 AM Revision 5370: sql_gen.py: simplify_expr(): Also simplify "NOT true" to false
- 01:24 AM Revision 5369: sql_io.py: put_table(): ignore_cond(): Changed "Ignoring rows where" message with the negated (filter-out) condition to "Ignoring rows that don't satisfy" with the filter condition for clarity
- 01:22 AM Revision 5368: sql_io.py: put_table(): ignore_cond(): If cond simplifies to false, remove all rows instead of filtering out individual rows which will all be filtered out. This optimization should improve import times of tables, such as taxonconcept, which use a check constraint instead of NOT NULL constraints to prevent empty rows. The taxonomic schema refactoring caused the creation of many more levels of taxonconcepts, many of which (such as variety, forma, cultivar) are empty for most datasources, so this optimization should also reduce overall import times for datasources that have any empty levels of taxonconcept. Note that this optimization is only possible now that sql_gen.simplify_expr() is able to simplify all the way to a single boolean value for the taxonconcept_required_key constraint.
- 12:55 AM Revision 5367: Moved expression transforming functions from sql.py to sql_gen.py because they do not manipulate an actual database and merely generate SQL
- 12:38 AM Revision 5366: sql.py: Added true_expr, false_expr and use them where their values are used
- 12:34 AM Revision 5365: sql.py: simplify_expr(): Also simplify "AND true" expressions
- 12:30 AM Revision 5364: sql.py: simplify_expr(): Also simplify "AND false" expressions
- 12:19 AM Revision 5363: sql.py: Added atom_re and use it in simplify_parens()
- 12:19 AM Revision 5362: sql.py: Added or_re and use it in simplify_expr()
- 12:18 AM Revision 5361: sql.py: logic_op_re(): Added expr_re param for an expr on the other side of the operator
10/09/2012
- 11:54 PM Revision 5360: sql.py: simplify_parens(): Use bool_re
- 11:54 PM Revision 5359: sql.py: Removed no longer needed paren_re()
- 11:53 PM Revision 5358: sql.py: true_re, false_re: Removed no longer needed paren_re() because simplify_parens() now handles this
- 11:50 PM Revision 5357: sql.py: simplify_expr(): Removed final simplify_parens() because this is now done by simplify_recursive()
- 11:49 PM Revision 5356: sql.py: simplify_expr(): Use new simplify_recursive(). This also fixes a bug where some logic expressions are not simplified because of extra parens.
- 11:48 PM Revision 5355: sql.py: Added simplify_recursive()
- 11:31 PM Revision 5354: sql.py: simplify_parens(): Also remove parens around true and false
- 11:26 PM Revision 5353: regexp.py: sub_nested(): Use new sub_recursive()
- 11:25 PM Revision 5352: regexp.py: Added sub_recursive()
- 11:21 PM Revision 5351: sql.py: simplify_expr(): Use new simplify_parens()
- 11:20 PM Revision 5350: sql.py: Added simplify_parens()
- 11:14 PM Revision 5349: sql.py: simplify_expr(): Use new regexp.sub_nested()
- 11:14 PM Revision 5348: Added regexp.py
- 10:46 PM Revision 5347: sql.py: simplify_expr(): Use new logic_op_re()
- 10:46 PM Revision 5346: sql.py: Added logic_op_re()
- 10:40 PM Revision 5345: sql.py: bool_re: Use new true_re, false_re
- 10:40 PM Revision 5344: sql.py: Added true_re, false_re
- 10:37 PM Revision 5343: sql.py: bool_re: Use new paren_re()
- 10:36 PM Revision 5342: sql.py: bool_re: Use new paren_re()
- 10:36 PM Revision 5341: sql.py: Added paren_re()
- 10:31 PM Revision 5340: sql.py: simplify_expr(): Combined replacements of bool_re+' OR ' with the value in either order into one replacement
- 10:27 PM Revision 5339: mappings/VegCore-VegBIEN.csv: verbatim* taxonconcept: Don't store Name_submitted in taxonomicnamewithauthor in addition to identifyingtaxonomicname, because the fields other than identifyingtaxonomicname are meant to store parsed values rather than raw, unscrubbed values and TNRS does not directly provide a concatenated taxonomic name with author
- 10:23 PM Revision 5338: mappings/VegCore-VegBIEN.csv: verbatim* taxonconcept: Don't create hierarchy of parent taxonconcepts, because the parsed names (rather than the names for the matched taxonconcept) are from the input taxonomic name, rather than from the official tree of life used by TNRS. Otherwise, if a taxonomic name provides e.g. no family (common), a separate genus taxonconcept would have been created with no parent_id, which would not compare equal to the matched taxonconcept's genus *with* a parent_id. Continue to store the parsed family, genus, species in the family, genus, species cached fields, because the parsed family is often different from the matched taxonconcept's family when e.g. no family is provided in the taxonomic name.
- 10:16 PM Revision 5337: sql.py: Renamed table_cols() to table_col_names() for clarity, because it does not return sql_gen.Col objects
- 10:12 PM Revision 5336: inputs/.TNRS/tnrs/test.xml.ref: Accepted new inserted row count. The change is most likely from several revisions back, but the cause of the change is unknown (it is not due to the updated TNRS.tnrs table, which is still sorted with the same rows first).
- 09:09 PM Revision 5335: sql_gen.py: is_text_col(): Use new is_text_type()
- 09:09 PM Revision 5334: sql_gen.py: Added is_text_type()
- 09:05 PM Revision 5333: sql_gen.py: ensure_not_null(): Documented that NULL has no type, hence the NoUnderlyingTableException being re-raised
- 09:04 PM Revision 5332: sql_gen.py: ensure_not_null(): Just store the column type in col_type, instead of storing typed_col and using typed_col.type, now that other info in typed_col is no longer needed
- 09:02 PM Revision 5331: sql_gen.py: ensure_not_null(): Use is_nullable() instead of determining nullability itself, for clarity
- 08:59 PM Revision 5330: sql_gen.py: is_nullable(): Fixed bug where non-columns could not be sent to db.col_info()
- 08:53 PM Revision 5329: sql_gen.py: ensure_not_null(): Always remove_col_rename() the column to ensure that it is acceptable by helper functions like is_nullable()
- 08:11 PM Revision 5328: lib/PostgreSQL-MySQL.csv: COMMENT statement: Fixed bug where ending ; could match only when preceded by ' and followed by a newline, to avoid matching ; embedded in the comment
- 08:07 PM Revision 5327: schemas/vegbien.sql: taxonconcept: family, genus, species comments: Changed "scoping" to "identifying" for clarity
- 08:06 PM Revision 5326: schemas/vegbien.sql: taxonconcept: family, genus, species: Added comment that each is a cached field for easy querying and the scoping version of it is stored in the chain of parent_id ancestors
- 08:03 PM Revision 5325: schemas/vegbien.sql: taxonconcept: taxonconcept_unique: Removed family, genus, species because these are now just cached fields for analytical_db_view rather than scoping fields. The scoping versions of these fields are stored in the chain of parent_id ancestors.
- 07:42 PM Revision 5324: tnrs_db: Moved "Processing # taxonconcepts" log message to before waiting or exiting if no taxonconcepts left, so that it would be printed right after the query is run and say that no taxonconcepts were found
- 07:39 PM Revision 5323: tnrs_db: Updated comments and log messages for schema changes
- 07:33 PM Revision 5322: tnrs_db: Updated query for schema changes
- 07:33 PM Revision 5321: README.TXT: Schema changes: files to update with renamings: Added bin/tnrs_db
- 07:25 PM Revision 5320: inputs/import.stats.xls: Updated import times
- 07:04 PM Revision 5319: README.TXT: Data import: Changed `inputs/*/*/logs` to `inputs/{.,}*/*/logs` to also include the TNRS names import log
10/08/2012
- 09:58 PM Revision 5318: import_all: Added commands to import TNRS names so the user doesn't have to do this manually
- 09:55 PM Revision 5317: sql.py: map_expr(): Fixed bug where names were being matched inside punctuated names replaced in previous calls of map_expr()
- 09:45 PM Revision 5316: schemas/vegbien.sql: party: party_required_key: Only allow NULL organizationname if party is not a root party (i.e. creator_id != party_id)
- 09:39 PM Revision 5315: mappings/VegCore-VegBIEN.csv: Mapped to new taxonconcept.creationdate
- 09:37 PM Revision 5314: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key: Added creationdate as an allowable minimum field when parent_id (containing the associated hierarchical concept) is specified
- 09:30 PM Revision 5313: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key: Removed family and genus because these are now cached fields only, and are not used for scoping a taxonconcept. Instead, *taxonomicname and taxonname+parent_id are used for this purpose. This removes several leaf taxonconcepts with insufficient scoping information to create a taxonconcept separate from the main tree. With the upcoming population of creationdate, some of these taxonconcepts will reappear due to the date's additional distinguishing information.
- 09:16 PM Revision 5312: schemas/vegbien.sql: taxonconcept: Added creationdate (the date the taxonconcept was created or defined), and include it in the taxonconcept_unique unique index
- 09:05 PM Revision 5311: schemas/vegbien.sql: taxonconcept: Added comment with the definition of a taxon: "a group of one (or more) populations of organism(s), which a taxonomist adjudges to be a unit" (http://en.wikipedia.org/wiki/Taxon). This is useful in clarifying that our taxon concepts are intended to serve a similar purpose, by storing one person's defined taxon.
- 08:58 PM Revision 5310: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key: Removed family and genus because these are now cached fields only, and are not used for scoping a taxonconcept. Instead, *taxonomicname and taxonname+parent_id are used for this purpose.
- 08:54 PM Revision 5309: schemas/vegbien.sql: taxonconcept: Moved identifyingtaxonomicname near other full-taxonomic-name-related fields, after the fields that contain just the current level's component of the full name
- 08:48 PM Revision 5308: schemas/vegbien.sql: taxonconcept.canon_taxonconcept_id: Changed four-level hierarchy to use "parsed concept" and "matched concept" instead of concatenated and parsed, because the directly-parsed name components actually go in level 2 of the hierarchy (the TNRS input name), while the name components based on the matched taxon concept go in level 3
- 08:44 PM Revision 5307: schemas/vegbien.sql: taxonconcept.parent_id: Documented that while a taxon *name* may have multiple parents, a taxon *concept* has only one, based on the creator's opinion of where that taxonconcept goes in the taxonomic hierarchy
- 08:38 PM Revision 5306: mappings/VegCore-VegBIEN.csv: taxonconcept: Moved infraspecific taxonconcept to its own level, rather than combining it with the level that contains the full taxonomic name and author (as well as any morphospecies), for consistency with the storage of other ranked taxonomic name components, which each get their own taxonconcept. The infraspecific taxon concept is general to all parties making idenfitications (within a datasource), while the concatenated name and author and any morphospecies are specific to the person who defined the taxonconcept used by a taxondetermination.
- 08:05 PM Revision 5305: schemas/vegbien.sql: taxonconcept: Removed no longer used higher- and infraspecific taxonomic rank fields because these terms are now stored in their own taxonconcepts. family, genus, and species have not been removed because these are used to cache names of parent taxa for fast access by analytical_db_view.
- 07:57 PM Revision 5304: schemas/vegbien.sql: analytical_db_view: Changed taxonMorphospecies to use taxonconcept.taxonname, where any morphospecies is now stored
- 07:53 PM Revision 5303: mappings/VegCore-VegBIEN.csv: infraspecific taxonomic terms: Removed mappings to first-class taxonconcept fields because these terms are now stored in their own taxonconcepts, or in the lowest-level taxonconcept as the taxonname and rank
- 07:43 PM Revision 5302: mappings/VegCore-VegBIEN.csv: higher-level taxonomic terms: Removed mappings to first-class taxonconcept fields because these terms are now stored in their own taxonconcepts
- 07:41 PM Revision 5301: schemas/vegbien.sql: taxonconcept: Merged taxonconcept_unique_within_creator_by_name unique index into taxonconcept_unique_within_parent, placed parent_id first, and removed index condition, so that this index can be used as a lookup index by taxonconcept_update_ancestors() (which requires no index condition in order to apply to *all* taxonconcepts) in addition to as a unique index. Note that an index condition should not be necessary for the index's uniquifying task, because if a set of taxonconcepts provides only the identifyingtaxonomicname, that should collide in the taxonconcept_unique_within_creator_by_identifying_name unique index before this index collides. This assumes that the collision order when multiple indexes collide is alphabetical by the index name.
- 07:16 PM Task #486: add unit-conversion mechanism
- All applicable VegBIEN fields have unit suffixes. Most corresponding VegCore terms also have unit suffixes.
- 07:15 PM Task #499 (Resolved): map example terms into the taxonomic schema
- See "README.TXT":https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/entry/README.TXT section "To import a...
- 06:38 PM Revision 5300: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key check constraint: Also allow a taxonconcept to have just an author when it has a parent_id, so that an author can uniquely identify a taxon within a more general taxon, such as a species name, that has no author
- 06:22 PM Revision 5299: strings.py: concat(): Fixed bug where end index of returned str0 portion would wrap around to a negative number if str1 itself was too long, causing incorrect truncation
- 05:44 PM Revision 5298: schemas/vegbien.sql: taxonconcept: Renamed taxonconcept_unique_within_parent to taxonconcept_unique because the index does not apply only to taxonconcepts with a parent, and because it's the primary unique index for taxonconcept
- 05:42 PM Revision 5297: schemas/vegbien.sql: taxonconcept: Renamed taxonconcept_unique_within_creator_by_identifying_name to taxonconcept_0_unique_identifying_name to ensure that it is always applied before taxonconcept_unique_within_parent if both collide
- 05:36 PM Revision 5296: schemas/vegbien.sql: taxonconcept: Merged taxonconcept_unique_within_creator_by_name unique index into taxonconcept_unique_within_parent, placed parent_id first, and removed index condition, so that this index can be used as a lookup index by taxonconcept_update_ancestors() (which requires no index condition in order to apply to *all* taxonconcepts) in addition to as a unique index. Note that an index condition should not be necessary for the index's uniquifying task, because if a set of taxonconcepts provides only the identifyingtaxonomicname, that should collide in the taxonconcept_unique_within_creator_by_identifying_name unique index before this index collides. This assumes that the collision order when multiple indexes collide is alphabetical by the index name.
- 04:47 PM Revision 5295: mappings/VegCore-VegBIEN.csv: taxonconcepts: Also create the taxonconcept tree for taxonconcepts created from original*, verbatim*, and accepted* taxonomic terms
- 04:35 PM Revision 5294: mappings/VegCore-VegBIEN.csv: taxonconcepts: Also create the taxonconcept tree if datasource provided separated components of the taxonomic name and/or its own tree of life with higher classifications. This enables storing the datasource's own tree of life to supplement any official tree (TROPICOS, USDA, etc.).
- 04:25 PM Revision 5293: mappings/VegCore-VegBIEN.csv: taxonconcept tree: Don't map infraspecificEpithet+taxonRank to a taxonconcept in the tree of parent concepts because it has already been mapped to the primary, lowest-level taxonconcept
- 04:00 PM Revision 5292: schemas/vegbien.sql: taxonconcept: taxonconcept_unique_within_creator_by_name unique index: Fixed bug where index filter overlapped with taxonconcept_unique_within_parent's index filter, causing these unique indexes to sometimes both apply at the same time and prevent column-based import from correctly choosing which index to use for each taxonconcept import
- 01:15 PM Revision 5291: schemas/vegbien.ERD.mwb: Fixed lines
- 01:02 PM Revision 5290: schemas/vegbien.sql: taxonconcept.canon_taxonconcept_id comment: Changed comment to use "concept" rather than "name" where applicable. Documented that a synonym between taxonconcepts of different sources is indicated by choosing one taxonconcept to be authoritative and pointing the other taxonconcept to it using this field.
10/05/2012
- 10:52 PM Revision 5289: sql_io.py: put_table(): Resolving default value column: Fixed bug where the default value col needed to have its table changed from in_table to full_in_table if it's a table column, and needed to have any column rename removed if it's a literal value
- 10:29 PM Revision 5288: Regenerated vegbien.ERD exports
- 10:28 PM Revision 5287: schemas/vegbien.ERD.mwb: Fixed lines
- 10:23 PM Revision 5286: schemas/vegbien.sql: Renamed plant* taxonomic tables -> taxon*, as part of the taxonomic schema refactoring at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2012-10-03_conference_call#Taxonomic-schema-refactoring>
- 10:15 PM Revision 5285: schemas/vegbien.ERD.mwb: Rearranged to fit more of location table on the diagram, using the newly available space from taxon
- 10:00 PM Revision 5284: schemas/vegbien.ERD.mwb: Fixed lines
- 09:59 PM Revision 5283: schemas/tree_cross-links.sql: Synced with schema, updating with new table names
- 09:54 PM Revision 5282: schemas/vegbien.sql: Removed no longer used taxon table. Use taxonconcept instead.
- 09:51 PM Revision 5281: schemas/vegbien.sql: taxonconcept.taxonname: comment: Stated that this is the name of the taxon within its parent taxon
- 09:48 PM Revision 5280: schemas/vegbien.sql: taxonconcept: comment: Removed no longer accurate comment that an accepted taxonconcept points to the identified taxon in the tree of life, because it *is* the identified taxon in the tree of life
- 09:39 PM Revision 5279: schemas/filter_ERD.csv: Changed the table with the visible fkey from plant* to taxon* to be plantstatus rather than plantusage, since it contains more core fields
- 09:25 PM Revision 5278: schemas/vegbien.sql: taxonconcept: Removed taxon_id, since taxonconcept now contains all the information needed to represent a taxonomic hierarchy, including both conceptual and nomenclature information
- 09:20 PM Revision 5277: schemas/vegbien.sql: plantusage: Point just to taxonconcept instead of both to taxonconcept and taxon
- 09:16 PM Revision 5276: schemas/vegbien.sql: taxonconcept: rank, verbatimrank comments: Added info from corresponding fields in taxon that also applies to taxonconcept
- 09:14 PM Revision 5275: schemas/vegbien.sql: taxonconcept: comment: Added info from taxon that also applies to taxonconcept
- 09:06 PM Revision 5274: schemas/vegbien.sql: Added taxonconcept_ancestor cross-link table
- 08:40 PM Revision 5273: schemas/vegbien.sql: taxonconcept: Added description field
- 08:38 PM Revision 5272: mappings/VegCore-VegBIEN.csv: Remapped taxon hierarchy for accepted taxonconcepts to taxonconcept parent_id hierarchy
- 08:12 PM Revision 5271: schemas/vegbien.sql: Fixed bug where taxonconcept.parent_id was missing a foreign key constraint
- 08:10 PM Revision 5270: schemas/vegbien.sql: taxonconcept: Changed instructions for including a taxon name at a rank with no explicit column to create a parent taxonconcept for it and point to it using parent_id instead of using otherranks. Removed no longer used otherranks field.
- 08:05 PM Revision 5269: schemas/vegbien.sql: taxonconcept: taxonconcept_required_key check constraint: Added taxonname
- 07:58 PM Revision 5268: schemas/vegbien.sql: taxonconcept: taxonconcept_unique_within_creator_by_name unique index: Removed duplicate entry for creator_id
- 07:57 PM Revision 5267: schemas/vegbien.sql: taxonconcept: Added parent_id to point to the parent taxonconcept
- 07:56 PM Revision 5266: sql_gen.py: null_sentinels: Added 'unknown' for taxonrank
- 07:44 PM Revision 5265: schemas/vegbien.sql: taxonrank: Added 'unknown'
- 07:30 PM Revision 5264: mappings/VegCore-VegBIEN.csv: Also map *taxonRank to taxonconcept.rank, so that if it's in the taxonrank enum, it will automatically populate this field
- 07:14 PM Revision 5263: mappings/VegCore-VegBIEN.csv: Remapped *infraspecificEpithet to new taxonconcept.taxonname rather than placing it in subspecies prefixed with the taxonRank, because it isn't necessarily the subspecies and because taxonname is defined to contain the lowest-rank portion of the taxonomic name. Note that when both morphospecies and infraspecificEpithet are provided, infraspecificEpithet takes priority for the taxonname field, because if TNRS leaves unmatched terms (which are tentatively mapped to morphospecies) but also matches an infraspecificEpithet, then the unmatched terms can't be for a morphospecies (because an infraspecificEpithet and therefore also a specificEpithet was matched, so the species is definite and formally named).
- 06:45 PM Revision 5262: schemas/vegbien.sql: taxonconcept: Renamed morphospecies to taxonname since it's used in the same way as taxon.taxonname: to store the lowest-rank portion of the taxonomic name, such as the morphospecies suffix
- 06:21 PM Revision 5261: inputs/.TNRS/tnrs/map.csv: Mapped *_matched terms that are both matched in the input name and which correspond to the matched taxonconcept (Genus_matched, Specific_epithet_matched, etc.) to both the input and matched taxonconcepts
- 06:09 PM Revision 5260: inputs/.TNRS/tnrs/map.csv: Mapped terms matched in the original string (rather than deduced from the matched taxonconcept) to new verbatim* taxonomic terms
- 06:03 PM Revision 5259: mappings/VegCore-VegBIEN.csv: Mapped verbatim* taxonomic terms to the TNRS input taxonconcept
- 05:48 PM Revision 5258: mappings/VegCore-VegBIEN.csv: TNRS input taxonconcept: Split single _if statement controlling where morphospecies goes into two _if statements for each case, so that other verbatim* terms don't need to have an _if statement in their mapping to the input taxonconcept
- 05:29 PM Revision 5257: mappings/VegCore.csv: Added back verbatim* taxonomic terms, which will now be used for the TNRS input taxonconcept. Note that they will have a different meaning than the original* taxonomic terms that they were renamed to in r5062.
- 05:22 PM Revision 5256: mappings/VegCore-VegBIEN.csv: In TNRS mode, remapped morphospecies (Unmatched_terms) to the input name's taxonconcept, because this does not relate to the matched taxon concept
- 05:12 PM Revision 5255: mappings/VegCore-VegBIEN.csv: TNRS-only mappings: Switch them on when verbatimScientificNameWithAuthorship is provided rather than when acceptedScientificNameWithAuthorship is provided, because it's the presence of a separate TNRS input name that really determines when TNRS is being mapped
- 05:07 PM Revision 5254: Makefiles: .last_cleanup targets: Also make the file that's being cleaned up .PRECIOUS so it doesn't get deleted if the .last_cleanup target has an error
- 05:04 PM Revision 5253: Makefiles: .last_cleanup targets: Make each individual target .PRECIOUS (don't delete on error) because just making %.last_cleanup precious doesn't seem to prevent deletion
10/04/2012
- 11:19 PM Revision 5252: mappings/VegCore-VegBIEN.csv: Mapped *taxonRank to new taxonconcept.verbatimrank
- 11:15 PM Revision 5251: schemas/vegbien.sql: taxonconcept: Added rank, verbatimrank analogous to those fields in taxon
- 09:59 PM Revision 5250: Makefiles: Don't delete %.last_cleanup on error because it's a mod time record rather than a generated file, and so that it's left at the last successful cleanup time when a cleanup operation is cancelled
- 09:52 PM Revision 5249: input.Makefile: Maps building: %/.map.csv.last_cleanup: Removed no longer accurate comment about mappings being autoremoved
- 09:34 PM Revision 5248: inputs/.TNRS/tnrs/map.csv: Remapped Name_submitted to new verbatimScientificNameWithAuthorship to create an additional level of taxonconcept for the concatenated (TNRS input) name separate from the parsed (TNRS output) name
- 09:33 PM Revision 5247: mappings/VegCore-VegBIEN.csv: Mapped verbatimScientificNameWithAuthorship as an additional level of taxonconcept for the concatenated (TNRS input) name separate from the parsed (TNRS output) name
- 09:26 PM Revision 5246: schemas/vegbien.sql: taxonconcept.canon_taxonconcept_id: comment: Changed three-level hierarchy to four-level hierarchy which separates the concatenated (TNRS input) name from the parsed (TNRS output) name
- 09:22 PM Revision 5245: mappings/VegCore.csv: Added back verbatimScientificNameWithAuthorship, which will now be used to store the TNRS input name
- 08:45 PM Revision 5244: schemas/filter_ERD.csv: Removed no longer used table taxonscope
- 08:32 PM Revision 5243: schemas/vegbien.sql: voucher: Removed accessioncode because this table has no sourceaccessioncode which it would be generated from (it just links a taxonoccurrence to a vouchering specimenreplicate)
- 08:26 PM Revision 5242: schemas/vegbien.sql: Renamed datasource_id to creator_id so it can apply generally to any entity (such as a person), not just an aggregated datasource. This also enables taxonconcept.datasource_id to merge with creator_id, which now serves the same purpose.
- 08:05 PM Revision 5241: schemas/vegbien.sql: taxonconcept: Renamed definer_id to creator_id to allow merging with datasource_id when datasource_id is renamed to creator_id
- 07:50 PM Revision 5240: mappings/VegCore-VegBIEN.csv: Populated new taxonconcept.definer_id from identifiedBy, or when no identifiedBy is specified, from the datasource itself (using _simplifyPath:[next=datasource_id])
- 07:43 PM Revision 5239: sql_io.py: put_table(): Resolve default value column *after* the main loop (inserts and selects), so that the default value column can refer to an output column that is not in the original mapping but is added to the mapping from a col_defaults entry. This requires deferring the "Missing mapping for NOT NULL column" warning until the default value column is resolved, and including all columns in the full_in_table since the default value input column is not yet known.
- 06:59 PM Revision 5238: schemas/vegbien.sql: taxonconcept: comment: Changed definition to "A taxon concept defined by an entity" to correspond with the table's new name and usage
- 06:51 PM Revision 5237: mappings/VegCore-VegBIEN.csv: Fixed bug where needed to set datasource_id=0 on the TNRS party (which concatenated names/TNRS inputs are owned by) in order to make it a datasource (a root party)
- 06:44 PM Revision 5236: schemas/vegbien.sql: party: Fixed bug where needed separate unique index for roots (datasources), whose organizationnames must be globally unique rather than unique within a datasource
- 06:28 PM Revision 5235: schemas/vegbien.sql: taxonconcept: Renamed concept_reference_id to definer_id because this is a clearer name and because this will allow merging with datasource_id, which serves the same purpose
- 06:15 PM Revision 5234: schemas/vegbien.sql: party: Made it datasource-scoped. Since this creates a recursive fkey, a datasource (a root party) should point to itself in this field, which will happen automatically by setting it to the special value 0.
- 05:51 PM Revision 5233: lib/PostgreSQL-MySQL.csv: Changed translation of fulltext to quote the identifier instead of appending characters to make it not a reserved word
- 05:36 PM Revision 5232: schemas/vegbien.sql: taxonconcept: Moved concept_reference_id to the top of the table because it is now a key scoping field
- 05:30 PM Revision 5231: schemas/vegbien.sql: concept_reference_id: Made it an fkey to party instead of taxonscope, because this is now the entity that defined the taxon concept, and is no longer specific to morphospecies. Removed no longer used table taxonscope.
- 05:13 PM Revision 5230: schemas/vegbien.sql: taxonconcept: Documented that it's equivalent to VegBank's plantConcept table
- 04:56 PM Revision 5229: schemas/filter_ERD.csv: taxonconcept inward fkeys: Removed not applicable taxon filtered table, since the fkey points in the opposite direction and thus is not part of this filter
- 04:52 PM Revision 5228: schemas/vegbien.sql: taxonconcept: Renamed scope_id -> concept_reference_id as part of taxonomic schema refactoring at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2012-10-03_conference_call#Taxonomic-schema-refactoring>
- 04:47 PM Revision 5227: README.TXT: Schema changes: Moved "update the following files with any renamings" out of "Sync ERD with vegbien.sql schema" because this is needed for any schema changes, not just as part of syncing the ERD
- 04:42 PM Revision 5226: README.TXT: Schema changes: Added Refactoring tips section with steps to rename a table and a column
- 04:23 PM Revision 5225: schemas/vegbien.sql: Renamed taxonpath -> taxonconcept as part of taxonomic schema refactoring at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2012-10-03_conference_call#Taxonomic-schema-refactoring>
- 04:17 PM Revision 5224: README.TXT: Schema changes: Syncing ERD with vegbien.sql schema: Added step to update mappings/VegCore-VegBIEN.csv with any renamings
- 04:10 PM Revision 5223: README.TXT: Schema changes: Syncing ERD with vegbien.sql schema: Added step to update schemas/filter_ERD.csv with any table renamings
- 03:58 PM Revision 5222: inputs/import.stats.xls: Updated import times. This now includes the half-hour-long pre-import of the TNRS taxonomic names (which the datasources then match up with), as well as the concatenation of the datasource's taxonomic name components to create or match up with the TNRS input name.
- 03:54 PM Revision 5221: README.TXT: Data import: make backups/TNRS.backup/restore: Run it in the background because it takes awhile
- 03:53 PM Revision 5220: README.TXT: Data import: Added steps to sync the TNRS schema to the latest version on vegbiendev
- 03:38 PM Revision 5219: README.TXT: Data import: make inputs/download-logs: Added tnrs_log=1 so the TNRS daemon log is downloaded as well
10/03/2012
- 01:55 PM Revision 5218: Added inputs/test_taxonomic_names/Taxon/testNames.txt since this is test data, and thus can be under version control
- 01:55 PM Revision 5217: Added inputs/test_taxonomic_names/README.TXT with Bob's comments
- 01:41 PM Revision 5216: schemas/vegbien.sql: taxonpath.taxon_id: Changed comment to indicate that this used for parsed, not just accepted names. Parsed names have been standardized by TNRS but may be synonyms.
- 01:27 PM Revision 5215: README.TXT: Documentation: To import and scrub just the test taxonomic names: Added `yes|` before make schemas/public/reinstall so the user isn't prompted to confirm the reinstallation a second time, and can just copy and paste the set of 5 commands directly into the terminal
- 01:11 PM Revision 5214: tnrs_db: Made wait option default to off to facilitate running tnrs_db by itself, rather than as part of an import
- 01:08 PM Revision 5213: tnrs_db: Added wait option to have tnrs_db exit as soon as no more names are available. This is useful for running tnrs_db when there is no concurrent import running, and therefore no need to wait for new data.
- 01:00 PM Revision 5212: tnrs_db: Fixed the time of the "Waited" message so it that the total_pause (containing the next wait) would be incremented *after* the message was displayed. Split the "Waited" and "Waiting" messages into two separate messages.
- 12:51 PM Revision 5211: README.TXT: Data import: Added steps to back up the TNRS cache, since it takes a long time to recreate. This also enables syncing it with a local machine when `make backups/download` is run.
- 12:47 PM Revision 5210: README.TXT: Documentation: Added instructions to import and scrub just the test taxonomic names
- 12:41 PM Revision 5209: input.Makefile: Staging tables installation: uninstall: For the TNRS datasource, prompt the user before deleting the schema, since the data in it is not easily reconstructible from a flat file
- 11:41 AM Revision 5208: sql.py: map_expr(): When matching without quotes, support names containing spaces by not matching words when preceded or followed by quotes
- 11:24 AM Revision 5207: sql.py: Expressions: bool_re: Also match parentheses surrounding the boolean value
- 08:57 AM Revision 5206: README.TXT: Data import: import_all: Don't run with & because this prevents the created jobs from being owned by the calling shell. Instead, import the TNRS names as a separate backgrounded step and wait for it to finish before starting import_all. Removed TNRS import steps from import_all since these are now invoked separately.
- 08:35 AM Revision 5205: README.TXT: Data import: Run import_all in the background, because it needs to import all the taxonomic names synchronously before it can start the datasource import in the background
- 08:19 AM Revision 5204: Regenerated vegbien.ERD exports
- 08:14 AM Revision 5203: inputs/.TNRS/tnrs/map.csv: Mapped Unmatched_terms to morphospecies because the morphospecies is what's left once named ranks are matched
- 08:11 AM Revision 5202: mappings/VegCore-VegBIEN.csv: Mapped morphospecies
- 08:08 AM Revision 5201: mappings/VegCore.csv: Added morphospecies
- 08:04 AM Revision 5200: schemas/vegbien.sql: taxonpath: Added morphospecies
- 07:43 AM Revision 5199: inputs/.TNRS/tnrs/test.xml.ref: Updated for latest TNRS output
- 06:40 AM Revision 5198: inputs/.TNRS/tnrs/map.csv: Infraspecific_rank_2, Infraspecific_epithet_2_*: Mapped to UNUSED because they do not appear to be provided by TNRS (it just puts additional infraspecific names in Unmatched_terms)
- 06:34 AM Revision 5197: inputs/.TNRS/tnrs/map.csv: Omit Infraspecific_rank because Name_matched_rank contains the unabbreviated rank and is provided more often
- 06:29 AM Revision 5196: mappings/VegCore-VegBIEN.csv: Also map TNRS-parsed infraspecificEpithet (Infraspecific_epithet_matched) to taxon at the infraspecies rank
- 06:07 AM Revision 5195: mappings/VegCore-VegBIEN.csv: Also map TNRS-parsed taxonomic ranks to the tree of life in the taxon table
- 05:18 AM Revision 5194: schemas/vegbien.sql: taxon: Added comment that this table stores the tree of life
- 05:00 AM Revision 5193: mappings/VegCore-VegBIEN.csv: accepted taxonomic terms: Use new _set_canon_taxonpath() to set the canon_taxonpath_id *after* the taxonpath has been inserted, so that if the taxonpath is an accepted name (scrubs to itself), it will link up to the just-inserted taxonpath with the taxonomic ranks parsed out, rather than to a new taxonpath containing only the few taxonomic ranks of the accepted name that TNRS provides. In particular, this (together with the tnrs_accepted_names sorting index on TNRS.tnrs) ensures that an accepted name is imported with its genus and species parsed out by TNRS instead of concatenated together in the Accepted_name_species field (genus+species). This enables the individual taxonomic ranks to be used in constructing the leaves of the tree of life (the taxon table).
- 04:50 AM Revision 5192: sql_io.py: put_table(): Fixed bug where row_ct_ref was incorrectly being incremented when the iteration is a function call. This bug only occurred in row-based mode, because the DB cursor for a function call is not stored in column-based mode.
- 04:30 AM Revision 5191: inputs/.TNRS/tnrs/map.csv: Use Name_matched_author/Name_matched_accepted_family instead of Author_matched/Family_matched because these fields are provided more often, due to being determined from the matched name itself rather than from the original string. This helps to fill in as many fields as possible. For accepted names (which scrub to themselves), this is especially important, because it adds the accepted name's family, which is not present in the input taxonomic name.
- 03:58 AM Revision 5190: xml_func.py: process(): Fixed bug where need to preserve complex functions that have unevaluated XML nodes as arguments, because XML nodes are not accepted by sql_io.put() (they are handled by db_xml.put())
- 03:08 AM Revision 5189: schemas/vegbien.sql: Renamed set_canon_taxonpath() to _set_canon_taxonpath() (adding _ prefix) so that db_xml.put() treats its arguments as arguments rather than as children with fkeys to parent
- 03:02 AM Revision 5188: schemas/vegbien.sql: Added set_canon_taxonpath() to set a taxonpath's canon_taxonpath_id after it has been created
- 02:48 AM Revision 5187: Added inputs/.TNRS/tnrs/cleanup.sql to cluster TNRS.tnrs on tnrs_accepted_names. This keeps TNRS.tnrs sorted with the accepted names first.
- 02:46 AM Revision 5186: input.Makefile: Staging tables installation: %/cleanup: Also run any custom cleanup.sql provided in the subdir. %/install: Removed processing of postprocess.sql because no datasources are using it and because cleanup.sql can now be used for this purpose.
- 02:39 AM Revision 5185: inputs/.TNRS/schema.sql: tnrs: Added tnrs_accepted_names index, which sorts accepted names first, and cluster the table on this index. This ensures that the component-parsed entries for accepted names are created before any verbatim names that point to them.
- 02:37 AM Revision 5184: input.Makefile: Staging tables installation: %/cleanup: Documented that this removes any index comments, due to a PostgreSQL bug. (This occurs because ALTER TABLE recreates the index but not its comment.)
- 01:55 AM Revision 5183: inputs/.TNRS/schema.sql: Removed hardcoded schema name
- 01:18 AM Revision 5182: inputs/.TNRS/tnrs/map.csv: Changed Name_matched_accepted_family comment to match analogous Name_matched_author comment
- 01:17 AM Revision 5181: inputs/.TNRS/tnrs/map.csv: Remapped Author_matched as the scientificNameAuthorship instead of Name_matched_author, because Name_matched_author contains the author based on the matched name, not the author in the original string, so it's not strictly from the original name
- 12:33 AM Revision 5180: mappings/VegCore.csv: Added acceptedBinomial, originalBinomial
- 12:29 AM Revision 5179: mappings/VegCore.csv: Added binomial
- 12:03 AM Revision 5178: inputs/.TNRS/tnrs/map.csv: Mapped Specific_epithet_matched
10/02/2012
- 11:53 PM Revision 5177: Added inputs/test_taxonomic_names/
- 11:37 PM Revision 5176: mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode: Only populate if needed to distinguish the taxonoccurrence within a plot
- 11:24 PM Revision 5175: schemas/vegbien.sql: placepath: Removed no longer used placepath_unique constraint on place_id. Removed place_id from placepath_unique_within_datasource_by_name unique index because otherranks is now used to store custom ranks.
- 11:23 PM Revision 5174: schemas/vegbien.sql: placepath: Removed no longer used placepath_unique constraint on place_id. Removed place_id from placepath_unique_within_datasource_by_name unique index because otherranks is now used to store custom ranks.
- 11:14 PM Revision 5173: schemas/vegbien.sql: taxonpath, placepath: Added *_required_key check constraints to ensure that empty entries are not created when a row does not have taxonpath/placepath data
- 10:35 PM Revision 5172: import_all: Use new dedicated cleanup make target to clean up TNRS.tnrs
- 09:54 PM Revision 5171: tnrs.py: encode_map: Added hidden minus sign, which TNRS removes
- 09:44 PM Revision 5170: csvs.py: tsv_encode_map: Escape \n as \n (instead of as a \ followed by a newline) for clarity. Added escape for \r by using strings.json_encode_map. TsvReader: Decode all escapes in tsv_encode_map.
- 09:25 PM Revision 5169: tnrs.py: encode_map: Added × (times), which TNRS replaces with x
- 09:18 PM Revision 5168: tnrs.py: encode_map: Added " and ', which TNRS removes when at the beginning or end
- 09:12 PM Revision 5167: tnrs.py: encode_map: Documented why each character needs to be encoded
- 09:04 PM Revision 5166: tnrs.py: encode_map: Removed '&', which is actually not a special character for TNRS (although ';' is)
- 09:02 PM Revision 5165: tnrs.py: encode_map: Added '_', which TNRS replaces with space
- 08:56 PM Revision 5164: sql_io.py: append_csv(): In INSERT mode, print # rows read (different from # lines read if some fields contained embedded newlines) and # rows inserted (different from # rows read if some violated a constraint)
- 08:42 PM Revision 5163: sql.py: insert(): Explicitly return None if the insert failed and a DuplicateKeyException or NullValueException was suppressed
- 07:13 PM Revision 5162: input.Makefile: Staging tables installation: $(logInstall*Add): Fixed bug where the existing install log would be overwritten in quiet mode, even though this function should append its output to the log. Note that plain $(logInstall*) always overwrites the existing install log because it is used by the first install command.
- 06:53 PM Revision 5161: strings.py: json_encode(): Fixed bug where '\n' and '\r' also needed to be encoded
- 06:50 PM Revision 5160: tnrs.py: repeated_tnrs_request(): Also retry request in debug mode if an HTTPError is thrown, so that debugging info can also be obtained if there is a bug in the TNRS client
10/01/2012
- 10:44 PM Revision 5159: tnrs_db: Updated query for new three-level taxonpath hierarchy, where the concatenated name is now stored in identifyingtaxonomicname instead of taxonomicnamewithauthor
- 10:41 PM Revision 5158: root map: Removed no longer needed public schema override, which is now handled by vegbien_dest
- 10:40 PM Revision 5157: vegbien_dest: Allow user to specify a custom public schema in the $public env var. This makes custom public schema functionality available to all VegBIEN-accessing scripts, not just map.
- 10:12 PM Revision 5156: tnrs_db: Adjusted pause, max_pause so the daemon waits longer before exiting, because after the initial TNRS run, most names have already been scrubbed and new names may not be added until the end of the import (in the case of a very large new datasource)
- 09:44 PM Revision 5155: input.Makefile: Staging tables installation: Added cleanup, %/cleanup to clean up already-installed tables
- 09:36 PM Revision 5154: tnrs.py: encode(): Also prepend special padding string to empty and whitespace-only strings because these names are otherwise ignored by TNRS (no response row)
- 09:15 PM Revision 5153: tnrs_db: pause: Increased to 30 min because if no new names are available in TNRS.tnrs, there is no need to check every minute for new names (which clutters up the log file output). The pause feature is designed to allow tnrs_db to run in parallel with the import process, and process new names as they are made available, which only happens once for each partition of each datasource.
- 09:11 PM Revision 5152: tnrs_db: Fixed bug where the new filtering out of already-scrubbed names caused names to be skipped, because the loop would both advance by the number of rows found *and* those rows would no longer be returned by the query, causing only every other set of rows to be processed
- 08:58 PM Revision 5151: tnrs.py: tnrs_request(): Rewrapped lines (became >80 chars after adding profiling)
- 08:52 PM Revision 5150: tnrs.py: tnrs_request(): Use new encode() and TnrsOutputStream to escape TNRS-invalid characters
- 08:51 PM Revision 5149: tnrs.py: Added encode(), decode(), decode_for_tsv(), and TnrsOutputStream to handle escaping TNRS-invalid characters
- 08:48 PM Revision 5148: strings.py: Added regexp_repl_esc()
- 08:47 PM Revision 5147: strings.py: Added replace_all() and replace_all_re(), as well as flip_map() for use with maps for these functions
- 08:46 PM Revision 5146: csvs.py: Added tsv_encode_map for use in creating TSVs parsed by TsvReader
- 06:42 PM Revision 5145: csvs.py: TsvReader: Also interpret '\t' as a tab, to provide a mechanism for encoding embedded tabs
- 05:47 PM Revision 5144: tnrs.py: gwt_encode(): Escape special characters in the string instead of removing them, so that TNRS receives the original name rather than a modified version. This will help make the submitted names match up with the returned Name_submitted.
- 05:45 PM Revision 5143: strings.py: Added json_encode()
- 05:44 PM Revision 5142: strings.py: Added esc_quotes()
- 04:52 PM Revision 5141: schemas/vegbien.sql: placepath.canon_placepath_id: Changed hierarchy comment to match the taxonpath.canon_taxonpath_id comment, but with a two-level hierarchy of datasource name -> accepted name. This may later be changed to a three-level hierarchy like taxonpath.canon_taxonpath_id depending on how GNRS works.
- 04:49 PM Revision 5140: schemas/vegbien.sql: taxonpath.canon_taxonpath_id: Changed comment to specify that taxonpaths should now be linked in a three-level hierarchy of datasource name -> concatenated name -> accepted name
- 04:45 PM Revision 5139: schemas/vegbien.sql: taxonpath, placepath: Changed "scrubbed" to "accepted" to emphasize that the name is the accepted name returned by TNRS or GNRS, rather than merely the matched name
- 04:38 PM Revision 5138: mappings/VegCore-VegBIEN.csv: non-TNRS taxonpaths: Store the concatenated identifyingtaxonomicname in a separate taxonpath owned by the TNRS datasource, so that it will match up with (and create a link to) the corresponding submitted TNRS name's taxonpath. This in turn is linked to the TNRS-determined accepted name, thus creating a three-level hierarchy of datasource name -> concatenated name -> accepted name.
- 03:59 PM Revision 5137: mappings/VegCore-VegBIEN.csv: taxonomic terms: Remapped the concatenated taxonomic name to new identifyingtaxonomicname to use it directly to match up with the TNRS submitted name. Continue to map scientificNameWithAuthorship to taxonomicnamewithauthor.
- 03:56 PM Revision 5136: schemas/vegbien.sql: taxonpath: Renamed plantcode to identifyingtaxonomicname so that it can be used to store the concatenated taxonomicname that gets scrubbed. This enables ignoring the name components when the full name is specified, so that when a TNRS submitted name's matched components are included in its taxonpath, this will not prevent a datasource's concatenated name (without the matched components) from matching up with the corresponding TNRS submitted name.
- 03:25 PM Revision 5135: schemas/vegbien.sql: taxonpath: Made taxonomicnamewithauthor optional again and include all columns in the taxonpath_unique_within_datasource_by_name unique index so that the original name components can be stored in a separate taxonpath from the taxonpath with the concatenated taxonomic name. (The datasource's taxonpath would not always contain an entry for taxonomicnamewithauthor, so the other columns also need to be used in the unique index.)
- 02:57 PM Revision 5134: schemas/vegbien.sql: taxonpath: Added back datasource_id, plantcode to make taxonpath datasource-specific again. This way, the original name components can still be stored in taxonpath, in addition to storing the concatenated name in a datasource-general taxonpath for use by TNRS.
09/28/2012
- 03:46 PM Revision 5133: inputs/.TNRS/tnrs/map.csv: Mapped columns for components of original, submitted name
- 03:33 PM Revision 5132: mappings/VegCore-VegBIEN.csv, VegCore.csv: Removed no longer used verbatimScientificNameWithAuthorship. Use scientificNameWithAuthorship instead, and map accepted (scrubbed) names to acceptedScientificNameWithAuthorship to create the canon_taxonpath_id link.
- 03:28 PM Revision 5131: inputs/.TNRS/tnrs/map.csv: Remapped to new accepted* taxonomic terms
- 03:23 PM Revision 5130: mappings/VegCore-VegBIEN.csv: Mapped accepted* taxonomic terms
- 03:00 PM Revision 5129: sql_io.py: cleanup_table(): Don't clean up the pkey, because the canonicalization involved may produce collisions (as it does for TNRS.tnrs)
- 02:58 PM Revision 5128: sql.py: Added pkey_col_()
- 02:31 PM Revision 5127: tnrs.py: tnrs_request(): Added comment that names containing only whitespace characters are ignored by TNRS and do not receive a response row. Our tnrs_db and reimport pipeline handles the necessary re-matching-up by just creating taxonpaths for each Name_submitted, and then letting the data import process on the following import attach to the prepopulated taxonpaths.
- 02:17 PM Revision 5126: tnrs_db: Exclude taxonomic names which have already been scrubbed, by using a filter-out LEFT JOIN on TNRS.tnrs
- 02:02 PM Revision 5125: tnrs.py: max_pause: Changed to 30 min because TNRS sometimes freezes for ~10 min. The freezing usually happens while the data is being uploaded rather than when it's being retrieved, so that the max_pause would not apply, but to be on the safe side, requests should not time out unnecessarily.
- 01:27 PM Revision 5124: tnrs_db: tnrs_profiler: Use iter_text='name' for consistency with tnrs.tnrs_request()'s own profiler's iter_text
- 01:25 PM Revision 5123: tnrs_db: Print cumulative profiling information after every TNRS request, rather than just at the end
- 01:22 PM Revision 5122: inputs/.TNRS/tnrs/tnrs.make: Append to the log file instead of overwriting it, so that the TNRS scrubbing of each import's new taxonomic names can be included in one log file. Echo the command to the log file to identify separate runs.
- 01:15 PM Revision 5121: TNRS-related programs: Use "names" instead of "taxons" for variable names because what's being submitted are actually verbatim taxonomic names, not official references to specific taxa
- 01:08 PM Revision 5120: tnrs.py: tnrs_request(): Profile the TNRS request
- 12:58 PM Revision 5119: tnrs.py: tnrs_request(): Fixed bug where initial_headers needed to be copied instead of just assigned to headers, because initial_headers is a global constant and should not be changed when the Cookie header is added
- 12:17 PM Revision 5118: mappings/VegCore.csv: originalTaxonRank, acceptedTaxonRank: Fixed sources to use verbatimTaxonRank, not taxonRank
- 12:15 PM Revision 5117: mappings/VegCore.csv: originalTaxonRank: Added source of the original* prefix
- 12:14 PM Revision 5116: mappings/VegCore.csv: acceptedTaxonRank: Added source of the accepted prefix
- 12:12 PM Revision 5115: mappings/VegCore.csv: accepted* taxonomic terms: Fixed sources of the accepted prefix to use acceptedNameUsage, not acceptedNameUsageID
- 12:09 PM Revision 5114: mappings/VegCore.csv: original* taxonomic terms: Source the original prefix to DwC originalNameUsage, which is a more offical source than SALVIAS orig_species
- 12:09 PM Revision 5113: mappings/VegCore.csv: original* taxonomic terms: Source the original prefix to DwC originalNameUsage, which is a more offical source than SALVIAS orig_species
- 11:56 AM Revision 5112: mappings/VegCore.csv: Added accepted* taxonomic terms to store the scrubbed name
- 11:42 AM Revision 5111: import_all: Clean up any new TNRS.tnrs entries before importing the TNRS data
- 11:36 AM Revision 5110: inputs/.TNRS/tnrs/: Create using datasource schema.sql file instead of text header and postprocess.sql, for clarity and to enable using `make inputs/.TNRS/tnrs/install` to clean up the tnrs entries populated by tnrs_db
- 11:21 AM Revision 5109: mappings/VegCore-VegBIEN.csv: Don't combine taxonRank with infraspecificEpithet if there is no infraspecificEpithet, because the taxonRank is only the infraspecificEpithet's prefix when there is an actual infraspecificEpithet. Often, taxonRank contains values like "genus" or "species" which cannot be used for this purpose.
- 10:54 AM Revision 5108: tnrs.py: repeated_tnrs_request(): Just retry the request once with with debug turned on, to avoid cluttering the log output with the verbose debug info of multiple failed requests if the error is not resolved on retry
- 10:47 AM Revision 5107: tnrs.py: tnrs_request(): repeated_tnrs_request(): Print all suppressed exceptions to stderr
- 10:41 AM Revision 5106: tnrs.py: tnrs_request(): parse_response(): Include both the response headers and the response body in the InvalidResponse message
- 10:23 AM Revision 5105: inputs/import.stats.xls: Updated import times
- 10:15 AM Revision 5104: profiling.py: Profiler: Fixed bug where instance variable start had the same name as method start()
- 10:08 AM Revision 5103: mappings/VegCore-VegBIEN.csv: verbatimScientificNameWithAuthorship: Set canon_taxonpath_id to 0 on the first, scrubbed taxonpath to auto-create the self reference that indicates a scrubbed taxonpath
- 10:03 AM Revision 5102: mappings/VegCore-VegBIEN.csv: Don't forward scientificName to taxonoccurrence.authortaxoncode when importing just taxonpaths, as for TNRS
- 09:51 AM Revision 5101: tnrs_db: Moved lower max_taxons limit to tnrs.py because it's really required to avoid crashing the TNRS server and should apply to all callers
- 09:35 AM Revision 5100: tnrs_db: Print log message with # of taxonpaths being sent to TNRS
- 09:30 AM Revision 5099: tnrs_db: Fixed bug where InvalidResponse was missing module name
- 09:29 AM Revision 5098: tnrs_db: Profile the TNRS requests. This involves using a finally block to ensure that the profiling stats are printed even if the program exits with an error.
- 09:13 AM Revision 5097: tnrs_db: Reduced the chunk size to avoid slowing down the TNRS server
- 09:07 AM Revision 5096: inputs/.TNRS/tnrs/tnrs.make: Added log option which outputs to the terminal instead when set to ""
- 09:01 AM Revision 5095: tnrs_db: Added log messages for Making TNRS request and Storing TNRS response data so that if the TNRS daemon pauses, it's obvious which step it's waiting on
- 08:58 AM Revision 5094: sql.py: insert(): ignore optimization: Fixed bug where needed to run insert_select() recoverably so that the aborted transaction is rolled back after a DuplicateKeyException or NullValueException
- 08:43 AM Revision 5093: tnrs_db: If tnrs.repeated_tnrs_request() stil throws InvalidResponse, skip the current set in case its data caused the error. Note that it will still be tried again the next time tnrs_db is run.
- 08:34 AM Revision 5092: mappings/VegCore-VegBIEN.csv: Don't forward scientificName to taxonoccurrence.authortaxoncode when importing just taxonpaths, as for TNRS
- 08:30 AM Revision 5091: repeated_tnrs_request(): When retrying after an invalid response, output protocol info for debugging
- 08:29 AM Revision 5090: inputs/Makefile: Import logs: Don't download .TNRS/tnrs/tnrs.make.log by default because it changes each time `make inputs/.TNRS/tnrs/tnrs-remake` is run, and any version downloaded for debugging should be preserved. It can still be downloaded by setting the tnrs_log env var.
- 08:17 AM Revision 5089: tnrs_client, tnrs_db: Use new tnrs.repeated_tnrs_request()
- 08:16 AM Revision 5088: tnrs.py: Added repeated_tnrs_request() to retry a TNRS request which returned an invalid response
- 08:05 AM Revision 5087: db_xml.py: put_table(): Fixed bug where pkeys_loc needed to be initialized. Note that this bug was only triggered when importing a table with zero rows (in this case, the initial empty TNRS.tnrs table), because otherwise it would be set in the loop.
- 07:57 AM Revision 5086: inputs/Makefile: Import logs: Also download inputs/.TNRS/tnrs/tnrs.make.log
- 07:56 AM Revision 5085: inputs/Makefile: Import logs: Use new $(rsync*) to also sync datasources starting with ., such as .TNRS
- 07:55 AM Revision 5084: lib/common.Makefile: rsync: Added $(rsync*) to rsync all files, including those starting with "."
- 07:43 AM Revision 5083: tnrs.py: parse_response(): Raise custom InvalidResponse exception instead of SystemExit, so callers can catch the exception and respond to it
- 07:38 AM Revision 5082: mappings/VegCore-VegBIEN.csv: taxonpath.taxonomicnamewithauthor _join_words mappings: Added space after taxon rank prefix (var., etc.) for infraspecific ranks
09/27/2012
- 11:28 AM Revision 5081: import_all: Start the tnrs daemon using `make inputs/.TNRS/tnrs/tnrs-remake &`
- 11:25 AM Revision 5080: Added inputs/.TNRS/tnrs/tnrs.make to run tnrs_db on VegBIEN
- 11:25 AM Revision 5079: Added tnrs_db to scrub the taxonpaths in VegBIEN using TNRS
- 11:19 AM Revision 5078: Regenerated vegbien.ERD exports
- 11:17 AM Revision 5077: schemas/vegbien.sql: taxonpath: Made it datasource-general and uniquely identified only by its taxonomicnamewithauthor so that the taxonpaths imported by the TNRS datasource will be matched and used directly when the other datasources are imported
- 11:10 AM Revision 5076: schemas/vegbien.sql: taxonpath: taxonpath_unique_within_datasource_by_name unique index: Just do duplicate elimination on the taxonomicnamewithauthor, since that is now a required field and is generated by concatenating all the other fields. Note that the inserted row counts change slightly because the concatenation makes some names equal that are split among the fields differently, such as when the genus is included in the species field.
- 10:51 AM Revision 5075: db_xml.py: put(): Added _alt optimization that just returns the first arg if it's non-NULL
- 10:49 AM Revision 5074: sql_gen.py: Added is_nullable()
- 10:49 AM Revision 5073: schemas/vegbien.sql: taxonpath.taxonomicnamewithauthor: Made it NOT NULL, so that all taxonpaths would have a concatenated name to feed to TNRS
- 10:37 AM Revision 5072: mappings/VegCore-VegBIEN.csv: taxonomic terms: Changed _first to _alt because some datasources have NULL values in scientificNameWithAuthorship or scientificName, so it can't just be used in place of the joined-together taxonomic ranks
- 10:19 AM Revision 5071: db_xml.py: put(): Parse input columns and process values in separate loops, so that structural XML function optimization code can be inserted between them
- 10:12 AM Revision 5070: sql_io.py: put_table(): Removed comment that can support in_tables of any fixed-size iterable type, because the iterable must be ordered so that the first table can be treated specially
- 10:09 AM Revision 5069: sql_io.py: put_table(): Support in_tables of any fixed-size iterable type
- 09:13 AM Revision 5068: mappings/Veg+-VegCore.csv: cationExchangeCapacity->cationExchangeCapacity_cmol_kg mapping: Removed ? prefix because a mapping to only one set of units is unambiguous (if additional units for cationExchangeCapacity are found, this will become an ambiguous mapping). Note that canon automatically removes punctuation from VegCore terms, so this mapping would previously have had the ? prefix autoremoved anyway (both in inputs/*/*/map.csv and recently also in Veg+-VegCore.csv).
- 09:06 AM Revision 5067: mappings/Makefile: .Veg+-VegCore.csv.last_cleanup: Translate VegCore terms using itself so that any mapping to another Veg+ term automatically becomes a mapping to a VegCore term. .VegX-VegCore.csv.last_cleanup: Translate VegCore terms using Veg+-VegCore.csv to keep the terms up to date.
- 09:04 AM Revision 5066: mappings/VegX-VegCore.csv: Translated VegCore terms using Veg+-VegCore.csv
- 09:00 AM Revision 5065: mappings/Makefile: .VegCore.csv.last_cleanup, .VegCore-VegBIEN.csv.last_cleanup: Apply Veg+-VegCore.csv so that terms can easily be renamed just by adding a mapping in Veg+-VegCore.csv, which will auto-translate all places that use the term. .VegCore-VegBIEN.csv.last_cleanup: Canonicalize to VegCore.csv so case changes in VegCore terms will automatically propagate to VegCore-VegBIEN.csv.
- 08:46 AM Revision 5064: mappings/VegCore-VegBIEN.csv: Mapped verbatimScientificNameWithAuthorship, so that it links a verbatim taxonpath to the scrubbed taxonpath created from the primary taxonomic terms
- 08:36 AM Revision 5063: mappings/VegCore.csv: Renamed unscrubbedScientificNameWithAuthorship to the more standard verbatimScientificNameWithAuthorship, which is available now that the original taxondetermination terms use the original* prefix
- 08:31 AM Revision 5062: mappings/VegCore.csv: Renamed verbatim* taxonomic terms to original* because in most datasources, they are in fact for the *original* taxon determination of the organism (which can be a completely different name than the primary determination), rather than merely unscrubbed versions of the primary taxonomic name elements. Note that SALVIAS's orig_* terms do appear to be merely unscrubbed versions, but it's not a problem to add an additional taxon determination for them.
- 08:14 AM Revision 5061: sql.py: pkey(): Get the table's actual primary key column, rather than just using the first column in the table. Continue to return the first column in the table if the table has no primary key.
- 07:31 AM Revision 5060: inputs/.TNRS/tnrs/postprocess.sql: Use :table var instead of hardcoding the table name
- 07:30 AM Revision 5059: inputs/.TNRS/tnrs/postprocess.sql: Also add a primary key on Name_submitted, to prevent duplicate entries
- 07:27 AM Revision 5058: inputs/.TNRS/tnrs/: Added postprocess.sql which makes Name_submitted NOT NULL
- 07:25 AM Revision 5057: sql.py: insert(): ignore mode: Also ignore NullValueException
- 07:24 AM Revision 5056: input.Makefile: Staging tables installation: %/install: Support custom postprocess.sql which specifies commands to run after the table is imported
- 07:10 AM Revision 5055: import_all: Added import of .TNRS datasource, which happens synchronously before other datasources are imported
- 07:08 AM Revision 5054: Moved tnrs table from public (schemas/vegbien.sql) to its own TNRS schema, which is created by a new .TNRS datasource. Note that .TNRS is included in the automated testing, but not yet in the import.
- 06:57 AM Revision 5053: mappings/VegCore-VegBIEN.csv: Restored subplotID -> if subplot cond mapping, which had been overwritten
- 06:46 AM Revision 5052: inputs/ACAD/Specimen/map.csv: Remapped scientificName to scientificNameWithAuthorship
- 06:06 AM Revision 5051: sql_io.py: append_csv(): Using INSERT: Use ignore mode to support inserting rows into a table with a unique constraint
- 06:05 AM Revision 5050: sql.py: insert(): Added ignore optimization that just suppresses any DuplicateKeyException on the client side, to avoid needing to create a wrapper function just to insert-ignore one row
- 05:23 AM Revision 5049: mappings/VegCore-VegBIEN.csv: Synchronized verbatim* and non-verbatim taxonomic terms' mappings
- 05:08 AM Revision 5048: mappings/VegCore.csv: Added special term unscrubbedScientificNameWithAuthorship
- 05:05 AM Revision 5047: mappings/VegCore.csv: Added verbatimSubspecies, verbatimVariety, verbatimForma, verbatimCultivar (already mapped in VegCore-VegBIEN.csv)
- 05:04 AM Revision 5046: mappings/Makefile: .VegCore.csv.last_cleanup: Also remake VegCore-VegBIEN.unsourced_terms.csv here, not just in .VegCore-VegBIEN.csv.last_cleanup, so that the unsourced_terms.csv will be remade if the user adds the missing sources to VegCore.csv
- 05:03 AM Revision 5045: mappings/Makefile: VegCore-VegBIEN.unsourced_terms.csv: Factored remake code into its own make target
- 04:51 AM Revision 5044: mappings/VegCore-VegBIEN.csv: verbatim* taxonomic terms: Added taxonomicnamewithauthor mappings analogous to those for the non-verbatim taxonomic terms
- 04:29 AM Revision 5043: mappings/VegCore.csv: Added verbatimScientificNameWithAuthorship
- 03:50 AM Revision 5042: Added inputs/.public/, which stores mappings that manipulate VegBIEN itself
- 03:49 AM Revision 5041: forwarding.Makefile: Differentiate between subdirs which can be sent a command and subdirs which will receive a command broadcast to "all" subdirs
- 03:39 AM Revision 5040: README.TXT: Data import: Starting column-based import: Use import_all, which now supports passing custom vars like by_col=1
- 03:37 AM Revision 5039: import_all: Pass any args, such as vars, through to with_all
- 03:35 AM Revision 5038: with_all: Support additional command-line args for the make target, such as vars
- 03:11 AM Revision 5037: sql_io.py: append_csv(): Check that the CSV's header matches the table's columns
- 03:08 AM Revision 5036: schemas/vegbien.sql: Added tnrs table to hold contents of TNRS response
- 02:20 AM Revision 5035: input.Makefile: Existing maps discovery: $(anyMap): Inlined patterns used because they are only used here
- 01:27 AM Revision 5034: schemas/vegbien.sql: taxonpath_canon_taxonpath_id_self_ref(), placepath_canon_placepath_id_self_ref(): Fixed bug where the pkey could only be prepopulated if it was not already set, in order to support UPDATE as well as INSERT statements
- 01:15 AM Revision 5033: schemas/vegbien.sql: taxonpath.canon_taxonpath_id, placepath.canon_placepath_id: Fixed comment describing that the special value 0 creates an automatic self-reference
- 01:09 AM Revision 5032: schemas/vegbien.sql: taxonpath.canon_taxonpath_id, placepath.canon_placepath_id: Added trigger to automatically create a self-reference (indicating a scrubbed name) when set to the special value 0
- 12:33 AM Revision 5031: input.Makefile: Staging tables installation: `%/install: %/create.sql`: Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)
- 12:28 AM Revision 5030: bin/map, db_xml.put_table() (row-based and column-based import): Don't sort the input table by its pkey, in order to support input tables with no pkey. Note that reading the input table in table order and having this match the input flat file's order is only possible with sql_io.import_csv()'s truncation of the table on a failed import, which ensures that the rows will be stored in inserted order.
- 12:19 AM Revision 5029: input.Makefile: Staging tables installation: Removed no longer used $(isJoinedTable). Note that it is no longer necessary for joined tables to be suffixed with ".src" to prevent the creation of a row_num column, which collided during joins.
- 12:17 AM Revision 5028: csv2db: Removed no longer used has_row_num param
- 12:14 AM Revision 5027: sql_io.py: import_csv(): Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)
09/26/2012
- 11:49 PM Revision 5026: bin/map, db_xml.put_table() (row-based and column-based import): Don't sort the input table by its pkey, in order to support input tables with no pkey. Note that reading the input table in table order and having this match the input flat file's order is only possible with sql_io.import_csv()'s truncation of the table on a failed import, which ensures that the rows will be stored in inserted order.
- 11:34 PM Revision 5025: sql_io.py: import_csv(): Only do the import in a savepoint if using COPY FROM, to allow autocommits after each insert and thus make rows visible immediately after they are inserted
- 10:53 PM Revision 5024: db_xml.py: put_table(): Subsetting in_table: Add a row number column if in_table does not already have a pkey
- 10:48 PM Revision 5023: db_xml.py: put_table(): Subsetting in_table: Copy all of in_table's structure, rather than just the column types, by using sql.copy_table_struct() and sql.insert_select(). This preserves pkeys and NOT NULL constraints, which are useful for column-based import.
- 10:47 PM Revision 5022: db_xml.py: put_table(): Subsetting in_table: Create in_table as a completely new sql_gen.Table instead of copying full_in_table and relying on sql.run_query_into() to set is_temp and remove the schema
- 10:40 PM Revision 5021: sql.py: add_row_num(): Use if_not_exists in order to abort if the column already exists rather than adding a version #
- 10:36 PM Revision 5020: sql.py: add_col(): Added if_not_exists param to abort if the column already exists rather than adding a version #
- 10:14 PM Revision 5019: db_xml.py: put_table(): Removed no longer accurate comment that full_in_table will be shadowed (hidden) by the created temp table. (The temp table is now named differently, so the shadowing does not occur.)
- 10:02 PM Revision 5018: db_xml.py: put_table(): Replaced no longer accurate Recurse comment with Import data. Rewrapped lines.
- 09:12 PM Revision 5017: sql_io.py: import_csv(): Factored insertion code out into new append_csv()
- 08:47 PM Revision 5016: README.TXT: Data import: `make test by_col=1`: Replaced errors explanation with pointer to updated explanation in the Testing section
- 08:31 PM Revision 5015: xml_func.py: Removed no longer used _name(). Use _join_words() instead.
- 08:30 PM Revision 5014: mappings/VegCore-VegBIEN.csv: Use new, more general _join_words() instead of _name()
- 08:22 PM Revision 5013: mappings/Veg+-VegCore.csv: Prefix ambiguous terms' VegCore replacement with "?" so it's visually flagged in map.csv, in the same way that unmatched terms are flagged with a "*" prefix
- 08:19 PM Revision 5012: mappings/VegCore-VegBIEN.csv: Taxonomic terms: Also join terms together in taxonomicnamewithauthor if scientificNameWithAuthorship is not provided, for use by TNRS
- 08:15 PM Revision 5011: xml_func.py: Simplifying functions: Merging: Added _join_words()
- 07:57 PM Revision 5010: inputs/ARIZ/Specimen/map.csv: Remapped ScientificNameAuthor to scientificNameWithAuthorship because it contains the binomial in addition to the authority
- 07:39 PM Revision 5009: schemas/functions.sql: Added _join_words()
- 07:33 PM Revision 5008: input.Makefile: Paths: $(datasrc): Remove any "." prefix from the subdir name. The "." prefix allows a subdir to be hidden from the normal import process.
- 06:56 PM Revision 5007: db_xml.py: put_table(): Allow caller to specify custom partition_size
- 06:45 PM Revision 5006: tnrs.py: tnrs_request(): Return the CSV stream directly instead of reading it into a string
- 06:42 PM Revision 5005: tnrs.py: tnrs_request(): Moved CSV-download-specific functionality from do_request() to the Download section
- 06:34 PM Revision 5004: inputs/import.stats.xls: Updated import times
09/25/2012
- 11:13 PM Revision 5003: tnrs.py: tnrs_request(): Return the response instead of printing it to stdout
- 10:59 PM Revision 5002: schemas/py_functions.sql: _namePart(): Fixed bug where it was returning the empty string instead of NULL
- 10:46 PM Revision 5001: sql_io.py: import_csv(): Documented that sql.truncate() MUST be run so that the rows will be stored in inserted order, and the row_num added after import will match up with the CSV's row order
- 10:35 PM Revision 5000: sql.py: add_row_num(): Add distinguishing comment to ADD COLUMN statement so that it will be cached. The distinguishing comment is required because sometimes column names are truncated, leading to unwanted collisions with previously-cached ADD COLUMN statements. It provides a way of distinguishing the full column name behind a particular ADD COLUMN statement.
- 10:24 PM Revision 4999: sql_io.py: import_csv(): Free memory used by deleted rows from any failed import. Documented that sql.create_table() is not rolled back if the import fails, but instead is cached, and will not be re-run if the import is retried.
- 09:37 PM Revision 4998: sql_io.py: import_csv(): Fixed bug where the added row number column needed to be named row_num instead of _row_num to be autodetected as the pkey column (sql.pkey_col) by sql.pkey() and to avoid name collisions with the row number column added in column-based import
- 09:34 PM Revision 4997: sql.py: add_row_num(): Support custom row number column name
- 09:12 PM Revision 4996: csv2db: Use new sql_io.import_csv()
- 09:10 PM Revision 4995: sql_io.py: Added import_csv()
- 09:05 PM Revision 4994: csv2db: Don't truncate the table before loading rows because it has just been created, and is therefore empty. This statement may be left over from a time when the table was created only once, and its creation was not rolled back if the import fails.
- 08:44 PM Revision 4993: sql_io.py: cleanup_table(): Print 'Cleaning up table' log message
- 08:41 PM Revision 4992: sql_io.py: cleanup_table(): Also vacuum and reanalyze table
- 07:43 PM Revision 4991: tnrs_client: Use new tnrs.tnrs_request()
- 07:43 PM Revision 4990: Added tnrs.py
- 07:34 PM Revision 4989: tnrs_client: Factored TNRS request code into separate function tnrs_request()
- 07:23 PM Revision 4988: inputs/VegBank/taxonimportance/map.csv: Documented that taxonimportance is not 1:1 with taxonobservation
- 07:22 PM Revision 4987: mappings/VegCore-VegBIEN.csv: Removed unnecessary /_first/# suffix for multiple terms in the same _exists expression, because _exists() only checks whether its node is non-empty, and it does not matter how many child nodes it contains
- 06:57 PM Revision 4986: schemas/vegbien.sql: taxonoccurrence: taxonoccurrence_unique_within_locationevent unique index: Fixed bug where locationevent_id needed to be enclosed in COALESCE(..., 2147483647) so that the unique constraint also applies to rows with NULL locationevent_ids (there is no other unique constraint handling these rows)
- 06:52 PM Revision 4985: README.TXT: Documented that if the row-based and column-based imports produce different inserted row counts, this usually means that a table is underconstrained (the unique indexes don't cover all possible rows). The inserted row count difference occurs because column-based import collapses empty table rows into one insert, while row-based import performs an insert of the empty row for each input row. Without a unique index to combine multiple row-based inserts, extra rows will be added.
- 06:48 PM Revision 4984: sql_io.py: put_table(): Warn if inserting empty table rows
- 06:13 PM Revision 4983: schemas/py_functions.sql: _namePart(): Fixed bug where it was returning the empty string instead of NULL
- 05:57 PM Revision 4982: schemas/functions.sql, py_functions.sql: Added schema comment that functions must always return NULL in place of the empty string, to ensure that empty strings do not find their way into VegBIEN. Note that row-based import automatically removes empty strings because the intermediate values are stored in XML and our XML DOM traversing code auto-replaces the empty string with NULL. Column-based import, on the other hand, does not, because the intermediate data is stored in database temp tables instead of a DOM tree.
- 05:31 PM Revision 4981: root map: Fixed custom public schema override to work with schemas lists that include public, by replacing public with the new public schema instead of just appending it
- 04:53 PM Revision 4980: inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.
- 04:52 PM Revision 4979: inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.
- 04:36 PM Revision 4978: ins_col: Added column fill value param
- 04:16 PM Revision 4977: inputs/VegBank/stemcount/map.csv: Fixed bug where taxonimportance_id needed to point to aggregateOccurrenceID instead of taxonOccurrenceID
- 04:15 PM Revision 4976: mappings/VegCore-VegBIEN.csv: Don't forward individualID to taxonoccurrence.sourceaccessioncode when aggregateOccurrenceID is present
- 03:52 PM Revision 4975: inputs/import.stats.xls: Updated import times
09/24/2012
- 06:45 PM Revision 4974: Regenerated vegbien.ERD exports
- 06:33 PM Revision 4973: schemas/vegbien.sql: placepath.otherranks comment: Added analogous text from taxonpath.otherranks
- 06:31 PM Revision 4972: schemas/vegbien.sql: taxonpath.author comment: Added equivalent Darwin Core term
- 06:27 PM Revision 4971: schemas/vegbien.sql: taxon columns: Added descriptive comments for data dictionary
- 06:15 PM Revision 4970: schemas/vegbien.sql: placepath: Added canon_placepath_id, analogous to taxonpath.canon_taxonpath_id
- 06:09 PM Revision 4969: schemas/vegbien.sql: place, placepath descriptive comments: Added analogous text from taxon/taxonpath
- 06:05 PM Revision 4968: schemas/vegbien.sql: taxonpath: descriptive comment: Changed "applicable taxon" to "identified taxon"
- 05:58 PM Revision 4967: schemas/vegbien.sql: taxon: descriptive comment: Reworded to emphasize that this stores only one rank (e.g. family) of the full taxonomic name, in contrast to taxonpath, which stores all of them
- 05:54 PM Revision 4966: schemas/vegbien.sql: taxonpath: descriptive comment: Clarified that this is the full path to a taxon, including all components of the taxonomic name
- 05:48 PM Revision 4965: schemas/vegbien.sql: Replaced "scientific name" with "taxonomic name" for schema-wide consistency and for consistency with the taxon/taxonomic name vocabulary
- 05:38 PM Revision 4964: schemas/vegbien.sql: taxonpath named ranks: Added descriptive comments for data dictionary
- 05:34 PM Revision 4963: schemas/vegbien.sql: taxonpath columns other than named ranks: Added descriptive comments for data dictionary
- 05:14 PM Revision 4962: schemas/vegbien.sql: taxonscope: descriptive comment: Reworded to make the first sentence a noun, for consistency with other descriptive table comments
- 05:13 PM Revision 4961: schemas/vegbien.sql: taxon: descriptive comment: Added note that the taxonname stores only one rank (e.g. family) of the full identifying name
- 05:07 PM Revision 4960: schemas/vegbien.sql: taxonpath: descriptive comment: Reworded to make the first sentence a noun, for consistency with other descriptive table comments. The convention is for the first "sentence" to be a noun which describes the entity that the table models.
- 05:00 PM Revision 4959: schemas/vegbien.sql: comments: Removed units from comments on fields which already have a units suffix, to avoid having to keep the units in sync between the suffix and the comment. Note that the units were abbreviated equally in the suffixes and comments, so this did not result in a loss of information other than the ^ for a quantity squared (but it's obvious enough that m2 is m^2).
- 04:54 PM Revision 4958: schemas/vegbien.sql: taxonscope: descriptive comment: Added period for consistency with other descriptive table comments
- 04:50 PM Revision 4957: schemas/vegbien.sql: taxon: Added descriptive comment for data dictionary
- 04:48 PM Revision 4956: schemas/vegbien.sql: VegBank-equivalent tables comments: Prepended "Equivalent to" before VegBank, so the equivalent tables statement can fit grammatically after a description of the table instead of having to be the first phrase in the descriptive table comment
- 04:41 PM Revision 4955: schemas/vegbien.sql: taxon: VegBank-equivalent tables comment: Added plantName and applicable columns from plantStatus, which are also part of the taxon table
- 04:37 PM Revision 4954: schemas/vegbien.sql: placepath: Added otherranks field, analogous to taxonpath.otherranks
- 04:26 PM Revision 4953: schemas/vegbien.sql: taxonpath: Added descriptive comment for data dictionary
- 03:36 PM Revision 4952: inputs/import.stats.xls: Updated import times
- 02:58 PM Revision 4951: inputs/UNCC/Specimen/map.csv: accession: Documented that it's globally unique, although occasionally duplicated
- 02:54 PM Revision 4950: inputs/REMIB/Specimen/map.csv: Remapped accession_number to catalogNumber because it is not globally unique, only (usually) unique within the institution providing the data ("acronym"). Note that there are nevertheless 11,869 rows where an accession_number appears multiple times within the same institution.
- 02:45 PM Revision 4949: mappings/VegCore-VegBIEN.csv: Only use institutionCode+collectionCode+catalogNumber as the authorlocationcode (location-scoping ID) if there is actually a catalogNumber. Otherwise, the mapping process would attempt to create one location for each collection in the datasource, when there should be one location for each specimen.
- 02:36 PM Revision 4948: schemas/py_functions.sql: _namePart(): Slice the first name from the beginning of the string to one word before the end, instead of one after the beginning, in order to avoid overlap with the last name, which starts one before the end, when there is only one word. Note that only one word means the name is assumed to be a last name. This assumption may not always be true, but when a datasource provides the name concatenated, an assumption must be made when not all name components are present.
- 02:30 PM Revision 4947: schemas/vegbien.sql: party: Added check constraint to require at least an organizationname or surname. Previously, NULL entries for the collector or identifier incorrectly caused the creation of an empty party entry, hence the lower inserted row counts now that this is no longer created.
- 02:17 PM Revision 4946: inputs/REMIB/Specimen/map.csv: Remapped acronym to institutionCode because this is an aggregator, and the field lists the datasource each record was aggregated from. Note that the inserted row count changes because of different duplicate elimination strategies in specimenreplicate and party (which institutionCode is placed in).
- 02:11 PM Revision 4945: inputs/REMIB/Specimen/create.sql: Also filter out rows where acronym (collectionCode) is NULL because this is a required field for valid records
- 01:28 PM Revision 4944: schemas/vegbien.sql: taxonpath: Renamed scientificnameauthor to author so the column name doesn't have "scientificname" in it, which made the term look confusingly like scientificname itself. Added descriptive comment that this is the author of the scientific name.
- 01:19 PM Revision 4943: schemas/vegbien.sql: taxonpath: Renamed canon_id to canon_taxonpath_id to clarify that this is a recursive fkey. The convention is that a recursive fkey includes the table name plus a descriptive prefix.
- 01:14 PM Revision 4942: schemas/filter_ERD.csv: Don't filter out fkeys from taxonpath to itself
- 01:04 PM Task #501 (Resolved): find out which datasources won't allow their data to be publicly accessible
- * needed before we can make VegBIEN public
These datasources are:
* "REMIB":http://www.conabio.gob.mx/remib/cgi... - 01:02 PM Task #500 (New): when lower rank has name concatenated together, use lowest rank as the scientific name
- 12:57 PM Task #499 (Resolved): map example terms into the taxonomic schema
- 12:57 PM Task #498 (Resolved): add definitions to columns in "green tables"
- 12:57 PM Task #497 (Resolved): create examples of taxonomic names to test the limits of the new taxonomic schema
- * need types of morphospecies indicators
- 11:32 AM Revision 4941: schemas/vegbien.sql: taxonpath: Added canon_id for the canonical (scrubbed) taxonpath determined by TNRS
- 11:24 AM Revision 4940: schemas/vegbien.sql: taxonpath: taxonpath_unique_within_datasource_by_name unique index: Added otherranks, so that ranks without a named column will be used in uniquely identifying the taxonpath
- 11:22 AM Revision 4939: sql.py: DbConn.col_info(): Parse array types as sql_gen.ArrayType
- 11:22 AM Revision 4938: sql_gen.py: EnsureNotNull: Support ArrayType types
- 11:21 AM Revision 4937: strings.py: remove_prefix(), remove_suffix(): Added require param to raise aan exception if the string does not have the given prefix/suffix
- 11:06 AM Revision 4936: sql.py: DbConn.col_info(): Moved parsing of user-defined datatypes to Python code, so that parsing for other composite types which also requires both data_type and udt_name can easily be added
- 11:03 AM Revision 4935: sql_gen.py: Added ArrayType
- 10:29 AM Revision 4934: schemas/vegbien.sql: Scope taxonpath instead of taxon with taxonscope, because a morphospecies name is specific to a datasource entity, so it should go in the datasource-specific taxonpath table instead of the datasource-general taxon table
- 10:14 AM Revision 4933: schemas/vegbien.sql: taxonpath: Added otherranks array column to store ranked names without a named column. Documented that ranks with no named column should be stored in this new field instead of in a chain of taxons pointed to by taxon_id. This ensures that only the tree of life uses the taxon table.
- 09:47 AM Revision 4932: schemas/vegbien.sql: Removed no longer used table stemtag, which has been replaced by stemobservation.tag, stemobservation.tags
09/21/2012
- 04:28 PM Revision 4931: inputs/ARIZ/Specimen/test.xml.ref: Updated after reinstalling staging table with new sql_io.null_strs
- 04:22 PM Revision 4930: inputs/VegBank/: Added stemlocation/
- 04:17 PM Revision 4929: inputs/VegBank/: Added stemcount/
- 04:10 PM Revision 4928: sql_io.py: cleanup_table(): Fixed bug where couldn't run any update statement when no columns are text
- 03:57 PM Revision 4927: csv2db: COPY FROM mode: Removed no longer needed explicit column list, now that the initial table has the exact width of the CSV (the row_num is added later)
- 03:55 PM Revision 4926: csv2db: Add any row_num column after creating the table, so it does not interfere with row widths when using COPY FROM without explicit column names
- 03:48 PM Revision 4925: csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
- 03:44 PM Revision 4924: csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
- 03:28 PM Revision 4923: inputs/VegBank/: Added taxonimportance/
- 03:20 PM Revision 4922: mappings/VegCore.csv: Added and mapped aggregateOccurrenceID
- 03:12 PM Revision 4921: mappings/VegCore.csv: taxonOccurrenceID: Re-sourced to VegBank taxonobservation and DwC occurrenceID, because this is where the VegBIEN table name came from
- 02:57 PM Revision 4920: tnrs_client: Support parsing multiple taxons at once, by specifying each as a command-line argument. Increased the max_pause to 10 min to support large batches. Limited the batch size to 5000 names, using the limit at <http://tnrs.iplantcollaborative.org/TNRSapp.html>. Note that when using xargs to pass many names, xargs will by default split its arguments into chunks of 5000. You can change this using the -n option.
- 02:29 PM Revision 4919: inputs/import.stats.xls: Updated import times
- 01:20 PM Revision 4918: Added tnrs_client. Note that obtaining an actual CSV requires four (!) steps: submit, retrieve, prepare download, and download. The output of the retrieve step is unusable because the array has different lengths depending on the taxonomic ranks present in the provided taxon name. This initial version runs one name at a time, but could later be expanded to batch process because TNRS can run multiple names at once.
- 12:36 PM Revision 4917: streams.py: Line iteration: Added read_all()
- 08:24 AM Revision 4916: inputs/Madidi/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
- 08:18 AM Revision 4915: sql_io.py: null_strs: Added '-'
- 08:18 AM Revision 4914: sql_io.py: cleanup_table(): Fixed bug where each column name needed to be converted to Unicode before being concatenated with other strings, to support non-ASCII characters
- 07:57 AM Revision 4913: inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
- 07:52 AM Revision 4912: inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Removed no longer needed old-style _units filter, now that unit conversion is handled by mappings/VegCore-VegBIEN.csv using _percent_to_fraction
- 07:48 AM Revision 4911: inputs/VegBank/observation_/map.csv: soilObs fields: Cited data dictionary source of units
- 07:15 AM Revision 4910: mappings/Veg+-VegCore.csv: Soil component measurements: Added unitless terms that automap to all alternatives of units
- 07:08 AM Revision 4909: mappings/VegCore.csv: Added term with *_fraction units for every *_percent term
- 07:03 AM Revision 4908: mappings/VegCore.csv: Soil component measurements: Added default units of percent (cmol_kg for cationExchangeCapacity). This involves translating the names everywhere and adding a _percent_to_fraction conversion in mappings/VegCore-VegBIEN.csv.
Also available in: Atom