Project

General

Profile

2014-02-27 conference call

Brad's notes

Quantitative validations

- SALVIAS validation pipeline is still incomplete. So far, only 4 of the 18 total queries have been completed and run as a pipeline. Trait validations are complete, but work on specimens or other plot sources has not been started. Aaron needs to make faster progress or he will not be able to finish.
- Aaron has encountered the following problems:
1. Column names do not match between input and output queries
2. Data types do not match between some columns involved in joins
3. Some output queries are structured incorrectly, returning records for plots+subplots instead of just plots
4. Some queries have not been written yet
- Solutions:
1. Alias columns as needed to ensure that the number and names of columns match between output and input queries
2. re-CAST data types as needed to ensure that data types match between joining columns
3. Rename column location.top_plot to location.plot_location_id. Fix output queries which incorrectly return plots+subplots instead of plots by joining location to itself on location.plot_location_id=location.location_id.
4. Write the missing queries
- Priority (order) of tasks for coming week:
1. Rename column location.top_plot to location.plot_location_id in all queries, triggers, etc. in which this column appears.
2. For SALVIAS validations, complete and fix all pairs of queries in ascending order by query number. In other words, complete input query 1 and output query 1 before moving onto input query 2 and output query 2. Do this by running the SQL directly and inspecting the results; do not test by running queries in pipeline. It is important to ensure that the queries themselves are correct before adding them to the pipeline.
3. Add the queries to the pipeline in ascending order by query number, ensuring that each query pairs runs without error before moving onto the next query pair.
- Aaron will report back to Brad and the rest of the BIEN DB group as he completes each of tasks 1-3.

The following items need to be done before the end of March, but should NOT be done next week. Aaron needs to focus on completing quantitative validations next week.

Traits:
- Change the core DB schema and loading scripts to add the missing fields (marked with asterisks) to table trait
- Reload trait data from bien2_staging.TraitObservation in MySQL on nimoy after table contents have been refreshed by Brad. Brad will let Aaron know when the trait data have been refreshed.

Taxon names:
- All taxon names from all sources will be re-run through the TNRS using The Plant List as the taxonomic source
- This needs to be done after Brad has finished adding The Plant List to the TNRS, and before releasing the database. Brad will let Aaron know when the TNRS is ready
- Aaron estimates it will take at least 1 week to run all names through the TNRS

upcoming

  • the next regular conference call is next week at the usual time (Th. 9am PT/10am Tucson/12pm ET)

availability

  • please add your availability for spring 2014 to the *spreadsheet*:

Loading Google Spreadsheet...

decisions

SALVIAS aggregating validations

  • implement and test queries in the order they are numbered, instead of writing the blank queries first (Brad)
    • write blank queries if they are the next query in the sequence
  • test queries outside of the pipeline first

traits data

  • add TraitObservation unmapped columns to the trait table rather than mapping them to party (Brad) instead, these will be linked from dataset-<sourcecontributor once the attribution schema is in place: "Every taxonoccurrence is linked to a dataset [...] Traits, including those which do not have locations or locationevents, are thus linked to datasets" (Brad) .
  • OK to just add new columns to the version-controlled schema, and let them get added to the live DB in the next full-database import
  • Brad will load new data to bien2_staging.TraitObservation, rather than us importing it directly into VegBIEN from the CSVs, in the interest of time (Brad)

to do for Aaron

schema changes

  1. rename location.top_plot -> plot_location_id
    • in the live and testing databases, the triggers for each, the ERD, and the FAQ
  2. send e-mail that top_plot renamed
  3. rename top_plot view -> plot
    • in the live and testing databases, and the FAQ

SALVIAS aggregating validations

  1. test each query outside of the pipeline, in order
  2. send e-mail that queries written
  3. implement each query in the pipeline

traits data

  1. add TraitObservation unmapped columns to the trait table instead, these will be linked from dataset-<sourcecontributor once the attribution schema is in place: "Every taxonoccurrence is linked to a dataset [...] Traits, including those which do not have locations or locationevents, are thus linked to datasets" (Brad) .
  2. reload staging tables from bien2_staging.TraitObservation once Brad has updated it to the new traits data

TNRS client

  • add new source that Brad is adding