Project

General

Profile

2013-06-20 conference call

Decisions made during the call

  • Switching to new-style import is now a higher priority than range modeling instead of the other way around
    • Brian is OK with this (he is John's supervisor)
  • Validations should be added to the normalized database rather than the denormalized full_occurrence table
    • Although it is easier to add them to full_occurrence, this makes them available in the normalized portion of the database

New-style import

see New-style import

Decisions we still need to make

New-style import

  1. Do we prefer stage II or stage IV validations?
    stage II is probably better, because the derived columns will be created without needing to go through all of column-based import.
    also, stage II validations support the refactor-in-place method of translating datasources to VegBIEN, which uses the existing staging tables as the output tables, rather than either running all datasources into the same set of output tables (making removing a datasource difficult) or running each datasource into its own set of VegBIEN tables, which would add ~70 VegBIEN tables to
    each of the ~40 datasources.
    • Is it important to have the derived columns in the staging tables?
    • Is it important for the database to be able to add the derived columns automatically?
  2. Should we allow stage III validations during the normalization step?
    probably not, but we can keep existing ones for now

To do for Aaron

  1. Complete new-style import diagram

New-style import

see Switching to new-style import

Range modeling

  • now a lower priority than new-style import

Misc

From Martha:

Aaron, just add the following items to your overall ‘To Do’ list and implement them at the appropriate step in the desired (new) db building process:

  • File a JIRA request to structure the TNRS metadata text file into columns.: I instead added a comment to the existing JIRA request for TNRS metadata columns, saying that we would also be happy with the Download settings file itself in CSV format.
  • For homonyms, regarding Aaron’s addition of using author name, Aaron will add a threshold of >0.6

To Do for Others

  • Martha: Will reschedule the call the week of July 4th by polling for Tuesday or Wednesday.
  • Brad: Will send Aaron the query to build a lookup table of subclasses and families.
  • Aaron and Martha: Will edit Brad’s slides for the desired process (workflow) to build the normalized BIEN database.
  • Martha: Will send the slides to everyone. (Aaron sent them.)
  • All: Review the database building process in the slides and post questions, comments on the Redmine wiki for discussion during a future call: Db process comments
    comments instead added to the import process PowerPoint .
  • Martha: Will follow up with Naim on Aaron’s TNRS request.

Martha's Notes

notes

Availability

  • Brad available this summer on June 27-28; week of July 4; July 25; Aug 15
    • will talk to Martha on June 28
    • will talk to Aaron on week of July 4
    • reachable by phone and possibly laptop cell modem
    • *Ramona Walls* will be joining the group to help provide scientific advice since Brad will be less available in July and August. (corrected by Martha)
      • she has botany and ontology expertise and can provide domain knowledge
      • she will be on the conference calls and is on the BIEN mailing list
  • Martha will be gone this Friday afternoon
  • Bob won't be available