Project

General

Profile

2013-09-12 conference call

with some edits by Martha

Upcoming

  • call next week at usual time (Th. 9am PDT&Tucson)

Availability

  • Brad won't be able to respond to e-mails until after Monday because he has a huge deadline
  • See the *Google spreadsheet* (and please add your availability for future weeks once it's known):

Loading Google Spreadsheet...

Decisions made

VegBIEN will not be public

  • the release for the October deadline should just be within the BIEN group, not public (Brian)

datasource validations

  • refactoring is a higher priority than validation (Brad, Martha)
    • Clarification (Martha): This refers only to VegBank and CVS, not to all data sources.
  • VegBank is a higher priority because CVS depends on it (Brian)
    • Clarification (Martha):
      • Aaron will refactor VegBank and CVS to load using the new architecture (process).
      • This is a change to the decision we made a few weeks ago not to refactor code for data sources that were already loading into VegBIEN.
      • The reason to alter this decision is that Aaron said it will be more efficient than using the current loading since there are major validation issues.
      • After the call Martha asked how long Aaron thought this would take to make sure it wouldn't be an inordinate amount of the remaining time. Aaron thinks "For just VegBank and CVS, more likely on the order of a few hours each for the left-joins, and a few hours to a day each for switching to new-style import."

attribution and conditions of use

  • putting together the attribution table is not a showstopper, we just need a solution by the October deadline

Possible tasks for iPlant developer (from Martha)

  • Geoscrubbing is something an iPlant developer could do.
  • Data provider attribution is NOT something an iPlant developer could do.

To do for Brad, Brian, Bob

data provider metadata

  • have separate meeting to figure out the use conditions of each datasource
    • this may require going back through old e-mails to see what each data provider said when they provided us with their data
  • Clarification (Martha):
    • Brian, Brad, Bob have a separate call to discuss data provider metadata and attribution
      • The schema changes to support this need to be in place.
      • Please decide/clarify whether this needs to be implemented in the workflow by the October deadline. (It wasn't clear to MN from discussion on the call.)

To do for Bob

datasource validations

  • validate CVS when extract is ready

To do for Martha

geoscrubbing

  • find iPlant person who can run Jim's scripts
    "Prospects are very good for getting someone to help with automating the geoscrubbing pipeline (less sure about running it as needed in the mean time), but, of course the guys first need to look into the documentation and code. The earliest someone can really dig in and take a look will be October 1." (Martha)
    • modifications needed:
      • make sure the scripts are "headless" (fully automated)
      • clear out tables in between runs
      • trigger the scripts when a new dataset comes in
      • don't re-scrub already-scrubbed data
  • Clarification (Martha): Martha to see if and how soon an iPlant developer could work on this.

BIEN2 queries

  • find someone besides Aaron/Brad to run these for Brian
    Martha: Currently there is not a solution to this problem and it is not in scope for iPlant. The best we can do for now is for Brian to only make requests if absolutely necessary (as this one was) and to ask Aaron how much time any request will take to fulfill, before giving the go-ahead to Aaron.
  • From Brad (added by Martha):
    • I am happy to run BIEN2 queries. So long as I am not up against a "huge deadline" such as this coming Monday, I am glad to run queries against the BIEN2 database. I know it well and can do so quickly. It takes very little of my time.
    • I cannot run queries on the BIEN3 database until it is complete. Once we have completed BIEN3 development, I should be able to run queries on the normalized BIEN3 database as well, as long as the demands on my time are not excessive.

To do for Aaron

datasource validations

  1. add column to datasources table indicating whether switched to new-style import or not
  2. switch VegBank, CVS to new-style import
  3. flatten VegBank, CVS
    • by denormalizing, we hope to avoid some of the validation bugs that normalized datasources tend to have
    • this also avoids the need to run the left-join as part of the validation process, because the data will already be left-joined
  4. fix remaining VegBank issues
  5. send Bob a CVS extract
  6. continue working on the remaining data sources that have validation issues (Martha)
    • Email the group if you need input.

data provider metadata

  • check that the schema supports the necessary metadata fields
    • sourcecontributor needs a field for individual contributors' data use conditions

Addition (Martha): remaining data loading/validation

  • When the above tasks are completed, continue working on the remaining data sources that have validation issues.
  • Email the group if you need input.

Martha's notes