Project

General

Profile

2013-08-22 conference call

Martha's notes

Upcoming

  • call next week at usual time (Th. 8:30am)
    • Brad will be available
  • fall conference calls will be Thursdays at 9am
    • would work for Bob and Brian E

Availability

  • See the *Google spreadsheet* (and please add your availability for future weeks once it's known):
  • Bob's fall teaching schedule is MWF 11am-12pm ET

Loading Google Spreadsheet...

Decisions made

  • "The intent is to be able to readily add new data sources (or reload individual sources)." (Martha)

datasource validations (spot-checking)

  • a lower priority than making it easy to add/reload data (Martha)
  • the issue is data vs. tools: is it more important to have a confederated database or the tools to create one? Brad: tools
    • maybe a case of giving someone a fish vs. teaching them how to fish? i.e. better to provide a workflow that allows nonprogrammers to fix the validation issues

schema

  • must freeze architecture before can freeze schema

To do for Martha

  • ask Naim how to make TNRS version info available
    • it's currently only available as part of a separate text file, rather than part of the downloaded CSV (2013-06-13 conference call > include TNRS version and settings in TNRS cache)

To do for Aaron

timeline

  • flag timeline issues that can be done by iPlant personnel these are the following:
    • Attribution and conditions of use
    • Geoscrubbing re-run
    • Geoscrubbing automated pipeline
    • Improve and complete data provider metadata
    • Obtain any additional new data

add derived data version info

  • make schema changes to accommodate this info: TNRS, geoscrubbing
  • add timestamp, version, URI: TNRS, geoscrubbing (source code, GADM shapefiles, geonames.org data)

individual datasource removal

  • determine runtime of individual datasource removal ACAD (medium): 30 s, 0.61 ms/row; MO (large): 55 min, 0.85 ms/row
    • what was it for CTFS? see ACAD runtime above instead
  • add fkey covering indexes where needed: see Adding covering indexes on foreign keys

Individual datasource refresh

source-level tracking of import and revision

  • see fields under "record-level tracking of import and revision" below
  • the schema revision # and import date (in the schema comment, search for "Version:") wouldn't be enough because datasources will be individually reloaded (Brad)

record-level tracking of import and revision

from Brad:

For full forward-compatibility, I would suggest the following four fields for every table:

datecreated
createdby
datelastmodified
lastmodifiedby

  • "In terms of importance, source-level tracking of import and revision is essential. Record-level tracking is desirable but less important. Make the latter change only after all other schema changes have been completed, and only if it won't interfere with meeting the major BIEN deadlines." (Brad)