Project

General

Profile

2012-06-01 conference call

Update

  • Column-based import is now documented on the wiki under Column-based import with the SQL steps for sample columns.
    • The examples use the QMOR dataset, which is on vegbiendev in vegbien's QMOR.specimens table (access instructions on the wiki under PhpPgAdmin).
  • I simplified the import process by translating "relational functions" into plain SQL functions. This sped up the QMOR import by 5x, so that it is now 122x faster than row-based import. New benchmarks are on the wiki under Column-based import.
  • I made the SQL code more self-documenting, by taking advantage of PostgreSQL's support of special characters in names.
  • To import non-CSV inputs to staging tables, I will need to add equivalent scripts for DB and XML inputs (SALVIAS and CTFS). Note that SALVIAS's CSV format is already supported, for individual SALVIAS downloads.

Agenda

  • Update on column-based import
  • Update milestones timeline

To Do

  1. Discuss problems with large VegX files
  2. Finish translating XML functions to SQL functions for column-based import
  3. Reimplement row-based logging mechanism for column-based import
  4. Reload DB using column-based import
  5. Load all plots data
  6. Generic data provider feedback mechanism
  7. Translate existing validation utilities to Python/Postgres presumably this refers to the BIEN2 standardizations?

Notes

  • Milestones development timeline has been updated to reflect the month we spent on optimization and column-based import
  • Brad will be gone until late July
    • he will be available via e-mail (nightly?), except for one week of "radio blackout"