Project

General

Profile

2012-04-09 conference call

To do

  1. reload all datasources except those excluded by Brad
    • investigate host vs. VM performance first?: VM is actually 9% faster
  2. import CTFS VegX raw data instead
  3. parallelize Python import scripts using column-based import instead
    • split time between this and importing CTFS
    • will allow SpeciesLink to be imported much faster by using all cores at once
  4. automate validation of new data sources
    • replace column names for each datasource in a SQL validation script that uses DwC2 names
  5. serialization of VegX
    • CSV? JSON?
  6. find John Donoghue's geo-validation scripts

New data sources

  • quality over quantity: do existing sources well first
  • no new plots data until we have all BIEN2 data in VegBIEN
  1. TurboVeg
  2. U.S. National Parks data from Brad
  3. Argentina/-Chile-
  4. GBIF reimport
    • talk to Dave Remsen
  5. BCI DiGIR servers

Geo-validation

  • sources of geographical names
  • spelling errors in names
  • names in different languages for the same place
  • shapefiles (also correct their spelling errors)
  • Biogeomancer
  • TNRS: PHP/MySQL from Brad on nimoy
  • Geoscrub: Python/PostgreSQL/R from John on eos, and PHP/MySQL from Brad on nimoy

Martha's notes