2012-04-09 conference call¶
To do¶
reload all datasourcesexcept those excluded by Bradinvestigate host vs. VM performance first?: VM is actually 9% faster
import CTFS VegXraw data insteadparallelize Python import scriptsusing column-based import instead- split time between this and importing CTFS
- will allow SpeciesLink to be imported much faster by using all cores at once
- automate validation of new data sources
- replace column names for each datasource in a SQL validation script that uses DwC2 names
serialization of VegX- CSV?
JSON?
- CSV?
- find John Donoghue's geo-validation scripts
New data sources¶
- quality over quantity: do existing sources well first
- no new plots data until we have all BIEN2 data in VegBIEN
- TurboVeg
- U.S. National Parks data from Brad
- Argentina/-Chile-
GBIF reimport- talk to Dave Remsen
- BCI DiGIR servers
Geo-validation¶
- sources of geographical names
- spelling errors in names
- names in different languages for the same place
- shapefiles (also correct their spelling errors)
- Biogeomancer
- TNRS: PHP/MySQL from Brad on nimoy
- Geoscrub: Python/PostgreSQL/R from John on eos, and PHP/MySQL from Brad on nimoy
Martha's notes¶