Project

General

Profile

Data loading scripts

Locations

  • nimoy
    • /home/bien_shared/bien2_scripts/
    • /home/dolins/Scripts/
  • https://code.nceas.ucsb.edu/code/projects/bien/ (UI)
    • bienDjango, makeVegxModel, tapir2flatClient

Databanks

  • nimoy:/home/bien_shared/bien2_scripts/data_loading_scripts/specimens/ (PHP)
    • ariz, ncu-ncsc, ny, remib, specieslink, uncc, utrecht
    • raw data: nimoy:/home/bien_shared/raw_data/
      • index: nimoy:/home/bien_shared/raw_data/bien_data_sources.xlsx
  • nimoy:/home/dolins/Scripts/ (PHP)
    • GBIF, MOBOT, FIA

Data formats

  • VegX: https://code.nceas.ucsb.edu/code/projects/bien/makeVegxModel/trunk/
    • contact: Matt Wheeler
    • requires GeoDjango
    • usage: php makeModel.php |python

Taxonomy

  • nimoy:/home/bien_shared/bien2_scripts/geoscrub/ (PHP)

Geovalidation

  • nimoy:/home/bien_shared/bien2_scripts/geoscrub/ (PHP)

Traits

  • nimoy:/home/bien_shared/bien2_scripts/data_loading_scripts/traits/load_traits/ (PHP)
    • import.php

Loading process

e-mail from Brad Boyle on 2011-10-18:

"Data was loaded in two steps: (1) raw data loaded to staging tables (see tables in database bien_staging) and then (2) loaded in a separate step from staging tables to the core db tables in bien2.

Steven Dolins and his students did the step 1 loading for FIA and CTFS (plots) and MO+GBIF (specimens); I did the rest (see the "loaded by" column in that spreadsheet I sent you). Step 2 was done entirely by Steve and his students.

The specimen data is uniform largely because it was provided to us as DwC files (the beauty of exchange schemas). Also because specimen data is inherently more uniform. You can find my Step 1 loading scripts in bien_shared/bien2_scripts/data_loading_scripts/. I believe some or all of Steve's loading scripts are in his directory (home/dolins/). Steve, do you mind if Aaron looks over your loading scripts?

The plot data loaded by me (to staging) is more complicated. Some of it was provided as Excel spreadsheets, some as plain text files, some as Access databases (salvias, vegbank, Carolina Vegetation Survey). Sorting out what the fields meant involved a lot of correspondence with data providers. I'm pretty sure I loaded most of the plot data first to MS Access, then transferred it to staging tables using ODBC. Scripting would have been better, but most of the loading was done in haste just prior to the 2009 BIEN meeting.

For some of the plots data (ci-team, madidi) the data in the raw_data directory is original, as provided. For other data (salvias, vegbank) I've provided the partially processed data already loaded to the staging tables, as the data as delivered was originally in MS Access databases. If you are interested in looking at the data in that format and have a copy of MS Access (2003 version would do) let me know and I will provide the Access databases as well.

I believe the plot data loaded by Steve, and the loading scripts, are in his directory."