Project

General

Profile

Mapping a new table in VegBank--a SQL plots datasource

what is needed from the user

  1. datasource folder name
  2. table name
  3. mappings to VegCore
  4. code to create source-specific derived columns
  5. position to add table in import_order.txt
  6. position to add table in left-join
  7. position to add columns in validation subset

steps

underlined: user input needed (other steps can be automated)

1. set vars

  1. start a subshell so the vars only affect these commands:
    $0
    
  2. set vars:
    datasrc=VegBank table=project
    

2. map the table

  1. add subdir:
    make inputs/$datasrc/$table/add
    "cp" -f inputs/.NCBI/nodes/run inputs/$datasrc/$table/ # add new-style import files
    
  2. install the staging table and create the map spreadsheet:
    inputs/$datasrc/$table/run
    
  3. prevent column collisions upon left-join4:
    bin/in_place inputs/$datasrc/$table/map.csv sed -r "s/,\*/,*$table--/g" 
    
  4. map the columns in inputs/$datasrc/$table/map.csv to terms in the VegCore data dictionary
  5. rename staging table columns to VegCore:
    inputs/$datasrc/$table/run
    
  6. if any datasource-specific postprocessing is needed:
    1. add postprocess.sql:
      "cp" -f inputs/VegBank/plot_/postprocess.sql inputs/$datasrc/$table/
      
    2. in postprocess.sql, modify the existing code to fit the datasource. remove any template code that doesn't apply.
    3. run the postprocessing:
      inputs/$datasrc/$table/run
      
  7. in inputs/$datasrc/import_order.txt, insert $table in dependency order
  8. commit:
    make inputs/$datasrc/add # place files under version control
    svn di # make sure the test output is correct
    svn st # make sure all non-data files are under version control
    svn ci -m "mapped inputs/$datasrc/$table/" 
    

3. add the table to the left-join

  1. in inputs/$datasrc/taxon_observation.**/postprocess.sql, insert the following at the correct position:
    NATURAL LEFT JOIN $table
    
  2. update the left-join view:
    inputs/$datasrc/taxon_observation.\*\*/run
    
  3. commit:
    svn di # make sure the test output is correct
    svn ci -m "inputs/$datasrc/taxon_observation.**/postprocess.sql: added the $table table" 
    

4. add new columns to the validation subset

  1. in inputs/$datasrc/$table/map.csv, note all output columns that don't start with *
  2. in inputs/$datasrc/^taxon_observation.**.sample/create.sql and map.csv, insert these columns at the appropriate position
  3. update the materialized view:
    rm=1 inputs/$datasrc/\^taxon_observation.\*\*.sample/run
    
  4. commit:
    svn di # make sure the test output is correct
    svn ci -m "inputs/$datasrc/^taxon_observation.**.sample/create.sql, map.csv: added new $table columns" 
    

5. sync vegbiendev

  1. switch to vegbiendev:
    ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
    
  2. set vars as above
  3. apply staging table changes:
    svn up
    inputs/$datasrc/$table/run
    inputs/$datasrc/taxon_observation.\*\*/run
    rm=1 inputs/$datasrc/\^taxon_observation.\*\*.sample/run