Project

General

Profile

Adding new-style import to a datasource

1. Translate filters to postprocessing derived columns

  1. add postprocess.sql containing the following:
    SELECT util.search_path_append('util');
    
  2. in postprocess.sql, add
    SELECT mk_derived_col((:table_str, 'derived_col'),
    $$formula (using the *original* column names)$$)
    ; -- runtime: # s ("# ms") @hostname
    
  3. if any columns used in the filter have the same name as a VegCore output column, also add the following:
    SELECT util.rename_cols(:table_str, $$
    input_col => *input_col,
    $$::hstore);
    
  4. in map.csv:
    • for a single-column filter: append _verbatim to the output name of the original column and remove the filter
    • for a multi-column filter: append a distinguishing suffix to the output name of each original column and remove the filter
  5. at the very end of map.csv (after any rows starting with : ), add a mapping for the new derived column (needed before switching to new-style import so that the derived column is included in VegBIEN.csv):
    derived_column,derived_column,,
    
  6. run:
    # if you get errors about lowercase schema names, this is a Mac filesystem bug and is fixed by renaming the datasource folder to something else and then back to the capitalized name
    make inputs/$dest/$subdir/postprocess
    make inputs/$dest/$subdir/test
    # check that the inserted row count does not change. if it changes, you are missing a map.csv entry for the derived column (above).
    
  7. IMPORTANT: run:
    make inputs/$dest/add
    
  8. commit:
    svn di
    svn ci -m 'inputs/$dest/$subdir/: translated filters to postprocessing derived columns, using the steps at http://vegbiendev.nceas.ucsb.edu/wiki/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns'
    

2. Switch to runscripts that rename staging table columns

  1. IMPORTANT: remove any derived columns in each map.csv (they will not cause an error locally because the derived columns are already in the staging table, but will break on vegbiendev)
  2. locally:
    (src=.NCBI dest=___
    "cp" -f inputs/$src/run inputs/$dest/
    for subdir in $(cat inputs/$dest/import_order.txt); do
        "cp" -f inputs/$src/nodes/run inputs/$dest/$subdir/
        vrm inputs/$dest/$subdir/VegBIEN.csv
        to inputs/$dest/$subdir/map.csv # needed for postprocess.sql to be remade
        make inputs/$dest/$subdir/postprocess.sql 
    done
    rv inputs/$dest/Source/VegBIEN.csv
    )
    
    • ignore benign errors of the form .../postprocess.sql: No such file or directory, which do not affect the exit status
  3. move any util.rename_cols() renamings in each postprocess.sql to map.csv, if they are not there already
  4. check that each postprocess.sql's columns have been properly mapped to the output column names
    1. svn di inputs/$dest/*/postprocess.sql
    2. remove any util.rename_cols() renames of input columns that are also VegCore output columns
    3. undo renames of output columns (on the right in map.csv) that are also input columns (on the left in map.csv) (cross-namespace collisions)
    4. add renames of input columns that are also output columns (cross-namespace collisions)
    5. enclose camelCase terms in quotes if the term they replaced did not have them
  5. (dest=___; inputs/$dest/run postprocess) # runtime: ~1 min
    • make sure that the renames it's performing are correct, and that input and output column names match up. the renames have the form:
      NOTICE:  ALTER TABLE table RENAME "column" TO "*column" 
      
    • fix bugs and repeat until no errors
    • if you get an error ERROR: arrays must have same bounds, and map.csv contains metadata values (lines starting with :):
      1. drop all derived columns that have an input column of the same name (the input column will now have a * prefix)
      2. run:
        inputs/$dest/$subdir/run , reset_col_names trim_table # reset_col_names causes trim_table to preserve input-namespace instead of output-namespace columns
        
    • ignore benign errors that do not affect the exit status
  6. (dest=___; make inputs/$dest/add)
  7. commit:
    svn di inputs/$dest/*/test.xml.ref
    # make sure the inserted row counts (in test.xml.ref) have stayed the same
    svn di
    svn ci -m 'inputs/$dest/: switched to new-style import, using the steps at http://vegbiendev.nceas.ucsb.edu/wiki/Adding_new-style_import_to_a_datasource'
    
  8. on vegbiendev:
    svn up
    inputs/$dest/run postprocess # runtime: 4+ min (derived columns are also added in this step)