- Table of contents
- Adding new-style import to a datasource
Adding new-style import to a datasource¶
1. Translate filters to postprocessing derived columns¶
- add
postprocess.sql
containing the following:SELECT util.search_path_append('util');
- in
postprocess.sql
, addSELECT mk_derived_col((:table_str, 'derived_col'), $$formula (using the *original* column names)$$) ; -- runtime: # s ("# ms") @hostname
- if any columns used in the filter have the same name as a VegCore output column, also add the following:
SELECT util.rename_cols(:table_str, $$ input_col => *input_col, $$::hstore);
- in
map.csv
:- for a single-column filter: append
_verbatim
to the output name of the original column and remove the filter - for a multi-column filter: append a distinguishing suffix to the output name of each original column and remove the filter
- for a single-column filter: append
- at the very end of
map.csv
(after any rows starting with:
), add a mapping for the new derived column (needed before switching to new-style import so that the derived column is included inVegBIEN.csv
):derived_column,derived_column,,
- run:
# if you get errors about lowercase schema names, this is a Mac filesystem bug and is fixed by renaming the datasource folder to something else and then back to the capitalized name make inputs/$dest/$subdir/postprocess make inputs/$dest/$subdir/test # check that the inserted row count does not change. if it changes, you are missing a map.csv entry for the derived column (above).
- IMPORTANT: run:
make inputs/$dest/add
- commit:
svn di svn ci -m 'inputs/$dest/$subdir/: translated filters to postprocessing derived columns, using the steps at http://vegbiendev.nceas.ucsb.edu/wiki/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns'
2. Switch to runscripts that rename staging table columns¶
- IMPORTANT: remove any derived columns in each
map.csv
(they will not cause an error locally because the derived columns are already in the staging table, but will break on vegbiendev) - locally:
(src=.NCBI dest=___ "cp" -f inputs/$src/run inputs/$dest/ for subdir in $(cat inputs/$dest/import_order.txt); do "cp" -f inputs/$src/nodes/run inputs/$dest/$subdir/ vrm inputs/$dest/$subdir/VegBIEN.csv to inputs/$dest/$subdir/map.csv # needed for postprocess.sql to be remade make inputs/$dest/$subdir/postprocess.sql done rv inputs/$dest/Source/VegBIEN.csv )
- ignore benign errors of the form
.../postprocess.sql: No such file or directory
, which do not affect the exit status
- ignore benign errors of the form
- move any
util.rename_cols()
renamings in eachpostprocess.sql
tomap.csv
, if they are not there already - check that each
postprocess.sql
's columns have been properly mapped to the output column namessvn di inputs/$dest/*/postprocess.sql
- remove any
util.rename_cols()
renames of input columns that are also VegCore output columns - undo renames of output columns (on the right in
map.csv
) that are also input columns (on the left inmap.csv
) (cross-namespace collisions) - add renames of input columns that are also output columns (cross-namespace collisions)
- enclose camelCase terms in quotes if the term they replaced did not have them
(dest=___; inputs/$dest/run postprocess) # runtime: ~1 min
- make sure that the renames it's performing are correct, and that input and output column names match up. the renames have the form:
NOTICE: ALTER TABLE table RENAME "column" TO "*column"
- fix bugs and repeat until no errors
- if you get an error
ERROR: arrays must have same bounds
, andmap.csv
contains metadata values (lines starting with:
):- drop all derived columns that have an input column of the same name (the input column will now have a
*
prefix) - run:
inputs/$dest/$subdir/run , reset_col_names trim_table # reset_col_names causes trim_table to preserve input-namespace instead of output-namespace columns
- drop all derived columns that have an input column of the same name (the input column will now have a
- ignore benign errors that do not affect the exit status
- make sure that the renames it's performing are correct, and that input and output column names match up. the renames have the form:
(dest=___; make inputs/$dest/add)
- commit:
svn di inputs/$dest/*/test.xml.ref # make sure the inserted row counts (in test.xml.ref) have stayed the same svn di svn ci -m 'inputs/$dest/: switched to new-style import, using the steps at http://vegbiendev.nceas.ucsb.edu/wiki/Adding_new-style_import_to_a_datasource'
- on vegbiendev:
svn up inputs/$dest/run postprocess # runtime: 4+ min (derived columns are also added in this step)