Project

General

Profile

Refreshing CVS--an MS Access plots datasource

what is needed from the user

  • updated extract (so we can go back to the raw data if needed)

steps

underlined: user input needed (other steps can be automated)

from README.TXT > Single datasource refresh

  1. connect to vegbiendev:
    ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
    
  2. obtain updated extract
    • from Mike Lee
  3. place extract in inputs/CVS/_src/
  4. unzip the extract
  5. IMPORTANT: move previous versions of the extract out of the way:
    mv inputs/CVS/*.{data,schema}.sql inputs/CVS/_archive/
    
    otherwise, you will get strange errors when it tries to load new data on top of an old schema!
  6. see README.TXT > Datasource setup > For MS Access databases
    • for the .ini files, use inputs/CVS/_src/{data,schema}.sql.ini
  7. save the existing staging tables to revert to in case of errors:
    bin/psql_verbose_vegbien <<EOF
    ALTER SCHEMA "CVS" RENAME TO "CVS_prev";
    EOF
    
  8. reload staging tables:
    rm=1 inputs/CVS/run
    
  9. remove the previous staging tables:
    bin/psql_verbose_vegbien <<EOF
    DROP SCHEMA "CVS_prev" CASCADE;
    EOF
    
  10. run column-based import:
    make inputs/CVS/reimport_scrub by_col=1 &
    tail -n 150 inputs/CVS/*/logs/r[#].log.sql # view progress
    
  11. see README.TXT > Single datasource refresh > steps after reimport_scrub

runtimes

  • inputs/CVS/data.sql creation by MSAccess to PostgreSQL: 30 min ("06:28:58".."07:00:04")
  • rm=1 inputs/CVS/run: @vegbiendev: 40 min ("38m43.663s"); @frenzy: 15 min ("16m58.735s")
  • make inputs/CVS/reimport_scrub by_col=1 &: 1 day ("1.1 days")