Project

General

Profile

Refreshing ACAD--a Darwin Core datasource

what is needed from the user

  • updated extract (so we can go back to the raw data if needed)
  • extracted flat file(s) that should be imported

steps

underlined: user input needed (other steps can be automated)

this example refreshes ACAD

from README.TXT > Single datasource refresh

  1. connect to vegbiendev:
    ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
    
  2. obtain updated extract
  3. place extract in inputs/ACAD/_src/
  4. place extracted flat file(s) in the Specimen/ subdir
    • these should be CSV/TSV
    • if the header is repeated in each segment, it should be exactly the same
  5. reload staging tables:
    rm=1 inputs/ACAD/run
    
  6. run column-based import:
    make inputs/ACAD/reimport_scrub by_col=1 & # runtime: 15 min ("14:45.73")
    tail -150 inputs/ACAD/*/logs/public.log.sql # view progress
    
  7. see README.TXT > Single datasource refresh > steps after reimport_scrub