added derived/TNRS/Modifications to procedure to scrubbing names using TNRS.docx from Brad
added derived/TNRS/web_app/protocol/
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
derived/biengeo/README.txt: updated geoscrub.sh runtime
derived/biengeo/load-geoscrub-input.sh: allow the caller to override $DATAFILE in the environment, to use a file named other than "geoscrub-corpus.csv"
derived/biengeo/load-geoscrub-input.sh: updated $DATA_URL for new input filename
/run geoscrub_input/make(): include a header on the CSV file, so that the column names don't risk getting spliced from the data (and to shorten the CSV filename, which had to contain the column names instead). this requires changing the geoscrubbing scripts to accept a CSV header.
Added an output CSV file option to geoscrub.sh.
Added notes on running biengeo scripts to README.
Added biengeo script options for data directories.
Added GADM and geonames.org data dir options toupdate_validation_data.sh scripts.Added geoscrub input data dir option to geoscrub.sh scripts.
Added update options to biengeo update_validation_data.sh
Added options to update only GADM data, only Geonames.org data, orneither. In every case, the geonames-to-gadm scripts are always run.
Added cmd-line options to biengeo bash scripts.
All biengeo bash scripts now accept command line options to specify psqluser, host, and database values.These options are the same as those defined by the psql command.If an invalid option is given to a script, a usage message is printed...
Fix biengeo script password prompt for postgres user.
Changed the DB_HOST variables in the biengeo bash scripts to aDB_HOST_OPT variable that is blank by default.Updated all psql calls that used "-h $DB_HOST" to use just $DB_HOST_OPTinstead.This means that to specify a different db host, the DB_HOST_OPT...
Fixed TRUNCATE statement in truncate.geonames.sql.
Fixed the biengeo truncate.geonames.sql script to include all tables inone TRUNCATE statement that have foreign-key references to geonames andcountry tables.
Added more approx. runtimes to biengeo README.
Renamed biengeo install scripts to setup scripts.
It seems to make more sense to call these setup scripts, since they areonly setting up the database and tables, and not actually installing anyfiles anywhere on the OS.
Updated biengeo README with new script workflow.
Split geovalidate.sh into install and update scripts.
Split geovalidate.sh into install.sh and update_gadm_data.sh scripts.The install.sh script creates the databse and uses the install sqlscripts to create all required tables.The update_gadm_data.sh script downloads the GADM data and creates the...
Refactored geonames.sh to update_geonames_data.sh
Renamed geonames.sh to update_geonames_data.sh and moved many of the SQLstatements from the bash script into supporting update and truncate sqlscripts.These sql and update_geonames_data.sh scripts now assume all required...
Split up geonames-to-gadm.sql into 3 scripts.
Each script only operates on one table within a transaction.These scripts now assume the tables have already been created (byinstall scripts added in a previous commit), and each starts out bytruncating the table it will update with new data.
Added geoscrub.sh script.
This script runs the load-geoscrub-input.sh, geonames.sql, andgeovalidate.sql scripts in order to load and scrub vegbien input data.Updated README to explain the new script.Minor updates to load-geoscrub-input.sh.
Updated load-geoscrub script with configurable db.
load-geoscrub-input.sh now uses a variable with the db name defined atthe top of the script.Updated the default db host to 'localhost' for this script.
derived/biengeo/README.txt: geoscrub new data: geovalidate.sql: added runtime from Paul
Added db user and host to load-geoscrub-input.sh
The psql commands in load-geoscrub-input.sh will now connect with aspecific user on a specific host.Updated the 'COPY' sql statement to a '\COPY' statement, so that thepsql user does not have to be a PostgreSQL superuser.
derived/biengeo/README.txt: geoscrub new data: steps that use .sql scripts: added the psql commands to run these
Updated install instructions in the README.
derived/biengeo/README.txt: geoscrub new data: noted that this now deletes any previous geoscrubbing results
derived/biengeo/README.txt: added steps to set the working dir for each set of steps
derived/biengeo/README.txt: added section on obtaining source code, including path to Paul's in-progress files on vegbiendev (not sure whether the in-progress files are needed to run the core scripts in steps 1-6)
derived/biengeo/README.txt: moved commands to run to the top of the README. flagged commands-sections with *** and an identifying label.
Initial checkin of geoscrub install SQL files.
Added install.*.sql files that will do initial table creation for allrequired tables.Added a truncate.vegbien_geoscrub.sql script that will clear tables related todata downloaded in load-geoscrub-input.sh....
Update load-geoscrub-input.sh to download from URL.
Removed logic to dump input data directly from the vegbien database andto download the input from a URL provided by AMK instead.Also updated this script to download the file into an input datadirectory, rather than just into the current working directory.
Added instructions for dependencies in the README.
Added indexes to speed up geonames-to-gadm.sql.
Without these indexes, these queries could take hours to complete.With them, the times more closely matched the times Jim noted in the sqlcomments.
Fixed a couple of syntax errors in geovalidate.sh.
Fixed a sql syntax error and a bash syntax error in the next line.
added derived/biengeo/Geovalidation_and_geoscrubbing_update.presentation.url
added derived/biengeo/ from https://projects.nceas.ucsb.edu/nceas/projects/biengeo/repository/
added /derived