Project

General

Profile

Statistics
| Revision:

# Date Author Comment
14295 07/22/2014 12:18 AM Aaron Marcuse-Kubitza

added derived/TNRS/Modifications to procedure to scrubbing names using TNRS.docx from Brad

13854 06/25/2014 07:30 PM Aaron Marcuse-Kubitza

added derived/TNRS/web_app/protocol/

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11783 11/26/2013 10:03 PM Aaron Marcuse-Kubitza

derived/biengeo/README.txt: updated geoscrub.sh runtime

11591 11/06/2013 04:39 PM Aaron Marcuse-Kubitza

derived/biengeo/load-geoscrub-input.sh: allow the caller to override $DATAFILE in the environment, to use a file named other than "geoscrub-corpus.csv"

11587 11/06/2013 12:34 PM Aaron Marcuse-Kubitza

derived/biengeo/load-geoscrub-input.sh: updated $DATA_URL for new input filename

11586 11/06/2013 12:27 PM Aaron Marcuse-Kubitza

/run geoscrub_input/make(): include a header on the CSV file, so that the column names don't risk getting spliced from the data (and to shorten the CSV filename, which had to contain the column names instead). this requires changing the geoscrubbing scripts to accept a CSV header.

11563 11/05/2013 11:49 AM Paul Sarando

Added an output CSV file option to geoscrub.sh.

11562 11/04/2013 03:25 PM Paul Sarando

Added notes on running biengeo scripts to README.

11561 10/31/2013 05:35 PM Paul Sarando

Added biengeo script options for data directories.

Added GADM and geonames.org data dir options to
update_validation_data.sh scripts.
Added geoscrub input data dir option to geoscrub.sh scripts.

11560 10/31/2013 05:35 PM Paul Sarando

Added update options to biengeo update_validation_data.sh

Added options to update only GADM data, only Geonames.org data, or
neither. In every case, the geonames-to-gadm scripts are always run.

11559 10/31/2013 05:35 PM Paul Sarando

Added cmd-line options to biengeo bash scripts.

All biengeo bash scripts now accept command line options to specify psql
user, host, and database values.
These options are the same as those defined by the psql command.
If an invalid option is given to a script, a usage message is printed...

11558 10/31/2013 05:35 PM Paul Sarando

Fix biengeo script password prompt for postgres user.

Changed the DB_HOST variables in the biengeo bash scripts to a
DB_HOST_OPT variable that is blank by default.
Updated all psql calls that used "-h $DB_HOST" to use just $DB_HOST_OPT
instead.
This means that to specify a different db host, the DB_HOST_OPT...

11557 10/31/2013 05:35 PM Paul Sarando

Fixed TRUNCATE statement in truncate.geonames.sql.

Fixed the biengeo truncate.geonames.sql script to include all tables in
one TRUNCATE statement that have foreign-key references to geonames and
country tables.

11556 10/31/2013 05:35 PM Paul Sarando

Added more approx. runtimes to biengeo README.

11555 10/31/2013 05:35 PM Paul Sarando

Renamed biengeo install scripts to setup scripts.

It seems to make more sense to call these setup scripts, since they are
only setting up the database and tables, and not actually installing any
files anywhere on the OS.

11497 10/30/2013 06:24 PM Paul Sarando

Updated biengeo README with new script workflow.

11496 10/30/2013 06:24 PM Paul Sarando

Split geovalidate.sh into install and update scripts.

Split geovalidate.sh into install.sh and update_gadm_data.sh scripts.
The install.sh script creates the databse and uses the install sql
scripts to create all required tables.
The update_gadm_data.sh script downloads the GADM data and creates the...

11495 10/30/2013 06:24 PM Paul Sarando

Refactored geonames.sh to update_geonames_data.sh

Renamed geonames.sh to update_geonames_data.sh and moved many of the SQL
statements from the bash script into supporting update and truncate sql
scripts.
These sql and update_geonames_data.sh scripts now assume all required...

11494 10/30/2013 06:24 PM Paul Sarando

Split up geonames-to-gadm.sql into 3 scripts.

Each script only operates on one table within a transaction.
These scripts now assume the tables have already been created (by
install scripts added in a previous commit), and each starts out by
truncating the table it will update with new data.

11493 10/30/2013 06:24 PM Paul Sarando

Added geoscrub.sh script.

This script runs the load-geoscrub-input.sh, geonames.sql, and
geovalidate.sql scripts in order to load and scrub vegbien input data.
Updated README to explain the new script.
Minor updates to load-geoscrub-input.sh.

11479 10/30/2013 01:53 PM Paul Sarando

Updated load-geoscrub script with configurable db.

load-geoscrub-input.sh now uses a variable with the db name defined at
the top of the script.
Updated the default db host to 'localhost' for this script.

11467 10/29/2013 06:29 PM Aaron Marcuse-Kubitza

derived/biengeo/README.txt: geoscrub new data: geovalidate.sql: added runtime from Paul

11450 10/25/2013 06:15 PM Paul Sarando

Added db user and host to load-geoscrub-input.sh

The psql commands in load-geoscrub-input.sh will now connect with a
specific user on a specific host.
Updated the 'COPY' sql statement to a '\COPY' statement, so that the
psql user does not have to be a PostgreSQL superuser.

11449 10/25/2013 04:51 PM Aaron Marcuse-Kubitza

derived/biengeo/README.txt: geoscrub new data: steps that use .sql scripts: added the psql commands to run these

11448 10/25/2013 04:22 PM Paul Sarando

Updated install instructions in the README.

11447 10/25/2013 03:00 PM Aaron Marcuse-Kubitza

derived/biengeo/README.txt: geoscrub new data: noted that this now deletes any previous geoscrubbing results

11446 10/25/2013 02:58 PM Aaron Marcuse-Kubitza

derived/biengeo/README.txt: added steps to set the working dir for each set of steps

11445 10/25/2013 02:54 PM Aaron Marcuse-Kubitza

derived/biengeo/README.txt: added section on obtaining source code, including path to Paul's in-progress files on vegbiendev (not sure whether the in-progress files are needed to run the core scripts in steps 1-6)

11444 10/25/2013 02:44 PM Aaron Marcuse-Kubitza

derived/biengeo/README.txt: moved commands to run to the top of the README. flagged commands-sections with *** and an identifying label.

11443 10/25/2013 02:04 PM Paul Sarando

Initial checkin of geoscrub install SQL files.

Added install.*.sql files that will do initial table creation for all
required tables.
Added a truncate.vegbien_geoscrub.sql script that will clear tables related to
data downloaded in load-geoscrub-input.sh....

11442 10/25/2013 02:04 PM Paul Sarando

Update load-geoscrub-input.sh to download from URL.

Removed logic to dump input data directly from the vegbien database and
to download the input from a URL provided by AMK instead.
Also updated this script to download the file into an input data
directory, rather than just into the current working directory.

11347 10/18/2013 05:23 PM Paul Sarando

Added instructions for dependencies in the README.

11346 10/18/2013 05:23 PM Paul Sarando

Added indexes to speed up geonames-to-gadm.sql.

Without these indexes, these queries could take hours to complete.
With them, the times more closely matched the times Jim noted in the sql
comments.

11345 10/18/2013 05:23 PM Paul Sarando

Fixed a couple of syntax errors in geovalidate.sh.

Fixed a sql syntax error and a bash syntax error in the next line.

10897 09/11/2013 02:52 PM Aaron Marcuse-Kubitza

added derived/biengeo/Geovalidation_and_geoscrubbing_update.presentation.url

10707 08/22/2013 02:54 PM Aaron Marcuse-Kubitza

added derived/biengeo/ from https://projects.nceas.ucsb.edu/nceas/projects/biengeo/repository/

10706 08/22/2013 02:50 PM Aaron Marcuse-Kubitza

added /derived