Project

General

Profile

1 10707 aaronmk
BIEN geovalidation notes
2
========================
3
4 11445 aaronmk
***** obtain source code:
5
svn co https://code.nceas.ucsb.edu/code/projects/bien/derived/biengeo/
6
additional, in-progress files are at
7
sftp://vegbiendev.nceas.ucsb.edu/home/psarando/src/bien/derived/biengeo/
8
9 11444 aaronmk
***** install dependencies:
10 11347 psarando
The only dependencies for running these scripts are PostgreSQL 9.1, postgis 2.0,
11
and unzip.
12
Installing these packages on Ubuntu 13.04 should be as simple as these commands:
13
sudo apt-get install postgresql
14
sudo apt-get install postgresql-client
15
sudo apt-get install postgresql-client-common
16
sudo apt-get install postgis
17
sudo apt-get install postgresql-9.1-postgis-2.0
18
sudo apt-get install unzip
19
20 11444 aaronmk
***** initialize the DB:
21 11446 aaronmk
cd <svn_biengeo_root>
22 10707 aaronmk
1. geovalidate.sh
23
   - creates postgis DB and loads GADM2 data
24
2. geonames.sh
25
   - loads geonames.org data and adds some custom mapping logic
26
3. geonames-to-gadm.sql
27
   - contains SQL statements that build linkages between geonames.org
28
     names and GADM2 names
29 11444 aaronmk
30
***** geoscrub new data:
31 11447 aaronmk
WARNING: deletes any previous geoscrubbing results!
32 11446 aaronmk
cd <svn_biengeo_root>
33 10707 aaronmk
4. load-geoscrub-input.sh
34
   - dumps geoscrub_input from vegbien and loads it into the geoscrub db
35
5. geonames.sql
36
   - contains SQL statements that scrub asserted names and (to the
37
     extent possible) map them to GADM2
38
6. geovalidate.sql
39
   - contains (postgis-extended) SQL statements that score the validity
40
     of GADM2-scrubbed names against given point coordinates
41
42 11444 aaronmk
[Also see comments embedded in specific scripts in this directory.]
43
44
The bash and SQL statements contained in the files as ordered below
45
should be applied to carry out geographic name scrubbing and
46
geovalidation on a given corpus of BIEN location records.
47
48
That said, given the tight deadline under which this was done in order
49
to produced a geovalidated BIEN3 corpus in advance of the Nov 2013
50
working group meeting, and the corresponding manner in which much of
51
this was actually executed piecemeal in an iterative and interactive
52
fashion within a bash shell and psql session, I can't guarantee that the
53
code in its current state could be run end-to-end without intervention.
54
It's close, but probably not bulletproof.
55
56 10707 aaronmk
The resulting 'geoscrub' table is what contains the scrubbed (i.e.,
57
GADM2-matched) names and various geovalidation scores.
58
59
Notes/Caveats/Todos:
60
* Clearly the SQL statements used in this procedure suffer from a lot of
61
  redundancy, and it might be worth trying to refactor once we're happy
62
  with the particular approach taken.
63
* Need to pull out more known notes/caveats/todos and highlight them :)