Project

General

Profile

1
BIEN geovalidation notes
2
========================
3

    
4
[Also see comments embedded in specific scripts in this directory.]
5

    
6
The bash and SQL statements contained in the files as ordered below
7
should be applied to carry out geographic name scrubbing and
8
geovalidation on a given corpus of BIEN location records.
9

    
10
That said, given the tight deadline under which this was done in order
11
to produced a geovalidated BIEN3 corpus in advance of the Nov 2013
12
working group meeting, and the corresponding manner in which much of
13
this was actually executed piecemeal in an iterative and interactive
14
fashion within a bash shell and psql session, I can't guarantee that the
15
code in its current state could be run end-to-end without intervention.
16
It's close, but probably not bulletproof.
17

    
18
1. geovalidate.sh
19
   - creates postgis DB and loads GADM2 data
20
2. geonames.sh
21
   - loads geonames.org data and adds some custom mapping logic
22
3. geonames-to-gadm.sql
23
   - contains SQL statements that build linkages between geonames.org
24
     names and GADM2 names
25
4. load-geoscrub-input.sh
26
   - dumps geoscrub_input from vegbien and loads it into the geoscrub db
27
5. geonames.sql
28
   - contains SQL statements that scrub asserted names and (to the
29
     extent possible) map them to GADM2
30
6. geovalidate.sql
31
   - contains (postgis-extended) SQL statements that score the validity
32
     of GADM2-scrubbed names against given point coordinates
33

    
34
The resulting 'geoscrub' table is what contains the scrubbed (i.e.,
35
GADM2-matched) names and various geovalidation scores.
36

    
37
Notes/Caveats/Todos:
38
* Clearly the SQL statements used in this procedure suffer from a lot of
39
  redundancy, and it might be worth trying to refactor once we're happy
40
  with the particular approach taken.
41
* Need to pull out more known notes/caveats/todos and highlight them :)
(2-2/9)