1 |
10707
|
aaronmk
|
BIEN geovalidation notes
|
2 |
|
|
========================
|
3 |
|
|
|
4 |
|
|
[Also see comments embedded in specific scripts in this directory.]
|
5 |
|
|
|
6 |
|
|
The bash and SQL statements contained in the files as ordered below
|
7 |
|
|
should be applied to carry out geographic name scrubbing and
|
8 |
|
|
geovalidation on a given corpus of BIEN location records.
|
9 |
|
|
|
10 |
|
|
That said, given the tight deadline under which this was done in order
|
11 |
|
|
to produced a geovalidated BIEN3 corpus in advance of the Nov 2013
|
12 |
|
|
working group meeting, and the corresponding manner in which much of
|
13 |
|
|
this was actually executed piecemeal in an iterative and interactive
|
14 |
|
|
fashion within a bash shell and psql session, I can't guarantee that the
|
15 |
|
|
code in its current state could be run end-to-end without intervention.
|
16 |
|
|
It's close, but probably not bulletproof.
|
17 |
|
|
|
18 |
|
|
1. geovalidate.sh
|
19 |
|
|
- creates postgis DB and loads GADM2 data
|
20 |
|
|
2. geonames.sh
|
21 |
|
|
- loads geonames.org data and adds some custom mapping logic
|
22 |
|
|
3. geonames-to-gadm.sql
|
23 |
|
|
- contains SQL statements that build linkages between geonames.org
|
24 |
|
|
names and GADM2 names
|
25 |
|
|
4. load-geoscrub-input.sh
|
26 |
|
|
- dumps geoscrub_input from vegbien and loads it into the geoscrub db
|
27 |
|
|
5. geonames.sql
|
28 |
|
|
- contains SQL statements that scrub asserted names and (to the
|
29 |
|
|
extent possible) map them to GADM2
|
30 |
|
|
6. geovalidate.sql
|
31 |
|
|
- contains (postgis-extended) SQL statements that score the validity
|
32 |
|
|
of GADM2-scrubbed names against given point coordinates
|
33 |
|
|
|
34 |
|
|
The resulting 'geoscrub' table is what contains the scrubbed (i.e.,
|
35 |
|
|
GADM2-matched) names and various geovalidation scores.
|
36 |
|
|
|
37 |
|
|
Notes/Caveats/Todos:
|
38 |
|
|
* Clearly the SQL statements used in this procedure suffer from a lot of
|
39 |
|
|
redundancy, and it might be worth trying to refactor once we're happy
|
40 |
|
|
with the particular approach taken.
|
41 |
|
|
* Need to pull out more known notes/caveats/todos and highlight them :)
|