Revision 11444
Added by Aaron Marcuse-Kubitza about 11 years ago
README.txt | ||
---|---|---|
1 | 1 |
BIEN geovalidation notes |
2 | 2 |
======================== |
3 | 3 |
|
4 |
Dependencies:
|
|
4 |
***** install dependencies:
|
|
5 | 5 |
The only dependencies for running these scripts are PostgreSQL 9.1, postgis 2.0, |
6 | 6 |
and unzip. |
7 | 7 |
Installing these packages on Ubuntu 13.04 should be as simple as these commands: |
... | ... | |
12 | 12 |
sudo apt-get install postgresql-9.1-postgis-2.0 |
13 | 13 |
sudo apt-get install unzip |
14 | 14 |
|
15 |
[Also see comments embedded in specific scripts in this directory.] |
|
16 |
|
|
17 |
The bash and SQL statements contained in the files as ordered below |
|
18 |
should be applied to carry out geographic name scrubbing and |
|
19 |
geovalidation on a given corpus of BIEN location records. |
|
20 |
|
|
21 |
That said, given the tight deadline under which this was done in order |
|
22 |
to produced a geovalidated BIEN3 corpus in advance of the Nov 2013 |
|
23 |
working group meeting, and the corresponding manner in which much of |
|
24 |
this was actually executed piecemeal in an iterative and interactive |
|
25 |
fashion within a bash shell and psql session, I can't guarantee that the |
|
26 |
code in its current state could be run end-to-end without intervention. |
|
27 |
It's close, but probably not bulletproof. |
|
28 |
|
|
15 |
***** initialize the DB: |
|
29 | 16 |
1. geovalidate.sh |
30 | 17 |
- creates postgis DB and loads GADM2 data |
31 | 18 |
2. geonames.sh |
... | ... | |
33 | 20 |
3. geonames-to-gadm.sql |
34 | 21 |
- contains SQL statements that build linkages between geonames.org |
35 | 22 |
names and GADM2 names |
23 |
|
|
24 |
***** geoscrub new data: |
|
36 | 25 |
4. load-geoscrub-input.sh |
37 | 26 |
- dumps geoscrub_input from vegbien and loads it into the geoscrub db |
38 | 27 |
5. geonames.sql |
... | ... | |
42 | 31 |
- contains (postgis-extended) SQL statements that score the validity |
43 | 32 |
of GADM2-scrubbed names against given point coordinates |
44 | 33 |
|
34 |
[Also see comments embedded in specific scripts in this directory.] |
|
35 |
|
|
36 |
The bash and SQL statements contained in the files as ordered below |
|
37 |
should be applied to carry out geographic name scrubbing and |
|
38 |
geovalidation on a given corpus of BIEN location records. |
|
39 |
|
|
40 |
That said, given the tight deadline under which this was done in order |
|
41 |
to produced a geovalidated BIEN3 corpus in advance of the Nov 2013 |
|
42 |
working group meeting, and the corresponding manner in which much of |
|
43 |
this was actually executed piecemeal in an iterative and interactive |
|
44 |
fashion within a bash shell and psql session, I can't guarantee that the |
|
45 |
code in its current state could be run end-to-end without intervention. |
|
46 |
It's close, but probably not bulletproof. |
|
47 |
|
|
45 | 48 |
The resulting 'geoscrub' table is what contains the scrubbed (i.e., |
46 | 49 |
GADM2-matched) names and various geovalidation scores. |
47 | 50 |
|
Also available in: Unified diff
derived/biengeo/README.txt: moved commands to run to the top of the README. flagged commands-sections with *** and an identifying label.