Revision 7254
Added by Aaron Marcuse-Kubitza almost 12 years ago
README.TXT | ||
---|---|---|
1 |
E-mails from Jim:
|
|
1 |
e-mail from Jim on 2012-11-16:
|
|
2 | 2 |
----- |
3 | 3 |
As a quick but hopefully sufficient way of transferring the geoscrub results back to you, I dumped my geoscrub output table out to CSV and stuck it on vegbiendev at /tmp/public.2012-11-04-07-34-10.r5984.geoscrub_output.csv. |
4 | 4 |
|
... | ... | |
21 | 21 |
|
22 | 22 |
The added countrystd, stateprovincestd, and countystd columns contain the corresponding GADM place names in cases where the scrubbing procedure yielded a match to GADM. And the four *validity columns contain scores as described in my email to the bien-db list a few minutes ago. |
23 | 23 |
----- |
24 |
|
|
25 |
e-mail from Jim on 2012-11-16: |
|
26 |
----- |
|
24 | 27 |
Attached is a tabulation of provisional geo validity scores I generated for the full set of 1707970 geoscrub_input records Aaron provided me a couple of weeks ago (from schema public.2012-11-04-07-34-10.r5984). This goes all the way down to level of county/parish (i.e., 2nd order administrative divisions), although I know the scrubbing can still be improved especially at that lower level. Hence my "provisional" qualifier. |
25 | 28 |
|
26 | 29 |
To produce these scores, I first passed the data through a geoscrubbing pipeline that attempts to translate asserted names into GADM (http://gadm.org) names with the help of geonames.org data, some custom mappings, and a few other tricks. Then I pushed them through a geovalidation pipeline that assesses the proximity of asserted lat/lon coordinates to their putative administrative areas in cases where scrubbing was successful. All operations happen in a Postgis database, and the full procedure ran for me in ~2 hours on a virtual server similar to vegbiendev. (This doesn't include the time it takes to set things up by importing GADM and geonames data and building appropriate indexes, but that's a one-time cost anyway.) |
Also available in: Unified diff
inputs/.geoscrub/_src/README.TXT: Added dates for e-mails from Jim