Project

General

Profile

« Previous | Next » 

Revision 7324

inputs/.geoscrub/_src/README.TXT: Added e-mails from Jim about how the county_centroids data was generated

View differences:

inputs/.geoscrub/_src/README.TXT
115 115
if any, changes in the high-level admin units we're using, but who
116 116
knows.
117 117
-----
118

  
119
e-mail from Jim on 2013-1-18:
120
-----
121
Regarding calculation of precision estimates for counties, I just resuscitated something I had started working up (with Bob's inspiration) back on the final Friday afternoon of the Nov 2012 BIEN WG meeting at NCEAS. Using GADM2 administrative polygons data (http://www.gadm.org), this calculation works by first reprojecting a given county into whatever UTM zone contains the county centroid, then (just as Bob suggested in his email) computing the distance from the centroid to the most distance point on the county border.
122

  
123
Have a look at the attached CSV of results for US counties, and mull over whether this seems useful. Maybe even sanity-check against your favorite county or two! For those who care, the SQL statement is pasted below my sig, along with a 'utmzone' function that I borrowed from the PostGIS wiki.
124

  
125
Note that I did the whole UTM reprojection bit because the GADM2 data are in a geographic projection, for which planar distance calculations aren't in meaningful units and aren't consistent across latitudes. Although postgis provides some functions that know how to work directly on the WGS 84 ellipsoid and don't require a projection, the exact functions we need for this aren't available.
126

  
127
Some misc things that were in my head as I did this:
128

  
129
1. This precision measure is conservative insofar as it basically describes a circle of radius equal to the precision value, and most counties will only actually occupy a fraction of the circle depending on their shape -- think of long skinny counties, or those which have an offshore island under their jurisdiction. Totally reasonable! But this _is_ just one conception of what "precision" means for county data.
130

  
131
2. In GADM2, counties are the lowest admin level for the US, making things easy. But other countries have additional levels below county/parish, meaning that we'd need to add a step that first unions those smaller polygons into a single polygon for the county/parish.
132

  
133
3. I was a little concerned about the, um, precision of these UTM-based precision estimates. Although UTM is convenient insofar as a local zone can be applied when looking at a given smallish area (like a county), distance is not perfectly preserved in this projection, which suffers from distortion as you approach the edges of a given zone. However, I just did some testing of the UTM distortion using points at roughly opposite east-west ends of Santa Barbara County. SB makes for a decent test case because it's a fairly large county that lies across two UTM zones. Distance between the test points I used is 100833 meters in UTM, and 100782 meters using the more accurate ellipsoid calculation. So I'm pretty confident we could say that our county precision estimates are themselves precise to 1 km (and probably usually much better than that), which I assume is fine?
134

  
135
As an aside, I was curious to see Franklin County, VA, showing up as having one of the worst precision values. Turns out GADM2 counts Franklin County as including the independent city of Franklin located a few hundred km east (http://en.wikipedia.org/wiki/Franklin,_Virginia). What can I say, anything we do is only as good as our inputs.
136

  
137
Cheers,
138
Jim
139

  
140

  
141
-- generate precision estimates for all US counties
142
-- (requires utmzone function, defined below)
143
SELECT name_1 AS state, name_2 AS county,
144
       ROUND(ST_Length(
145
         ST_Longestline(
146
           ST_Transform(geom, utmzone(ST_Centroid(geom))),
147
           ST_Centroid(ST_Transform(geom, utmzone(ST_Centroid(geom))))
148
         )
149
       )::numeric/1000, 3) AS error_km
150
  FROM gadm2
151
  WHERE name_0='United States'
152
  ORDER BY state, county;
153

  
154
-- function taken from postgis users wiki
155
CREATE OR REPLACE FUNCTION utmzone(geometry)
156
   RETURNS integer AS
157
 $BODY$
158
 DECLARE
159
     geomgeog geometry;
160
     zone int;
161
     pref int;
162

  
163
 BEGIN
164
     geomgeog:= ST_Transform($1,4326);
165

  
166
     IF (ST_Y(geomgeog))>0 THEN
167
        pref:=32600;
168
     ELSE
169
        pref:=32700;
170
     END IF;
171

  
172
     zone:=floor((ST_X(geomgeog)+180)/6)+1;
173

  
174
     RETURN zone+pref;
175
 END;
176
 $BODY$ LANGUAGE 'plpgsql' IMMUTABLE
177
   COST 100;
178
-----
179

  
180
e-mail from Jim on 2013-1-18:
181
-----
182
See attached, same as before but now with centroid coordinates (after
183
back-transforming from the local UTM zone to WGS84), along with their
184
calculated precision in km. Note that these centroids may differ from
185
what other authorities claim, because of differences in the method
186
and/or differences in the input polygons.
187

  
188
Revised CSV attached, revised SQL statement below.
189

  
190
Cheers,
191
Jim
192

  
193
SELECT name_1 AS state, name_2 AS county,
194
       ST_Y(ST_Transform(ST_Centroid(geom), 4326)) AS latitude,
195
       ST_X(ST_Transform(ST_Centroid(geom), 4326)) AS longitude,
196
       ROUND(
197
         ST_Length(ST_Longestline(geom, ST_Centroid(geom))
198
       )::numeric/1000, 3) AS error_km
199
  FROM (
200
    SELECT name_0, name_1, name_2,
201
           ST_Transform(geom, utmzone(ST_Centroid(geom))) AS geom
202
      FROM gadm2
203
      WHERE name_0='United States') AS gadm2_utm
204
  ORDER BY state, county
205
-----

Also available in: Unified diff