Revision 7324
Added by Aaron Marcuse-Kubitza almost 12 years ago
inputs/.geoscrub/_src/README.TXT | ||
---|---|---|
115 | 115 |
if any, changes in the high-level admin units we're using, but who |
116 | 116 |
knows. |
117 | 117 |
----- |
118 |
|
|
119 |
e-mail from Jim on 2013-1-18: |
|
120 |
----- |
|
121 |
Regarding calculation of precision estimates for counties, I just resuscitated something I had started working up (with Bob's inspiration) back on the final Friday afternoon of the Nov 2012 BIEN WG meeting at NCEAS. Using GADM2 administrative polygons data (http://www.gadm.org), this calculation works by first reprojecting a given county into whatever UTM zone contains the county centroid, then (just as Bob suggested in his email) computing the distance from the centroid to the most distance point on the county border. |
|
122 |
|
|
123 |
Have a look at the attached CSV of results for US counties, and mull over whether this seems useful. Maybe even sanity-check against your favorite county or two! For those who care, the SQL statement is pasted below my sig, along with a 'utmzone' function that I borrowed from the PostGIS wiki. |
|
124 |
|
|
125 |
Note that I did the whole UTM reprojection bit because the GADM2 data are in a geographic projection, for which planar distance calculations aren't in meaningful units and aren't consistent across latitudes. Although postgis provides some functions that know how to work directly on the WGS 84 ellipsoid and don't require a projection, the exact functions we need for this aren't available. |
|
126 |
|
|
127 |
Some misc things that were in my head as I did this: |
|
128 |
|
|
129 |
1. This precision measure is conservative insofar as it basically describes a circle of radius equal to the precision value, and most counties will only actually occupy a fraction of the circle depending on their shape -- think of long skinny counties, or those which have an offshore island under their jurisdiction. Totally reasonable! But this _is_ just one conception of what "precision" means for county data. |
|
130 |
|
|
131 |
2. In GADM2, counties are the lowest admin level for the US, making things easy. But other countries have additional levels below county/parish, meaning that we'd need to add a step that first unions those smaller polygons into a single polygon for the county/parish. |
|
132 |
|
|
133 |
3. I was a little concerned about the, um, precision of these UTM-based precision estimates. Although UTM is convenient insofar as a local zone can be applied when looking at a given smallish area (like a county), distance is not perfectly preserved in this projection, which suffers from distortion as you approach the edges of a given zone. However, I just did some testing of the UTM distortion using points at roughly opposite east-west ends of Santa Barbara County. SB makes for a decent test case because it's a fairly large county that lies across two UTM zones. Distance between the test points I used is 100833 meters in UTM, and 100782 meters using the more accurate ellipsoid calculation. So I'm pretty confident we could say that our county precision estimates are themselves precise to 1 km (and probably usually much better than that), which I assume is fine? |
|
134 |
|
|
135 |
As an aside, I was curious to see Franklin County, VA, showing up as having one of the worst precision values. Turns out GADM2 counts Franklin County as including the independent city of Franklin located a few hundred km east (http://en.wikipedia.org/wiki/Franklin,_Virginia). What can I say, anything we do is only as good as our inputs. |
|
136 |
|
|
137 |
Cheers, |
|
138 |
Jim |
|
139 |
|
|
140 |
|
|
141 |
-- generate precision estimates for all US counties |
|
142 |
-- (requires utmzone function, defined below) |
|
143 |
SELECT name_1 AS state, name_2 AS county, |
|
144 |
ROUND(ST_Length( |
|
145 |
ST_Longestline( |
|
146 |
ST_Transform(geom, utmzone(ST_Centroid(geom))), |
|
147 |
ST_Centroid(ST_Transform(geom, utmzone(ST_Centroid(geom)))) |
|
148 |
) |
|
149 |
)::numeric/1000, 3) AS error_km |
|
150 |
FROM gadm2 |
|
151 |
WHERE name_0='United States' |
|
152 |
ORDER BY state, county; |
|
153 |
|
|
154 |
-- function taken from postgis users wiki |
|
155 |
CREATE OR REPLACE FUNCTION utmzone(geometry) |
|
156 |
RETURNS integer AS |
|
157 |
$BODY$ |
|
158 |
DECLARE |
|
159 |
geomgeog geometry; |
|
160 |
zone int; |
|
161 |
pref int; |
|
162 |
|
|
163 |
BEGIN |
|
164 |
geomgeog:= ST_Transform($1,4326); |
|
165 |
|
|
166 |
IF (ST_Y(geomgeog))>0 THEN |
|
167 |
pref:=32600; |
|
168 |
ELSE |
|
169 |
pref:=32700; |
|
170 |
END IF; |
|
171 |
|
|
172 |
zone:=floor((ST_X(geomgeog)+180)/6)+1; |
|
173 |
|
|
174 |
RETURN zone+pref; |
|
175 |
END; |
|
176 |
$BODY$ LANGUAGE 'plpgsql' IMMUTABLE |
|
177 |
COST 100; |
|
178 |
----- |
|
179 |
|
|
180 |
e-mail from Jim on 2013-1-18: |
|
181 |
----- |
|
182 |
See attached, same as before but now with centroid coordinates (after |
|
183 |
back-transforming from the local UTM zone to WGS84), along with their |
|
184 |
calculated precision in km. Note that these centroids may differ from |
|
185 |
what other authorities claim, because of differences in the method |
|
186 |
and/or differences in the input polygons. |
|
187 |
|
|
188 |
Revised CSV attached, revised SQL statement below. |
|
189 |
|
|
190 |
Cheers, |
|
191 |
Jim |
|
192 |
|
|
193 |
SELECT name_1 AS state, name_2 AS county, |
|
194 |
ST_Y(ST_Transform(ST_Centroid(geom), 4326)) AS latitude, |
|
195 |
ST_X(ST_Transform(ST_Centroid(geom), 4326)) AS longitude, |
|
196 |
ROUND( |
|
197 |
ST_Length(ST_Longestline(geom, ST_Centroid(geom)) |
|
198 |
)::numeric/1000, 3) AS error_km |
|
199 |
FROM ( |
|
200 |
SELECT name_0, name_1, name_2, |
|
201 |
ST_Transform(geom, utmzone(ST_Centroid(geom))) AS geom |
|
202 |
FROM gadm2 |
|
203 |
WHERE name_0='United States') AS gadm2_utm |
|
204 |
ORDER BY state, county |
|
205 |
----- |
Also available in: Unified diff
inputs/.geoscrub/_src/README.TXT: Added e-mails from Jim about how the county_centroids data was generated