Bug #950
Updated by Aaron Marcuse-Kubitza about 10 years ago
h3. test case
eg. this happens for CVS rows:
|datasource|country|state_province|county|latitude|longitude|is_geovalid|
|CVS|United States|TENNESSEE|Sevier|35.654350967|-83.444906936|<NULL>|
<pre><code class="SQL">
SET enable_seqscan = off;
SET enable_mergejoin = off;
SELECT * FROM view_full_occurrence_individual_view WHERE datasource = 'CVS' LIMIT 1;
</code></pre>
and FIA rows:
|datasource|country|state_province|county|latitude|longitude|is_geovalid|
|FIA|United States|Alabama|Covington|31.39|-86.36|<NULL>|
|FIA|United States|Alabama|Escambia|31.17|-86.72|<NULL>|
<pre><code class="SQL">
SET enable_seqscan = off;
SET enable_mergejoin = off;
SELECT
DISTINCT ON (country, state_province, county)
*
FROM (SELECT * FROM view_full_occurrence_individual_view WHERE datasource = 'FIA' LIMIT 1000) s
;
</code></pre>
h3. info
since is_geovalid is NULL, and state_province is falling back to the unscrubbed value, this indicates it is unable to find a @geoscrub.geoscrub_output@ row to join to
however, there is a matching row in the @geoscrub@ DB's result table:
|decimallatitude|decimallongitude|country|stateprovince|county|countrystd|stateprovincestd|countystd|latlonvalidity|countryvalidity|stateprovincevalidity|countyvalidity|
|35.654350967|-83.444906936|United States|TENNESSEE|Sevier|United States|Tennessee|Sevier|1|3|3|3|
<pre><code class="SQL">
SELECT * FROM geoscrub WHERE
country = 'United States'
AND stateprovince = 'TENNESSEE'
AND county = 'Sevier'
AND decimallatitude = 35.654350967
AND decimallongitude = -83.444906936
LIMIT 1
;
</code></pre>
this suggests that the problem is in the transfer from the @geoscrub@ DB to @vegbien@
<pre>
grep -E '^35\.654350967,-83\.444906936,United States,TENNESSEE,Sevier.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns no rows
</pre>
because this row is not present in the extract, the problem is in the export from the @geoscrub@ DB
then determine if decimal truncation is causing the problem:
<pre>
grep -E '^35\.65.*,-83\.44.*,United States,TENNESSEE,Sevier.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns no rows
</pre>
but this doesn't appear to be the case
trying just the first coordinate:
<pre>
grep -E '^35\.65.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns many rows
grep -E ^35.654350967 inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns 1 row:
---
35.654350967,-83.444906936,United States,Tennessee,Sevier,United States,Tennessee,Sevier,1,3,3,3
---
</pre>
the returned row has the wrong value for @stateprovince@ (the scrubbed value instead of the input value), so the problem is in the export of the placenames, not the coordinates
regenerating the extract:
<pre>
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
rm=1 inputs/.geoscrub/geoscrub_output/geoscrub.csv.run export_
grep -E ^35.654350967 inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns 1 row:
---
35.654350967,-83.444906936,United States,TENNESSEE,Sevier,United States,Tennessee,Sevier,1,3,3,3
---
</pre>
the returned row is now correct, so there was likely some problem in the exporting of the @geoscrub@ DB results
the next step will be to figure out if the other unmatched rows are also fixed by the reload