Bug #950
Updated by Aaron Marcuse-Kubitza about 10 years ago
h3. test case
eg. this happens for CVS rows:
|datasource|country|state_province|county|latitude|longitude|is_geovalid|
|CVS|United States|TENNESSEE|Sevier|35.654350967|-83.444906936|<NULL>|
<pre><code class="SQL">
SET enable_seqscan = off;
SET enable_mergejoin = off;
SELECT * FROM view_full_occurrence_individual_view WHERE datasource = 'CVS' LIMIT 1;
</code></pre>
and FIA rows:
|datasource|country|state_province|county|latitude|longitude|is_geovalid|
|FIA|United States|Alabama|Covington|31.39|-86.36|<NULL>|
|FIA|United States|Alabama|Escambia|31.17|-86.72|<NULL>|
<pre><code class="SQL">
SET enable_seqscan = off;
SET enable_mergejoin = off;
SELECT
DISTINCT ON (country, state_province, county)
*
FROM (SELECT * FROM view_full_occurrence_individual_view WHERE datasource = 'FIA' LIMIT 1000) s
;
</code></pre>
h3. info
since is_geovalid is NULL, and state_province is falling back to the unscrubbed value, this indicates it is unable to find a @geoscrub.geoscrub_output@ row to join to
however, there is a matching row in the @geoscrub@ DB's result table:
|decimallatitude|decimallongitude|country|stateprovince|county|countrystd|stateprovincestd|countystd|latlonvalidity|countryvalidity|stateprovincevalidity|countyvalidity|
|35.654350967|-83.444906936|United States|TENNESSEE|Sevier|United States|Tennessee|Sevier|1|3|3|3|
<pre><code class="SQL">
SELECT * FROM geoscrub WHERE
country = 'United States'
AND stateprovince = 'TENNESSEE'
AND county = 'Sevier'
AND decimallatitude = 35.654350967
AND decimallongitude = -83.444906936
LIMIT 1
;
</code></pre>
this suggests that the problem is in the transfer from the @geoscrub@ DB to @vegbien@
<pre>
grep -E '^35.654350967,-83.444906936,United States,TENNESSEE,Sevier' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns no rows
</pre>
because this row is not present in the extract, the problem is in the export from the @geoscrub@ DB