Bug #950
Updated by Aaron Marcuse-Kubitza about 10 years ago
h3. test case eg. this happens for CVS rows: |datasource|country|state_province|county|latitude|longitude|is_geovalid| |CVS|United States|TENNESSEE|Sevier|35.654350967|-83.444906936|<NULL>| <pre><code class="SQL"> SET enable_seqscan = off; SET enable_mergejoin = off; SELECT * FROM view_full_occurrence_individual_view WHERE datasource = 'CVS' LIMIT 1; </code></pre> and FIA rows: |datasource|country|state_province|county|latitude|longitude|is_geovalid| |FIA|United States|Alabama|Covington|31.39|-86.36|<NULL>| |FIA|United States|Alabama|Escambia|31.17|-86.72|<NULL>| <pre><code class="SQL"> SET enable_seqscan = off; SET enable_mergejoin = off; SELECT DISTINCT ON (country, state_province, county) * FROM (SELECT * FROM view_full_occurrence_individual_view WHERE datasource = 'FIA' LIMIT 1000) s ; </code></pre> h3. info since is_geovalid is NULL, and state_province is falling back to the unscrubbed value, this indicates it is unable to find a @geoscrub.geoscrub_output@ row to join to however, there is a matching row in the @geoscrub@ DB's result table: |decimallatitude|decimallongitude|country|stateprovince|county|countrystd|stateprovincestd|countystd|latlonvalidity|countryvalidity|stateprovincevalidity|countyvalidity| |35.654350967|-83.444906936|United States|TENNESSEE|Sevier|United States|Tennessee|Sevier|1|3|3|3| <pre><code class="SQL"> SELECT * FROM geoscrub WHERE country = 'United States' AND stateprovince = 'TENNESSEE' AND county = 'Sevier' AND decimallatitude = 35.654350967 AND decimallongitude = -83.444906936 LIMIT 1 ; </code></pre> this suggests that the problem is in the transfer from the @geoscrub@ DB to @vegbien@ <pre> grep -E '^35\.654350967,-83\.444906936,United States,TENNESSEE,Sevier.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns no rows </pre> because this row is not present in the extract, the problem is in the export from the @geoscrub@ DB then determine if decimal truncation is causing the problem: <pre> grep -E '^35\.65.*,-83\.44.*,United States,TENNESSEE,Sevier.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns no rows </pre> but this doesn't appear to be the case trying just the first coordinate: <pre> grep -E '^35\.65.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns many rows grep -E ^35.654350967 inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns 1 row: --- 35.654350967,-83.444906936,United States,Tennessee,Sevier,United States,Tennessee,Sevier,1,3,3,3 --- row </pre> the returned row has the wrong value for @stateprovince@ (the scrubbed value instead of the input value), so the problem is in the export of the placenames, not the coordinates