Project

General

Profile

Bug #950

Updated by Aaron Marcuse-Kubitza over 9 years ago

h3. test case 

 eg. this happens for CVS rows: 

 |datasource|country|state_province|county|latitude|longitude|is_geovalid| 
 |CVS|United States|TENNESSEE|Sevier|35.654350967|-83.444906936|<NULL>| 

 <pre><code class="SQL"> 
 SET enable_seqscan = off; 
 SET enable_mergejoin = off; 
 SELECT * FROM view_full_occurrence_individual_view WHERE datasource = 'CVS' LIMIT 1; 
 </code></pre> 

 and FIA rows: 

 |datasource|country|state_province|county|latitude|longitude|is_geovalid| 
 |FIA|United States|Alabama|Covington|31.39|-86.36|<NULL>| 
 |FIA|United States|Alabama|Escambia|31.17|-86.72|<NULL>| 

 <pre><code class="SQL"> 
 SET enable_seqscan = off; 
 SET enable_mergejoin = off; 
 SELECT 
 DISTINCT ON (country, state_province, county) 
 * 
 FROM (SELECT * FROM view_full_occurrence_individual_view WHERE datasource = 'FIA' LIMIT 1000) s 
 ; 
 </code></pre> 

 h3. info 

 since is_geovalid is NULL, and state_province is falling back to the unscrubbed value, this indicates it is unable to find a @geoscrub.geoscrub_output@ row to join to 

 however, there is a matching row in the @geoscrub@ DB's result table: 

 |decimallatitude|decimallongitude|country|stateprovince|county|countrystd|stateprovincestd|countystd|latlonvalidity|countryvalidity|stateprovincevalidity|countyvalidity| 
 |35.654350967|-83.444906936|United States|TENNESSEE|Sevier|United States|Tennessee|Sevier|1|3|3|3| 

 <pre><code class="SQL"> 
 SELECT * FROM geoscrub WHERE 
	 country = 'United States' 
 AND 	 stateprovince = 'TENNESSEE' 
 AND 	 county = 'Sevier' 
 AND 	 decimallatitude = 35.654350967 
 AND 	 decimallongitude = -83.444906936 
 LIMIT 1 
 ; 
 </code></pre> 

 this suggests that the problem is in the transfer from the @geoscrub@ DB to @vegbien@ 

 <pre> 
 grep -E '^35\.654350967,-83\.444906936,United States,TENNESSEE,Sevier.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns no rows 
 </pre> 

 because this row is not present in the extract, the problem is in the export from the @geoscrub@ DB 

 then determine if decimal truncation is causing the problem: 

 <pre> 
 grep -E '^35\.65.*,-83\.44.*,United States,TENNESSEE,Sevier.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns no rows 
 </pre> 

 but this doesn't appear to be the case 

 trying just the first coordinate:  

 <pre> 
 grep -E '^35\.65.*$' inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns many rows 
 grep -E ^35.654350967 inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns 1 row: 
 --- 
 35.654350967,-83.444906936,United States,Tennessee,Sevier,United States,Tennessee,Sevier,1,3,3,3 
 --- 
 </pre> 

 the returned row has the wrong value for @stateprovince@ (the scrubbed value instead of the input value), so the problem is in the export of the placenames, not the coordinates 

 regenerating the extract: 

 <pre> 
 ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i 
 rm=1 inputs/.geoscrub/geoscrub_output/geoscrub.csv.run export_ 
 grep -E ^35.654350967 inputs/.geoscrub/geoscrub_output/geoscrub.csv # returns 1 row: 
 --- 
 35.654350967,-83.444906936,United States,TENNESSEE,Sevier,United States,Tennessee,Sevier,1,3,3,3 
 --- 
 </pre> 

 the returned row is now correct, so there was likely some problem in the exporting of the @geoscrub@ DB results 

 the next step will be to figure out if the other unmatched rows are also fixed by the reload

Back