Project

General

Profile

Bug #948

Updated by Aaron Marcuse-Kubitza over 9 years ago

h3. issue 

 "from Brody":mailto:brody.sandelATbiology.au.dk?Brody_Sandel.2014-8-22-9:43PT: 

 > SELECT * FROM view_full_occurrence_individual WHERE datasource = 
 > 'VegBank' AND plot_name = 'CO081202XY04' 
 > ; 
 >  
 > returns two lines with "Lupinus argenteus". Those lines appear to be complete duplicates, except for the last field (taxonobservation_id) which is 1230 for one and 1231 for the next. Both lines show cover_percent of 2. 
 >  
 > Looking at VegBank (http://vegbank.org/vegbank/views/observation_comprehensive.jsp?view=comprehensive&entity=observation&wparam=42558&strata2Show=1&params=42558&placeholder=), I see a single entry of Lupinus argenteus, with a cover of 2%. 
 >  
 > There's a similar situation for Quercus gambelii in this particular plot. 

 h3. info 

 h3. +fix+ fix 

 h3. h3(. -check staging tables- 

 p(. p((. _the duplication is not in the source data or the staging tables, as VegBank/CVS themselves have only one entry for this:_ 

 * VegBank: using @aggregate_organism_observation_id = 937165@ from those two rows: 
 <pre><code class="SQL"> 
 SELECT * FROM "VegBank".taxonimportance WHERE taxonimportance_id = 937165; -- 1 row 
 SELECT * FROM "VegBank".taxoninterpretation WHERE taxonobservation_id = 697934; -- 1 row 
 </code></pre> 
 * CVS: using @aggregate_organism_observation_id = 396239@ from two rows that have this problem: 
 <pre><code class="SQL"> 
 SELECT * FROM "CVS"."taxonImportance" WHERE "taxonImportance_ID" = 396239; -- 1 row 
 SELECT * FROM "CVS"."taxonObservation_" WHERE "taxonOccurrenceID__overall_plot" = 162259; -- 1 row 
 </code></pre> 

 h3. h3(. check normalized DB 

 p(. p((. go through all of the tables used by view_full_occurrence_individual_view (below), and figure out if any of them have duplication. those that have duplication will cause the corresponding @LEFT JOIN@ to add duplicated rows. 

 p(. p((. _using the example CVS row above:_ 

 p((. @FROM source@ 
 @JOIN location USING (source_id)@ 
 -@LEFT JOIN locationevent USING (location_id)@- _query:_ 
 <pre><code class="SQL"> 
 SELECT * FROM locationevent WHERE locationevent_id = 912711; -- 1 row 
 </code></pre> 

 p((. @LEFT JOIN place USING (place_id)@ 
 @LEFT JOIN location parent_location ON parent_location.location_id = location.parent_id@ 
 @LEFT JOIN coordinates USING (coordinates_id)@ 
 @LEFT JOIN geoscrub.geoscrub_output ON ARRAY[geoscrub_output."decimalLatitude"] = ARRAY[coordinates.latitude_deg] AND ARRAY[geoscrub_output."decimalLongitude"] = ARRAY[coordinates.longitude_deg] AND ARRAY[geoscrub_output.country] = ARRAY[place.country] AND ARRAY[geoscrub_output."stateProvince"] = ARRAY[place.stateprovince] AND ARRAY[geoscrub_output.county] = ARRAY[place.county]@ 
 @LEFT JOIN "newWorld".iso_code_gadm ON iso_code_gadm."*GADM country" = COALESCE(geoscrub_output."acceptedCountry", place.country)@ 
 @LEFT JOIN "newWorld"."newWorldCountries" ON "newWorldCountries"."*isoCode" = iso_code_gadm."*2-digit iso code"@ 
 @LEFT JOIN geoscrub.county_centroids ON place.country = 'United States'::text AND county_centroids."stateProvince" = COALESCE(geoscrub_output."acceptedStateProvince", place.stateprovince) AND county_centroids.county = COALESCE(geoscrub_output."acceptedCounty", place.county)@ 
 @LEFT JOIN locationevent parent_event ON parent_event.locationevent_id = locationevent.parent_id@ 
 @LEFT JOIN project ON project.project_id = COALESCE(locationevent.project_id, parent_event.project_id)@ 
 @LEFT JOIN stratum ON stratum.stratum_id = COALESCE(locationevent.stratum_id, parent_event.stratum_id)@ 
 @LEFT JOIN method ON method.method_id = COALESCE(locationevent.method_id, parent_event.method_id);@ 
 -@LEFT JOIN taxonoccurrence ON taxonoccurrence.locationevent_id = "plot.**".locationevent_id@- _query:_ 
 <pre><code class="SQL"> 
 SELECT * FROM taxonoccurrence WHERE taxonoccurrence_id = 11298823; -- 1 row 
 </code></pre> 

 p((. @LEFT JOIN party collector ON collector.party_id = taxonoccurrence.collector_id@ 
 -@LEFT JOIN aggregateoccurrence USING (taxonoccurrence_id)@- _query:_ 
 <pre><code class="SQL"> 
 SELECT * FROM aggregateoccurrence WHERE aggregateoccurrence_id = 11292621; -- 1 row 
 </code></pre> 

 p((. -@LEFT JOIN plantobservation USING (aggregateoccurrence_id)@- _query:_ 
 <pre><code class="SQL"> 
 SELECT * FROM plantobservation WHERE plantobservation_id = 11865135; -- 1 row 
 </code></pre> 

 p((. @LEFT JOIN specimenreplicate USING (plantobservation_id)@ 
 @LEFT JOIN sourcelist ON sourcelist.sourcelist_id = specimenreplicate.duplicate_institutions_sourcelist_id@ 
 -@LEFT JOIN taxondetermination ON taxondetermination.taxonoccurrence_id = taxonoccurrence.taxonoccurrence_id AND taxondetermination.iscurrent@- _query:_ 
 <pre><code class="SQL"> 
 SELECT * FROM taxondetermination WHERE taxonoccurrence_id = 11298823; -- 1 row 
 </code></pre> 

 p((. @LEFT JOIN party identifiedby ON identifiedby.party_id = taxondetermination.party_id@ 
 @LEFT JOIN taxonverbatim USING (taxonverbatim_id)@ 
 @LEFT JOIN taxonlabel USING (taxonlabel_id)@ 
 @LEFT JOIN "TNRS".taxon_scrub ON taxon_scrub."*Name_submitted" = taxonlabel.taxonomicname@ 
 @LEFT JOIN family_higher_plant_group ON family_higher_plant_group.family = taxon_scrub."[scrubbed_]family~(-Accepted_-)__@TNRS__@vegpath.org"@ 
 @LEFT JOIN cultivated_family_locations ON cultivated_family_locations.family = taxon_scrub."[scrubbed_]family~(-Accepted_-)__@TNRS__@vegpath.org" AND cultivated_family_locations.country = "plot.**".country;@ 

 p(. p((. _using the example VegBank row above:_ 

 p((. @FROM source@ 
 @JOIN location USING (source_id)@ 
 @LEFT JOIN locationevent USING (location_id)@ 
 @LEFT JOIN place USING (place_id)@ 
 @LEFT JOIN location parent_location ON parent_location.location_id = location.parent_id@ 
 @LEFT JOIN coordinates USING (coordinates_id)@ 
 @LEFT JOIN geoscrub.geoscrub_output ON ARRAY[geoscrub_output."decimalLatitude"] = ARRAY[coordinates.latitude_deg] AND ARRAY[geoscrub_output."decimalLongitude"] = ARRAY[coordinates.longitude_deg] AND ARRAY[geoscrub_output.country] = ARRAY[place.country] AND ARRAY[geoscrub_output."stateProvince"] = ARRAY[place.stateprovince] AND ARRAY[geoscrub_output.county] = ARRAY[place.county]@ 
 @LEFT JOIN "newWorld".iso_code_gadm ON iso_code_gadm."*GADM country" = COALESCE(geoscrub_output."acceptedCountry", place.country)@ 
 @LEFT JOIN "newWorld"."newWorldCountries" ON "newWorldCountries"."*isoCode" = iso_code_gadm."*2-digit iso code"@ 
 @LEFT JOIN geoscrub.county_centroids ON place.country = 'United States'::text AND county_centroids."stateProvince" = COALESCE(geoscrub_output."acceptedStateProvince", place.stateprovince) AND county_centroids.county = COALESCE(geoscrub_output."acceptedCounty", place.county)@ 
 @LEFT JOIN locationevent parent_event ON parent_event.locationevent_id = locationevent.parent_id@ 
 @LEFT JOIN project ON project.project_id = COALESCE(locationevent.project_id, parent_event.project_id)@ 
 @LEFT JOIN stratum ON stratum.stratum_id = COALESCE(locationevent.stratum_id, parent_event.stratum_id)@ 
 @LEFT JOIN method ON method.method_id = COALESCE(locationevent.method_id, parent_event.method_id);@ 
 @LEFT JOIN taxonoccurrence ON taxonoccurrence.locationevent_id = "plot.**".locationevent_id@ 
 @LEFT JOIN party collector ON collector.party_id = taxonoccurrence.collector_id@ 
 -@LEFT JOIN aggregateoccurrence USING (taxonoccurrence_id)@- _query:_ 
 <pre><code class="SQL"> 
 SELECT * FROM aggregateoccurrence WHERE (source_id, (COALESCE(sourceaccessioncode, '\N'::text)) COLLATE pg_catalog."default") = (38,'937165') AND sourceaccessioncode IS NOT NULL; -- 1 row 
 </code></pre> 

 p((. -@LEFT JOIN plantobservation USING (aggregateoccurrence_id)@- _query:_ 
 <pre><code class="SQL"> 
 SELECT * FROM plantobservation WHERE aggregateoccurrence_id = 3823285; -- 1 row 
 </code></pre> 

 p((. -@LEFT JOIN specimenreplicate USING (plantobservation_id)@- _query:_ 
 <pre><code class="SQL"> 
 SELECT * FROM specimenreplicate WHERE plantobservation_id = 4462528; -- 1 row 
 </code></pre> 

 p((. @LEFT JOIN sourcelist ON sourcelist.sourcelist_id = specimenreplicate.duplicate_institutions_sourcelist_id@ 
 @LEFT JOIN taxondetermination ON taxondetermination.taxonoccurrence_id = taxonoccurrence.taxonoccurrence_id AND taxondetermination.iscurrent@ 
 @LEFT JOIN party identifiedby ON identifiedby.party_id = taxondetermination.party_id@ 
 @LEFT JOIN taxonverbatim USING (taxonverbatim_id)@ 
 @LEFT JOIN taxonlabel USING (taxonlabel_id)@ 
 @LEFT JOIN "TNRS".taxon_scrub ON taxon_scrub."*Name_submitted" = taxonlabel.taxonomicname@ 
 @LEFT JOIN family_higher_plant_group ON family_higher_plant_group.family = taxon_scrub."[scrubbed_]family~(-Accepted_-)__@TNRS__@vegpath.org"@ 
 @LEFT JOIN cultivated_family_locations ON cultivated_family_locations.family = taxon_scrub."[scrubbed_]family~(-Accepted_-)__@TNRS__@vegpath.org" AND cultivated_family_locations.country = "plot.**".country;@

Back