Bug #948
Updated by Aaron Marcuse-Kubitza about 10 years ago
h3. issue "from Brody":mailto:brody.sandelATbiology.au.dk?Brody_Sandel.2014-8-22-9:43PT: > SELECT * FROM view_full_occurrence_individual WHERE datasource = > 'VegBank' AND plot_name = 'CO081202XY04' > ; > > returns two lines with "Lupinus argenteus". Those lines appear to be complete duplicates, except for the last field (taxonobservation_id) which is 1230 for one and 1231 for the next. Both lines show cover_percent of 2. > > Looking at VegBank (http://vegbank.org/vegbank/views/observation_comprehensive.jsp?view=comprehensive&entity=observation&wparam=42558&strata2Show=1¶ms=42558&placeholder=), I see a single entry of Lupinus argenteus, with a cover of 2%. > > There's a similar situation for Quercus gambelii in this particular plot. h3. info h3. fix h3(. -check staging tables- p((. _the duplication is not in the source data or the staging tables, as VegBank/CVS themselves have only one entry for this:_ * VegBank: using @aggregate_organism_observation_id = 937165@ from those two rows: <pre><code class="SQL"> SELECT * FROM "VegBank".taxonimportance WHERE taxonimportance_id = 937165; -- 1 row SELECT * FROM "VegBank".taxoninterpretation WHERE taxonobservation_id = 697934; -- 1 row </code></pre> * CVS: using @aggregate_organism_observation_id = 396239@ from two rows that have this problem: <pre><code class="SQL"> SELECT * FROM "CVS"."taxonImportance" WHERE "taxonImportance_ID" = 396239; -- 1 row SELECT * FROM "CVS"."taxonObservation_" WHERE "taxonOccurrenceID__overall_plot" = 162259; -- 1 row </code></pre> h3(. check normalized DB p((. go through all of the tables used by view_full_occurrence_individual_view (below), view_full_occurrence_individual_view, and figure out if any of them have duplication. those that have duplication will cause the corresponding @LEFT JOIN@ to add duplicated rows. duplication: p((. _using the example CVS row above:_ p((. @FROM source@ @JOIN location USING (source_id)@ -@LEFT JOIN locationevent USING (location_id)@- _query:_ <pre><code class="SQL"> SELECT * FROM locationevent WHERE locationevent_id = 912711; -- 1 row </code></pre> p((. @LEFT JOIN place USING (place_id)@ @LEFT JOIN location parent_location ON parent_location.location_id = location.parent_id@ @LEFT JOIN coordinates USING (coordinates_id)@ @LEFT JOIN geoscrub.geoscrub_output ON ARRAY[geoscrub_output."decimalLatitude"] = ARRAY[coordinates.latitude_deg] AND ARRAY[geoscrub_output."decimalLongitude"] = ARRAY[coordinates.longitude_deg] AND ARRAY[geoscrub_output.country] = ARRAY[place.country] AND ARRAY[geoscrub_output."stateProvince"] = ARRAY[place.stateprovince] AND ARRAY[geoscrub_output.county] = ARRAY[place.county]@ @LEFT JOIN "newWorld".iso_code_gadm ON iso_code_gadm."*GADM country" = COALESCE(geoscrub_output."acceptedCountry", place.country)@ @LEFT JOIN "newWorld"."newWorldCountries" ON "newWorldCountries"."*isoCode" = iso_code_gadm."*2-digit iso code"@ @LEFT JOIN geoscrub.county_centroids ON place.country = 'United States'::text AND county_centroids."stateProvince" = COALESCE(geoscrub_output."acceptedStateProvince", place.stateprovince) AND county_centroids.county = COALESCE(geoscrub_output."acceptedCounty", place.county)@ @LEFT JOIN locationevent parent_event ON parent_event.locationevent_id = locationevent.parent_id@ @LEFT JOIN project ON project.project_id = COALESCE(locationevent.project_id, parent_event.project_id)@ @LEFT JOIN stratum ON stratum.stratum_id = COALESCE(locationevent.stratum_id, parent_event.stratum_id)@ @LEFT JOIN method ON method.method_id = COALESCE(locationevent.method_id, parent_event.method_id);@ -@LEFT JOIN taxonoccurrence ON taxonoccurrence.locationevent_id = "plot.**".locationevent_id@- _query:_ <pre><code class="SQL"> SELECT * FROM taxonoccurrence WHERE taxonoccurrence_id = 11298823; -- 1 row </code></pre> p((. @LEFT JOIN party collector ON collector.party_id = taxonoccurrence.collector_id@ -@LEFT JOIN aggregateoccurrence USING (taxonoccurrence_id)@- _query:_ <pre><code class="SQL"> SELECT * FROM aggregateoccurrence WHERE aggregateoccurrence_id = 11292621; -- 1 row </code></pre> p((. -@LEFT JOIN plantobservation USING (aggregateoccurrence_id)@- _query:_ <pre><code class="SQL"> SELECT * FROM plantobservation WHERE plantobservation_id = 11865135; -- 1 row </code></pre> p((. @LEFT JOIN specimenreplicate USING (plantobservation_id)@ @LEFT JOIN sourcelist ON sourcelist.sourcelist_id = specimenreplicate.duplicate_institutions_sourcelist_id@ -@LEFT JOIN taxondetermination ON taxondetermination.taxonoccurrence_id = taxonoccurrence.taxonoccurrence_id AND taxondetermination.iscurrent@- _query:_ <pre><code class="SQL"> SELECT * FROM taxondetermination WHERE taxonoccurrence_id = 11298823; -- 1 row </code></pre> p((. @LEFT JOIN party identifiedby ON identifiedby.party_id = taxondetermination.party_id@ @LEFT JOIN taxonverbatim USING (taxonverbatim_id)@ @LEFT JOIN taxonlabel USING (taxonlabel_id)@ @LEFT JOIN "TNRS".taxon_scrub ON taxon_scrub."*Name_submitted" = taxonlabel.taxonomicname@ @LEFT JOIN family_higher_plant_group ON family_higher_plant_group.family = taxon_scrub."[scrubbed_]family~(-Accepted_-)__@TNRS__@vegpath.org"@ @LEFT JOIN cultivated_family_locations ON cultivated_family_locations.family = taxon_scrub."[scrubbed_]family~(-Accepted_-)__@TNRS__@vegpath.org" AND cultivated_family_locations.country = "plot.**".country;@ h3(. check view_full_occurrence_individual_view joins p((. go through all of the view_full_occurrence_individual_view joins, and figure out where the duplication is being introduced: p((. _using the example VegBank row above:_ p((. @FROM source@ @JOIN location USING (source_id)@ @LEFT JOIN locationevent USING (location_id)@ @LEFT JOIN place USING (place_id)@ @LEFT JOIN location parent_location ON parent_location.location_id = location.parent_id@ @LEFT JOIN coordinates USING (coordinates_id)@ @LEFT JOIN geoscrub.geoscrub_output ON ARRAY[geoscrub_output."decimalLatitude"] = ARRAY[coordinates.latitude_deg] AND ARRAY[geoscrub_output."decimalLongitude"] = ARRAY[coordinates.longitude_deg] AND ARRAY[geoscrub_output.country] = ARRAY[place.country] AND ARRAY[geoscrub_output."stateProvince"] = ARRAY[place.stateprovince] AND ARRAY[geoscrub_output.county] = ARRAY[place.county]@ @LEFT JOIN "newWorld".iso_code_gadm ON iso_code_gadm."*GADM country" = COALESCE(geoscrub_output."acceptedCountry", place.country)@ @LEFT JOIN "newWorld"."newWorldCountries" ON "newWorldCountries"."*isoCode" = iso_code_gadm."*2-digit iso code"@ @LEFT JOIN geoscrub.county_centroids ON place.country = 'United States'::text AND county_centroids."stateProvince" = COALESCE(geoscrub_output."acceptedStateProvince", place.stateprovince) AND county_centroids.county = COALESCE(geoscrub_output."acceptedCounty", place.county)@ @LEFT JOIN locationevent parent_event ON parent_event.locationevent_id = locationevent.parent_id@ @LEFT JOIN project ON project.project_id = COALESCE(locationevent.project_id, parent_event.project_id)@ @LEFT JOIN stratum ON stratum.stratum_id = COALESCE(locationevent.stratum_id, parent_event.stratum_id)@ @LEFT JOIN method ON method.method_id = COALESCE(locationevent.method_id, parent_event.method_id);@ @LEFT JOIN taxonoccurrence ON taxonoccurrence.locationevent_id = "plot.**".locationevent_id@ @LEFT JOIN party collector ON collector.party_id = taxonoccurrence.collector_id@ -@LEFT JOIN aggregateoccurrence USING (taxonoccurrence_id)@- _query:_ <pre><code class="SQL"> SELECT * FROM aggregateoccurrence WHERE (source_id, (COALESCE(sourceaccessioncode, '\N'::text)) COLLATE pg_catalog."default") = (38,'937165') AND sourceaccessioncode IS NOT NULL; -- 1 row </code></pre> p((. -@LEFT JOIN plantobservation USING (aggregateoccurrence_id)@- _query:_ <pre><code class="SQL"> SELECT * FROM plantobservation WHERE aggregateoccurrence_id = 3823285; -- 1 row </code></pre> p((. -@LEFT JOIN specimenreplicate USING (plantobservation_id)@- _query:_ <pre><code class="SQL"> SELECT * FROM specimenreplicate WHERE plantobservation_id = 4462528; -- 1 row </code></pre> p((. @LEFT JOIN sourcelist ON sourcelist.sourcelist_id = specimenreplicate.duplicate_institutions_sourcelist_id@ @LEFT JOIN taxondetermination ON taxondetermination.taxonoccurrence_id = taxonoccurrence.taxonoccurrence_id AND taxondetermination.iscurrent@ @LEFT JOIN party identifiedby ON identifiedby.party_id = taxondetermination.party_id@ @LEFT JOIN taxonverbatim USING (taxonverbatim_id)@ @LEFT JOIN taxonlabel USING (taxonlabel_id)@ @LEFT JOIN "TNRS".taxon_scrub ON taxon_scrub."*Name_submitted" = taxonlabel.taxonomicname@ @LEFT JOIN family_higher_plant_group ON family_higher_plant_group.family = taxon_scrub."[scrubbed_]family~(-Accepted_-)__@TNRS__@vegpath.org"@ @LEFT JOIN cultivated_family_locations ON cultivated_family_locations.family = taxon_scrub."[scrubbed_]family~(-Accepted_-)__@TNRS__@vegpath.org" AND cultivated_family_locations.country = "plot.**".country;@