VegBIEN FAQ¶
- Table of contents
- VegBIEN FAQ
- asked by Brad on 2013-12-4:
- Is there an entity in vegbien that corresponds 1:1 with the data source list on the BIEN wiki at https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Datasource_validation_status?
- how would I write a query that lists each source and the number of taxonoccurrences in each source?
- how [do] I distinguish a proximate data provider?
- the correct way to count the number of specimen observations in the database
- What about adding tree-traversal indices to all recursively-linked tables? That would speed/simplify retrieval of descendant records without compromising the model.
- asked by Brad on 2013-12-5:
- asked by Brad on 2013-12-15:
- *asked by Mark on 2014-1-14*:
- the entire data-flow process
- [data] acquisition
- some of the challenges involved with schema mapping
- the value of [...] making these data interoperable
- who is currently contributing to BIEN
- why and how any of the Darwin Core dumps (e.g. from NYBG and others) are different from what is already in GBIF
- a summary of the status of resolving intellectual property right issues among contributors
- our technology model for resolving [intellectual property right issues among contributors]
- quality control mechanisms
- a brief description of how you are doing [the validations]
- a list of data sources incorporated into BIEN3
- the total number of plots or occurrences contained in BIEN3
- where the Intellectual property rights stuff is being stored
- asked by Brad on 2014-1-17:
- asked by Brad on 2014-1-21:
- asked by Brad on 2014-2-23:
- count of total plots [+subplots]
- resolved species names, after correction by the tnrs
- count of total [top-level] plots
- the coordinates of those taxonoccurrences which (a) have passed geovalidation, (b) are not cultivated
- the coordinates of those taxonoccurrences which (a) have passed geovalidation, (b) are not cultivated and (c) are from New World countries only
- asked by Brian M on 2014-7-8:
- asked by Brian E on 2014-8-8:
- select all observation records from a given species
- select all observation records from a given family
- select all observation records from a list of taxa (taxon_a, taxon_b, taxon_c)
- Count how many plot records there are
- Count how many observation records there are
- Counts of the families, genera, and species in BIEN3db
- Counts of BIEN2 species in the associated groups (bryophytes", "ferns and allies", "flowering plants", "gymnosperms (conifers)", "gymnosperms (non-conifer)).
- Counts of plots by data source (plot network and data providers)
- Counts of specimens by source (herbarium)
- Create New World species list.
- Counts of families and species by country
- The total number of botanical observation records (from specimens, plots, and traits) for each country.
- Count the total number of BIEN3 species that also contain a trait observation
- Count the total number of traits in the BIEN database and count the total number -- of trait observations.
- For each trait, count the total number of BIEN2 species that have those trait values recorded
- asked by Brody on 2014-8-20:
- asked by Brian E on 2014-8-21
- asked by Cyrille on 2014-10-16:
- asked by Brian E on 2014-10-18:
- for a given species list extract a count of number trait observations per trait per species
- for a given species extract all traits within BIEN3, any associated geographic information, and the source of all trait values
- for a given species [and trait] return the average trait value, the number of observations of that trait and the variance of that trait
- for a given species extract a given trait value
- for a given list of families extract all trait[s]
- asked by Brian M on 2014-10-23:
- asked by Brian E on 2014-12-8:
- basic queries for accessing all observation records for a given species, genus, or family
- all observation records for a given species
- all observation records for a given genus
- all observation records for a given family
- basic queries for accessing all traits or a specific trait of a given species, genus, or family
- all traits of a given species
- a specific trait of a given species
- all traits of a given genus
- a specific trait of a given genus
- all traits of a given family
- a specific trait of a given family
- asked by Brian E on 2015-7-23:
- asked by Brian E on 2015-7-28:
- asked by Brad on 2013-12-4:
asked by Brad on 2013-12-4:¶
Is there an entity in vegbien that corresponds 1:1 with the data source list on the BIEN wiki at https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Datasource_validation_status?¶
entries for each top-level datasource are in source
. however, this is not 1:1 with the datasource list, because it also includes entries for primary data providers (i.e. herbaria).
how would I write a query that lists each source and the number of taxonoccurrences in each source?¶
SELECT
source.shortname AS datasource,
(SELECT COUNT(*) FROM taxonoccurrence WHERE source_id = source.source_id)
AS taxonoccurrences
FROM source
WHERE EXISTS(SELECT NULL FROM taxonoccurrence WHERE source_id = source.source_id LIMIT 1)
ORDER BY source.shortname
how [do] I distinguish a proximate data provider?¶
a proximate data provider is any source
table entry that has associated occurrences, i.e. whose source_id
is used in the taxonoccurrence
table:
SELECT
source.shortname AS datasource
FROM source
WHERE EXISTS(SELECT NULL FROM taxonoccurrence WHERE source_id = source.source_id LIMIT 1)
ORDER BY source.shortname
the correct way to count the number of specimen observations in the database¶
count the # rows in specimenreplicate
What about adding tree-traversal indices to all recursively-linked tables? That would speed/simplify retrieval of descendant records without compromising the model.¶
What we would instead add is a pointer from every locationevent directly to the corresponding project (instead of just from the outermost locationevent). Then, occurrences in outer locationevents would be listed along with occurrences in subplot locationevents when joining using the project_id.
Anyways, what we would be adding would not be tree-traversal indices, because these are somewhat cumbersome and need to be rebuilt whenever rows are added. We would instead use a many:many linking table that lists the descendants (and ancestors) for each node (e.g. see taxonlabel_relationship).
asked by Brad on 2013-12-5:¶
an accurate way to get a count of plots per proximate data source¶
SELECT
source.shortname AS datasource,
(SELECT COUNT(*) FROM location WHERE source_id = source.source_id) AS plots
FROM source
WHERE EXISTS(SELECT NULL FROM taxonoccurrence WHERE source_id = source.source_id LIMIT 1)
AND observationtype = 'plot'
ORDER BY source.shortname
asked by Brad on 2013-12-15:¶
where do I find `matched_taxon_name_with_author` or `accepted_taxon_name_with_author`?¶
These are variables defined in the function:
matched_taxon_name_with_author text = [...]
accepted_taxon_name_with_author text = [...]
new.scrubbed_unique_taxon_name = COALESCE(
accepted_taxon_name_with_author, matched_taxon_name_with_author);
Am I interpreting taxonlabel.taxonomicname correctly as the resolved name?¶
No, this is the concatenated input name sent to TNRS. To get the resolved name, join to the TNRS output:
SELECT *
FROM taxonlabel
LEFT JOIN "TNRS".taxon_scrub ON taxon_scrub."concatenatedScientificName" = taxonlabel.taxonomicname
LIMIT 10
To include invalid TNRS matches as well, instead join to the unfiltered TNRS output:
SELECT *
FROM taxonlabel
LEFT JOIN "TNRS"."MatchedTaxon" ON "MatchedTaxon"."concatenatedScientificName" = taxonlabel.taxonomicname
LIMIT 10
*asked by Mark on 2014-1-14*:¶
the entire data-flow process¶
We have an import steps wiki page which describes the current process, as well as a new-style import wiki page and associated *PowerPoint* that describes our planned improvements to it.
[data] acquisition¶
For the mapping process, we have two examples of how to use our import workflow:
some of the challenges involved with schema mapping¶
I think translating between normalized and denormalized formats has been by far the biggest challenge, especially the normalization step, as evidenced by the complexity of column-based import and the *XPath mappings*.
the value of [...] making these data interoperable¶
it enables scientists to just use a single CSV, rather than having to go through all the mapping work we did themselves.
who is currently contributing to BIEN¶
The datasources are listed on the wiki under VegBIEN contents (as well as in the *publishable datasources spreadsheet*), and details about who provided them are under Datasource conditions of use.
why and how any of the Darwin Core dumps (e.g. from NYBG and others) are different from what is already in GBIF¶
These were obtained directly from the herbarium in its native data format, so they are both newer and include more fields than what is in GBIF.
a summary of the status of resolving intellectual property right issues among contributors¶
The current status in clearing datasources for publication is in the *publishable datasources spreadsheet*
our technology model for resolving [intellectual property right issues among contributors]¶
We look at the provider's website to see if they have a formal use policy, and if they don't, we contact them to ask for permission to redistribute. We track their conditions of use on the wiki.
quality control mechanisms¶
That is the spot-checking validation, and eventually the aggregating validations.
a brief description of how you are doing [the validations]¶
Basically, I prepare a sample extract of the data in VegBIEN and the corresponding input data, and the provider or one of the BIEN members reviews the extract for both accuracy and completeness (inclusion of important fields).
a list of data sources incorporated into BIEN3¶
see VegBIEN contents > datasources
the total number of plots or occurrences contained in BIEN3¶
see VegBIEN contents > records
where the Intellectual property rights stuff is being stored¶
The *publishable datasources spreadsheet* will show you the most useful summary of this, and links to each datasource's details.
asked by Brad on 2014-1-17:¶
List of plots with stem measurements¶
for SALVIAS's first 100 rows:
SET search_path TO "SALVIAS_VegBIEN";
SELECT authorlocationcode
FROM plot
WHERE source_id = source_by_shortname('SALVIAS')
AND EXISTS(
SELECT NULL
FROM location
JOIN locationevent USING (location_id)
JOIN taxonoccurrence USING (locationevent_id)
JOIN aggregateoccurrence USING (taxonoccurrence_id)
JOIN plantobservation USING (aggregateoccurrence_id)
JOIN stemobservation USING (plantobservation_id)
WHERE location.plot_location_id = plot.location_id
LIMIT 1
)
ORDER BY authorlocationcode
asked by Brad on 2014-1-21:¶
Count taxa¶
SELECT COUNT(DISTINCT (taxonname, author)) AS taxa
FROM taxonverbatim
WHERE source_id = source_by_shortname('bien2_traits')
List taxa¶
SELECT DISTINCT concat_ws(' ', taxonname, author) AS taxonwithauthor
FROM taxonverbatim
WHERE source_id = source_by_shortname('bien2_traits')
ORDER BY concat_ws(' ', taxonname, author)
Taxon, trait and value for first 5000 records¶
SELECT concat_ws(' ', taxonname, author) AS taxonwithauthor, name AS trait, value
FROM taxonverbatim
JOIN taxondetermination ON taxondetermination.taxonverbatim_id = taxonverbatim.taxonverbatim_id AND taxondetermination.iscurrent
JOIN trait USING (taxonoccurrence_id)
WHERE taxonverbatim.source_id = source_by_shortname('bien2_traits')
ORDER BY taxonname
LIMIT 5000
asked by Brad on 2014-2-23:¶
count of total plots [+subplots]¶
edited from Brad's query
SELECT COUNT(DISTINCT location_id) AS plots
FROM location l JOIN source s
ON l.source_id=s.source_id
WHERE s.observationtype='plot';
resolved species names, after correction by the tnrs¶
SELECT COUNT(DISTINCT scrubbed_genus||' '||scrubbed_specific_epithet) AS resolved_species_names
FROM taxonlabel
JOIN "TNRS".taxon_scrub ON taxon_scrub."concatenatedScientificName" = taxonlabel.taxonomicname
-- runtime: 1.5 min ("82061 ms") @vegbiendev
count of total [top-level] plots¶
edited from Brad's query
SELECT COUNT(DISTINCT location_id) AS plots
FROM plot l JOIN source s
ON l.source_id=s.source_id
WHERE s.observationtype='plot';
the coordinates of those taxonoccurrences which (a) have passed geovalidation, (b) are not cultivated¶
SELECT
"decimalLatitude" AS latitude_deg,
"decimalLongitude" AS longitude_deg
FROM analytical_stem
-- COALESCE() makes sure each value is true/false, to avoid 3-valued logic
WHERE NOT COALESCE(cultivated_bien::boolean, false)
AND COALESCE(geovalid_bien::boolean, false)
AND "decimalLatitude" IS NOT NULL
AND "decimalLongitude" IS NOT NULL
-- runtime for COUNT(*): 4 min ("224559 ms") for 40 million rows ("38930250") @vegbiendev
the coordinates of those taxonoccurrences which (a) have passed geovalidation, (b) are not cultivated and (c) are from New World countries only¶
SELECT
"decimalLatitude" AS latitude_deg,
"decimalLongitude" AS longitude_deg
FROM analytical_stem
-- COALESCE() makes sure each value is true/false, to avoid 3-valued logic
WHERE NOT COALESCE(cultivated_bien::boolean, false)
AND COALESCE(geovalid_bien::boolean, false)
AND COALESCE("isNewWorld_bien"::boolean, false)
AND "decimalLatitude" IS NOT NULL
AND "decimalLongitude" IS NOT NULL
-- runtime: 8 min ("468036 ms") for 10 million rows ("10255303") @vegbiendev
asked by Brian M on 2014-7-8:¶
[the % of records which have] the aggregate occurrence date¶
SELECT
shortname AS source
, counts.*
FROM source
, LATERAL (
SELECT
COUNT(*) AS total
, COUNT(NULLIF(collectiondate IS NOT NULL, false)) AS with_date
FROM aggregateoccurrence
WHERE source_id = source.source_id
) counts
WHERE EXISTS(SELECT NULL FROM taxonoccurrence WHERE source_id = source.source_id+0 LIMIT 1)
ORDER BY shortname
; -- runtime: 1 min ("51210 ms")
% of input rows with date, for comparison to VegBIEN¶
SELECT
COUNT(*) AS total
, COUNT(NULLIF(date IS NOT NULL, false)) AS with_date
FROM "datasrc"."table"
asked by Brian E on 2014-8-8:¶
select all observation records from a given species¶
SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SELECT taxonoccurrence.*
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
/*needed to enable index scan:*/
AND taxonomicname IS NOT NULL AND "TNRS".taxon_name_is_safe(taxonomicname)
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN taxonoccurrence USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*species:*/'Poa annua'
select all observation records from a given family¶
SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SELECT taxonoccurrence.*
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
/*needed to enable index scan:*/
AND taxonomicname IS NOT NULL AND "TNRS".taxon_name_is_safe(taxonomicname)
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN taxonoccurrence USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_family__@TNRS__@vegpath.org" = /*family:*/'Poaceae'
select all observation records from a list of taxa (taxon_a, taxon_b, taxon_c)¶
SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SELECT taxonoccurrence.*
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
/*needed to enable index scan:*/
AND taxonomicname IS NOT NULL AND "TNRS".taxon_name_is_safe(taxonomicname)
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN taxonoccurrence USING (taxonoccurrence_id)
WHERE taxon_scrub."*Accepted_name" IN (/*taxa:*/'Poa annua', 'Poa annua subvar. minima', 'Poa annua var. annua')
Count how many plot records there are¶
SET search_path = /*r14089*/public;
SELECT COUNT(*)
FROM source
JOIN plot USING (source_id)
WHERE observationtype = 'plot'
Count how many observation records there are¶
SET search_path = /*r14089*/public;
SELECT COUNT(*)
FROM taxonoccurrence
Counts of the families, genera, and species in BIEN3db¶
SELECT
COUNT(DISTINCT "Accepted_family__@TNRS__@vegpath.org") AS families
, COUNT(DISTINCT "[accepted_]genus__@DwC__@vegpath.org") AS genera
, COUNT(DISTINCT "Accepted_species[_binomial]__@TNRS__@vegpath.org") AS species
FROM "TNRS".taxon_scrub
Counts of BIEN2 species in the associated groups (bryophytes", "ferns and allies", "flowering plants", "gymnosperms (conifers)", "gymnosperms (non-conifer)).¶
SET search_path = /*r14089*/public;
SELECT
higher_plant_group
, COUNT(DISTINCT "Accepted_species[_binomial]__@TNRS__@vegpath.org") AS species
FROM "TNRS".taxon_scrub
JOIN family_higher_plant_group ON family_higher_plant_group.family = taxon_scrub."Accepted_family__@TNRS__@vegpath.org"
GROUP BY higher_plant_group
ORDER BY higher_plant_group
Counts of plots by data source (plot network and data providers)¶
SET search_path = /*r14089*/public;
SELECT
source.shortname AS datasource
, (SELECT COUNT(*) FROM plot WHERE source_id = source.source_id) AS plots
FROM source
WHERE observationtype = 'plot'
ORDER BY datasource
Counts of specimens by source (herbarium)¶
SET search_path = /*r14089*/public;
SELECT
source.shortname AS datasource
, (SELECT COUNT(*) FROM specimenreplicate WHERE source_id = source.source_id) AS specimens
FROM source
WHERE observationtype = 'specimen'
ORDER BY datasource
Create New World species list.¶
SET search_path = /*r14089*/public;
SELECT DISTINCT scrubbed_species_binomial AS species
FROM view_full_occurrence_individual
WHERE "isNewWorld_bien"::boolean
Counts of families and species by country¶
SET search_path = /*r14089*/public;
SELECT
"country__@DwC__@vegpath.org"
, COUNT(DISTINCT scrubbed_family) AS families
, COUNT(DISTINCT scrubbed_species_binomial) AS species
FROM view_full_occurrence_individual
GROUP BY "country__@DwC__@vegpath.org"
ORDER BY "country__@DwC__@vegpath.org"
The total number of botanical observation records (from specimens, plots, and traits) for each country.¶
SET search_path = /*r14089*/public;
SELECT
"country__@DwC__@vegpath.org"
, COUNT(*) AS observations
FROM view_full_occurrence_individual
GROUP BY "country__@DwC__@vegpath.org"
ORDER BY "country__@DwC__@vegpath.org"
Count the total number of BIEN3 species that also contain a trait observation¶
28,362
SET search_path = /*r14089*/public;
SET enable_seqscan = off; -- needed even with LIMIT
SELECT COUNT(DISTINCT "Accepted_species[_binomial]__@TNRS__@vegpath.org")
FROM
(
SELECT taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org"
FROM trait
JOIN taxondetermination USING (taxonoccurrence_id)
JOIN taxonverbatim USING (taxonverbatim_id)
JOIN taxonlabel USING (taxonlabel_id)
JOIN "TNRS".taxon_scrub ON taxon_scrub."*Name_submitted" = taxonlabel.taxonomicname
LIMIT 200000 /* added to use correct query plan; must be >= actual rowcount (128628;
note that there is some 1:many mapping on the 109659 trait rows) */
) s
Count the total number of traits in the BIEN database and count the total number -- of trait observations.¶
SET search_path = /*r14089*/public;
SELECT
COUNT(DISTINCT name) AS traits
, COUNT(*) AS trait_observations
FROM trait
For each trait, count the total number of BIEN2 species that have those trait values recorded¶
trait | count |
Area-based photosynthesis (Aarea) | 755 |
d13C | 88 |
Flowering date | 645 |
Flowering day | 645 |
Flowering month | 3684 |
Germination time | 72 |
Height | 4173 |
Leaf area | 1978 |
Leaf Cmass | 13 |
Leaf dry area | 1804 |
Leaf dry matter content (LDMC) | 1736 |
Leaf K | 29 |
Leaf lifespan (LLS) | 608 |
Leaf Narea | 1413 |
Leaf Nmass | 2042 |
Leaf Parea | 511 |
Leaf Pmass | 1172 |
Leaf thickness | 27 |
Mass-based photosynthesis (Amass) | 709 |
PNUE | 79 |
seed mass | 20734 |
Specific leaf area (SLA) | 3526 |
Stomatal conductance (Gs) | 436 |
wood density | 6868 |
SET search_path = /*r14089*/public;
SET enable_seqscan = off; -- needed even with LIMIT
SELECT
trait
, COUNT(DISTINCT "Accepted_species[_binomial]__@TNRS__@vegpath.org")
FROM
(
SELECT
trait.name AS trait
, taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org"
FROM trait
JOIN taxondetermination USING (taxonoccurrence_id)
JOIN taxonverbatim USING (taxonverbatim_id)
JOIN taxonlabel USING (taxonlabel_id)
JOIN "TNRS".taxon_scrub ON taxon_scrub."*Name_submitted" = taxonlabel.taxonomicname
LIMIT 200000 /* added to use correct query plan; must be >= actual rowcount (128628;
note that there is some 1:many mapping on the 109659 trait rows) */
) s
GROUP BY trait
ORDER BY trait
asked by Brody on 2014-8-20:¶
a list of all vegetation plots in BIEN¶
SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SELECT
source.shortname AS datasource
, plot.authorlocationcode AS plot_name
FROM source
JOIN plot USING (source_id)
WHERE observationtype = 'plot'
ORDER BY datasource
how many [vegetation plots] there are total¶
SET search_path = /*r14089*/public;
SELECT COUNT(*)
FROM source
JOIN plot USING (source_id)
WHERE observationtype = 'plot'
how many [vegetation plots] came from each original source¶
SET search_path = /*r14089*/public;
SELECT
source.shortname AS datasource
, (SELECT COUNT(*) FROM plot WHERE source_id = source.source_id) AS plots
FROM source
WHERE observationtype = 'plot'
ORDER BY datasource
[the vegetation plots'] lat/longs¶
SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SET enable_hashjoin = off;
SELECT
source.shortname AS datasource
, latitude_deg
, longitude_deg
FROM source
JOIN plot USING (source_id)
JOIN place USING (place_id)
JOIN coordinates USING (coordinates_id)
WHERE observationtype = 'plot'
ORDER BY datasource
all records from a plot¶
SET search_path = /*r14089*/public;
SELECT * FROM view_full_occurrence_individual
WHERE datasource = '__' AND plot_name = '__'
one line in [the list of SALVIAS plots] doesn't include any species names¶
This is because it's the line for the top-level plot (location_id = '474'), which doesn't itself contain species because they are all in the subplots (location_id = '474; 24', etc)
asked by Brian E on 2014-8-21¶
how to filter plots by area¶
SET search_path = /*r14089*/public;
SELECT *
FROM view_full_occurrence_individual
WHERE plot_area_ha = '__'
asked by Cyrille on 2014-10-16:¶
the number of occurrences you currently have in BIEN 3 for the species list attached¶
SET search_path = /*r14089*/public;
-- the simpler nested-SELECT approach ("384705 ms") is faster than LEFT JOIN ("536893 ms") for including all the input names, but slightly slower than `= ANY` ("308003 ms"), which does not include all the input names
SELECT
scrubbed_species_binomial AS species
, (
SELECT COUNT(*) FROM view_full_occurrence_individual
WHERE scrubbed_species_binomial = species_list.scrubbed_species_binomial
)
FROM unnest(/*scrubbed by TNRS:*/'{Abies concolor,Abies magnifica,Acer saccharum,Cimicifuga americana,Actaea elata,Allium tricoccum,Arisaema triphyllum,Astragalus cremnophylax,Astragalus scaphoides,Astragalus tyghensis,Atriplex acanthocarpa,Atriplex canescens,Balsamorhiza sagittata,Boechera fecunda,Bouteloua rigidiseta,Goeppertia ovandensis,Calocedrus decurrens,Calochortus albus,Calochortus lyallii,Calochortus obispoensis,Calochortus pulchellus,Calochortus tiburonensis,Chamaecrista lineata keyensis,Cirsium pitcheri,Cirsium undulatum,Cleistesiopsis divaricata,Cryptantha flava,Cypripedium fasciculatum,Dicerandra frutescens,Echinacea angustifolia,Eriogonum longifolium,Eryngium cuneifolium,Hydrastis canadensis,Hypericum cumulicola,Lepidium davisii,Liatris scariosa,Lomatium bradshawii,Lomatium cookii,Lupinus tidestromii,Mimulus cardinalis,Mimulus lewisii,Paronychia jamesii,Physaria ovalifolia,Pinus jeffreyi,Pinus lambertiana,Pinus ponderosa,Potentilla congesta,Psoralidium tenuiflorum,Purshia subintegra,Pyrrocoma radiata,Rhus aromatica,Schmaltzia copallinum,Silene spaldingii,Solidago mollis,Sphaeralcea coccinea,Stenaria nigricans,Trillium grandiflorum,Trollius laxus,Tsuga canadensis,Aechmea magdalenae,Agave marmorata,Echinocactus platyacanthus,Prosopis glandulosa,Pterocereus gaumeri,Ratibida columnifera,Thelesperma megapotamicum,Dioon edule,Prosopis laevigata,Cecropia obtusifolia,Chamaedorea elegans,Chamaedorea radicalis,Dioon merolae,Echeveria longissima,Euterpe edulis,Geonoma brevispatha,Geonoma deversa,Geonoma schottiana,Guettarda viburnoides,Mammillaria gaumeri,Mammillaria huitzilopochtli,Manilkara zapota,Neobuxbaumia macrocephala,Neobuxbaumia mezcalaensis,Neobuxbaumia tetetzo,Prioria copaifera,Sabal yapa,Syngonanthus nitens,Zamia amblyphyllidia,Zea diploperennis,Carex bigelowii,Epilobium latifolium,Kosteletzkya pentacarpos,Corallorhiza trifida,Cypripedium calceolus,Geum rivale,Poa alpina,Potentilla anserina,Saxifraga aizoides}'::text[]) species_list (scrubbed_species_binomial)
how many occurrences, additional to GBIF data, you have for each species¶
SET search_path = /*r14089*/public;
-- the simpler nested-SELECT approach ("384705 ms") is faster than LEFT JOIN ("536893 ms") for including all the input names, but slightly slower than `= ANY` ("308003 ms"), which does not include all the input names
SELECT
scrubbed_species_binomial AS species
, (
SELECT COUNT(*) FROM view_full_occurrence_individual
WHERE
scrubbed_species_binomial = species_list.scrubbed_species_binomial
AND datasource != 'GBIF'
)
FROM unnest(/*scrubbed by TNRS:*/'{Abies concolor,Abies magnifica,Acer saccharum,Cimicifuga americana,Actaea elata,Allium tricoccum,Arisaema triphyllum,Astragalus cremnophylax,Astragalus scaphoides,Astragalus tyghensis,Atriplex acanthocarpa,Atriplex canescens,Balsamorhiza sagittata,Boechera fecunda,Bouteloua rigidiseta,Goeppertia ovandensis,Calocedrus decurrens,Calochortus albus,Calochortus lyallii,Calochortus obispoensis,Calochortus pulchellus,Calochortus tiburonensis,Chamaecrista lineata keyensis,Cirsium pitcheri,Cirsium undulatum,Cleistesiopsis divaricata,Cryptantha flava,Cypripedium fasciculatum,Dicerandra frutescens,Echinacea angustifolia,Eriogonum longifolium,Eryngium cuneifolium,Hydrastis canadensis,Hypericum cumulicola,Lepidium davisii,Liatris scariosa,Lomatium bradshawii,Lomatium cookii,Lupinus tidestromii,Mimulus cardinalis,Mimulus lewisii,Paronychia jamesii,Physaria ovalifolia,Pinus jeffreyi,Pinus lambertiana,Pinus ponderosa,Potentilla congesta,Psoralidium tenuiflorum,Purshia subintegra,Pyrrocoma radiata,Rhus aromatica,Schmaltzia copallinum,Silene spaldingii,Solidago mollis,Sphaeralcea coccinea,Stenaria nigricans,Trillium grandiflorum,Trollius laxus,Tsuga canadensis,Aechmea magdalenae,Agave marmorata,Echinocactus platyacanthus,Prosopis glandulosa,Pterocereus gaumeri,Ratibida columnifera,Thelesperma megapotamicum,Dioon edule,Prosopis laevigata,Cecropia obtusifolia,Chamaedorea elegans,Chamaedorea radicalis,Dioon merolae,Echeveria longissima,Euterpe edulis,Geonoma brevispatha,Geonoma deversa,Geonoma schottiana,Guettarda viburnoides,Mammillaria gaumeri,Mammillaria huitzilopochtli,Manilkara zapota,Neobuxbaumia macrocephala,Neobuxbaumia mezcalaensis,Neobuxbaumia tetetzo,Prioria copaifera,Sabal yapa,Syngonanthus nitens,Zamia amblyphyllidia,Zea diploperennis,Carex bigelowii,Epilobium latifolium,Kosteletzkya pentacarpos,Corallorhiza trifida,Cypripedium calceolus,Geum rivale,Poa alpina,Potentilla anserina,Saxifraga aizoides}'::text[]) species_list (scrubbed_species_binomial)
; -- runtime: 6.5 min ("384705 ms") @r14838 @vegbiendev
asked by Brian E on 2014-10-18:¶
for a given species list extract a count of number trait observations per trait per species¶
SET search_path = /*r14089*/public;
SELECT *
FROM unnest(/*scrubbed by TNRS:*/'{Acer rubrum,Poa annua}'::text[]) species_list (species)
CROSS JOIN LATERAL
(
SELECT
trait.name AS trait
, COUNT(*)
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = species_list.species
GROUP BY trait
ORDER BY trait
) s
for a given species extract all traits within BIEN3, any associated geographic information, and the source of all trait values¶
SET search_path = /*r14089*/public;
SELECT
trait.name AS trait
, trait.value
, trait.units
, geoscrub_output.*
, collector.fullname AS collector
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
LEFT JOIN taxonoccurrence USING (taxonoccurrence_id)
LEFT JOIN locationevent USING (locationevent_id)
LEFT JOIN location USING (location_id)
LEFT JOIN place USING (place_id) -- place_id is pulled forward from the parent location when needed
LEFT JOIN coordinates USING (coordinates_id)
LEFT JOIN geoscrub.geoscrub_output ON ARRAY[geoscrub_output."decimalLatitude"] = ARRAY[coordinates.latitude_deg] AND ARRAY[geoscrub_output."decimalLongitude"] = ARRAY[coordinates.longitude_deg] AND ARRAY[geoscrub_output.country] = ARRAY[place.country] AND ARRAY[geoscrub_output."stateProvince"] = ARRAY[place.stateprovince] AND ARRAY[geoscrub_output.county] = ARRAY[place.county]
LEFT JOIN party collector ON collector.party_id = taxonoccurrence.collector_id
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
ORDER BY trait
for a given species [and trait] return the average trait value, the number of observations of that trait and the variance of that trait¶
SET search_path = /*r14089*/public;
SELECT
avg(trait.value::double precision)
, COUNT(*)
, variance(trait.value::double precision)
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE
taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
AND trait.name = 'Height'
for a given species extract a given trait value¶
SET search_path = /*r14089*/public;
SELECT
trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE
taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
AND trait.name = 'Height'
for a given list of families extract all trait[s]¶
SET search_path = /*r14089*/public;
SET enable_hashjoin = off;
SET enable_mergejoin = off;
SELECT *
FROM unnest(/*scrubbed by TNRS:*/'{Sapindaceae,Poaceae}'::text[]) family_list (family)
CROSS JOIN LATERAL
(
SELECT
trait.name AS trait
, trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_family__@TNRS__@vegpath.org" = family_list.family
ORDER BY trait
) s
; -- runtime: 2 min ("111228 ms") for 11986 rows @r14786 @vegbiendev
asked by Brian M on 2014-10-23:¶
all plot data other than plot area in view full occurrence¶
SET search_path = /*r14089*/public;
SELECT view_full_occurrence_individual.*
FROM source
JOIN view_full_occurrence_individual ON view_full_occurrence_individual.datasource = source.shortname
WHERE source.observationtype = 'plot'
--ORDER BY source.shortname -- does not work with hash joins
asked by Brian E on 2014-12-8:¶
basic queries for accessing all observation records for a given species, genus, or family¶
all observation records for a given species¶
SET search_path = /*r14089*/public;
SELECT *
FROM view_full_occurrence_individual
WHERE scrubbed_species_binomial = /*scrubbed by TNRS:*/'Poa annua'
all observation records for a given genus¶
SET search_path = /*r14089*/public;
SELECT *
FROM view_full_occurrence_individual
WHERE scrubbed_genus = /*scrubbed by TNRS:*/'Poa'
all observation records for a given family¶
SET search_path = /*r14089*/public;
SELECT *
FROM view_full_occurrence_individual
WHERE scrubbed_family = /*scrubbed by TNRS:*/'Poaceae'
basic queries for accessing all traits or a specific trait of a given species, genus, or family¶
all traits of a given species¶
SET search_path = /*r14089*/public;
SELECT
trait.name AS trait
, trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
ORDER BY trait
a specific trait of a given species¶
SET search_path = /*r14089*/public;
SELECT
trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
AND trait.name = 'Height'
all traits of a given genus¶
SET search_path = /*r14089*/public;
SELECT
trait.name AS trait
, trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."[accepted_]genus__@DwC__@vegpath.org" = /*scrubbed by TNRS:*/'Poa'
ORDER BY trait
a specific trait of a given genus¶
SET search_path = /*r14089*/public;
SELECT
trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."[accepted_]genus__@DwC__@vegpath.org" = /*scrubbed by TNRS:*/'Poa'
AND trait.name = 'Height'
all traits of a given family¶
SET search_path = /*r14089*/public;
SELECT
trait.name AS trait
, trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_family__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poaceae'
ORDER BY trait
a specific trait of a given family¶
SET search_path = /*r14089*/public;
SELECT
trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted"
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_family__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poaceae'
AND trait.name = 'Height'
asked by Brian E on 2015-7-23:¶
we would need the trait table in VegBIEN to have all of the associated metadata. In particular, the most important columns being ‘Source’, ‘URL source’, ‘source citation’, ‘Access’, ‘Project PI’¶
SET search_path = /*r14089*/public;
SELECT
trait.*
, source.shortname AS "Source"
, source.url AS "URL source"
, source.citation AS "source citation"
, location.accesslevel AS "Access"
, collector.recorded_by AS "Project PI"
FROM trait
LEFT JOIN taxonoccurrence USING (taxonoccurrence_id)
LEFT JOIN
(SELECT party_id AS collector_id, fullname AS recorded_by FROM party)
collector USING (collector_id)
LEFT JOIN locationevent USING (locationevent_id, source_id)
LEFT JOIN location USING (location_id, source_id)
LEFT JOIN source USING (source_id)
asked by Brian E on 2015-7-28:¶
a few examples of the R functions below that currently are very very slow. Country, State, County (essentially all of the political divisions) all take too long.¶
original query:
SET search_path = /*r14089*/public;
SELECT DISTINCT
country, scrubbed_species_binomial
FROM view_full_occurrence_individual
WHERE
country in ('United States')
AND (is_cultivated = 0 OR is_cultivated IS NULL)
AND is_new_world = 1
AND higher_plant_group IS NOT NULL
AND (is_geovalid = 1 OR is_geovalid IS NULL)
ORDER BY scrubbed_species_binomial
;
revised query:
SET search_path = /*r14089*/public;
SELECT DISTINCT * FROM
(
SELECT
country, scrubbed_species_binomial
FROM view_full_occurrence_individual
WHERE
country in ('United States')
AND (is_cultivated = 0 OR is_cultivated IS NULL)
--AND is_new_world = 1 -- not needed; must omit to prevent Postgres from trying to use this index
AND higher_plant_group IS NOT NULL
AND (is_geovalid = 1 OR is_geovalid IS NULL)
) s
ORDER BY scrubbed_species_binomial
;