Project

General

Profile

VegBIEN FAQ

asked by Brad on 2013-12-4:

Is there an entity in vegbien that corresponds 1:1 with the data source list on the BIEN wiki at https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Datasource_validation_status?

entries for each top-level datasource are in source. however, this is not 1:1 with the datasource list, because it also includes entries for primary data providers (i.e. herbaria).

how would I write a query that lists each source and the number of taxonoccurrences in each source?

SELECT
source.shortname AS datasource,
(SELECT COUNT(*) FROM taxonoccurrence WHERE source_id = source.source_id)
    AS taxonoccurrences
FROM source
WHERE EXISTS(SELECT NULL FROM taxonoccurrence WHERE source_id = source.source_id LIMIT 1)
ORDER BY source.shortname

how [do] I distinguish a proximate data provider?

a proximate data provider is any source table entry that has associated occurrences, i.e. whose source_id is used in the taxonoccurrence table:

SELECT
source.shortname AS datasource
FROM source
WHERE EXISTS(SELECT NULL FROM taxonoccurrence WHERE source_id = source.source_id LIMIT 1)
ORDER BY source.shortname

the correct way to count the number of specimen observations in the database

count the # rows in specimenreplicate

What about adding tree-traversal indices to all recursively-linked tables? That would speed/simplify retrieval of descendant records without compromising the model.

What we would instead add is a pointer from every locationevent directly to the corresponding project (instead of just from the outermost locationevent). Then, occurrences in outer locationevents would be listed along with occurrences in subplot locationevents when joining using the project_id.

Anyways, what we would be adding would not be tree-traversal indices, because these are somewhat cumbersome and need to be rebuilt whenever rows are added. We would instead use a many:many linking table that lists the descendants (and ancestors) for each node (e.g. see taxonlabel_relationship).

asked by Brad on 2013-12-5:

an accurate way to get a count of plots per proximate data source

SELECT
source.shortname AS datasource,
(SELECT COUNT(*) FROM location WHERE source_id = source.source_id) AS plots
FROM source
WHERE EXISTS(SELECT NULL FROM taxonoccurrence WHERE source_id = source.source_id LIMIT 1)
AND observationtype = 'plot'
ORDER BY source.shortname

asked by Brad on 2013-12-15:

where do I find `matched_taxon_name_with_author` or `accepted_taxon_name_with_author`?

These are variables defined in the function:

matched_taxon_name_with_author text = [...]
accepted_taxon_name_with_author text = [...]

new.scrubbed_unique_taxon_name = COALESCE(
    accepted_taxon_name_with_author, matched_taxon_name_with_author);

Am I interpreting taxonlabel.taxonomicname correctly as the resolved name?

No, this is the concatenated input name sent to TNRS. To get the resolved name, join to the TNRS output:

SELECT *
FROM taxonlabel
LEFT JOIN "TNRS".taxon_scrub ON taxon_scrub."concatenatedScientificName" = taxonlabel.taxonomicname
LIMIT 10

To include invalid TNRS matches as well, instead join to the unfiltered TNRS output:

SELECT *
FROM taxonlabel
LEFT JOIN "TNRS"."MatchedTaxon" ON "MatchedTaxon"."concatenatedScientificName" = taxonlabel.taxonomicname
LIMIT 10

*asked by Mark on 2014-1-14*:

the entire data-flow process

We have an import steps wiki page which describes the current process, as well as a new-style import wiki page and associated *PowerPoint* that describes our planned improvements to it.

[data] acquisition

For the mapping process, we have two examples of how to use our import workflow:

some of the challenges involved with schema mapping

I think translating between normalized and denormalized formats has been by far the biggest challenge, especially the normalization step, as evidenced by the complexity of column-based import and the *XPath mappings*.

the value of [...] making these data interoperable

it enables scientists to just use a single CSV, rather than having to go through all the mapping work we did themselves.

who is currently contributing to BIEN

The datasources are listed on the wiki under VegBIEN contents (as well as in the *publishable datasources spreadsheet*), and details about who provided them are under Datasource conditions of use.

why and how any of the Darwin Core dumps (e.g. from NYBG and others) are different from what is already in GBIF

These were obtained directly from the herbarium in its native data format, so they are both newer and include more fields than what is in GBIF.

a summary of the status of resolving intellectual property right issues among contributors

The current status in clearing datasources for publication is in the *publishable datasources spreadsheet*

our technology model for resolving [intellectual property right issues among contributors]

We look at the provider's website to see if they have a formal use policy, and if they don't, we contact them to ask for permission to redistribute. We track their conditions of use on the wiki.

quality control mechanisms

That is the spot-checking validation, and eventually the aggregating validations.

a brief description of how you are doing [the validations]

Basically, I prepare a sample extract of the data in VegBIEN and the corresponding input data, and the provider or one of the BIEN members reviews the extract for both accuracy and completeness (inclusion of important fields).

a list of data sources incorporated into BIEN3

see VegBIEN contents > datasources

the total number of plots or occurrences contained in BIEN3

see VegBIEN contents > records

where the Intellectual property rights stuff is being stored

The *publishable datasources spreadsheet* will show you the most useful summary of this, and links to each datasource's details.

asked by Brad on 2014-1-17:

List of plots with stem measurements

for SALVIAS's first 100 rows:

SET search_path TO "SALVIAS_VegBIEN";
SELECT authorlocationcode
FROM plot
WHERE source_id = source_by_shortname('SALVIAS')
AND EXISTS(
    SELECT NULL
    FROM location
    JOIN locationevent USING (location_id)
    JOIN taxonoccurrence USING (locationevent_id)
    JOIN aggregateoccurrence USING (taxonoccurrence_id)
    JOIN plantobservation USING (aggregateoccurrence_id)
    JOIN stemobservation USING (plantobservation_id)
    WHERE location.plot_location_id = plot.location_id
    LIMIT 1
)
ORDER BY authorlocationcode

asked by Brad on 2014-1-21:

Count taxa

SELECT COUNT(DISTINCT (taxonname, author)) AS taxa
FROM taxonverbatim
WHERE source_id = source_by_shortname('bien2_traits')

List taxa

SELECT DISTINCT concat_ws(' ', taxonname, author) AS taxonwithauthor
FROM taxonverbatim
WHERE source_id = source_by_shortname('bien2_traits')
ORDER BY concat_ws(' ', taxonname, author)

Taxon, trait and value for first 5000 records

SELECT concat_ws(' ', taxonname, author) AS taxonwithauthor, name AS trait, value
FROM taxonverbatim
JOIN taxondetermination ON taxondetermination.taxonverbatim_id = taxonverbatim.taxonverbatim_id AND taxondetermination.iscurrent
JOIN trait USING (taxonoccurrence_id)
WHERE taxonverbatim.source_id = source_by_shortname('bien2_traits')
ORDER BY taxonname
LIMIT 5000

asked by Brad on 2014-2-23:

count of total plots [+subplots]

edited from Brad's query

SELECT COUNT(DISTINCT location_id) AS plots
FROM location l JOIN source s
ON l.source_id=s.source_id
WHERE s.observationtype='plot';

resolved species names, after correction by the tnrs

SELECT COUNT(DISTINCT scrubbed_genus||' '||scrubbed_specific_epithet) AS resolved_species_names
FROM taxonlabel
JOIN "TNRS".taxon_scrub ON taxon_scrub."concatenatedScientificName" = taxonlabel.taxonomicname
-- runtime: 1.5 min ("82061 ms") @vegbiendev

count of total [top-level] plots

edited from Brad's query

SELECT COUNT(DISTINCT location_id) AS plots
FROM plot l JOIN source s
ON l.source_id=s.source_id
WHERE s.observationtype='plot';

the coordinates of those taxonoccurrences which (a) have passed geovalidation, (b) are not cultivated

SELECT
"decimalLatitude" AS latitude_deg,
"decimalLongitude" AS longitude_deg
FROM analytical_stem
-- COALESCE() makes sure each value is true/false, to avoid 3-valued logic
WHERE NOT COALESCE(cultivated_bien::boolean, false)
AND COALESCE(geovalid_bien::boolean, false)
AND "decimalLatitude" IS NOT NULL
AND "decimalLongitude" IS NOT NULL
-- runtime for COUNT(*): 4 min ("224559 ms") for 40 million rows ("38930250") @vegbiendev

the coordinates of those taxonoccurrences which (a) have passed geovalidation, (b) are not cultivated and (c) are from New World countries only

SELECT
"decimalLatitude" AS latitude_deg,
"decimalLongitude" AS longitude_deg
FROM analytical_stem
-- COALESCE() makes sure each value is true/false, to avoid 3-valued logic
WHERE NOT COALESCE(cultivated_bien::boolean, false)
AND COALESCE(geovalid_bien::boolean, false)
AND COALESCE("isNewWorld_bien"::boolean, false)
AND "decimalLatitude" IS NOT NULL
AND "decimalLongitude" IS NOT NULL
-- runtime: 8 min ("468036 ms") for 10 million rows ("10255303") @vegbiendev

asked by Brian M on 2014-7-8:

[the % of records which have] the aggregate occurrence date

SELECT
  shortname AS source
, counts.*
FROM source
, LATERAL (
    SELECT
      COUNT(*) AS total
    , COUNT(NULLIF(collectiondate IS NOT NULL, false)) AS with_date
    FROM aggregateoccurrence
    WHERE source_id = source.source_id
) counts
WHERE EXISTS(SELECT NULL FROM taxonoccurrence WHERE source_id = source.source_id+0 LIMIT 1) 
ORDER BY shortname
; -- runtime: 1 min ("51210 ms")

% of input rows with date, for comparison to VegBIEN

SELECT
  COUNT(*) AS total
, COUNT(NULLIF(date IS NOT NULL, false)) AS with_date
FROM "datasrc"."table" 

asked by Brian E on 2014-8-8:

select all observation records from a given species

SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SELECT taxonoccurrence.*
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
    /*needed to enable index scan:*/
    AND taxonomicname IS NOT NULL AND "TNRS".taxon_name_is_safe(taxonomicname)
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN taxonoccurrence USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*species:*/'Poa annua'

select all observation records from a given family

SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SELECT taxonoccurrence.*
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
    /*needed to enable index scan:*/
    AND taxonomicname IS NOT NULL AND "TNRS".taxon_name_is_safe(taxonomicname)
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN taxonoccurrence USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_family__@TNRS__@vegpath.org" = /*family:*/'Poaceae'

select all observation records from a list of taxa (taxon_a, taxon_b, taxon_c)

SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SELECT taxonoccurrence.*
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
    /*needed to enable index scan:*/
    AND taxonomicname IS NOT NULL AND "TNRS".taxon_name_is_safe(taxonomicname)
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN taxonoccurrence USING (taxonoccurrence_id)
WHERE taxon_scrub."*Accepted_name" IN (/*taxa:*/'Poa annua', 'Poa annua subvar. minima', 'Poa annua var. annua')

Count how many plot records there are

SET search_path = /*r14089*/public;
SELECT COUNT(*)
FROM source
JOIN plot USING (source_id)
WHERE observationtype = 'plot'

Count how many observation records there are

SET search_path = /*r14089*/public;
SELECT COUNT(*)
FROM taxonoccurrence

Counts of the families, genera, and species in BIEN3db

SELECT
  COUNT(DISTINCT "Accepted_family__@TNRS__@vegpath.org") AS families
, COUNT(DISTINCT "[accepted_]genus__@DwC__@vegpath.org") AS genera
, COUNT(DISTINCT "Accepted_species[_binomial]__@TNRS__@vegpath.org") AS species
FROM "TNRS".taxon_scrub

Counts of BIEN2 species in the associated groups (bryophytes", "ferns and allies", "flowering plants", "gymnosperms (conifers)", "gymnosperms (non-conifer)).

SET search_path = /*r14089*/public;
SELECT
  higher_plant_group
, COUNT(DISTINCT "Accepted_species[_binomial]__@TNRS__@vegpath.org") AS species
FROM "TNRS".taxon_scrub
JOIN family_higher_plant_group ON family_higher_plant_group.family = taxon_scrub."Accepted_family__@TNRS__@vegpath.org" 
GROUP BY higher_plant_group
ORDER BY higher_plant_group

Counts of plots by data source (plot network and data providers)

SET search_path = /*r14089*/public;
SELECT
  source.shortname AS datasource
, (SELECT COUNT(*) FROM plot WHERE source_id = source.source_id) AS plots
FROM source
WHERE observationtype = 'plot'
ORDER BY datasource

Counts of specimens by source (herbarium)

SET search_path = /*r14089*/public;
SELECT
  source.shortname AS datasource
, (SELECT COUNT(*) FROM specimenreplicate WHERE source_id = source.source_id) AS specimens
FROM source
WHERE observationtype = 'specimen'
ORDER BY datasource

Create New World species list.

SET search_path = /*r14089*/public;
SELECT DISTINCT scrubbed_species_binomial AS species
FROM view_full_occurrence_individual
WHERE "isNewWorld_bien"::boolean

Counts of families and species by country

SET search_path = /*r14089*/public;
SELECT
  "country__@DwC__@vegpath.org" 
, COUNT(DISTINCT scrubbed_family) AS families
, COUNT(DISTINCT scrubbed_species_binomial) AS species
FROM view_full_occurrence_individual
GROUP BY "country__@DwC__@vegpath.org" 
ORDER BY "country__@DwC__@vegpath.org" 

The total number of botanical observation records (from specimens, plots, and traits) for each country.

SET search_path = /*r14089*/public;
SELECT
  "country__@DwC__@vegpath.org" 
, COUNT(*) AS observations
FROM view_full_occurrence_individual
GROUP BY "country__@DwC__@vegpath.org" 
ORDER BY "country__@DwC__@vegpath.org" 

Count the total number of BIEN3 species that also contain a trait observation

28,362

SET search_path = /*r14089*/public;
SET enable_seqscan = off; -- needed even with LIMIT
SELECT COUNT(DISTINCT "Accepted_species[_binomial]__@TNRS__@vegpath.org")
FROM
(
SELECT taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" 
FROM trait
JOIN taxondetermination USING (taxonoccurrence_id)
JOIN taxonverbatim USING (taxonverbatim_id)
JOIN taxonlabel USING (taxonlabel_id)
JOIN "TNRS".taxon_scrub ON taxon_scrub."*Name_submitted" = taxonlabel.taxonomicname
LIMIT 200000 /* added to use correct query plan; must be >= actual rowcount (128628;
    note that there is some 1:many mapping on the 109659 trait rows) */
) s

Count the total number of traits in the BIEN database and count the total number -- of trait observations.

SET search_path = /*r14089*/public;
SELECT
  COUNT(DISTINCT name) AS traits
, COUNT(*) AS trait_observations
FROM trait

For each trait, count the total number of BIEN2 species that have those trait values recorded

trait count
Area-based photosynthesis (Aarea) 755
d13C 88
Flowering date 645
Flowering day 645
Flowering month 3684
Germination time 72
Height 4173
Leaf area 1978
Leaf Cmass 13
Leaf dry area 1804
Leaf dry matter content (LDMC) 1736
Leaf K 29
Leaf lifespan (LLS) 608
Leaf Narea 1413
Leaf Nmass 2042
Leaf Parea 511
Leaf Pmass 1172
Leaf thickness 27
Mass-based photosynthesis (Amass) 709
PNUE 79
seed mass 20734
Specific leaf area (SLA) 3526
Stomatal conductance (Gs) 436
wood density 6868
SET search_path = /*r14089*/public;
SET enable_seqscan = off; -- needed even with LIMIT
SELECT
  trait
, COUNT(DISTINCT "Accepted_species[_binomial]__@TNRS__@vegpath.org")
FROM
(
SELECT
  trait.name AS trait
, taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" 
FROM trait
JOIN taxondetermination USING (taxonoccurrence_id)
JOIN taxonverbatim USING (taxonverbatim_id)
JOIN taxonlabel USING (taxonlabel_id)
JOIN "TNRS".taxon_scrub ON taxon_scrub."*Name_submitted" = taxonlabel.taxonomicname
LIMIT 200000 /* added to use correct query plan; must be >= actual rowcount (128628;
    note that there is some 1:many mapping on the 109659 trait rows) */
) s
GROUP BY trait
ORDER BY trait

asked by Brody on 2014-8-20:

a list of all vegetation plots in BIEN

SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SELECT
  source.shortname AS datasource
, plot.authorlocationcode AS plot_name
FROM source
JOIN plot USING (source_id)
WHERE observationtype = 'plot'
ORDER BY datasource

how many [vegetation plots] there are total

SET search_path = /*r14089*/public;
SELECT COUNT(*)
FROM source
JOIN plot USING (source_id)
WHERE observationtype = 'plot'

how many [vegetation plots] came from each original source

SET search_path = /*r14089*/public;
SELECT
  source.shortname AS datasource
, (SELECT COUNT(*) FROM plot WHERE source_id = source.source_id) AS plots
FROM source
WHERE observationtype = 'plot'
ORDER BY datasource

[the vegetation plots'] lat/longs

SET search_path = /*r14089*/public;
SET enable_seqscan = off;
SET enable_hashjoin = off;
SELECT
  source.shortname AS datasource
, latitude_deg
, longitude_deg
FROM source
JOIN plot USING (source_id)
JOIN place USING (place_id)
JOIN coordinates USING (coordinates_id)
WHERE observationtype = 'plot'
ORDER BY datasource

all records from a plot

SET search_path = /*r14089*/public;
SELECT * FROM view_full_occurrence_individual
WHERE datasource = '__' AND plot_name = '__'

one line in [the list of SALVIAS plots] doesn't include any species names

This is because it's the line for the top-level plot (location_id = '474'), which doesn't itself contain species because they are all in the subplots (location_id = '474; 24', etc)

asked by Brian E on 2014-8-21

how to filter plots by area

SET search_path = /*r14089*/public;
SELECT *
FROM view_full_occurrence_individual
WHERE plot_area_ha = '__'

asked by Cyrille on 2014-10-16:

the number of occurrences you currently have in BIEN 3 for the species list attached

SET search_path = /*r14089*/public;
-- the simpler nested-SELECT approach ("384705 ms") is faster than LEFT JOIN ("536893 ms") for including all the input names, but slightly slower than `= ANY` ("308003 ms"), which does not include all the input names
SELECT
  scrubbed_species_binomial AS species
, (
    SELECT COUNT(*) FROM view_full_occurrence_individual
    WHERE scrubbed_species_binomial = species_list.scrubbed_species_binomial
)
FROM unnest(/*scrubbed by TNRS:*/'{Abies concolor,Abies magnifica,Acer saccharum,Cimicifuga americana,Actaea elata,Allium tricoccum,Arisaema triphyllum,Astragalus cremnophylax,Astragalus scaphoides,Astragalus tyghensis,Atriplex acanthocarpa,Atriplex canescens,Balsamorhiza sagittata,Boechera fecunda,Bouteloua rigidiseta,Goeppertia ovandensis,Calocedrus decurrens,Calochortus albus,Calochortus lyallii,Calochortus obispoensis,Calochortus pulchellus,Calochortus tiburonensis,Chamaecrista lineata keyensis,Cirsium pitcheri,Cirsium undulatum,Cleistesiopsis divaricata,Cryptantha flava,Cypripedium fasciculatum,Dicerandra frutescens,Echinacea angustifolia,Eriogonum longifolium,Eryngium cuneifolium,Hydrastis canadensis,Hypericum cumulicola,Lepidium davisii,Liatris scariosa,Lomatium bradshawii,Lomatium cookii,Lupinus tidestromii,Mimulus cardinalis,Mimulus lewisii,Paronychia jamesii,Physaria ovalifolia,Pinus jeffreyi,Pinus lambertiana,Pinus ponderosa,Potentilla congesta,Psoralidium tenuiflorum,Purshia subintegra,Pyrrocoma radiata,Rhus aromatica,Schmaltzia copallinum,Silene spaldingii,Solidago mollis,Sphaeralcea coccinea,Stenaria nigricans,Trillium grandiflorum,Trollius laxus,Tsuga canadensis,Aechmea magdalenae,Agave marmorata,Echinocactus platyacanthus,Prosopis glandulosa,Pterocereus gaumeri,Ratibida columnifera,Thelesperma megapotamicum,Dioon edule,Prosopis laevigata,Cecropia obtusifolia,Chamaedorea elegans,Chamaedorea radicalis,Dioon merolae,Echeveria longissima,Euterpe edulis,Geonoma brevispatha,Geonoma deversa,Geonoma schottiana,Guettarda viburnoides,Mammillaria gaumeri,Mammillaria huitzilopochtli,Manilkara zapota,Neobuxbaumia macrocephala,Neobuxbaumia mezcalaensis,Neobuxbaumia tetetzo,Prioria copaifera,Sabal yapa,Syngonanthus nitens,Zamia amblyphyllidia,Zea diploperennis,Carex bigelowii,Epilobium latifolium,Kosteletzkya pentacarpos,Corallorhiza trifida,Cypripedium calceolus,Geum rivale,Poa alpina,Potentilla anserina,Saxifraga aizoides}'::text[]) species_list (scrubbed_species_binomial)

how many occurrences, additional to GBIF data, you have for each species

SET search_path = /*r14089*/public;
-- the simpler nested-SELECT approach ("384705 ms") is faster than LEFT JOIN ("536893 ms") for including all the input names, but slightly slower than `= ANY` ("308003 ms"), which does not include all the input names
SELECT
  scrubbed_species_binomial AS species
, (
    SELECT COUNT(*) FROM view_full_occurrence_individual
    WHERE
        scrubbed_species_binomial = species_list.scrubbed_species_binomial
    AND    datasource != 'GBIF'
)
FROM unnest(/*scrubbed by TNRS:*/'{Abies concolor,Abies magnifica,Acer saccharum,Cimicifuga americana,Actaea elata,Allium tricoccum,Arisaema triphyllum,Astragalus cremnophylax,Astragalus scaphoides,Astragalus tyghensis,Atriplex acanthocarpa,Atriplex canescens,Balsamorhiza sagittata,Boechera fecunda,Bouteloua rigidiseta,Goeppertia ovandensis,Calocedrus decurrens,Calochortus albus,Calochortus lyallii,Calochortus obispoensis,Calochortus pulchellus,Calochortus tiburonensis,Chamaecrista lineata keyensis,Cirsium pitcheri,Cirsium undulatum,Cleistesiopsis divaricata,Cryptantha flava,Cypripedium fasciculatum,Dicerandra frutescens,Echinacea angustifolia,Eriogonum longifolium,Eryngium cuneifolium,Hydrastis canadensis,Hypericum cumulicola,Lepidium davisii,Liatris scariosa,Lomatium bradshawii,Lomatium cookii,Lupinus tidestromii,Mimulus cardinalis,Mimulus lewisii,Paronychia jamesii,Physaria ovalifolia,Pinus jeffreyi,Pinus lambertiana,Pinus ponderosa,Potentilla congesta,Psoralidium tenuiflorum,Purshia subintegra,Pyrrocoma radiata,Rhus aromatica,Schmaltzia copallinum,Silene spaldingii,Solidago mollis,Sphaeralcea coccinea,Stenaria nigricans,Trillium grandiflorum,Trollius laxus,Tsuga canadensis,Aechmea magdalenae,Agave marmorata,Echinocactus platyacanthus,Prosopis glandulosa,Pterocereus gaumeri,Ratibida columnifera,Thelesperma megapotamicum,Dioon edule,Prosopis laevigata,Cecropia obtusifolia,Chamaedorea elegans,Chamaedorea radicalis,Dioon merolae,Echeveria longissima,Euterpe edulis,Geonoma brevispatha,Geonoma deversa,Geonoma schottiana,Guettarda viburnoides,Mammillaria gaumeri,Mammillaria huitzilopochtli,Manilkara zapota,Neobuxbaumia macrocephala,Neobuxbaumia mezcalaensis,Neobuxbaumia tetetzo,Prioria copaifera,Sabal yapa,Syngonanthus nitens,Zamia amblyphyllidia,Zea diploperennis,Carex bigelowii,Epilobium latifolium,Kosteletzkya pentacarpos,Corallorhiza trifida,Cypripedium calceolus,Geum rivale,Poa alpina,Potentilla anserina,Saxifraga aizoides}'::text[]) species_list (scrubbed_species_binomial)
; -- runtime: 6.5 min ("384705 ms") @r14838 @vegbiendev

asked by Brian E on 2014-10-18:

for a given species list extract a count of number trait observations per trait per species

SET search_path = /*r14089*/public;
SELECT *
FROM unnest(/*scrubbed by TNRS:*/'{Acer rubrum,Poa annua}'::text[]) species_list (species)
CROSS JOIN LATERAL
(
    SELECT
      trait.name AS trait
    , COUNT(*)
    FROM "TNRS".taxon_scrub
    JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
    JOIN taxonverbatim USING (taxonlabel_id)
    JOIN taxondetermination USING (taxonverbatim_id)
    JOIN trait USING (taxonoccurrence_id)
    WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = species_list.species
    GROUP BY trait
    ORDER BY trait
) s

for a given species extract all traits within BIEN3, any associated geographic information, and the source of all trait values

SET search_path = /*r14089*/public;
SELECT
  trait.name AS trait
, trait.value
, trait.units
, geoscrub_output.*
, collector.fullname AS collector
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
LEFT JOIN taxonoccurrence USING (taxonoccurrence_id)
LEFT JOIN locationevent USING (locationevent_id)
LEFT JOIN location USING (location_id)
LEFT JOIN place USING (place_id) -- place_id is pulled forward from the parent location when needed
LEFT JOIN coordinates USING (coordinates_id)
LEFT JOIN geoscrub.geoscrub_output ON ARRAY[geoscrub_output."decimalLatitude"] = ARRAY[coordinates.latitude_deg] AND ARRAY[geoscrub_output."decimalLongitude"] = ARRAY[coordinates.longitude_deg] AND ARRAY[geoscrub_output.country] = ARRAY[place.country] AND ARRAY[geoscrub_output."stateProvince"] = ARRAY[place.stateprovince] AND ARRAY[geoscrub_output.county] = ARRAY[place.county]
LEFT JOIN party collector ON collector.party_id = taxonoccurrence.collector_id
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
ORDER BY trait

for a given species [and trait] return the average trait value, the number of observations of that trait and the variance of that trait

SET search_path = /*r14089*/public;
SELECT
  avg(trait.value::double precision)
, COUNT(*)
, variance(trait.value::double precision)
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE
    taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
AND trait.name = 'Height'

for a given species extract a given trait value

SET search_path = /*r14089*/public;
SELECT
  trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE
    taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
AND trait.name = 'Height'

for a given list of families extract all trait[s]

SET search_path = /*r14089*/public;
SET enable_hashjoin = off;
SET enable_mergejoin = off;
SELECT *
FROM unnest(/*scrubbed by TNRS:*/'{Sapindaceae,Poaceae}'::text[]) family_list (family)
CROSS JOIN LATERAL
(
    SELECT
      trait.name AS trait
    , trait.value
    , trait.units
    FROM "TNRS".taxon_scrub
    JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
    JOIN taxonverbatim USING (taxonlabel_id)
    JOIN taxondetermination USING (taxonverbatim_id)
    JOIN trait USING (taxonoccurrence_id)
    WHERE taxon_scrub."Accepted_family__@TNRS__@vegpath.org" = family_list.family
    ORDER BY trait
) s
; -- runtime: 2 min ("111228 ms") for 11986 rows @r14786 @vegbiendev

asked by Brian M on 2014-10-23:

all plot data other than plot area in view full occurrence

SET search_path = /*r14089*/public;
SELECT view_full_occurrence_individual.*
FROM source
JOIN view_full_occurrence_individual ON view_full_occurrence_individual.datasource = source.shortname
WHERE source.observationtype = 'plot'
--ORDER BY source.shortname -- does not work with hash joins

asked by Brian E on 2014-12-8:

basic queries for accessing all observation records for a given species, genus, or family

all observation records for a given species

SET search_path = /*r14089*/public;
SELECT *
FROM view_full_occurrence_individual
WHERE scrubbed_species_binomial = /*scrubbed by TNRS:*/'Poa annua'

all observation records for a given genus

SET search_path = /*r14089*/public;
SELECT *
FROM view_full_occurrence_individual
WHERE scrubbed_genus = /*scrubbed by TNRS:*/'Poa'

all observation records for a given family

SET search_path = /*r14089*/public;
SELECT *
FROM view_full_occurrence_individual
WHERE scrubbed_family = /*scrubbed by TNRS:*/'Poaceae'

basic queries for accessing all traits or a specific trait of a given species, genus, or family

all traits of a given species

SET search_path = /*r14089*/public;
SELECT
  trait.name AS trait
, trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
ORDER BY trait

a specific trait of a given species

SET search_path = /*r14089*/public;
SELECT
  trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_species[_binomial]__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poa annua'
AND trait.name = 'Height'

all traits of a given genus

SET search_path = /*r14089*/public;
SELECT
  trait.name AS trait
, trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."[accepted_]genus__@DwC__@vegpath.org" = /*scrubbed by TNRS:*/'Poa'
ORDER BY trait

a specific trait of a given genus

SET search_path = /*r14089*/public;
SELECT
  trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."[accepted_]genus__@DwC__@vegpath.org" = /*scrubbed by TNRS:*/'Poa'
AND trait.name = 'Height'

all traits of a given family

SET search_path = /*r14089*/public;
SELECT
  trait.name AS trait
, trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_family__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poaceae'
ORDER BY trait

a specific trait of a given family

SET search_path = /*r14089*/public;
SELECT
  trait.value
, trait.units
FROM "TNRS".taxon_scrub
JOIN taxonlabel ON taxonlabel.taxonomicname = taxon_scrub."*Name_submitted" 
JOIN taxonverbatim USING (taxonlabel_id)
JOIN taxondetermination USING (taxonverbatim_id)
JOIN trait USING (taxonoccurrence_id)
WHERE taxon_scrub."Accepted_family__@TNRS__@vegpath.org" = /*scrubbed by TNRS:*/'Poaceae'
AND trait.name = 'Height'

asked by Brian E on 2015-7-23:

we would need the trait table in VegBIEN to have all of the associated metadata. In particular, the most important columns being ‘Source’, ‘URL source’, ‘source citation’, ‘Access’, ‘Project PI’

SET search_path = /*r14089*/public;
SELECT
  trait.*
, source.shortname AS "Source" 
, source.url AS "URL source" 
, source.citation AS "source citation" 
, location.accesslevel AS "Access" 
, collector.recorded_by AS "Project PI" 
FROM trait
LEFT JOIN taxonoccurrence USING (taxonoccurrence_id)
LEFT JOIN
(SELECT party_id AS collector_id, fullname AS recorded_by FROM party)
collector USING (collector_id)
LEFT JOIN locationevent USING (locationevent_id, source_id)
LEFT JOIN location USING (location_id, source_id)
LEFT JOIN source USING (source_id)

asked by Brian E on 2015-7-28:

a few examples of the R functions below that currently are very very slow. Country, State, County (essentially all of the political divisions) all take too long.

original query:

SET search_path = /*r14089*/public;
SELECT DISTINCT
country, scrubbed_species_binomial
FROM view_full_occurrence_individual
WHERE
    country in ('United States')
AND (is_cultivated = 0 OR is_cultivated IS NULL)
AND is_new_world = 1
AND higher_plant_group IS NOT NULL
AND (is_geovalid = 1 OR is_geovalid IS NULL)
ORDER BY scrubbed_species_binomial
;

revised query:

SET search_path = /*r14089*/public;
SELECT DISTINCT * FROM
(
SELECT
country, scrubbed_species_binomial
FROM view_full_occurrence_individual
WHERE
    country in ('United States')
AND (is_cultivated = 0 OR is_cultivated IS NULL)
--AND is_new_world = 1 -- not needed; must omit to prevent Postgres from trying to use this index
AND higher_plant_group IS NOT NULL
AND (is_geovalid = 1 OR is_geovalid IS NULL)
) s
ORDER BY scrubbed_species_binomial
;