Test queries against SALVIAS native database - all queries against MySQL db `salvias_plots` on nimoy 1. Count projects SELECT COUNT(*) FROM projects; +----------+ | COUNT(*) | +----------+ | 23 | +----------+ 1 row in set (0.00 sec) 2. Count plots SELECT COUNT(*) FROM plotMetadata; mysql> SELECT COUNT(*) FROM plotMetadata; +----------+ | COUNT(*) | +----------+ | 13661 | +----------+ 1 row in set (0.00 sec) 3. Count plots per project SELECT pr.project_id, project_name, COUNT(PlotID) FROM projects pr JOIN plotMetadata p ON pr.project_id=p.project_id GROUP BY pr.project_id, project_name; -> GROUP BY pr.project_id, project_name; +------------+------------------------------------------------+---------------+ | project_id | project_name | COUNT(PlotID) | +------------+------------------------------------------------+---------------+ | 1 | Gentry Transect Dataset | 228 | | 2 | Boyle Transects | 37 | | 3 | OTS Transects | 20 | | 5 | RAINFOR - 0.1 ha Madre de Dios, Peru | 102 | | 6 | Noel Kempff Forest Plots | 29 | | 7 | Noel Kempff Savanna Plots | 9 | | 8 | Inventarios de Bosques en Ecuador | 1 | | 9 | Inventarios de Bosques de la Costa del Ecuador | 1 | | 10 | Enquist Lab Transect Dataset | 22 | | 11 | INW Vegetation Plots | 12962 | | 12 | Cam Webb Borneo Plots | 28 | | 13 | Pilon Lajas Treeplots Bolivia | 2 | | 14 | Madidi Transects | 99 | | 16 | nsf_example | 2 | | 17 | Madidi Permanent Plots | 6 | | 18 | SERBO Selva Seca Oaxaca | 3 | | 19 | DeWalt Bolivia forest plots | 4 | | 21 | La Selva Secondary Forest Plots | 3 | | 22 | ACA Amazon Forest Inventories | 1 | | 23 | Madidi Savana Line Transects | 70 | | 24 | Bonifacino Forest Transects | 4 | | 25 | RAINFOR - 1 ha Peru | 28 | +------------+------------------------------------------------+---------------+ 22 rows in set (0.12 sec) 4. Count of individuals per plot for single project First, look up examples of methodologies used in different projects: SELECT DISTINCT pr.project_id, project_name, PlotMethod, m.MethodCode, m.Description FROM projects pr JOIN plotMetadata p JOIN lookup_MethodCode m ON pr.project_id=p.project_id AND p.MethodCode=m.MethodCode LIMIT 10; +------------+------------------------------------------------+--------------------------------------------+------------+--------------------------+ | project_id | project_name | PlotMethod | MethodCode | Description | +------------+------------------------------------------------+--------------------------------------------+------------+--------------------------+ | 1 | Gentry Transect Dataset | 0.1 ha transect, stems >= 2.5 cm dbh | 8 | species (stems) | | 2 | Boyle Transects | 0.1 ha transect, stems >= 2.5 cm dbh | 3 | individuals | | 3 | OTS Transects | 0.1 ha transect, stems >= 2.5 cm dbh | 3 | individuals | | 5 | RAINFOR - 0.1 ha Madre de Dios, Peru | 0.1 ha transect, stems >= 2.5 cm dbh | 3 | individuals | | 6 | Noel Kempff Forest Plots | 1 ha, stems >= 10 cm dbh | 3 | individuals | | 7 | Noel Kempff Savanna Plots | Point-intercept | 7 | species (line-intercept) | | 8 | Inventarios de Bosques en Ecuador | 1 ha, stems >= 10 cm dbh | 3 | individuals | | 9 | Inventarios de Bosques de la Costa del Ecuador | 1 ha, stems >= 10 cm dbh | 3 | individuals | | 10 | Enquist Lab Transect Dataset | 0.1 ha transect, stems >= 2.5 cm dbh | 3 | individuals | | 11 | INW Vegetation Plots | 20 m x 20 cover plots, all vascular plants | 1 | species (percent cover) | +------------+------------------------------------------------+--------------------------------------------+------------+--------------------------+ 10 rows in set (0.00 sec) project 2 will do (Boyle Transects). Are all individual observations. Now, let's count using two different methods: (a) method one: count observations (valid for lookup_MethodCode='individuals' only) SELECT p.PlotID, p.SiteCode, COUNT(*) FROM projects pr JOIN plotMetadata p JOIN plotObservations o ON pr.project_id=p.project_id AND p.PlotID=o.PlotID WHERE p.project_id=2 GROUP BY p.PlotID, p.SiteCode; +--------+----------+----------+ | PlotID | SiteCode | COUNT(*) | +--------+----------+----------+ | 290 | c1000-1 | 321 | | 291 | c1250-1 | 228 | | 292 | c1750-1 | 270 | | 293 | c1750-2 | 195 | | 294 | c1750-3 | 274 | | 295 | c2000-1 | 362 | | 296 | c2000-2 | 352 | | 297 | c2250-1 | 439 | | 298 | c2750-1 | 189 | | 299 | c2750-2 | 251 | | 300 | c2750-3 | 248 | | 301 | c500-1 | 273 | | 302 | c750-1 | 298 | | 303 | c750-2 | 303 | | 304 | c750-3 | 295 | | 305 | e1250-1 | 444 | | 306 | e1750-1 | 313 | | 307 | e1750-2 | 435 | | 308 | e1750-3 | 402 | | 309 | e2250-1 | 399 | | 310 | e2250-2 | 316 | | 311 | e2750-1 | 310 | | 312 | e2750-2 | 416 | | 313 | e750-1 | 252 | | 314 | e750-2 | 309 | | 315 | e750-3 | 346 | | 316 | m1250-1 | 397 | | 317 | m1250-2 | 396 | | 318 | m1750-1 | 346 | | 319 | m1750-2 | 247 | | 320 | m1750-3 | 410 | | 321 | m2250-1 | 390 | | 322 | m2750-1 | 247 | | 323 | m2750-2 | 220 | | 324 | m2750-3 | 212 | | 325 | m750-1 | 317 | | 326 | m750-2 | 340 | +--------+----------+----------+ 37 rows in set (0.02 sec) (b) method 2: sum `NoInd` (number of individuals). Valid for all individual-observations type plots The following query tells me that this approach should be valid for all plots in this project: SELECT NoInd, COUNT(*) AS observations FROM plotObservations o JOIN plotMetadata p ON o.PlotID=p.PlotID WHERE project_id=2 GROUP BY NoInd; +-------+--------------+ | NoInd | observations | +-------+--------------+ | 1 | 11762 | +-------+--------------+ 1 row in set (0.03 sec) Perfect. Now, let's count: SELECT p.PlotID, p.SiteCode, SUM(NoInd) AS Individuals FROM projects pr JOIN plotMetadata p JOIN plotObservations o ON pr.project_id=p.project_id AND p.PlotID=o.PlotID WHERE p.project_id=2 GROUP BY p.PlotID, p.SiteCode; +--------+----------+-------------+ | PlotID | SiteCode | Individuals | +--------+----------+-------------+ | 290 | c1000-1 | 321 | | 291 | c1250-1 | 228 | | 292 | c1750-1 | 270 | | 293 | c1750-2 | 195 | | 294 | c1750-3 | 274 | | 295 | c2000-1 | 362 | | 296 | c2000-2 | 352 | | 297 | c2250-1 | 439 | | 298 | c2750-1 | 189 | | 299 | c2750-2 | 251 | | 300 | c2750-3 | 248 | | 301 | c500-1 | 273 | | 302 | c750-1 | 298 | | 303 | c750-2 | 303 | | 304 | c750-3 | 295 | | 305 | e1250-1 | 444 | | 306 | e1750-1 | 313 | | 307 | e1750-2 | 435 | | 308 | e1750-3 | 402 | | 309 | e2250-1 | 399 | | 310 | e2250-2 | 316 | | 311 | e2750-1 | 310 | | 312 | e2750-2 | 416 | | 313 | e750-1 | 252 | | 314 | e750-2 | 309 | | 315 | e750-3 | 346 | | 316 | m1250-1 | 397 | | 317 | m1250-2 | 396 | | 318 | m1750-1 | 346 | | 319 | m1750-2 | 247 | | 320 | m1750-3 | 410 | | 321 | m2250-1 | 390 | | 322 | m2750-1 | 247 | | 323 | m2750-2 | 220 | | 324 | m2750-3 | 212 | | 325 | m750-1 | 317 | | 326 | m750-2 | 340 | +--------+----------+-------------+ 37 rows in set (0.04 sec) Good, matches result using previous method. Next, let's count individuals in a plot which uses aggregate counts of individuals (lookup_MethodCode='species (stems)'). Method 1 isn't valid for such plots; we must use method 2. First, what projects use this methodology? SELECT DISTINCT pr.project_id, project_name, PlotMethod, m.MethodCode, m.Description FROM projects pr JOIN plotMetadata p JOIN lookup_MethodCode m ON pr.project_id=p.project_id AND p.MethodCode=m.MethodCode WHERE m.MethodCode=8; +------------+-------------------------+---------------------------------------+------------+-----------------+ | project_id | project_name | PlotMethod | MethodCode | Description | +------------+-------------------------+---------------------------------------+------------+-----------------+ | 1 | Gentry Transect Dataset | 0.1 ha transect, stems >= 2.5 cm dbh | 8 | species (stems) | +------------+-------------------------+---------------------------------------+------------+-----------------+ 1 row in set (0.04 sec) Gentry plots are the only ones. First, let's check to see if we have the expected distribution of number of individuals: SELECT NoInd, COUNT(*) AS observations FROM plotObservations o JOIN plotMetadata p JOIN lookup_MethodCode m ON o.PlotID=p.PlotID AND p.MethodCode=m.MethodCode WHERE project_id=2 GROUP BY NoInd; +-------+--------------+ | NoInd | observations | +-------+--------------+ | NULL | 1 | | 1 | 32823 | | 2 | 6591 | | 3 | 2445 | | 4 | 1199 | | 5 | 676 | | 6 | 405 | | 7 | 266 | | 8 | 187 | | 9 | 113 | | 10 | 132 | | 11 | 60 | | 12 | 56 | | 13 | 37 | | 14 | 40 | | 15 | 27 | | 16 | 27 | | 17 | 10 | | 18 | 7 | | 19 | 7 | | 20 | 6 | | 21 | 5 | | 22 | 2 | | 23 | 1 | | 24 | 5 | | 25 | 2 | | 26 | 1 | | 27 | 2 | | 28 | 1 | | 30 | 2 | | 31 | 3 | | 32 | 1 | | 40 | 1 | | 41 | 1 | | 48 | 1 | | 77 | 1 | +-------+--------------+ 36 rows in set (0.12 sec) As expected, except for the one NULL value. Let's not worry about it. Let's do the count for the first 10 plots: SELECT p.PlotID, p.SiteCode, SUM(NoInd) AS Individuals FROM projects pr JOIN plotMetadata p JOIN plotObservations o ON pr.project_id=p.project_id AND p.PlotID=o.PlotID WHERE p.project_id=1 GROUP BY p.PlotID, p.SiteCode LIMIT 10; +--------+----------+-------------+ | PlotID | SiteCode | Individuals | +--------+----------+-------------+ | 1 | ACHUPALL | 428 | | 2 | ALLACHER | 248 | | 3 | ALLPAHUA | 402 | | 4 | ALTERDOC | 164 | | 5 | ALTODEMI | 322 | | 6 | ALTOSAPA | 391 | | 7 | AMOTAPE | 395 | | 8 | ANCHICAY | 412 | | 9 | ANKARIF | 436 | | 10 | ANTADO | 383 | +--------+----------+-------------+ 10 rows in set (0.00 sec) 5. Count of stems per plot First, let's find out which methodologies are associated with stem measurements: SELECT PlotMethod, MethodCode, Description, COUNT(project_id) as projects FROM ( SELECT DISTINCT p.PlotMethod, p.MethodCode, p.Description, p.project_id FROM ( SELECT project_id, PlotID, PlotMethod, m.MethodCode, m.Description FROM plotMetadata p JOIN lookup_MethodCode m ON p.MethodCode=m.MethodCode ) p JOIN plotObservations o JOIN stems s ON p.PlotID=o.PlotID AND o.PlotObsID=s.PlotObsID GROUP BY project_id ) AS a GROUP BY PlotMethod, MethodCode, Description; +-----------------------------------------------------------------------------------------------------------------+------------+-----------------+----------+ | PlotMethod | MethodCode | Description | projects | +-----------------------------------------------------------------------------------------------------------------+------------+-----------------+----------+ | 0.01 ha, stems >= 10 cm dbh; 16 subplots of 10 x 10 m per plot, 4 rows (UU, U, S, SS), 4 columns (BB, B, T, TT) | 3 | individuals | 1 | | 0.1 ha transect, stems >= 2.5 cm dbh | 3 | individuals | 9 | | 0.1 ha transect, stems >= 2.5 cm dbh | 8 | species (stems) | 1 | | 1 ha, stems >= 10 cm dbh | 3 | individuals | 7 | +-----------------------------------------------------------------------------------------------------------------+------------+-----------------+----------+ 4 rows in set (4.14 sec) Normally, you'd expect only "individuals" methodology to allow collecting of stem information. That one project which uses counts of stems per species ("species (stems)") is the Gentry plots. Just a quirk of the bad way in which the original data was recorded: it has individual stem measurement, and counts of individuals per species, but no way to associate sets of stems with individuals trees. However, we can still count stems even for these weird Gentry data. Next, let's find a project which uses "individuals" methodology and has associated stem measurements. SELECT DISTINCT pr.project_id, project_name FROM projects pr JOIN plotMetadata p JOIN plotObservations o JOIN stems s ON pr.project_id=p.project_id AND p.PlotID=o.PlotID AND o.PlotObsID=s.PlotObsID WHERE p.MethodCode=3; +------------+------------------------------------------------+ | project_id | project_name | +------------+------------------------------------------------+ | 2 | Boyle Transects | | 3 | OTS Transects | | 5 | RAINFOR - 0.1 ha Madre de Dios, Peru | | 6 | Noel Kempff Forest Plots | | 10 | Enquist Lab Transect Dataset | | 9 | Inventarios de Bosques de la Costa del Ecuador | | 8 | Inventarios de Bosques en Ecuador | | 12 | Cam Webb Borneo Plots | | 13 | Pilon Lajas Treeplots Bolivia | | 17 | Madidi Permanent Plots | | 18 | SERBO Selva Seca Oaxaca | | 19 | DeWalt Bolivia forest plots | | 21 | La Selva Secondary Forest Plots | | 22 | ACA Amazon Forest Inventories | | 24 | Bonifacino Forest Transects | | 25 | RAINFOR - 1 ha Peru | | 14 | Madidi Transects | +------------+------------------------------------------------+ 17 rows in set (1.16 sec) OK, Boyle Transect will do. Now, count stems for the first 10 plots: SELECT p.PlotID, SiteName as plot, COUNT(*) AS stems FROM plotMetadata p JOIN plotObservations o JOIN stems s ON p.PlotID=o.PlotID AND o.PlotObsID=s.PlotObsID WHERE p.project_id=2 GROUP BY p.PlotID, SiteName LIMIT 10; +--------+---------+-------+ | PlotID | plot | stems | +--------+---------+-------+ | 290 | c1000-1 | 371 | | 291 | c1250-1 | 258 | | 292 | c1750-1 | 343 | | 293 | c1750-2 | 285 | | 294 | c1750-3 | 357 | | 295 | c2000-1 | 424 | | 296 | c2000-2 | 434 | | 297 | c2250-1 | 525 | | 298 | c2750-1 | 266 | | 299 | c2750-2 | 324 | +--------+---------+-------+ 10 rows in set (0.13 sec) 6. Count of species (fully-identified species + morphospecies) per plot, Boyle Transects SELECT p.PlotID, SiteName AS plot, COUNT(DISTINCT Family, Genus, Species) AS SpeciesCount FROM plotMetadata p JOIN plotObservations o ON p.PlotID=o.PlotID WHERE project_id=2 GROUP BY p.PlotID, SiteName; +--------+---------+--------------+ | PlotID | plot | SpeciesCount | +--------+---------+--------------+ | 290 | c1000-1 | 133 | | 291 | c1250-1 | 95 | | 292 | c1750-1 | 86 | | 293 | c1750-2 | 73 | | 294 | c1750-3 | 82 | | 295 | c2000-1 | 74 | | 296 | c2000-2 | 62 | | 297 | c2250-1 | 65 | | 298 | c2750-1 | 24 | | 299 | c2750-2 | 28 | | 300 | c2750-3 | 39 | | 301 | c500-1 | 118 | | 302 | c750-1 | 130 | | 303 | c750-2 | 115 | | 304 | c750-3 | 128 | | 305 | e1250-1 | 137 | | 306 | e1750-1 | 113 | | 307 | e1750-2 | 132 | | 308 | e1750-3 | 105 | | 309 | e2250-1 | 74 | | 310 | e2250-2 | 101 | | 311 | e2750-1 | 50 | | 312 | e2750-2 | 52 | | 313 | e750-1 | 101 | | 314 | e750-2 | 115 | | 315 | e750-3 | 119 | | 316 | m1250-1 | 49 | | 317 | m1250-2 | 62 | | 318 | m1750-1 | 54 | | 319 | m1750-2 | 58 | | 320 | m1750-3 | 53 | | 321 | m2250-1 | 42 | | 322 | m2750-1 | 22 | | 323 | m2750-2 | 15 | | 324 | m2750-3 | 25 | | 325 | m750-1 | 60 | | 326 | m750-2 | 74 | +--------+---------+--------------+ 37 rows in set (0.03 sec) 7. List of all species (fully-identified species + morphospecies) for one plot As an example, let's use the plot from above with fewest species: SELECT DISTINCT p.PlotID, SiteName AS plot, Family, CONCAT_WS(' ',IFNULL(Genus,''), IFNULL(Species,'')) AS SpeciesName FROM plotMetadata p JOIN plotObservations o ON p.PlotID=o.PlotID WHERE p.PlotID=298; +--------+---------+-----------------+------------------------------+ | PlotID | plot | Family | SpeciesName | +--------+---------+-----------------+------------------------------+ | 298 | c2750-1 | Aquifoliaceae | Ilex lamprophylla | | 298 | c2750-1 | Araliaceae | Oreopanax xalapensis | | 298 | c2750-1 | Araliaceae | Schefflera rodriguesiana | | 298 | c2750-1 | Cunoniaceae | Weinmannia fagaroides | | 298 | c2750-1 | Ericaceae | Vaccinium poasanum | | 298 | c2750-1 | Clethraceae | Clethra pyrogena | | 298 | c2750-1 | Ericaceae | Cavendishia bracteata | | 298 | c2750-1 | Melastomataceae | Miconia schnellii | | 298 | c2750-1 | Cunoniaceae | Weinmannia pinnata | | 298 | c2750-1 | Myrsinaceae | Ardisia pleurobotrya | | 298 | c2750-1 | Saxifragaceae | Escallonia myrtilloides | | 298 | c2750-1 | Melastomataceae | Miconia longibracteata | | 298 | c2750-1 | Rhamnaceae | Rhamnus oreodendron | | 298 | c2750-1 | Araceae | Anthurium concinnatum | | 298 | c2750-1 | Caprifoliaceae | Viburnum venustum | | 298 | c2750-1 | Myrsinaceae | Myrsine pittieri | | 298 | c2750-1 | Ericaceae | Disterigma humboldtii | | 298 | c2750-1 | Winteraceae | Drimys granadensis | | 298 | c2750-1 | Piperaceae | Piper pittieri | | 298 | c2750-1 | Apiaceae | Myrrhidendron donnellsmithii | | 298 | c2750-1 | Loranthaceae | Gaiadendron punctatum | | 298 | c2750-1 | PTERIDOPHYTA | Cyathea suprastrigosa | | 298 | c2750-1 | Solanaceae | Solanum vacciniiflorum | | 298 | c2750-1 | Araliaceae | Dendropanax sp.1 | +--------+---------+-----------------+------------------------------+ 24 rows in set (0.01 sec) 9. List of fully-identified species only for one plot (using `name_status` as filter criterion) Notice that one of the species above is a morphospecies (Dendropanax sp.1). Unfortunately, SALVIAS keeps morphospecies strings in the same field as standard specific epithets. To exclude morphospecies and get a list only of fully determined species (scientific names), we need to use the flag `name_status`. Here are a few queries explaining what the column means: SELECT name_status, count(*) FROM plotObservations GROUP BY name_status; +-------------+----------+ | name_status | count(*) | +-------------+----------+ | NULL | 1026 | | 0 | 1841 | | 1 | 374676 | | 2 | 6940 | | 3 | 42371 | +-------------+----------+ 5 rows in set (0.70 sec) Too bad about the null values. Looks like `name_status` wasn't consistently used. Let's look at the non-null values: mysql> select name_status, Genus, Species from plotObservations where name_status=1 limit 10; +-------------+--------------+-------------+ | name_status | Genus | Species | +-------------+--------------+-------------+ | 1 | Geonoma | seleri | | 1 | Tovomitopsis | allenii | | 1 | Tovomitopsis | allenii | | 1 | Celastrus | vulcanicola | | 1 | Psychotria | sylvivaga | | 1 | Tovomitopsis | allenii | | 1 | Geonoma | seleri | | 1 | Geonoma | seleri | | 1 | Geonoma | seleri | | 1 | Geonoma | seleri | +-------------+--------------+-------------+ 10 rows in set (0.00 sec) The above are standard scientific names. mysql> select name_status, Genus, Species from plotObservations where name_status=2 limit 10; +-------------+--------------+----------+ | name_status | Genus | Species | +-------------+--------------+----------+ | 2 | gen_indet. | sp. | | 2 | gen_indet. | sp. | | 2 | gen_indet. | sp. | | 2 | Elaeocarpus | sp. nov. | | 2 | Cryptocarya | sp. nov. | | 2 | Planchonella | sp. nov. | | 2 | Planchonella | sp. nov. | | 2 | Planchonella | sp. nov. | | 2 | Planchonella | sp. nov. | | 2 | Planchonella | sp. nov. | +-------------+--------------+----------+ 10 rows in set (0.03 sec) The above are essentially "undefined" species, either definitely undescribed (sp. nov), or "who knows" (sp.). Not quite the same as morphospecies in that we do not know if two records labeled as "Miconia sp." refer to the same thing. mysql> select name_status, Genus, Species from plotObservations where name_status=3 limit 10; +-------------+---------+---------+ | name_status | Genus | Species | +-------------+---------+---------+ | 3 | Ardisia | sp.4 | | 3 | Ardisia | sp.1 | | 3 | Ardisia | sp.1 | | 3 | Ardisia | sp.1 | | 3 | Eugenia | sp.1 | | 3 | Ardisia | sp.4 | | 3 | Miconia | sp.6 | | 3 | Ardisia | sp.1 | | 3 | Miconia | sp.6 | | 3 | Ardisia | sp.3 | +-------------+---------+---------+ 10 rows in set (0.00 sec) The above are morphospecies. Unidentified formally (no complete scientific name), but the data collector was asserting that all records of, say, Ardisia sp.4, refer to the same species (only within a given plot. Between plots we don't know, or at least there is insufficient metadata in SALVIAS to determine this. mysql> select name_status, Genus, Species from plotObservations where name_status=0 limit 10; +-------------+----------+-----------+ | name_status | Genus | Species | +-------------+----------+-----------+ | 0 | Faramea | cogolloi | | 0 | Faramea | cogolloi | | 0 | Faramea | cogolloi | | 0 | Faramea | cogolloi | | 0 | Faramea | cogolloi | | 0 | Dussia | montana | | 0 | Siparuna | tecaphora | | 0 | Siparuna | tecaphora | | 0 | Siparuna | tecaphora | | 0 | Eleagnus | CONFORTA | +-------------+----------+-----------+ 10 rows in set (0.02 sec) The above are harder to explain. Basically, SALVIAS is recognizing that they are intended to be scientific names (not morphospecies) but they could not be matched to any published name in Tropicos or taxonomic databases such as IPNI. So, they are provisionally scientific names. So, let's have a look at the name_status flag for that plot we queried above: SELECT DISTINCT p.PlotID, SiteName AS plot, Family, CONCAT_WS(' ',IFNULL(Genus,''), IFNULL(Species,'')) AS SpeciesName, name_status FROM plotMetadata p JOIN plotObservations o ON p.PlotID=o.PlotID WHERE p.PlotID=298; +--------+---------+-----------------+------------------------------+-------------+ | PlotID | plot | Family | SpeciesName | name_status | +--------+---------+-----------------+------------------------------+-------------+ | 298 | c2750-1 | Aquifoliaceae | Ilex lamprophylla | 1 | | 298 | c2750-1 | Araliaceae | Oreopanax xalapensis | 1 | | 298 | c2750-1 | Araliaceae | Schefflera rodriguesiana | 1 | | 298 | c2750-1 | Cunoniaceae | Weinmannia fagaroides | 1 | | 298 | c2750-1 | Ericaceae | Vaccinium poasanum | 1 | | 298 | c2750-1 | Clethraceae | Clethra pyrogena | 1 | | 298 | c2750-1 | Ericaceae | Cavendishia bracteata | 1 | | 298 | c2750-1 | Melastomataceae | Miconia schnellii | 1 | | 298 | c2750-1 | Cunoniaceae | Weinmannia pinnata | 1 | | 298 | c2750-1 | Myrsinaceae | Ardisia pleurobotrya | 1 | | 298 | c2750-1 | Saxifragaceae | Escallonia myrtilloides | 1 | | 298 | c2750-1 | Melastomataceae | Miconia longibracteata | 1 | | 298 | c2750-1 | Rhamnaceae | Rhamnus oreodendron | 1 | | 298 | c2750-1 | Araceae | Anthurium concinnatum | 1 | | 298 | c2750-1 | Caprifoliaceae | Viburnum venustum | 1 | | 298 | c2750-1 | Myrsinaceae | Myrsine pittieri | 1 | | 298 | c2750-1 | Ericaceae | Disterigma humboldtii | 1 | | 298 | c2750-1 | Winteraceae | Drimys granadensis | 1 | | 298 | c2750-1 | Piperaceae | Piper pittieri | 1 | | 298 | c2750-1 | Apiaceae | Myrrhidendron donnellsmithii | 1 | | 298 | c2750-1 | Loranthaceae | Gaiadendron punctatum | 1 | | 298 | c2750-1 | PTERIDOPHYTA | Cyathea suprastrigosa | 1 | | 298 | c2750-1 | Solanaceae | Solanum vacciniiflorum | 1 | | 298 | c2750-1 | Araliaceae | Dendropanax sp.1 | 3 | +--------+---------+-----------------+------------------------------+-------------+ 24 rows in set (0.00 sec) So, finally, to get a list of proper scientific names only, we use name_status=1 as our criterion: SELECT DISTINCT p.PlotID, SiteName AS plot, Family, CONCAT_WS(' ',IFNULL(Genus,''), IFNULL(Species,'')) AS SpeciesName, name_status FROM plotMetadata p JOIN plotObservations o ON p.PlotID=o.PlotID WHERE p.PlotID=298 AND name_status=1; +--------+---------+-----------------+------------------------------+-------------+ | PlotID | plot | Family | SpeciesName | name_status | +--------+---------+-----------------+------------------------------+-------------+ | 298 | c2750-1 | Aquifoliaceae | Ilex lamprophylla | 1 | | 298 | c2750-1 | Araliaceae | Oreopanax xalapensis | 1 | | 298 | c2750-1 | Araliaceae | Schefflera rodriguesiana | 1 | | 298 | c2750-1 | Cunoniaceae | Weinmannia fagaroides | 1 | | 298 | c2750-1 | Ericaceae | Vaccinium poasanum | 1 | | 298 | c2750-1 | Clethraceae | Clethra pyrogena | 1 | | 298 | c2750-1 | Ericaceae | Cavendishia bracteata | 1 | | 298 | c2750-1 | Melastomataceae | Miconia schnellii | 1 | | 298 | c2750-1 | Cunoniaceae | Weinmannia pinnata | 1 | | 298 | c2750-1 | Myrsinaceae | Ardisia pleurobotrya | 1 | | 298 | c2750-1 | Saxifragaceae | Escallonia myrtilloides | 1 | | 298 | c2750-1 | Melastomataceae | Miconia longibracteata | 1 | | 298 | c2750-1 | Rhamnaceae | Rhamnus oreodendron | 1 | | 298 | c2750-1 | Araceae | Anthurium concinnatum | 1 | | 298 | c2750-1 | Caprifoliaceae | Viburnum venustum | 1 | | 298 | c2750-1 | Myrsinaceae | Myrsine pittieri | 1 | | 298 | c2750-1 | Ericaceae | Disterigma humboldtii | 1 | | 298 | c2750-1 | Winteraceae | Drimys granadensis | 1 | | 298 | c2750-1 | Piperaceae | Piper pittieri | 1 | | 298 | c2750-1 | Apiaceae | Myrrhidendron donnellsmithii | 1 | | 298 | c2750-1 | Loranthaceae | Gaiadendron punctatum | 1 | | 298 | c2750-1 | PTERIDOPHYTA | Cyathea suprastrigosa | 1 | | 298 | c2750-1 | Solanaceae | Solanum vacciniiflorum | 1 | +--------+---------+-----------------+------------------------------+-------------+ 23 rows in set (0.00 sec) If you weren't so strict and were willing to include unverified scientific names, you could also use the criterion "name_status=1 OR name_status=0". 10. Count of individuals in each subplot within a single plot (individual-observation plots only) In SALVIAS, subplot codes are in the column `Line` in the table `plotObservations`. (a) Method one, for 'individuals' method plots only. I won't do a lookup here, as we already know that all Boyle Transects use this methodology. SELECT p.PlotID, SiteCode AS plot, o.Line as subplot, COUNT(*) AS individuals FROM plotMetadata p JOIN plotObservations o ON p.PlotID=o.PlotID WHERE p.PlotID=298 GROUP BY p.PlotID, plot, subplot; +--------+---------+---------+-------------+ | PlotID | plot | subplot | individuals | +--------+---------+---------+-------------+ | 298 | c2750-1 | 1 | 14 | | 298 | c2750-1 | 10 | 20 | | 298 | c2750-1 | 2 | 29 | | 298 | c2750-1 | 3 | 26 | | 298 | c2750-1 | 4 | 26 | | 298 | c2750-1 | 5 | 19 | | 298 | c2750-1 | 6 | 12 | | 298 | c2750-1 | 7 | 8 | | 298 | c2750-1 | 8 | 20 | | 298 | c2750-1 | 9 | 15 | +--------+---------+---------+-------------+ 10 rows in set (0.00 sec) (b) Method 2, for 'individuals' method plots AND 'species (stems)' method plots. We MUST use this query method for 'species (stems)' plots (= the Gentry plots, see above). First, for the same plot as above: SELECT p.PlotID, SiteCode AS plot, o.Line as subplot, SUM(NoInd) AS individuals FROM plotMetadata p JOIN plotObservations o ON p.PlotID=o.PlotID WHERE p.PlotID=298 GROUP BY p.PlotID, plot, subplot; +--------+---------+---------+-------------+ | PlotID | plot | subplot | individuals | +--------+---------+---------+-------------+ | 298 | c2750-1 | 1 | 14 | | 298 | c2750-1 | 10 | 20 | | 298 | c2750-1 | 2 | 29 | | 298 | c2750-1 | 3 | 26 | | 298 | c2750-1 | 4 | 26 | | 298 | c2750-1 | 5 | 19 | | 298 | c2750-1 | 6 | 12 | | 298 | c2750-1 | 7 | 8 | | 298 | c2750-1 | 8 | 20 | | 298 | c2750-1 | 9 | 15 | +--------+---------+---------+-------------+ 10 rows in set (0.01 sec) Now a Gentry plot. I happen to know the plot #1 is a Gentry plot, so will use that as an example of a 'species (stems)' plot: SELECT p.PlotID, SiteCode AS plot, o.Line as subplot, SUM(NoInd) AS individuals FROM plotMetadata p JOIN plotObservations o ON p.PlotID=o.PlotID WHERE p.PlotID=1 GROUP BY p.PlotID, plot, subplot; +--------+----------+---------+-------------+ | PlotID | plot | subplot | individuals | +--------+----------+---------+-------------+ | 1 | ACHUPALL | 1 | 53 | | 1 | ACHUPALL | 2 | 62 | | 1 | ACHUPALL | 3 | 72 | | 1 | ACHUPALL | 4 | 83 | | 1 | ACHUPALL | 5 | 71 | | 1 | ACHUPALL | 6 | 87 | +--------+----------+---------+-------------+ 6 rows in set (0.00 sec) Hmm, so Gentry's plot ACHUPALL only had 6 lines. Actually an important bit of metadata I did not know before. Probably a good idea to exclude this plot from analyses given that it is incomplete. 11. List of species and their percent cover for a single aggregate observation plot This shows how to deal with the type of aggregate observation plot which uses percent cover, not counts of individuals. Recall from above that the MethodCode for percent cover plots is 1. First, how many plots use this methodology? SELECT COUNT(*) FROM plotMetadata WHERE MethodCode=1; +----------+ | COUNT(*) | +----------+ | 12962 | +----------+ 1 row in set (0.01 sec) Tons. OK, let's get one: SELECT PlotID FROM plotMetadata WHERE MethodCode=1 LIMIT 1; +--------+ | PlotID | +--------+ | 24589 | +--------+ 1 row in set (0.00 sec) Now get percent cover for all the species: SELECT p.PlotID, SiteCode AS plot, Family, CONCAT_WS(' ',IFNULL(Genus,''), IFNULL(Species,'')) AS SpeciesName, cover_percent FROM plotMetadata p JOIN plotObservations o ON p.PlotID=o.PlotID WHERE p.PlotID=24589; +--------+----------+-----------------+------------------------+---------------+ | PlotID | plot | Family | SpeciesName | cover_percent | +--------+----------+-----------------+------------------------+---------------+ | 24589 | INW33656 | Caprifoliaceae | Symphoricarpos albus | 3.00 | | 24589 | INW33656 | Caprifoliaceae | Sambucus cerulea | 1.00 | | 24589 | INW33656 | Rosaceae | Rosa woodsii | 3.00 | | 24589 | INW33656 | Pyrolaceae | Pterospora andromedea | 1.00 | | 24589 | INW33656 | Pinaceae | Pseudotsuga menziesii | 30.00 | | 24589 | INW33656 | Rosaceae | Prunus emarginata | 1.00 | | 24589 | INW33656 | Pinaceae | Pinus ponderosa | 40.00 | | 24589 | INW33656 | Saxifragaceae | Philadelphus lewisii | 10.00 | | 24589 | INW33656 | Apiaceae | Osmorhiza chilensis | 10.00 | | 24589 | INW33656 | Caryophyllaceae | Moehringia macrophylla | 3.00 | | 24589 | INW33656 | Boraginaceae | Lithospermum ruderale | 1.00 | | 24589 | INW33656 | Rosaceae | Holodiscus discolor | 3.00 | | 24589 | INW33656 | Asteraceae | Hieracium scouleri | 3.00 | | 24589 | INW33656 | Asteraceae | Hieracium albiflorum | 3.00 | | 24589 | INW33656 | Orchidaceae | Goodyera oblongifolia | 3.00 | | 24589 | INW33656 | Rubiaceae | Galium triflorum | 3.00 | | 24589 | INW33656 | Poaceae | Festuca occidentalis | 3.00 | | 24589 | INW33656 | Poaceae | Elymus glaucus | 10.00 | | 24589 | INW33656 | Cyperaceae | Carex geyeri | 20.00 | | 24589 | INW33656 | Poaceae | Bromopsis vulgaris | 3.00 | | 24589 | INW33656 | Berberidaceae | Berberis aquifolium | 10.00 | | 24589 | INW33656 | Aceraceae | Acer macrophyllum | 1.00 | | 24589 | INW33656 | Aceraceae | Acer glabrum | 3.00 | +--------+----------+-----------------+------------------------+---------------+ 23 rows in set (0.00 sec) In theory, this should be all that's needed. No summing. Unless of course the plot has subplots. That would be something to check before running a query such as this one. 12. List of locality and environmental details for all plots in a single project Here's what the basic metadata should look like for the first few Gentry plots: SELECT p.PlotID, p.SiteCode as plot, Country, PolDiv1 AS stateProvince, pol2 AS countyParish, LatDec as latitude, LongDec as longitude, Elev as elevation FROM projects pr JOIN plotMetadata p ON pr.project_id=p.project_id WHERE pr.project_id=1 LIMIT 20; +--------+----------+------------------+---------------+--------------+-----------+-----------+-----------+ | PlotID | plot | Country | stateProvince | countyParish | latitude | longitude | elevation | +--------+----------+------------------+---------------+--------------+-----------+-----------+-----------+ | 1 | ACHUPALL | Ecuador | NULL | NULL | -3.45 | -78.3667 | 2100 | | 2 | ALLACHER | Germany | NULL | NULL | 48.0667 | 11.5 | 530 | | 3 | ALLPAHUA | Peru | NULL | NULL | -3.95 | -73.4167 | 155 | | 4 | ALTERDOC | Brazil | NULL | NULL | -2.5 | -54.9667 | 50 | | 5 | ALTODEMI | Colombia | NULL | NULL | 10.9167 | -73.8333 | 1180 | | 6 | ALTOSAPA | Colombia | NULL | NULL | 7.16667 | -75.9 | 2660 | | 7 | AMOTAPE | Peru | NULL | NULL | -4.15 | -80.6167 | 830 | | 8 | ANCHICAY | Colombia | NULL | NULL | 3.75 | -76.8333 | 300 | | 9 | ANKARIF | Madagascar | NULL | NULL | -16.3167 | 46.8167 | 80 | | 10 | ANTADO | Colombia | NULL | NULL | 7.25 | -75.9167 | 1560 | | 11 | ARARACUA | Colombia | NULL | NULL | -0.416667 | -72.3333 | 200 | | 12 | ARCATING | Colombia | NULL | NULL | -0.416667 | -72.3167 | 250 | | 13 | AVALANCH | India | NULL | NULL | 11.3 | 76.5833 | 2100 | | 14 | BABLERSP | USA | Missouri | St. Louis | 38.5333 | -90.6667 | 150 | | 15 | BAITETE | Papua New Guinea | NULL | NULL | -5.16667 | 145.8 | 200 | | 16 | BAKOSAR | Malaysia | NULL | NULL | 1.75 | 110.417 | 90 | | 17 | BANKAMP | USA | Ohio | Belmont | 40.05 | -81.0167 | 250 | | 18 | BANYONG | Cameroun | NULL | NULL | 5 | 9.16667 | 420 | | 19 | BELEM | Brazil | NULL | NULL | -1.5 | -47.9833 | 20 | | 20 | BELINGA | Gabon | NULL | NULL | 1.15 | 13.2 | 750 | +--------+----------+------------------+---------------+--------------+-----------+-----------+-----------+ 20 rows in set (0.00 sec) 13. Display data access conditions (embargo levels) for individual plots Final thing. There is an extremely important column in the plot table in SALVIAS which indicates the allowed level of access for an individual plot. It is called "AccessCode", and here is a count of the number of plots with each value: SELECT AccessCode, COUNT(*) FROM plotMetadata GROUP BY AccessCode; +------------+----------+ | AccessCode | COUNT(*) | +------------+----------+ | 1 | 5 | | 2 | 236 | | 3 | 13420 | +------------+----------+ 3 rows in set (0.02 sec) There is no information anywhre in the `salvias_plots` database to tell you what AccessCode means. That is because it is used by a separate database, `salvias_users`, to control who can download what. However, there is a description of this code on the SALVIAS website, at http://www.salvias.net/eula/eula1.html: User Can Access: Level Description Description Data 1 Completely Restricted False False 2 Plot Description Only True False 3 Full Access True True So, level 3 plots can be downloaded by anyone (although they are still required to offer the data owner to option of co-authorship should they publish; SALVIAS records user name and timestamp of any download and reports to the data owner). For level 2 plots, the metadata only is visible on SALVIAS. Anyone wishing to use the data must contact the data owner to obtain the full data. Level 3 plots are completely hidden to the public, including metadata. They are visible only to the data owner and database administrators. `AccessCode` defines the default access for a given plot. This is the access a new user will have when she signs on. However, the data owner can assign different access levels to individual users. Please be careful with plots with AccessCode 1 or 2, especially level 1. Plots with AccessCode=1 should not be shared with anyone. We do not have permission to do so. Below is a bit of metadata about the level 1 plots and their owners: SELECT pr.project_id, project_name, project_pi, allow_download_all, PlotID, SiteCode, Country FROM projects pr JOIN plotMetadata p ON pr.project_id=p.project_id WHERE p.AccessCode=1; +------------+-------------------------+-------------+--------------------+--------+------------+---------+ | project_id | project_name | project_pi | allow_download_all | PlotID | SiteCode | Country | +------------+-------------------------+-------------+--------------------+--------+------------+---------+ | 16 | nsf_example | nsf_salvias | 1 | 3778 | nsf_1 | Bolivia | | 16 | nsf_example | nsf_salvias | 1 | 3779 | nsf_2 | USA | | 18 | SERBO Selva Seca Oaxaca | boyle_admin | 0 | 3791 | La Cotorra | Mexico | | 18 | SERBO Selva Seca Oaxaca | boyle_admin | 0 | 3792 | La Bamba | Mexico | | 18 | SERBO Selva Seca Oaxaca | boyle_admin | 0 | 3793 | El Chorro | Mexico | +------------+-------------------------+-------------+--------------------+--------+------------+---------+ 5 rows in set (0.05 sec) The first two plots are just examples to show how the access controls work. The last three however belong to a friend of mine. She has not yet published them, and we are not allowed to show them to anyone until she does and gives us the green light.