Project

General

Profile

higher_plant_group node names in Tropicos/APGIII

from Brad:

> how you derived the subclass lookup table for APGIII, in case we ever want to regenerate it on an updated version of APGIII

-- descendent lookup for class "Equisetopside" 
-- subclasses only
SELECT nameRank, scientificName
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE
leftIndex>=
(
SELECT leftIndex
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE scientificName='Equisetopsida'
AND c.sourceID=1
)
AND
rightIndex<=
(
SELECT rightIndex
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE scientificName='Equisetopsida'
AND c.sourceID=1
)
AND sourceID=1
AND nameRank IN ('subclass')
ORDER BY leftIndex;

from Brad:

> Could you please translate our NCBI higher_plant_group node names to Tropicos/APGIII?

It *think* you should be able to do this by querying subclasses for source Tropicos (sourceID=1) in the tnrs database. See the attached spreadsheet of mappings between APG III (=Tropicos) subclasses and BIEN higherPlantGroup.

To query higher taxa using the TNRS database, you need join table name to table classification. Be sure to set sourceID=1.

Here is how you would query the ancestor of rank 'subclass' using scientific name. This is for demonstration purposes only: it is not safe to query by scientificName alone due to possible homonyms (each subquery MUST return one value only).

SELECT nameRank, scientificName
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE
leftIndex<=
(
SELECT leftIndex
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE scientificName='Billia hippocastanum'
AND c.sourceID=1
)
AND
rightIndex>=
(
SELECT rightIndex
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE scientificName='Billia hippocastanum'
AND c.sourceID=1
)
AND sourceID=1
AND nameRank='subclass';

+----------+----------------+
| nameRank | scientificName |
+----------+----------------+
| subclass | Magnoliidae    |
+----------+----------------+
1 row in set (0.02 sec)

In general, however, it is safer to use the TNRS nameID:

mysql> SELECT n.nameID FROM name n JOIN name_source ns ON n.nameID=ns.nameID WHERE scientificName='Billia hippocastanum' AND ns.sourceID=1;
+--------+
| nameID |
+--------+
| 263948 |
+--------+
1 row in set (0.29 sec)

SELECT nameRank, scientificName
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE
leftIndex<=
(
SELECT leftIndex
FROM classification
WHERE nameID=263948
AND c.sourceID=1
)
AND
rightIndex>=
(
SELECT rightIndex
FROM classification
WHERE nameID=263948
AND c.sourceID=1
)
AND sourceID=1
AND nameRank='subclass';

+----------+----------------+
| nameRank | scientificName |
+----------+----------------+
| subclass | Magnoliidae    |
+----------+----------------+
1 row in set (0.01 sec)

You can look up subclasses for multiple species at once by using a cross-product with a table containing multiple nameIDs. For example, given the following species:
SELECT n.nameID, n.scientificName, n.scientificNameAuthorship
FROM name n JOIN name_source ns
ON n.nameID=ns.nameID
WHERE  ns.sourceID=1
AND n.scientificName LIKE 'Billia%'
AND n.nameRank='species';

+--------+----------------------+-----------------------------------------+
| nameID | scientificName       | scientificNameAuthorship                |
+--------+----------------------+-----------------------------------------+
| 263940 | Billia columbiana    | Planch. & Linden                        |
| 263948 | Billia hippocastanum | Peyr.                                   |
| 875259 | Billia rosea         | (Planch. & Linden) C. Ulloa & P. Jørg.  |
+--------+----------------------+-----------------------------------------+
3 rows in set (0.00 sec)

You can look up subclasses as follows:

SELECT spp.nameID, subcl.scientificName AS subclass
FROM
(
SELECT n.nameID, c.leftIndex, c.rightIndex
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE  c.sourceID=1
AND n.scientificName LIKE 'Billia%'
AND n.nameRank='species'
) AS spp,
(
SELECT nameRank, scientificName, leftIndex, rightIndex
FROM name n JOIN classification c
ON n.nameID=c.nameID
WHERE sourceID=1
AND n.nameRank='subclass'
) AS subcl
WHERE spp.leftIndex>subcl.leftIndex
AND spp.rightIndex<subcl.rightIndex;

+--------+-------------+
| nameID | subclass    |
+--------+-------------+
| 263940 | Magnoliidae |
| 263948 | Magnoliidae |
| 875259 | Magnoliidae |
+--------+-------------+
3 rows in set (0.00 sec)

Finally, you can look up the TNRS nameID using either the Tropicos name url or the Tropicos nameID; both are stored in table name_source:

mysql> SELECT n.nameID, n.scientificName, nameSourceOriginalID, nameSourceUrl
    -> FROM name n JOIN name_source ns
    -> ON n.nameID=ns.nameID
    -> WHERE sourceID=1
    -> AND n.nameID=263948;
+--------+----------------------+----------------------+---------------------------------------+
| nameID | scientificName       | nameSourceOriginalID | nameSourceUrl                         |
+--------+----------------------+----------------------+---------------------------------------+
| 263948 | Billia hippocastanum | 15500014             | http://www.tropicos.org/Name/15500014 |
+--------+----------------------+----------------------+---------------------------------------+
1 row in set (0.00 sec)