text IDs¶
A text ID, as defined here, is a unique key which is stable and human-readable.
Being stable allows it to retain its uniqueness into the future. This makes the ID safe to use to externally refer to the record.
Stability is especially important, because databases like VegBIEN that are frequently rebuilt will have changing numeric IDs, but the same text IDs.
note that the suggested formats in the examples create globally-unique text IDs, which can be used in place of other ID types. however, it is also possible to create locally-unique text IDs for use as local u-names.
(text IDs are similar to URNs but have a more general structure)
format¶
- It must be a u-name (unambiguous name) for the entity
- It should be the c-name (canonical name) for it whenever possible
- It should be human-readable whenever possible: you should be able to look at the ID and know immediately what it's referring to
- As part of being a c-name, it should be as short as possible while still being human-readable and unambiguous
- An abbreviation is preferable to a full name, but a name is preferable to a number or the URL generated by a URL shortener
- It should be a clickable URL whenever possible. The URL should take the user to the identified resource or a description of it.
- Where possible, the query portion of the URL (after the
?
) should be a SQL dotpath, because this is an unambiguous syntax - Path components that are not part of the URL should be placed after the
#
(or if there is already a#
, instead appended to the query string)
- Where possible, the query portion of the URL (after the
- Only use autogenerated primary keys if they are stable
- A record about an entity often has a different ID than the entity itself
- IDs should be uniform, using the following conventions:
examples | |
Separate path components with . or / |
US.AZ , US/AZ |
Separate labels from values with = |
country=US.state=AZ.city=Tucson |
Enclose literal values in () |
(bien3_architecture_denormalized.pptx) |
Escape special chars with \ |
[array_col=value1\,value2,other_col=...] |
Replace spaces with _ |
Brad_Boyle |
Use the standard capitalization of proper names | US , VegBank , ARIZ |
Use _ instead of camelCase |
catalog_number instead of catalogNumber |
Use filename-safe characters if possible | US.AZ instead of US:AZ |
Format dates as YYYY-M-D |
2000-1-1 |
When multiple IDs are available, include all of them (in separate columns) to allow looking up the record by any of them1. The column names should have the form id_by_...
. This way, when you fetch a record, you get the preferred natural key (id
), the id_by_source
, and any other natural keys that are available.
1 Note that only one of these IDs (the primary key) actually needs to be unique. The rest can contain duplicates, and would be used for queries that can return multiple rows.
.
Comparison to other ID formats¶
Human-readable | Clickable | Allows uppercase | Short | Nonrandom | Decentralized | Globally unique | Stable | Uniform | |
text ID | Y | Y | Y | Y | Y | Y | Y | Y | Y |
URL | Y | Y | Y | - | Y | - | Y | - | Y |
URN / LSID | Y | - | - | Y | Y | - | Y | Y | Y |
LDAP DN | Y | - | Y | Y | Y | Y | - | Y | Y |
VegBank accession code | Y | - | Y | Y | Y | Y | - | Y | Y |
GUID / UUID | - | - | - | - | - | Y | Y | - | Y |
DOI | - | - | - | Y | Y | - | Y | Y | Y |
OID | - | - | - | Y | Y | - | Y | Y | Y |
Performance considerations¶
When used in a foreign key, text IDs are subject to the performance limitations of string IDs. These can be overcome using a string interning technique.
Examples¶
eventually, these formats will be moved to the corresponding indexes in normalized VegCore
Time¶
form | examples |
date | 1945-8-24 |
date range | 1997-8-21..1998-9-3 |
indefinite date | 1950-8 (w/o day) |
datetime | 1945-8-24:13:00 |
Place¶
form | examples |
unlabeled path | US.AZ.Tucson |
labeled path | country=US.state=AZ.city=Tucson |
Resource¶
e.g. institution, database
form | examples |
HTTP URL without the protocol | salvias.net |
full URL | http://salvias.net |
database connection URL | postgresql://nimoy.nceas.ucsb.edu/salvias_plots |
standard abbreviation | ARIZ (an Index Herbariorum code) |
unambiguous name | VegBank , SALVIAS , UCSB |
place + locally unique name | US.CA.Santa_Barbara.University_of_California (multiple campuses) |
e-mail address (+ name) + date (+ subject) (+ attachment) | bboyle@email.arizona.edu?Brad_Boyle.2013-4-2-12:03PT.Re:+[Bien-db]+comparison+of+data+import+methods.(bien3_architecture_denormalized.pptx) i.e. e-mail@host?name.date.subject.(attachment.ext) |
person + date + presentation name (+ slide) | (UArizona.Brian_Enquist).2013-2-26.presentation=Toward+a+General+Informatics+Engine+to+Quantify+Botanical+Diversity+at+Large+Scales.slide=39 |
group + date + meeting type (+ meeting name) | bien.nceas.ucsb.edu?BIEN.2011-10-13.conference_call |
meeting notes URL | vegbiendev.nceas.ucsb.edu/wiki/2011-10-13_conference_call |
Data record¶
form | examples |
web interface URL | VegBank.org/cite/VB.Ob.3565.GRSM111 |
database + table + primary key | vegbiendev.nceas.ucsb.edu/VegBank?observation(3565) |
database + table + column + unique key | vegbiendev.nceas.ucsb.edu/VegBank?observation[authorobscode=(GRSM.111)] |
Person¶
form | examples |
e-mail address (+ name) | bboyle@email.arizona.edu?Brad_Boyle |
institution (+ department) + name | UArizona.Ecology.Brad_Boyle |
website + group name | bien.nceas.ucsb.edu?BIEN |
resource + username | vegbiendev.nceas.ucsb.edu/SALVIAS?@bboyle |
data record | vegbiendev.nceas.ucsb.edu/SALVIAS?tbl_users(7) |
Party¶
form | examples |
person | UArizona.Brad_Boyle |
organization | UArizona |
group | BIEN |
Event¶
form | examples |
resource + event type + resource-assigned name | vegbiendev.nceas.ucsb.edu/VegCore?proj(VegBank.(GRSM.111)) vegbiendev.nceas.ucsb.edu/VegBank?project[projectName=Great_Smoky_Mountains_National_Park] |
people/group + time + event type + event name | (UArizona.Brian_Enquist).2013-2-26.presentation=Toward+a+General+Informatics+Engine+to+Quantify+Botanical+Diversity+at+Large+Scales |
place/subject + time + people/group | vegbiendev.nceas.ucsb.edu/VegCore?obs_by_date((place_ranks.(36.105644,-111.237438)).1945-8-24.(ARIZ.R_A_Darrow)) vegbiendev.nceas.ucsb.edu/ARIZ?omoccurrences[place_ranks,decimalLatitude=36.105644,decimalLongitude=-111.237438,eventDate=1945-08-24,recordedBy=(R.A.Darrow)] |
data record | vegbiendev.nceas.ucsb.edu/VegBank?project(928) |
Thing¶
form | examples |
institution (+ scope) + ID type + institution-assigned ID | vegbiendev.nceas.ucsb.edu/VegCore?spec_by_acc_no((ARIZ.Plants).33907) vegbiendev.nceas.ucsb.edu/ARIZ?omoccurrences[collection=Plants,catalogNumber=33907] |
event + event-specific name | vegbiendev.nceas.ucsb.edu/VegCore?tax_pres(proj(VegBank.(GRSM.111)).(Acer rubrum L.)) vegbiendev.nceas.ucsb.edu/VegBank?observation[authorobscode=GRSM.111].taxonObservation[authorPlantName=(Acer rubrum L.)] |
event + event-specific ID | vegbiendev.nceas.ucsb.edu/VegCore?tax_occ_by_coll(obs_by_date(place_info.1945-8-24.(ARIZ.R_A_Darrow)).2756) vegbiendev.nceas.ucsb.edu/ARIZ?omoccurrences[place_info,eventDate=1945-08-24,recordedBy=(R.A.Darrow),recordNumber=2756] |
data record | vegbiendev.nceas.ucsb.edu/ARIZ?omoccurrences(45) |