text IDs

A text ID, as defined here, is a unique key which is stable and human-readable.

Being stable allows it to retain its uniqueness into the future. This makes the ID safe to use to externally refer to the record.
Stability is especially important, because databases like VegBIEN that are frequently rebuilt will have changing numeric IDs, but the same text IDs.

note that the suggested formats in the examples create globally-unique text IDs, which can be used in place of other ID types. however, it is also possible to create locally-unique text IDs for use as local u-names.

(text IDs are similar to URNs but have a more general structure)


  • It must be a u-name (unambiguous name) for the entity
  • It should be the c-name (canonical name) for it whenever possible
  • It should be human-readable whenever possible: you should be able to look at the ID and know immediately what it's referring to
    • use human-readable abbreviations to achieve shorter identifier names (comparable in length to other short identifiers, such as DOI, OID, etc.)
  • As part of being a c-name, it should be as short as possible while still being human-readable and unambiguous
    • An abbreviation is preferable to a full name, but a name is preferable to a number or the URL generated by a URL shortener
  • It should be a clickable URL whenever possible. The URL should take the user to the identified resource or a description of it.
    • Where possible, the query portion of the URL (after the ?) should be a SQL dotpath, because this is an unambiguous syntax
    • Path components that are not part of the URL should be placed after the # (or if there is already a #, instead appended to the query string)
  • Only use autogenerated primary keys if they are stable
  • A record about an entity often has a different ID than the entity itself
  • IDs should be uniform, using the following conventions:
Separate path components with . or / US.AZ , US/AZ
Separate labels from values with =
Enclose literal values in () (bien3_architecture_denormalized.pptx)
Escape special chars with \ [array_col=value1\,value2,other_col=...]
Replace spaces with _ Brad_Boyle
Use the standard capitalization of proper names US, VegBank, ARIZ
Use _ instead of camelCase catalog_number instead of catalogNumber
Use filename-safe characters if possible US.AZ instead of US:AZ
Format dates as YYYY-M-D 2000-1-1

When multiple IDs are available, include all of them (in separate columns) to allow looking up the record by any of them1. The column names should have the form id_by_... . This way, when you fetch a record, you get the preferred natural key (id), the id_by_source, and any other natural keys that are available.

1 Note that only one of these IDs (the primary key) actually needs to be unique. The rest can contain duplicates, and would be used for queries that can return multiple rows.

Comparison to other ID formats

Human-readable Clickable Allows uppercase Short Nonrandom Decentralized Globally unique Stable Uniform
text ID Y Y Y Y Y Y Y Y Y
URL Y Y Y - Y - Y - Y
URN / LSID Y - - Y Y - Y Y Y
VegBank accession code Y - Y Y Y Y - Y Y
GUID / UUID - - - - - Y Y - Y
DOI - - - Y Y - Y Y Y
OID - - - Y Y - Y Y Y

Performance considerations

When used in a foreign key, text IDs are subject to the performance limitations of string IDs. These can be overcome using a string interning technique.


eventually, these formats will be moved to the corresponding indexes in normalized VegCore


form examples
date 1945-8-24
date range 1997-8-21..1998-9-3
indefinite date 1950-8 (w/o day)
datetime 1945-8-24:13:00


form examples
unlabeled path US.AZ.Tucson
labeled path


e.g. institution, database

form examples
HTTP URL without the protocol
full URL
database connection URL postgresql://
standard abbreviation ARIZ (an Index Herbariorum code)
unambiguous name VegBank, SALVIAS, UCSB
place + locally unique name US.CA.Santa_Barbara.University_of_California (multiple campuses)
e-mail address (+ name) + date (+ subject) (+ attachment)[Bien-db]+comparison+of+data+import+methods.(bien3_architecture_denormalized.pptx)
i.e. e-mail@host?
person + date + presentation name (+ slide) (UArizona.Brian_Enquist).2013-2-26.presentation=Toward+a+General+Informatics+Engine+to+Quantify+Botanical+Diversity+at+Large+Scales.slide=39
group + date + meeting type (+ meeting name)
meeting notes URL

Data record

form examples
web interface URL
database + table + primary key
database + table + column + unique key[authorobscode=(GRSM.111)]


form examples
e-mail address (+ name)
institution (+ department) + name UArizona.Ecology.Brad_Boyle
website + group name
resource + username
data record


form examples
person UArizona.Brad_Boyle
organization UArizona
group BIEN


form examples
resource + event type + resource-assigned name[projectName=Great_Smoky_Mountains_National_Park]
people/group + time + event type + event name (UArizona.Brian_Enquist).2013-2-26.presentation=Toward+a+General+Informatics+Engine+to+Quantify+Botanical+Diversity+at+Large+Scales
place/subject + time + people/group,-111.237438)).1945-8-24.(ARIZ.R_A_Darrow))[place_ranks,decimalLatitude=36.105644,decimalLongitude=-111.237438,eventDate=1945-08-24,recordedBy=(R.A.Darrow)]
data record


form examples
institution (+ scope) + ID type + institution-assigned ID[collection=Plants,catalogNumber=33907]
event + event-specific name rubrum L.))[authorobscode=GRSM.111].taxonObservation[authorPlantName=(Acer rubrum L.)]
event + event-specific ID[place_info,eventDate=1945-08-24,recordedBy=(R.A.Darrow),recordNumber=2756]
data record