Project

General

Profile

text IDs

A text ID, as defined here, is a unique key which is stable and human-readable.

Being stable allows it to retain its uniqueness into the future. This makes the ID safe to use to externally refer to the record.
Stability is especially important, because databases like VegBIEN that are frequently rebuilt will have changing numeric IDs, but the same text IDs.

note that the suggested formats in the examples create globally-unique text IDs, which can be used in place of other ID types. however, it is also possible to create locally-unique text IDs for use as local u-names.

(text IDs are similar to URNs but have a more general structure)

format

  • It must be a u-name (unambiguous name) for the entity
  • It should be the c-name (canonical name) for it whenever possible
  • It should be human-readable whenever possible: you should be able to look at the ID and know immediately what it's referring to
    • use human-readable abbreviations to achieve shorter identifier names (comparable in length to other short identifiers, such as DOI, OID, etc.)
  • As part of being a c-name, it should be as short as possible while still being human-readable and unambiguous
    • An abbreviation is preferable to a full name, but a name is preferable to a number or the URL generated by a URL shortener
  • It should be a clickable URL whenever possible. The URL should take the user to the identified resource or a description of it.
    • Where possible, the query portion of the URL (after the ?) should be a SQL dotpath, because this is an unambiguous syntax
    • Path components that are not part of the URL should be placed after the # (or if there is already a #, instead appended to the query string)
  • Only use autogenerated primary keys if they are stable
  • A record about an entity often has a different ID than the entity itself
  • IDs should be uniform, using the following conventions:
examples
Separate path components with . or / US.AZ , US/AZ
Separate labels from values with = country=US.state=AZ.city=Tucson
Enclose literal values in () (bien3_architecture_denormalized.pptx)
Escape special chars with \ [array_col=value1\,value2,other_col=...]
Replace spaces with _ Brad_Boyle
Use the standard capitalization of proper names US, VegBank, ARIZ
Use _ instead of camelCase catalog_number instead of catalogNumber
Use filename-safe characters if possible US.AZ instead of US:AZ
Format dates as YYYY-M-D 2000-1-1

When multiple IDs are available, include all of them (in separate columns) to allow looking up the record by any of them1. The column names should have the form id_by_... . This way, when you fetch a record, you get the preferred natural key (id), the id_by_source, and any other natural keys that are available.

1 Note that only one of these IDs (the primary key) actually needs to be unique. The rest can contain duplicates, and would be used for queries that can return multiple rows.
.

Comparison to other ID formats

Human-readable Clickable Allows uppercase Short Nonrandom Decentralized Globally unique Stable Uniform
text ID Y Y Y Y Y Y Y Y Y
URL Y Y Y - Y - Y - Y
URN / LSID Y - - Y Y - Y Y Y
LDAP DN Y - Y Y Y Y - Y Y
VegBank accession code Y - Y Y Y Y - Y Y
GUID / UUID - - - - - Y Y - Y
DOI - - - Y Y - Y Y Y
OID - - - Y Y - Y Y Y

Performance considerations

When used in a foreign key, text IDs are subject to the performance limitations of string IDs. These can be overcome using a string interning technique.

Examples

eventually, these formats will be moved to the corresponding indexes in normalized VegCore

Time

form examples
date 1945-8-24
date range 1997-8-21..1998-9-3
indefinite date 1950-8 (w/o day)
datetime 1945-8-24:13:00

Place

form examples
unlabeled path US.AZ.Tucson
labeled path country=US.state=AZ.city=Tucson

Resource

e.g. institution, database

form examples
HTTP URL without the protocol salvias.net
full URL http://salvias.net
database connection URL postgresql://nimoy.nceas.ucsb.edu/salvias_plots
standard abbreviation ARIZ (an Index Herbariorum code)
unambiguous name VegBank, SALVIAS, UCSB
place + locally unique name US.CA.Santa_Barbara.University_of_California (multiple campuses)
e-mail address (+ name) + date (+ subject) (+ attachment) bboyle@email.arizona.edu?Brad_Boyle.2013-4-2-12:03PT.Re:+[Bien-db]+comparison+of+data+import+methods.(bien3_architecture_denormalized.pptx)
i.e. e-mail@host?name.date.subject.(attachment.ext)
person + date + presentation name (+ slide) (UArizona.Brian_Enquist).2013-2-26.presentation=Toward+a+General+Informatics+Engine+to+Quantify+Botanical+Diversity+at+Large+Scales.slide=39
group + date + meeting type (+ meeting name) bien.nceas.ucsb.edu?BIEN.2011-10-13.conference_call
meeting notes URL vegbiendev.nceas.ucsb.edu/wiki/2011-10-13_conference_call

Data record

form examples
web interface URL VegBank.org/cite/VB.Ob.3565.GRSM111
database + table + primary key vegbiendev.nceas.ucsb.edu/VegBank?observation(3565)
database + table + column + unique key vegbiendev.nceas.ucsb.edu/VegBank?observation[authorobscode=(GRSM.111)]

Person

form examples
e-mail address (+ name) bboyle@email.arizona.edu?Brad_Boyle
institution (+ department) + name UArizona.Ecology.Brad_Boyle
website + group name bien.nceas.ucsb.edu?BIEN
resource + username vegbiendev.nceas.ucsb.edu/SALVIAS?@bboyle
data record vegbiendev.nceas.ucsb.edu/SALVIAS?tbl_users(7)

Party

form examples
person UArizona.Brad_Boyle
organization UArizona
group BIEN

Event

form examples
resource + event type + resource-assigned name vegbiendev.nceas.ucsb.edu/VegCore?proj(VegBank.(GRSM.111))
vegbiendev.nceas.ucsb.edu/VegBank?project[projectName=Great_Smoky_Mountains_National_Park]
people/group + time + event type + event name (UArizona.Brian_Enquist).2013-2-26.presentation=Toward+a+General+Informatics+Engine+to+Quantify+Botanical+Diversity+at+Large+Scales
place/subject + time + people/group vegbiendev.nceas.ucsb.edu/VegCore?obs_by_date((place_ranks.(36.105644,-111.237438)).1945-8-24.(ARIZ.R_A_Darrow))
vegbiendev.nceas.ucsb.edu/ARIZ?omoccurrences[place_ranks,decimalLatitude=36.105644,decimalLongitude=-111.237438,eventDate=1945-08-24,recordedBy=(R.A.Darrow)]
data record vegbiendev.nceas.ucsb.edu/VegBank?project(928)

Thing

form examples
institution (+ scope) + ID type + institution-assigned ID vegbiendev.nceas.ucsb.edu/VegCore?spec_by_acc_no((ARIZ.Plants).33907)
vegbiendev.nceas.ucsb.edu/ARIZ?omoccurrences[collection=Plants,catalogNumber=33907]
event + event-specific name vegbiendev.nceas.ucsb.edu/VegCore?tax_pres(proj(VegBank.(GRSM.111)).(Acer rubrum L.))
vegbiendev.nceas.ucsb.edu/VegBank?observation[authorobscode=GRSM.111].taxonObservation[authorPlantName=(Acer rubrum L.)]
event + event-specific ID vegbiendev.nceas.ucsb.edu/VegCore?tax_occ_by_coll(obs_by_date(place_info.1945-8-24.(ARIZ.R_A_Darrow)).2756)
vegbiendev.nceas.ucsb.edu/ARIZ?omoccurrences[place_info,eventDate=1945-08-24,recordedBy=(R.A.Darrow),recordNumber=2756]
data record vegbiendev.nceas.ucsb.edu/ARIZ?omoccurrences(45)