u-name¶
a u-name (uniform unambiguous name) is a name for something which is unambiguous and has a uniform format
note that unambiguous is not the same as unique, because it must also communicate exactly what it is referring to. i.e. a randomly-assigned numeric ID is unique, but not unambiguous because the number itself may refer to anything.
u-names can be used for both primary keys and column names, as both of these should communicate exactly what they are referring to. in both cases, they create a stable, human-readable, and easily mergeable type of identifier.
scope¶
global¶
a global u-name should be globally unambiguous.
global u-names are used for things like permalinks, citation URLs, and data extract columns which must be globally unique. often, however, a local u-name (below) is more appropriate.
local¶
by contrast, a local u-name should only be unambiguous within the dataset or vocabulary it is part of1, not globally unique. this is the most common type of u-name.
when datasets are combined, the u-name for the dataset itself can then be prepended to the pkeys in it to form a u-name that is unambiguous in the combined dataset. similarly, when tables are left-joined together, the u-name for each table can be prepended to the columns to form a u-name that is unambiguous in the combined table.
this also applies at higher levels of aggregation: the dataset u-name should only be unambiguous within the database, and the table u-name should only be unambiguous within the exchange schema (for a closed format) or knowledge domain (for an extensible format)
when publishing data, it may be helpful to provide a globally-unique permalink to where the record is hosted. however, the pkey itself should not be globally unique2 because the URI prefix to do this would add unnecessary clutter to the pkey values, forcing the user to always enlarge the column in order to see the portion of the key that differs for each row. the same enlarging problem applies to column names. (but note that data extract columns must be globally unique, so a URL must be used instead of just the term name.)
1 this includes removing any shared prefix of the rows or columns. this shared prefix should be moved to the dataset's or table's own u-name.
2 this is a change from earlier u-name recommendations, which encouraged all u-names to be globally unambiguous. for local u-names, this is now discouraged instead.
.
format¶
ideally, you should use a u-name which is the c-name
- except for case-sensitive names, the capitalization must always be standardized: there must be only one case of letter which is correct
- if possible, all letters should be lowercase as this is the simplest form of standardization
- when allowing uppercase letters, proper nouns must be capitalized (title case) and acronyms must be all-caps
- unnecessary special symbols should be avoided
- for phrases, separate each word with one of
_
-
(a u-name parser must accept both symbols)- which separator is used may be restricted by the medium (e.g. source code identifiers (
_
) vs. domain names (-
) or the need to linewrap on a hyphen) - when either would work, choose
_
because this makes words more visually distinct - do not use CamelCase because this is harder to read, contains unnecessary capital letters, and loses word boundaries when lowercased
- which separator is used may be restricted by the medium (e.g. source code identifiers (
- for paths, separate components with one of
.
/
__
--
(a u-name parser must accept all forms) - for more complex u-names, use the text IDs format
examples¶
- globally, "u-name" (with the hyphen) is a u-name for u-name, but "uname" (without the hyphen) is not, because it could also refer to the Unix
uname
command - "uniform_unambiguous_name" is another u-name for u-name (but it is not the c-name)
- "unambiguous_name" is not a u-name for u-name, because u-names also have a uniform format
- see also pkey examples
source¶
the name for this comes from the Unix uname
command (where the u stands for "Unix" instead of "unambiguous")