SQL dotpaths¶
SQL dotpaths are a compact syntax for SQL queries, with additional features to assist in forming column names. They are designed especially for denormalizing a normalized database.
They are an offshoot of our normalization XPaths, which apply to SQL rather than XML and which are used for querying rather than loading data. Note that because there is a deterministic way of completely denormalizing a set of tables4 (see expand_recursive.**
below), and denormalized data is isomorphic to normalized data, this means there is also a deterministic way of normalizing data that is inserted into a denormalized updatable view. The normalization of data can therefore theoretically be fully automated, given an appropriate denormalized view whose columns are parseable SQL dotpaths. This is similar to the way we normalize data whose columns are XPaths, except that the XPaths would be automatically generated by expand_recursive.**
instead of manually created.
4 which can be used to generate normalization XPaths, or to autogenerate the denormalized VegCore terms from normalized VegCore
- Table of contents
- SQL dotpaths
Syntax¶
Overview¶
SQL dotpaths have the following format, where each element is optional:
index(col1_value.col2_value)[table_filter].{scalar,new_name=expand_one.col,expand_all.*,new_prefix=expand_subset.{col_1,col_2},expand_recursive.**,base_class::col,col->key@table.col,function(arg1,arg2),(subselect).*}[result_filter]
produces
scalar |
new_name |
expand_all.col_1 |
... |
expand_all.col_n |
new_prefix.col_1 |
new_prefix.col_2 |
expand_recursive.scalar |
expand_recursive.record.col_1 |
... |
expand_recursive.record.col_n |
... |
base_class::col |
col->key@table.col |
function(...) |
subselect.col_1 |
... |
- add
()
around values that contain.
or,
or start with(
- however, a single-value pkey/index/function does not require doubling the
()
, ie.table(pkey_value.containing_dot)
instead oftable((pkey_value.containing_dot))
.
this is because the()
are only used to separate.
-separated values from each other, but if there aren't multiple.
-separated values, there is no need for extra()
.
- however, a single-value pkey/index/function does not require doubling the
- instead of
index(...)
, you can also usetable(pkey_value)
,table(pkey_col1_value.pkey_col2_value)
, ortable[table_filter]
- to limit dotpath length, it's a good idea to create an abbreviation for each index and table, by creating a function with the abbreviated name which scans the index. this allows you to use eg. both
tbl
andtable
to refer totable
's pkey.
- to limit dotpath length, it's a good idea to create an abbreviation for each index and table, by creating a function with the abbreviated name which scans the index. this allows you to use eg. both
- some tables may want to omit the unique constraint name from their dotpath pkeys, so that eg.
collection(org_by_name(ARIZ).Plants)
reduces to justcollection(ARIZ.Plants)
- a join is indicated by an arrow or a fork, depending on the cardinality of the join:
->
for a 1:1 fkey (forwards or backwards) or-<
for a 1:many fkey (always backwards), in the form2col-><key@table
.
the table and/or the key column may be omitted if there is only one possibility (only one fkey):col1->col2
or simplycol1.col2
most joins, however, are implicit, and indicated by.
instead (e.g. the subtable join expressiontable1.table2::col
(=table1->table2::col
) expands totable1.pkey->pkey@table2::col
). after these implicit dot joins are expanded to explicit arrow joins (->
), there should be no.
in the expression except after the very first table name. - In
col.sub_col
(=col->pkey@sub_table::sub_col
),col
has a foreign key to a table in whichsub_col
is a column. Multiplesub_cols
can be specified using{}
. - when resolving an unqualified column name (without the
table::
prefix), 1:1 tables are searched in the following order (which is the same as forexpand_recursive.**
):- the table referred to by the fkey
- superclasses of it, in breadth-first order
- subclasses of it, in breadth-first order
- note that
expand_all.*
also includes parent class fields. the.*
notation can generally be used to name views which add parent class columns to a subclass.expand_recursive.**
adds to.*
, expanding downwards to all tables reachable through forward fkeys (including superclasses), and upwards to all subclasses (i.e. all tables that are 1:1 with the table).
(the**
comes from rsync filter patterns, which use this to denote recursive expansion into subdirs.)
- a filter can be any SQL expression using the column names available to the left of it:
- for
table_filter
, the table's column names - for
result_filter
, the column names in the{}
- these filter types correspond to
EXPLAIN
output as follows:Index Scan using index on table Index Cond: ([table_filter]) Filter: ([result_filter])
- for
2 this syntax is exactly like in the normalization XPaths, with a directional arrow and @
instead of []
indicating the target column
.
Levels of expansion¶
scalar |
a single primitive value |
expand |
a set of first-class columns |
expand_recursive |
a set of first-class columns, with all forward foreign key columns recursively expanded |
Debugging¶
A dotpath can be split into separate dotpaths for each column:
$ set -o braceexpand; shopt -u nullglob $ for i in dotpath; do echo "$i"; done
This can be used to predict what column names a dotpath query will return
Algorithm¶
the existence of a parsing algorithm ensures that dotpaths are always well-defined, so that a person could resolve them unambiguously even before a parser is written
- parse and fixup phase:
move inwards starting atroot_table
- when a
.
is found where a:
is expected, insert1:
+ the fkey referenced table for the column to the left of the.
- when a
:
is found where a column name is expected, insert1 the pkey name (of the enclosing scope's table) before it - when a column name is found where a
{
is expected, enclose the rest of the dotpath subtree (up to any}
or,
) in{}
- this is needed because the
.
operator is right-associative to the parser (i.e. inner rename-views are created before outer ones), even though it's semantically left-associative - instead of physically inserting a
}
, just increment a count of}
to add. when a}
or,
is seen in the input stream, process all the virtual}
's by decrementing the count and redirecting to the}
token. once the count reaches 0, process the}
or,
in the input.
- this is needed because the
- when a
- expand phase:
move back outwards from the innermost columns- for each expanded column:
- create a view on its referenced table that prefixes the column names with the name of the referencing column
- the view's name should be the dotpath join expression to join the referencing and referenced columns
if this is longer than 63 chars, shorten it and put the actual full name in the view's comment - the primary key should instead have the exact same name as the referencing column (i.e. using just the prefix)
- the view's name should be the dotpath join expression to join the referencing and referenced columns
- natural-join the referencing column to the view (the pkey column will overlap with the referencing column)
- in the select expressions, list the subset columns inside the
{}
, or all columns for*
- create a view on its referenced table that prefixes the column names with the name of the referencing column
- for each expanded column:
- the final select expression is just * , because the nested views have already performed the necessary subsetting
1 perform the insertion by redirecting to the grammar symbol for the token to insert. however, do not consume the unexpected character itself.
.
PostgreSQL extensions¶
Embedding¶
Instead of expanding a record as a set of first-class columns, it can be embedded as one of Postgres's aggregating types.
This is indicated by placing a special character between the .
and the {
:
symbol | type | mnemonic | notes |
! , none |
first-class columns | factorial operator expands to product | the default |
+ |
record | shaped like t for tuple | _3 |
@ |
array | Perl's array indicator | |
% |
hstore | Perl's hash indicator | see hstore-ing a column list |
3 for inheritance hierarchies, columns of the same name are merged. this may produce errors if the columns do not in fact store the same data.
.
hstore expansion¶
An hstore column can be expanded using the regular syntax.
Note that .*
expansions are slow (if there is no sufficient EXPLAIN
cache of the column) because they first require unioning all the hstore keys of every row. For this reason, .**
does not expand hstores.
Custom columns¶
If a column name is not found and the table contains an hstore column, it is treated as a key in the hstore
fkey handle types¶
when a custom handle type is used to implement string fkey interning, the handle type's input function can treat a non-existent pkey as a dotpath expression and evaluate it, instead of returning a domain error. this allows any index's associated dotpath, not just the pkey, to be used as a permalink to reference a table row.
Why dotpaths?¶
- SQL is a verbose syntax, which is not well-suited for embedding in URLs
- SQL produces an error if two columns share the same name, rather than automatically adding a table prefix to disambiguate them
- SQL does not support adding a prefix to multiple columns at once; instead each column must be renamed individually
.**
recursive expansion can also be used to export a subset of a database, following fkeys