Project

General

Profile

SQL dotpaths

SQL dotpaths are a compact syntax for SQL queries, with additional features to assist in forming column names. They are designed especially for denormalizing a normalized database.

They are an offshoot of our normalization XPaths, which apply to SQL rather than XML and which are used for querying rather than loading data. Note that because there is a deterministic way of completely denormalizing a set of tables4 (see expand_recursive.** below), and denormalized data is isomorphic to normalized data, this means there is also a deterministic way of normalizing data that is inserted into a denormalized updatable view. The normalization of data can therefore theoretically be fully automated, given an appropriate denormalized view whose columns are parseable SQL dotpaths. This is similar to the way we normalize data whose columns are XPaths, except that the XPaths would be automatically generated by expand_recursive.** instead of manually created.

4 which can be used to generate normalization XPaths, or to autogenerate the denormalized VegCore terms from normalized VegCore

Syntax

Overview

SQL dotpaths have the following format, where each element is optional:

index(col1_value.col2_value)[table_filter].{scalar,new_name=expand_one.col,expand_all.*,new_prefix=expand_subset.{col_1,col_2},expand_recursive.**,base_class::col,col->key@table.col,function(arg1,arg2),(subselect).*}[result_filter]

produces

scalar new_name expand_all.col_1 ... expand_all.col_n new_prefix.col_1 new_prefix.col_2 expand_recursive.scalar expand_recursive.record.col_1 ... expand_recursive.record.col_n ... base_class::col col->key@table.col function(...) subselect.col_1 ...
  • add () around values that contain . or , or start with (
    • however, a single-value pkey/index/function does not require doubling the (), ie. table(pkey_value.containing_dot) instead of table((pkey_value.containing_dot)) .
      this is because the () are only used to separate .-separated values from each other, but if there aren't multiple .-separated values, there is no need for extra ().
  • instead of index(...), you can also use table(pkey_value), table(pkey_col1_value.pkey_col2_value), or table[table_filter]
    • to limit dotpath length, it's a good idea to create an abbreviation for each index and table, by creating a function with the abbreviated name which scans the index. this allows you to use eg. both tbl and table to refer to table's pkey.
  • some tables may want to omit the unique constraint name from their dotpath pkeys, so that eg. collection(org_by_name(ARIZ).Plants) reduces to just collection(ARIZ.Plants)
  • a join is indicated by an arrow or a fork, depending on the cardinality of the join: -> for a 1:1 fkey (forwards or backwards) or -< for a 1:many fkey (always backwards), in the form2 col-><key@table .
    the table and/or the key column may be omitted if there is only one possibility (only one fkey): col1->col2 or simply col1.col2
    most joins, however, are implicit, and indicated by . instead (e.g. the subtable join expression table1.table2::col (= table1->table2::col) expands to table1.pkey->pkey@table2::col). after these implicit dot joins are expanded to explicit arrow joins (->), there should be no . in the expression except after the very first table name.
  • In col.sub_col (= col->pkey@sub_table::sub_col), col has a foreign key to a table in which sub_col is a column. Multiple sub_cols can be specified using {}.
  • when resolving an unqualified column name (without the table:: prefix), 1:1 tables are searched in the following order (which is the same as for expand_recursive.**):
    1. the table referred to by the fkey
    2. superclasses of it, in breadth-first order
    3. subclasses of it, in breadth-first order
  • note that expand_all.* also includes parent class fields. the .* notation can generally be used to name views which add parent class columns to a subclass.
    expand_recursive.** adds to .*, expanding downwards to all tables reachable through forward fkeys (including superclasses), and upwards to all subclasses (i.e. all tables that are 1:1 with the table).
    (the ** comes from rsync filter patterns, which use this to denote recursive expansion into subdirs.)
  • a filter can be any SQL expression using the column names available to the left of it:
    • for table_filter, the table's column names
    • for result_filter, the column names in the {}
    • these filter types correspond to EXPLAIN output as follows:
      Index Scan using index on table
        Index Cond: ([table_filter])
        Filter: ([result_filter])
      

2 this syntax is exactly like in the normalization XPaths, with a directional arrow and @ instead of [] indicating the target column
.

Levels of expansion

scalar a single primitive value
expand a set of first-class columns
expand_recursive a set of first-class columns, with all forward foreign key columns recursively expanded

Debugging

A dotpath can be split into separate dotpaths for each column:

$ set -o braceexpand; shopt -u nullglob
$ for i in dotpath; do echo "$i"; done

This can be used to predict what column names a dotpath query will return

Algorithm

the existence of a parsing algorithm ensures that dotpaths are always well-defined, so that a person could resolve them unambiguously even before a parser is written

  1. parse and fixup phase:
    move inwards starting at root_table
    1. when a . is found where a : is expected, insert1 : + the fkey referenced table for the column to the left of the .
    2. when a : is found where a column name is expected, insert1 the pkey name (of the enclosing scope's table) before it
    3. when a column name is found where a { is expected, enclose the rest of the dotpath subtree (up to any } or ,) in {}
      • this is needed because the . operator is right-associative to the parser (i.e. inner rename-views are created before outer ones), even though it's semantically left-associative
      • instead of physically inserting a }, just increment a count of } to add. when a } or , is seen in the input stream, process all the virtual }'s by decrementing the count and redirecting to the } token. once the count reaches 0, process the } or , in the input.
  2. expand phase:
    move back outwards from the innermost columns
    1. for each expanded column:
      1. create a view on its referenced table that prefixes the column names with the name of the referencing column
        • the view's name should be the dotpath join expression to join the referencing and referenced columns
          if this is longer than 63 chars, shorten it and put the actual full name in the view's comment
        • the primary key should instead have the exact same name as the referencing column (i.e. using just the prefix)
      2. natural-join the referencing column to the view (the pkey column will overlap with the referencing column)
      3. in the select expressions, list the subset columns inside the {}, or all columns for *
  3. the final select expression is just * , because the nested views have already performed the necessary subsetting

1 perform the insertion by redirecting to the grammar symbol for the token to insert. however, do not consume the unexpected character itself.
.

PostgreSQL extensions

Embedding

Instead of expanding a record as a set of first-class columns, it can be embedded as one of Postgres's aggregating types.
This is indicated by placing a special character between the . and the { :

symbol type mnemonic notes
!, none first-class columns factorial operator expands to product the default
+ record shaped like t for tuple _3
@ array Perl's array indicator
% hstore Perl's hash indicator see hstore-ing a column list

3 for inheritance hierarchies, columns of the same name are merged. this may produce errors if the columns do not in fact store the same data.
.

hstore expansion

An hstore column can be expanded using the regular syntax.

Note that .* expansions are slow (if there is no sufficient EXPLAIN cache of the column) because they first require unioning all the hstore keys of every row. For this reason, .** does not expand hstores.

Custom columns

If a column name is not found and the table contains an hstore column, it is treated as a key in the hstore

fkey handle types

when a custom handle type is used to implement string fkey interning, the handle type's input function can treat a non-existent pkey as a dotpath expression and evaluate it, instead of returning a domain error. this allows any index's associated dotpath, not just the pkey, to be used as a permalink to reference a table row.

Why dotpaths?

  • SQL is a verbose syntax, which is not well-suited for embedding in URLs
  • SQL produces an error if two columns share the same name, rather than automatically adding a table prefix to disambiguate them
  • SQL does not support adding a prefix to multiple columns at once; instead each column must be renamed individually
  • .** recursive expansion can also be used to export a subset of a database, following fkeys