main Makefile: VegBIEN DB: DB and bien user: Factored $(confirmRm<schema>) functions message text out into $(confirmRmSchema) function
schemas/Makefile, py_functions.sql.make: Generate py_functions.sql from vegbien's py_functions schema
main Makefile: postgres-Linux: Install postgresql-plpython
main Makefile: python-Linux, postgres-Linux: Fixed bug where apt-get installs needed to each be run in a separate command, so that if any package was not found, the other packages would still install. (apt-get aborts on the first invalid package name.)
db_dump_localize: Use new pg_version
Added pg_version
sql.py: into_table_name(): If relational function has a value argument, don't include other arguments, to save space
sql.py: add_pkey(): Version the index name just in case add_suffix() doesn't correctly preserve a needed version #
sql_gen.py: add_suffix(): Fixed bug where only strings already at the max length had the version preserved, even though appending the suffix could bring it past the max length and still cause the version to be overwritten. Fixed bug where last # in str, not first, should be considered to precede the version.
sql.py: put_table(): mapping param: Fixed documentation of supported key/value types
db_xml.py: put_table(): Removed no longer accurate comment about handling _simplifyPath
schemas/functions.sql: Added _nullIf relational function
sql_gen.py: add_suffix(): Preserve version so that it won't be truncated off the string, leading to collisions
sql_gen.py: identifier_max_len: Fixed bug where PostgreSQL's max length was actually 63, not 64
schemas/functions.sql: _label(): Fixed bug where some Python syntax had not been translated to PostgreSQL
schemas/functions.sql: Added _label relational function
db_xml.py: put_table(): Subsetting in_table: Fixed bug where in_table was not being ordered by the row_num, because order_by was set to None when it should have been omitted so it would default to the pkey
csv2db: Increased frequency of "Processed .. row(s)" messages to match slower, more common INSERT case instead of faster, less used COPY FROM case
schemas/functions.sql: _merge(): Fixed bug where values were ordered by value instead of by sort order (column name)
xml_func.py: process(): Refactored to emphasize special handling for row-based and column-based modes. In row-based mode, always use a DB relational function over a local XML function when possible, to faciliate testing of DB relational functions in row-based mode. (The shadowed local XML version will still be tested in non-DB modes, such as outputting to intermediate XML files.)
bin/map: Move retrieval of out_db's relational functions outside of process_input() so they can also be used by the non-by_col case
bin/map: out_is_db: Don't evaluate relational functions in xml_func.process() because these will be evaluated by db_xml.put()
xml_func.py: Removed no longer used strip()
bin/map: Use xml_func.process(..., strip=True) instead of xml_func.strip()
xml_func.py: process(): Added strip()'s functionality via strip option
schemas/functions.sql: Added _merge relational function
schemas/functions.sql: Added join_strs() aggregate
sql.py: Renamed index_pkey() to add_pkey() to be consistent with add_index()
sql.py: into_table_name(): In function args, omit column name for function result columns
sql.py: into_table_name(): In function args, keep the input table name for input columns to identify where they came from, except for the main input table name because it makes the string too long
sql_gen.py: esc_name(): Don't return plain name if is_safe_name(), because this makes the SQL inconsistent when some names have "_"s and some don't
sql.py: index_pkey(): Use sql_gen.add_suffix() to ensure index name isn't too long
sql.py: put_table(): insert_out_pkeys, insert_in_pkeys: Use sql_gen.add_suffix() to ensure name isn't too long
sql.py: next_version(): Use new sql_gen.add_suffix(). Removed identifier_max_len because it is now in sql_gen.
sql_gen.py: Added identifier_max_len and add_suffix()
next_version(): Append the version # so it looks more natural. Take into account the max identifier length.
strings.py: Added add_suffix()
sql.py: put_table(): Name the in_table just "in" plus the version #, and the insert_in_pkeys/insert_out_pkeys based on in_table, so that they don't take up so much space in the SQL
sql_gen.py: is_safe_name(): Fixed bug where keywords were incorrectly considered safe
strings.py: repr_no_u(): Fixed bug where "u" prefix was removed even in reprs of non-strings
db_xml.py: into_table_name(): Removed no longer necessary handling of simple functions, which is now done by sql.into_table_name(). Ensure that rank params in functions (not tables) are not treated specially as hierarchical.
sql.py: put_table(): If into == None: For function calls, include the arguments in the into table name
sql_gen.py: to_name_only_col(): Support non-Col Code inputs
sql_gen.py CompareCond.to_str(), callers of combine_conds(): Removed unnecessary grouping () to make SQL clearer
sql_gen.py: Added combine_conds() and use it in Join.to_str() and sql.py mk_select()
sql_gen.py Join.to_str(), sql.py mk_select(): Combining conditions: Don't add newlines where not needed, so that output is less vertically spread out
sql_gen.py: is_safe_name(): Fixed bug where names starting with a digit were incorrectly considered safe
sql.py: put_table(): Separate temp table names from into table name with "_" instead of "-" so that quoting the table name will usually be unnecessary
sql.py: esc_name_by_module(): Remove unused param ignore_case
sql_gen.py: esc_name(): If is_safe_name(), just return name, to avoid escessive escaping in debug output for Redmine
sql_gen.py: is_safe_name(): Don't consider uppercase letters safe because they would cause inconsistent behavior in PostgreSQL if quoted vs. not quoted (only unquoted identifiers are case-insensitive)
sql.py: Removed no longer needed check_name()
sql.py: esc_name_by_module(): psycopg2: If ignore_case is set but name is unsafe, just escape it instead of raising an exception
sql_gen.py: Added is_safe_name()
sql.py: put_table(): col_ustr(): Removed no longer needed sql_gen.as_Col() because mapping and join_cols now ensure that their contents are sql_gen.Col objects
schemas/functions.sql: Added _alt relational function
sql.py: put_table(): Make mapping and join_cols a sql_gen.ColDict so that literal values will always be turned into sql_gen.Col objects. DuplicateKeyException: Use dict_subset_right_join() instead of dict_subset() so that all columns in a constraint are included in joins on out_table (such as for a relational function with omitted arguments).
sql_gen.py: Added ColDict
sql_gen.py: as_Col(): Added optional name param to specify that non-Col input will be renamed using NamedCol with the given name
sql.py: put_table(): FunctionValueException: Fixed bug where only function calls, not plain columns, were handled, by using sql_gen.unwrap_func_call() to remove any function call only if there was one
sql_gen.py: Added unwrap_func_call()
bin/map: by_col: Stripping XML functions not in the DB: Fixed bug where preserve_funcs.add() was used when `preserve_funcs |=` should have been used to add the entire iterable that sql.tables() returns
sql.py: not_null_col: Changed value to 'not_null_col' so that column doesn't seem like a status indicator of whether some value is not null (in fact it's just a column that is always not null)
xml_func.py: Replaced xpath.get_1() with xpath.get_value() where possible, for simplicity
xml_func.py: strip(): Evaluate structural functions like _ignore and _ref by process() instead of removing them. Store structural functions' names in structural_funcs module var. This ensures that _ref targets are still expanded in column-based import.
xpath.py: get(): Create attrs: Put keys last so that any lookahead assertion's path will be created last as it would have without the assertion. This ensures that any value argument of an XML function will always go last even if a lookahead assertion would otherwise have caused it to be created with the element's keys, which previously were created before the attributes.
sql.py: put_table(): If is_func, default into table name ends in () instead of '-pkeys'
schemas/vegbien.sql, functions.sql: Made cast functions STRICT to enable the RETURNS NULL ON NULL INPUT optimization
db_xml.py: put_table(): Pass is_func to sql.put_table()
sql.py: put_table(): Added is_func param for whether out_table is the name of a SQL function, not a table
db_xml.py: put_table(): Treat every node name that starts with "_" as a function, not just members of put_table_special_funcs. This ensures that DB function args are always treated as values, not children with fkeys to parent.
bin/map: by_col: Strip only XML functions that are not in the DB
db_xml.py: put_table(): Make special_funcs externally available as module constant put_table_special_funcs
sql.py: tables(): Changed schema param to schema_like and filter the schema using LIKE so that all schemas can be selected
to_do/timeline.doc: Updated to reflect the month we spent on optimization and column-based import
sql.py: put_table(): in_table name: Remove '-pkeys' suffix from the into table name before adding '-input' so that the name is shorter and clearer
sql.py: put_table(): Wrap repr() calls for debug messages in strings.as_tt() to add Redmine formatting
sql.py: put_table(): Output "Adding index" debug message with level=2.5 so it's not part of the Redmine steps
schemas/vegbien.sql, functions.sql: Cast functions: Fixed bug where invalid value exceptions were not being caught, because implicit conversions to the return type apparently only happen outside the block containing the RETURN statement (i.e. at the end of the function). Fixed by adding explicit type conversion to return type, so that type conversion would happen inside try block.
sql.py: put_table(): Re-enabled FunctionValueException handling, by just filtering out the value on all input columns that use the named function (since the error message does not specify which column it was that had the invalid value). This is in some ways better, anyway, because that way the invalid value is filtered out right away in all columns that could contain it, instead of potentially once for each column (if the value appears in more than one input column).
sql.py: add_index(): Fixed bug where expressions could not be converted to a string until their table name had been removed
sql_gen.py: Added Expr
sql.py: add_index(): Fixed bug where expressions needed to be enclosed in () to distinguish them from plain columns
sql.py: add_index(): Support simple expressions as well as columns
sql.py: Renamed index_col() to add_index() so its name isn't similar to index_cols()
sql_gen.py: FunctionCall: Removed repr() because it's a Code object and its to_str() does not take extra arguments
sql.py: run_query(): FunctionValueException: Expanded parsing to include regular function calls, not just relational functions' trigger functions. put_table(): Disabled FunctionValueException handling because this expands FunctionValueException beyond what put_table() could handle.
sql.py: put_table(): MissingCastException: Fixed bug where renaming of cast literal value was not properly propagated to the returned value of the function call, causing the query to assume that a DISTINCT ON column referred to column in one of the joined tables instead of a named column in the SELECT columns list. This logic error would have been very difficult to catch without inspecting the code!
sql_gen.py: Added wrap_in_func()
sql_gen.py: FunctionCall: Filter args through remove_col_rename() to remove any renamings from the function args
sql.py: put_table(): No handler for exception: Print full exception instead of just first line to assist in debugging
schemas/vegbien.sql, functions.sql: Removed _to* relational functions because type casting for those types is now automatic
mappings/DwC2-VegBIEN.specimens.csv: Removed _to* relational functions because type casting for those types is now automatic
schemas/functions.sql: Added cast functions for _to* relational functions
schemas/vegbien.sql: Changed cast functions' input types to text because type must match exactly, not just be implicitly castable
sql.py: run_query(): MissingCastException parsing: Support multiple-word types
sql.py: put_table(): Handle MissingCastExceptions by attempting to call a function with the name of the type on the column
sql_gen.py: Added Functions section with Function and FunctionCall
sql.py: Added MissingCastException and parse it in run_query()
schemas/vegbien.sql: Added cast functions for enum types which map invalid values to NULL