schemas/functions.sql: Added _label relational function
db_xml.py: put_table(): Subsetting in_table: Fixed bug where in_table was not being ordered by the row_num, because order_by was set to None when it should have been omitted so it would default to the pkey
csv2db: Increased frequency of "Processed .. row(s)" messages to match slower, more common INSERT case instead of faster, less used COPY FROM case
schemas/functions.sql: _merge(): Fixed bug where values were ordered by value instead of by sort order (column name)
xml_func.py: process(): Refactored to emphasize special handling for row-based and column-based modes. In row-based mode, always use a DB relational function over a local XML function when possible, to faciliate testing of DB relational functions in row-based mode. (The shadowed local XML version will still be tested in non-DB modes, such as outputting to intermediate XML files.)
bin/map: Move retrieval of out_db's relational functions outside of process_input() so they can also be used by the non-by_col case
bin/map: out_is_db: Don't evaluate relational functions in xml_func.process() because these will be evaluated by db_xml.put()
xml_func.py: Removed no longer used strip()
bin/map: Use xml_func.process(..., strip=True) instead of xml_func.strip()
xml_func.py: process(): Added strip()'s functionality via strip option
schemas/functions.sql: Added _merge relational function
schemas/functions.sql: Added join_strs() aggregate
sql.py: Renamed index_pkey() to add_pkey() to be consistent with add_index()
sql.py: into_table_name(): In function args, omit column name for function result columns
sql.py: into_table_name(): In function args, keep the input table name for input columns to identify where they came from, except for the main input table name because it makes the string too long
sql_gen.py: esc_name(): Don't return plain name if is_safe_name(), because this makes the SQL inconsistent when some names have "_"s and some don't
sql.py: index_pkey(): Use sql_gen.add_suffix() to ensure index name isn't too long
sql.py: put_table(): insert_out_pkeys, insert_in_pkeys: Use sql_gen.add_suffix() to ensure name isn't too long
sql.py: next_version(): Use new sql_gen.add_suffix(). Removed identifier_max_len because it is now in sql_gen.
sql_gen.py: Added identifier_max_len and add_suffix()
next_version(): Append the version # so it looks more natural. Take into account the max identifier length.
strings.py: Added add_suffix()
sql.py: put_table(): Name the in_table just "in" plus the version #, and the insert_in_pkeys/insert_out_pkeys based on in_table, so that they don't take up so much space in the SQL
sql_gen.py: is_safe_name(): Fixed bug where keywords were incorrectly considered safe
strings.py: repr_no_u(): Fixed bug where "u" prefix was removed even in reprs of non-strings
db_xml.py: into_table_name(): Removed no longer necessary handling of simple functions, which is now done by sql.into_table_name(). Ensure that rank params in functions (not tables) are not treated specially as hierarchical.
sql.py: put_table(): If into == None: For function calls, include the arguments in the into table name
sql_gen.py: to_name_only_col(): Support non-Col Code inputs
sql_gen.py CompareCond.to_str(), callers of combine_conds(): Removed unnecessary grouping () to make SQL clearer
sql_gen.py: Added combine_conds() and use it in Join.to_str() and sql.py mk_select()
sql_gen.py Join.to_str(), sql.py mk_select(): Combining conditions: Don't add newlines where not needed, so that output is less vertically spread out
sql_gen.py: is_safe_name(): Fixed bug where names starting with a digit were incorrectly considered safe
sql.py: put_table(): Separate temp table names from into table name with "_" instead of "-" so that quoting the table name will usually be unnecessary
sql.py: esc_name_by_module(): Remove unused param ignore_case
sql_gen.py: esc_name(): If is_safe_name(), just return name, to avoid escessive escaping in debug output for Redmine
sql_gen.py: is_safe_name(): Don't consider uppercase letters safe because they would cause inconsistent behavior in PostgreSQL if quoted vs. not quoted (only unquoted identifiers are case-insensitive)
sql.py: Removed no longer needed check_name()
sql.py: esc_name_by_module(): psycopg2: If ignore_case is set but name is unsafe, just escape it instead of raising an exception
sql_gen.py: Added is_safe_name()
sql.py: put_table(): col_ustr(): Removed no longer needed sql_gen.as_Col() because mapping and join_cols now ensure that their contents are sql_gen.Col objects
schemas/functions.sql: Added _alt relational function
sql.py: put_table(): Make mapping and join_cols a sql_gen.ColDict so that literal values will always be turned into sql_gen.Col objects. DuplicateKeyException: Use dict_subset_right_join() instead of dict_subset() so that all columns in a constraint are included in joins on out_table (such as for a relational function with omitted arguments).
sql_gen.py: Added ColDict
sql_gen.py: as_Col(): Added optional name param to specify that non-Col input will be renamed using NamedCol with the given name
sql.py: put_table(): FunctionValueException: Fixed bug where only function calls, not plain columns, were handled, by using sql_gen.unwrap_func_call() to remove any function call only if there was one
sql_gen.py: Added unwrap_func_call()
bin/map: by_col: Stripping XML functions not in the DB: Fixed bug where preserve_funcs.add() was used when `preserve_funcs |=` should have been used to add the entire iterable that sql.tables() returns
sql.py: not_null_col: Changed value to 'not_null_col' so that column doesn't seem like a status indicator of whether some value is not null (in fact it's just a column that is always not null)
xml_func.py: Replaced xpath.get_1() with xpath.get_value() where possible, for simplicity
xml_func.py: strip(): Evaluate structural functions like _ignore and _ref by process() instead of removing them. Store structural functions' names in structural_funcs module var. This ensures that _ref targets are still expanded in column-based import.
xpath.py: get(): Create attrs: Put keys last so that any lookahead assertion's path will be created last as it would have without the assertion. This ensures that any value argument of an XML function will always go last even if a lookahead assertion would otherwise have caused it to be created with the element's keys, which previously were created before the attributes.
sql.py: put_table(): If is_func, default into table name ends in () instead of '-pkeys'
schemas/vegbien.sql, functions.sql: Made cast functions STRICT to enable the RETURNS NULL ON NULL INPUT optimization
db_xml.py: put_table(): Pass is_func to sql.put_table()
sql.py: put_table(): Added is_func param for whether out_table is the name of a SQL function, not a table
db_xml.py: put_table(): Treat every node name that starts with "_" as a function, not just members of put_table_special_funcs. This ensures that DB function args are always treated as values, not children with fkeys to parent.
bin/map: by_col: Strip only XML functions that are not in the DB
db_xml.py: put_table(): Make special_funcs externally available as module constant put_table_special_funcs
sql.py: tables(): Changed schema param to schema_like and filter the schema using LIKE so that all schemas can be selected
to_do/timeline.doc: Updated to reflect the month we spent on optimization and column-based import
sql.py: put_table(): in_table name: Remove '-pkeys' suffix from the into table name before adding '-input' so that the name is shorter and clearer
sql.py: put_table(): Wrap repr() calls for debug messages in strings.as_tt() to add Redmine formatting
sql.py: put_table(): Output "Adding index" debug message with level=2.5 so it's not part of the Redmine steps
schemas/vegbien.sql, functions.sql: Cast functions: Fixed bug where invalid value exceptions were not being caught, because implicit conversions to the return type apparently only happen outside the block containing the RETURN statement (i.e. at the end of the function). Fixed by adding explicit type conversion to return type, so that type conversion would happen inside try block.
sql.py: put_table(): Re-enabled FunctionValueException handling, by just filtering out the value on all input columns that use the named function (since the error message does not specify which column it was that had the invalid value). This is in some ways better, anyway, because that way the invalid value is filtered out right away in all columns that could contain it, instead of potentially once for each column (if the value appears in more than one input column).
sql.py: add_index(): Fixed bug where expressions could not be converted to a string until their table name had been removed
sql_gen.py: Added Expr
sql.py: add_index(): Fixed bug where expressions needed to be enclosed in () to distinguish them from plain columns
sql.py: add_index(): Support simple expressions as well as columns
sql.py: Renamed index_col() to add_index() so its name isn't similar to index_cols()
sql_gen.py: FunctionCall: Removed repr() because it's a Code object and its to_str() does not take extra arguments
sql.py: run_query(): FunctionValueException: Expanded parsing to include regular function calls, not just relational functions' trigger functions. put_table(): Disabled FunctionValueException handling because this expands FunctionValueException beyond what put_table() could handle.
sql.py: put_table(): MissingCastException: Fixed bug where renaming of cast literal value was not properly propagated to the returned value of the function call, causing the query to assume that a DISTINCT ON column referred to column in one of the joined tables instead of a named column in the SELECT columns list. This logic error would have been very difficult to catch without inspecting the code!
sql_gen.py: Added wrap_in_func()
sql_gen.py: FunctionCall: Filter args through remove_col_rename() to remove any renamings from the function args
sql.py: put_table(): No handler for exception: Print full exception instead of just first line to assist in debugging
schemas/vegbien.sql, functions.sql: Removed _to* relational functions because type casting for those types is now automatic
mappings/DwC2-VegBIEN.specimens.csv: Removed _to* relational functions because type casting for those types is now automatic
schemas/functions.sql: Added cast functions for _to* relational functions
schemas/vegbien.sql: Changed cast functions' input types to text because type must match exactly, not just be implicitly castable
sql.py: run_query(): MissingCastException parsing: Support multiple-word types
sql.py: put_table(): Handle MissingCastExceptions by attempting to call a function with the name of the type on the column
sql_gen.py: Added Functions section with Function and FunctionCall
sql.py: Added MissingCastException and parse it in run_query()
schemas/vegbien.sql: Added cast functions for enum types which map invalid values to NULL
sql.py: put_table(): Fixed bug where some exceptions with no handler would not even allow insertion of no rows into the out_table (due to type mismatch issues), by creating an empty pkeys table as a special case
sql.py: put_table(): Preparing to insert new rows: Fixed bug where main_select needed to be generated after distinct_on was set in the if statement
sql.py: put_table(): log_exc(): Fixed bug where the exception strings rather than the exceptions themselves needed to be put in the set, because exceptions are not comparable with ==
sql.py: put_table(): Moved mk_main_select() call out of try block since it is not related to the exceptions that may be thrown
sql.py: put_table(): log_exc(): Check if exception already caught before to avoid infinite loops
Added debug2redmine and helper file debug2redmine.csv
sql.py, db_xml.py: Removed unnecessary calls to sql_gen.clean_name() now that str() handles this automatically
sql_gen.py: sql_gen classes inherit from new base class BasicObject, whose str() calls clean_name() on the object's repr(). Changed the main debug-repr producing method to be repr() instead of str().
Moved clean_name() from sql.py to sql_gen.py because it's DB-general and so that it can be used by sql_gen.py without circular dependencies
db_xml.py: into_table_name(): Handle hierarchical tables specially by including their rank in the into table. Interpret any table with a value column as a function, regardless of out_table name.
sql.py: put_table(): Log "Default value column does not exist in mapping" error with level 2.1 so that it doesn't appear in Redmine output
db_xml.py: put_table(): Pass next as sql.put_table()'s default param now that it is supported
sql.py: put_table(): Changed default param to be an output column because that is what would be passed in by db_xml.put_table(), and because there is already a mapping that resolves that to a flattened input column
sql.py: put_table(): Added default param for the value or input column to use as the pkey for missing rows
sql.py: put_table(): Use single quotes rather than double quotes around strings where possible