sql.py: put_table(): mk_select_(): Fixed bug where order_by needed to be None because otherwise it wouldn't match the distinct_on cols if they were specified
sql.py: put_table(): insert_(): Fixed bug where distinct_on was not passed to mk_select_()
sql.py: put_table(): mk_select_(): Fixed bug where distinct_on needed to be passed as a keyword param
sql.py: put_table(): insert_() and mk_select_() take distinct_on param
sql.py: put_table(): Factored out code that inserts into pkeys table into run_query_into_pkeys() helper function
sql.py: mk_select(): Implemented DISTINCT ON according to the distinct_on param
sql.py: mk_select(): Added distinct_on param to set the columns to SELECT DISTINCT ON
sql.py: clean_name(): Convert names to lowercase so that PostgreSQL will behave the same whether the name is escaped with "" or not. This will help avoid bugs in code that uses temp tables created by the sql module.
sql.py: put_table(): Added order_by=None wherever rows were not supposed to be re-ordered. On DuplicateKeyException: Save existing pkeys in temp table for joining on.
db_xml.py: put_table(): Pass limit and start to sql.put_table()
db_xml.py: put_table(): Added limit and start options
sql.py: When creating a temporary entity (table, function, etc.), instead create it as a permanent entity in debug mode so it can be viewed after the program is run
sql.py: DbConn: Store whether in debug mode (log_debug != log_debug_none) for easy use by methods
bin/map: connect_db(): Turn on autocommit mode in debug mode if commit is on, so that incremental results can be seen in the DB
sql.py: DbConn: Use internal autocommit handling instead of DB connection autocommit attr to avoid autocommits inside a savepoint
sql.py: DbConn: Added autocommit option to turn on autocommit mode. Use set_session() instead of SQL command to set isolation level.
sql.py: mk_insert_select(): embeddable: Fixed bug where the function may do different things when run, because the function (and other statements whose cached strings depend on the function name) may be run after the function definition would have changed, by versioning the function name and using CREATE FUNCTION instead of CREATE OR REPLACE FUNCTION so that its definition never changes
sql.py: Parse "function already exists" errors as DuplicateFunctionException
sql.py: mk_select(): joins: Fixed bug where join_not_equal did not do what it was designed for, which is filtering out matches of the join condition (before the bug fix, it effectively did a cross join with matching rows excluded, causing duplication of rows). Renamed join_not_equal to filter_out to reflect its intended use. Support table-scoped column names in the WHERE conds list.
sql.py: put_table(): Fixed bug where ORDER BY column needed to have table0 name prefixed (if it didn't already have a table name), to avoid ambiguous column references
sql.py: mk_select(): Fixed bug in joins where right_col had the table name prepended before it was copied for use with a different table name in join_using and join_not_equal
Mapped some unmapped fields in DwC inputs
Added mappings/for_review/DwC2-VegBIEN.specimens.fields.csv
db_xml.py: put_table(): Fixed bug where didn't commit right after inserting node, but instead waited until children with fkeys to parent (independent of the node itself) were inserted
sql.py: put_table(): insert_(): Use insert_select() instead of run_query_into() if new option pkeys_table_exists is on
sql.py: mk_select(): Support joins with !=
sql.py: mk_select(): Support only some join columns being join_using
sql.py: put_table(): Renamed in_joins to insert_joins and joins to select_joins for clarity
db_xml.py: put_table(): Support children with fkeys to parent
sql.py: mk_select(): Make tuple optional for None literal values
sql.py: put_table(): Removed "SELECT statement missing a WHERE, LIMIT, or OFFSET clause" warnings
bin/map: by_col: row_ct = 0 because it's unknown for now
mk_select(): Support join conditions with literal values
sql.py: mk_insert_select(): embeddable: Don't cache function_query because function def could change and then change back
sql.py: with_savepoint(): Renamed savepoints to have "level" prefix, since the # indicates the level #
sql.py: get_cur_query(): Also accept input params to combine with input_query, and pass input params when get_cur_query() is called
sql.py: DbConn.run_query(): Pass input query to get_cur_query()
sql.py: get_cur_query() and _add_cursor_info(): Support input_query param that will be used if the raw query is None. Pass input_query in DbConn.execute().
sql.py: DbConn.run_query(): Check that query != None
bin/map: out_is_db: Only rollback() and close() out_db if it was connected
sql.py: DbConn: Added connected()
sql.py: Wrapped calls to get_cur_query() that are used as strings in str(), because get_cur_query() can return None
sql.py: next_version(): Versions start from 1, because first existing name was version 0
put_table(): Use short name for temp_suffix now that version # will be added if needed
sql.py: mk_select(): Parse join columns for literal values and table-scoped names as well
mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Call _toGrowthform on growthform
schemas/vegbien.sql: Added _toGrowthform
sql.py: put_table(): Changed temp_prefix to a suffix so main name won't be removed if name is truncated
sql.py: mk_select(): fields: Support columns with tables. Changed syntax for literal values so that it wouldn't conflict with new syntax for columns with tables.
iters.py: flatten(): If not an iterable, just return the value
sql.py: put_table(): Pass in_pkeys and out_pkeys to run_query_into() by ref so they will be updated if the table names are changed
sql.py: put_table(): Pass pkeys to run_query_into() by ref so it will be updated if the table name is changed
sql.py: run_query_into(): If CREATE TABLE AS generates a DuplicateTableException, rename the table with a version # prepended
sql.py: run_query_into(): Made into param a reference so that the function can change it, and renamed it to into_ref
sql.py: put_table(): If DuplicateKeyException: run_query_into() recoverably, so that DB errors such as DuplicateTableException will be parsed
sql.py: Removed no-longer-needed try_insert()
sql.py: Merged with_parsed_errors() into run_query() so all recoverable queries would automatically benefit from DB error message parsing. DbConn: Moved _add_cursor_info() to DbCursor.execute().
sql.py: with_parsed_errors(): Raise DuplicateTableException for "relation already exists" errors instead of "table name specified more than once" errors
sql.py: run_query_into(): Removed "DROP TABLE IF EXISTS" because sometimes when there are collisions in the temp table names, the code actually uses both "copies" of the temp table. Eventually, this situation will be resolved by adding a counter to the temp table name.
sql.py: Cleaned up DbException's and subclasses' messages
exc.py: ExceptionWithCause: Added cause_newline option to put the cause on its own line instead of on the message line
sql.py: with_parsed_errors(): Also parse "table name specified more than once" errors as DuplicateTableExceptions
sql.py: put_table(): Handle DuplicateKeyExceptions by running a select query on the unique constraint columns
sql.py: mk_select(): Support tuples of tables, not just lists
sql.py: with_parsed_errors(): Support table names that start with "_"
sql.py: DbConn: Added with_savepoint(). with_savepoint(): Use new DbConn.with_savepoint().
schemas/functions.sql: Added _toBool
mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Use _toBool on iscultivated, isnative
schemas/functions.sql: Made trigger functions IMMUTABLE since they do not modify other tables
sql.py: put_table(): Added support for putting just a window subset of the rows in the table. Removed "SELECT statement missing a WHERE, LIMIT, or OFFSET clause" warnings.
sql.py: put_table(): Return the column where the pkeys are made available (the out_pkey) instead of taking it as an argument
sql.py: put_table(): Get input pkeys corresponding to rows in insert and join together out_pkeys and in_pkeys into final pkeys table
sql.py: put_table(): Fully support multiple in_tables, joined together using the main input table's pkey
sql.py: mk_select(): joins: Fixed bug where USING-based joins did not have closing ")"
db_xml.py: put_table(): Fixed bug where in_table was last in in_tables instead of first, causing it to be ignored by the current put_table() implementation, which only considers the first table name
db_xml.py: put_table(): Fixed bug where pkeys_table returned by recursive call to put_table() needed to be prefixed with $ to be treated as an input column name rather than a literal value
sql.py: mk_select(): Support joins with USING, which can be used to merge multiple input cols into the same output col
sql.py: mk_insert_select(): embeddable: Fixed bug where query that uses function was being sorted by its first column (the default mk_select() setting), when it should be left in its original order
sql.py: put_table(): Take a dict mapping out to in cols instead of separate in and out cols lists
sql.py: mk_select(): Joins: Reversed order of left_col and right_col in the joins dict as well, so the joined table's columns are the keys
sql.py: mk_select(): Joins: Reversed order of left_col and right_col so the column of the table being joined is first, to match the form of a WHERE clause
sql.py: mk_select(): Support joins
sql.py: mk_select(): Accept a list of tables to join together (initial implementation just uses the first table)
sql.py: mk_select(): Support ORDER BY clause. By default, order by the pkey, since PostgreSQL apparently doesn't do this automatically (and this was causing some staging table tests to fail).
bin/map: In debug mode, print the row # and input row just like in error messages
bin/map: verbose_errors also defaults to on in debug mode
sql.py: add_row_num(): Make the row number column the primary key
csv2db: Use new sql.cleanup_table() to map NULL-equivalents to NULL. Consider the empty string to be NULL.
sql.py: Added cleanup_table()
csvs.py: Added row filters
db_xml.py: put_table(): Fixed bug where relational functions were not being treated as value nodes, and thus their containing child was treated as a child with a backwards pointer instead of a field
xml_func.py: Added is_func*() and is_xml_func*() and use them where their definitions were used
db_xml.py: Added value() and use it where xml_dom.first_elem() was used
mappings/DwC2-VegBIEN.specimens.csv: *Latitude/*Longitude: Moved _toDouble directly after the output col name, so that it's run after any translation functions (which all return strings). *ElevationInMeters: Added _toDouble around all output cols.
xpath.py: get(): Create attrs: Fixed bug where attrs were created with last_only on, which caused attrs to get created multiple times if there were multiple attrs of the same name but different values, becase the last_only optimization would only check the last attr of that name
mappings/DwC2-VegBIEN.specimens.csv: *Latitude/*Longitude: Use new _toDouble to convert strings to doubles (needed for by_col)
schemas/functions.sql: Added _toDouble
bin/map: When calling xml_func.process(), pass DB connection if available