mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Use _toBool on iscultivated, isnative
schemas/functions.sql: Added _toBool
schemas/functions.sql: Made trigger functions IMMUTABLE since they do not modify other tables
sql.py: put_table(): Added support for putting just a window subset of the rows in the table. Removed "SELECT statement missing a WHERE, LIMIT, or OFFSET clause" warnings.
sql.py: put_table(): Return the column where the pkeys are made available (the out_pkey) instead of taking it as an argument
sql.py: put_table(): Get input pkeys corresponding to rows in insert and join together out_pkeys and in_pkeys into final pkeys table
sql.py: put_table(): Fully support multiple in_tables, joined together using the main input table's pkey
sql.py: mk_select(): joins: Fixed bug where USING-based joins did not have closing ")"
db_xml.py: put_table(): Fixed bug where in_table was last in in_tables instead of first, causing it to be ignored by the current put_table() implementation, which only considers the first table name
db_xml.py: put_table(): Fixed bug where pkeys_table returned by recursive call to put_table() needed to be prefixed with $ to be treated as an input column name rather than a literal value
sql.py: mk_select(): Support joins with USING, which can be used to merge multiple input cols into the same output col
sql.py: mk_insert_select(): embeddable: Fixed bug where query that uses function was being sorted by its first column (the default mk_select() setting), when it should be left in its original order
sql.py: put_table(): Take a dict mapping out to in cols instead of separate in and out cols lists
sql.py: mk_select(): Joins: Reversed order of left_col and right_col in the joins dict as well, so the joined table's columns are the keys
sql.py: mk_select(): Joins: Reversed order of left_col and right_col so the column of the table being joined is first, to match the form of a WHERE clause
sql.py: mk_select(): Support joins
sql.py: mk_select(): Accept a list of tables to join together (initial implementation just uses the first table)
sql.py: mk_select(): Support ORDER BY clause. By default, order by the pkey, since PostgreSQL apparently doesn't do this automatically (and this was causing some staging table tests to fail).
bin/map: In debug mode, print the row # and input row just like in error messages
bin/map: verbose_errors also defaults to on in debug mode
sql.py: add_row_num(): Make the row number column the primary key
csv2db: Use new sql.cleanup_table() to map NULL-equivalents to NULL. Consider the empty string to be NULL.
sql.py: Added cleanup_table()
csvs.py: Added row filters
db_xml.py: put_table(): Fixed bug where relational functions were not being treated as value nodes, and thus their containing child was treated as a child with a backwards pointer instead of a field
xml_func.py: Added is_func*() and is_xml_func*() and use them where their definitions were used
db_xml.py: Added value() and use it where xml_dom.first_elem() was used
mappings/DwC2-VegBIEN.specimens.csv: *Latitude/*Longitude: Moved _toDouble directly after the output col name, so that it's run after any translation functions (which all return strings). *ElevationInMeters: Added _toDouble around all output cols.
xpath.py: get(): Create attrs: Fixed bug where attrs were created with last_only on, which caused attrs to get created multiple times if there were multiple attrs of the same name but different values, becase the last_only optimization would only check the last attr of that name
mappings/DwC2-VegBIEN.specimens.csv: *Latitude/*Longitude: Use new _toDouble to convert strings to doubles (needed for by_col)
schemas/functions.sql: Added _toDouble
bin/map: When calling xml_func.process(), pass DB connection if available
xml_func.py: process(): If DB with relational functions available (passed in via db param), call any non-local XML functions as relational funcs
sql.py: put(): pkey param (now pkey_) defaults to table's pkey
bin/map: by_col: In debug mode, print stripped XML tree that guides import
vegbien_dest: Fixed bug where there was a missing line continuation char before schemas var
sql.py: DbConn: Fixed bug where schemas db_config value needed to be split apart into strings. Fixed bug where current_setting() returned a value rather than an identifier, so it had to be used with set_config() instead of SET, and run after SET TRANSACTION ISOLATION LEVEL. Moved Input validation section before Database connections because it's used by Database connections.
Regenerated vegbien.ERD exports
vegbien.ERD.mwb: Changed lines to a configuration that MySQLWorkbench wouldn't keep resetting whenever the ERD was reopened
vegbien_dest: Added "functions" to schemas
sql.py: db_config: Added schemas param. DbConn: Use any schemas db_config value to set search_path.
sql.py: add_row_num(): Name the column "_row_num" so that it doesn't conflict with any "row_num" column that's part of the table schema
main Makefile: VegBIEN DB: functions schema: Renamed schemas/functions/clear to .../reset to reflect that it also resets the schema to what's in the dump file. schemas/functions/reset: Use now-available schemas/functions.sql to create the schema.
Added autogen schemas/functions.sql
schemas/vegbien.sql.make: Use new pg_dump_vegbien
Added pg_dump_vegbien to dump a schema of the vegbien db
main Makefile: VegBIEN DB: Added functions schema targets
Makefile: $(confirm): Support a separate line outside of the highlighted line. Include the "Continue?" in the macro since all prompts include it.
Makefile: VegBIEN DB: Display different warning message depending on whether entire DB or just current public schema is being deleted
db_xml.py: put_table(): Recurse into forward pointers
sql.py: put_table(): Take multiple in_tables. Initial implementation just used the first in_table.
sql.py: Added add_row_num(). put_table(): Add row_num to pkeys_table, so it can be joined with in_table's pkeys.
sql.py: Added run_query_into() and use it in insert_select()
sql.py: pkey(): Support escaped table names
sql.py: mk_insert_select(): embeddable: Name the function alias "f" since it will just be wrapped in a nested SELECT, so the exact name doesn't matter (and won't be visible outside the nested SELECT anyway)
db_xml.py: put_table(): Return the (table, col) where the pkeys are made available, now that this information is available from sql.put_table()
sql.py: put_table(): Return just the name of the table where the pkeys are made available, since the column name in that table now equals the pkey name
sql.py: mk_insert_select(): embeddable: Make the column returned by the function have the same name as the returning column
db_xml.py: put_table() Use new sql.put_table()
sql.py: Added put_table()
sql.py: Added clean_name(). Use it where needed to make an escaped name appendable as a string.
sql.py: Added with_parsed_errors() and use it in try_insert()
sql.py: insert_select(): into != None: Fixed bug where cacheable was not passed through to DROP TABLE's run_query(), even though it was passed through to CREATE TABLE AS's run_query()
db_xml.py: put_table(): Place pkeys in temp table
sql.py: mk_insert_select(): Document that embeddable will cause the query to be fully cached, not just if it raises an exception. insert_select(): into != None: Pass recover and cacheable through to each run_query()
sql.py: insert_select(): Support placing RETURNING values in temp table
db_xml.py: put_table(): Support returning pkey from INSERT SELECT
sql.py: mk_insert_select(): Support using an INSERT RETURNING statement as a nested SELECT
sql.py: mk_insert_select(): Removed unused params recover and cacheable
sql.py: Added mogrify()
db_xml.py: put_table(): Corrected @return doc
sql.py: Added mk_insert_select() and use it in insert_select()
db_xml.py: put_table(): Use new insert_select()
sql.py: insert_select(): Changed order of cols and params arguments so select_query and params would be together
sql.py: Added insert_select() and use it in insert()
Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default
sql.py: esc_name_by_module(): Changed preserve_case to ignore_case, which defaults to False
sql.py: esc_name_by_module(): preserve_case defaults to True
sql.py: mk_select(): Escape all names used (table, column, cond, etc.)
sql.py: esc_name_by_module(): If not enclosing name in quotes, call check_name() on it
sql.py: mk_select(): Support literal values in the list of cols to select
sql.py: mk_select(): Don't escape the table name, because it will either be check_name()d or it's already been escaped
sql.py: Added mk_select(), and use it in select()
bin/map: Always pass qual_name(table) to sql.select(). This is possible now that qual_name() can handle None schemas.
db_xml.py: put_table(): Take separate in_table and in_schema names, instead of in_table and table_is_esc, because the in_schema is needed to scope the temp tables appropriately
sql.py: qual_name(): If schema is None, don't prepend schema
bin/map, sql.py: Turned SQL query caching back on because benchmarks of just the caching on vs. off reveal that it does reduce processing time significantly. However, there is a slowdown that was introduced between the time caching was added and the time the same XML tree was used for each node, which was giving the false indication that the slowdown was due to the caching.
bin/map: Turn SQL query caching off by default
bin/map: Added cache_sql env var to enable SQL query caching
sql.py: Make caching DbConn enablable. Turn caching off by default because recent benchmarks (n=1000) were showing that it slows things down.
bin/map: Added new verbose_errors mode, enabled in test mode and off otherwise, which controls whether the output row and tracebacks are included in error messages. Having this off in import mode will reduce the size of error logs so they don't fill up the vegbiendev hard disk as quickly.
exc.py: print_ex(): Added detail option to turn off traceback
bin/map: Turn parallel processing off by default. This should fix "Cannot allocate memory" errors in large imports.
bin/map: in_is_db: Don't cache the main SELECT query
bin/map: by_col: Use the created template, which already has the column names in it, instead of mapping a sample row
bin/map: Fixed bug where db_xml could not be imported twice, or it was treated as an undefined variable for some reason
bin/map: map_table(): Make each column a db_xml.ColRef instead of a bare index, so that it will appear as the column name when converted to a string. This will provide better debugging info in the template tree and also avoid needing to create a separate sample row in by_col.
db_xml.py: Added ColRef
bin/map: Fixed bug where row count was off by one if all rows in the input were exhausted, because the row that raises StopIteration was counting as a row