Activity
From 04/17/2012 to 05/16/2012
05/15/2012
- 03:56 PM Revision 2205: sql.py: put_table(): mk_select_(): Fixed bug where order_by needed to be None because otherwise it wouldn't match the distinct_on cols if they were specified
- 03:55 PM Revision 2204: sql.py: put_table(): insert_(): Fixed bug where distinct_on was not passed to mk_select_()
- 03:30 PM Revision 2203: sql.py: put_table(): mk_select_(): Fixed bug where distinct_on needed to be passed as a keyword param
- 03:21 PM Revision 2202: sql.py: put_table(): insert_() and mk_select_() take distinct_on param
- 03:10 PM Revision 2201: sql.py: put_table(): Factored out code that inserts into pkeys table into run_query_into_pkeys() helper function
- 02:55 PM Revision 2200: sql.py: mk_select(): Implemented DISTINCT ON according to the distinct_on param
- 02:48 PM Revision 2199: sql.py: mk_select(): Added distinct_on param to set the columns to SELECT DISTINCT ON
- 02:31 PM Revision 2198: sql.py: clean_name(): Convert names to lowercase so that PostgreSQL will behave the same whether the name is escaped with "" or not. This will help avoid bugs in code that uses temp tables created by the sql module.
- 02:29 PM Revision 2197: sql.py: put_table(): Added order_by=None wherever rows were not supposed to be re-ordered. On DuplicateKeyException: Save existing pkeys in temp table for joining on.
- 01:31 PM Revision 2196: db_xml.py: put_table(): Pass limit and start to sql.put_table()
- 01:09 PM Revision 2195: db_xml.py: put_table(): Added limit and start options
- 11:54 AM Revision 2194: sql.py: When creating a temporary entity (table, function, etc.), instead create it as a permanent entity in debug mode so it can be viewed after the program is run
- 11:40 AM Revision 2193: sql.py: DbConn: Store whether in debug mode (log_debug != log_debug_none) for easy use by methods
- 11:31 AM Revision 2192: bin/map: connect_db(): Turn on autocommit mode in debug mode if commit is on, so that incremental results can be seen in the DB
- 11:30 AM Revision 2191: sql.py: DbConn: Use internal autocommit handling instead of DB connection autocommit attr to avoid autocommits inside a savepoint
- 11:15 AM Revision 2190: sql.py: DbConn: Added autocommit option to turn on autocommit mode. Use set_session() instead of SQL command to set isolation level.
05/14/2012
- 05:50 PM Revision 2189: sql.py: mk_insert_select(): embeddable: Fixed bug where the function may do different things when run, because the function (and other statements whose cached strings depend on the function name) may be run after the function definition would have changed, by versioning the function name and using CREATE FUNCTION instead of CREATE OR REPLACE FUNCTION so that its definition never changes
- 05:28 PM Revision 2188: sql.py: Parse "function already exists" errors as DuplicateFunctionException
- 05:13 PM Revision 2187: sql.py: mk_select(): joins: Fixed bug where join_not_equal did not do what it was designed for, which is filtering out matches of the join condition (before the bug fix, it effectively did a cross join with matching rows excluded, causing duplication of rows). Renamed join_not_equal to filter_out to reflect its intended use. Support table-scoped column names in the WHERE conds list.
- 04:22 PM Revision 2186: sql.py: put_table(): Fixed bug where ORDER BY column needed to have table0 name prefixed (if it didn't already have a table name), to avoid ambiguous column references
- 04:11 PM Revision 2185: sql.py: mk_select(): Fixed bug in joins where right_col had the table name prepended *before* it was copied for use with a different table name in join_using and join_not_equal
- 03:42 PM Revision 2184: Mapped some unmapped fields in DwC inputs
- 02:19 PM Revision 2183: Added mappings/for_review/DwC2-VegBIEN.specimens.fields.csv
- 01:21 PM Revision 2182: db_xml.py: put_table(): Fixed bug where didn't commit right after inserting node, but instead waited until children with fkeys to parent (independent of the node itself) were inserted
- 01:16 PM Revision 2181: sql.py: put_table(): insert_(): Use insert_select() instead of run_query_into() if new option pkeys_table_exists is on
- 12:51 PM Revision 2180: sql.py: mk_select(): Support joins with !=
- 12:45 PM Revision 2179: sql.py: mk_select(): Support only some join columns being join_using
- 12:40 PM Revision 2178: sql.py: put_table(): Renamed in_joins to insert_joins and joins to select_joins for clarity
- 12:21 PM Revision 2177: db_xml.py: put_table(): Support children with fkeys to parent
- 12:11 PM Revision 2176: sql.py: mk_select(): Make tuple optional for None literal values
05/13/2012
- 02:05 PM Revision 2175: sql.py: put_table(): Removed "SELECT statement missing a WHERE, LIMIT, or OFFSET clause" warnings
- 02:02 PM Revision 2174: bin/map: by_col: row_ct = 0 because it's unknown for now
- 02:00 PM Revision 2173: mk_select(): Support join conditions with literal values
- 01:42 PM Revision 2172: sql.py: mk_insert_select(): embeddable: Don't cache function_query because function def could change and then change back
- 01:35 PM Revision 2171: sql.py: with_savepoint(): Renamed savepoints to have "level" prefix, since the # indicates the level #
- 01:32 PM Revision 2170: sql.py: get_cur_query(): Also accept input params to combine with input_query, and pass input params when get_cur_query() is called
- 01:26 PM Revision 2169: sql.py: DbConn.run_query(): Pass input query to get_cur_query()
- 01:19 PM Revision 2168: sql.py: get_cur_query() and _add_cursor_info(): Support input_query param that will be used if the raw query is None. Pass input_query in DbConn.execute().
- 01:09 PM Revision 2167: sql.py: DbConn.run_query(): Check that query != None
- 01:05 PM Revision 2166: bin/map: out_is_db: Only rollback() and close() out_db if it was connected
- 01:04 PM Revision 2165: sql.py: DbConn: Added connected()
- 01:01 PM Revision 2164: sql.py: Wrapped calls to get_cur_query() that are used as strings in str(), because get_cur_query() can return None
- 12:57 PM Revision 2163: sql.py: next_version(): Versions start from 1, because first existing name was version 0
- 12:55 PM Revision 2162: put_table(): Use short name for temp_suffix now that version # will be added if needed
- 12:51 PM Revision 2161: sql.py: mk_select(): Parse join columns for literal values and table-scoped names as well
- 11:54 AM Revision 2160: mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Call _toGrowthform on growthform
- 11:53 AM Revision 2159: schemas/vegbien.sql: Added _toGrowthform
- 11:19 AM Revision 2158: sql.py: put_table(): Changed temp_prefix to a suffix so main name won't be removed if name is truncated
- 11:14 AM Revision 2157: sql.py: mk_select(): fields: Support columns with tables. Changed syntax for literal values so that it wouldn't conflict with new syntax for columns with tables.
- 11:08 AM Revision 2156: iters.py: flatten(): If not an iterable, just return the value
- 10:32 AM Revision 2155: sql.py: put_table(): Pass in_pkeys and out_pkeys to run_query_into() by ref so they will be updated if the table names are changed
- 10:28 AM Revision 2154: sql.py: put_table(): Pass pkeys to run_query_into() by ref so it will be updated if the table name is changed
- 10:15 AM Revision 2153: sql.py: run_query_into(): If CREATE TABLE AS generates a DuplicateTableException, rename the table with a version # prepended
- 10:08 AM Revision 2152: sql.py: run_query_into(): Made into param a reference so that the function can change it, and renamed it to into_ref
- 09:36 AM Revision 2151: sql.py: run_query_into(): Made into param a reference so that the function can change it, and renamed it to into_ref
- 09:11 AM Revision 2150: sql.py: put_table(): If DuplicateKeyException: run_query_into() recoverably, so that DB errors such as DuplicateTableException will be parsed
- 09:07 AM Revision 2149: sql.py: Removed no-longer-needed try_insert()
- 09:05 AM Revision 2148: sql.py: Merged with_parsed_errors() into run_query() so all recoverable queries would automatically benefit from DB error message parsing. DbConn: Moved _add_cursor_info() to DbCursor.execute().
- 07:45 AM Revision 2147: sql.py: with_parsed_errors(): Raise DuplicateTableException for "relation already exists" errors instead of "table name specified more than once" errors
- 07:43 AM Revision 2146: sql.py: run_query_into(): Removed "DROP TABLE IF EXISTS" because sometimes when there are collisions in the temp table names, the code actually uses both "copies" of the temp table. Eventually, this situation will be resolved by adding a counter to the temp table name.
- 07:26 AM Revision 2145: sql.py: Cleaned up DbException's and subclasses' messages
- 07:26 AM Revision 2144: exc.py: ExceptionWithCause: Added cause_newline option to put the cause on its own line instead of on the message line
- 07:10 AM Revision 2143: sql.py: with_parsed_errors(): Also parse "table name specified more than once" errors as DuplicateTableExceptions
- 06:56 AM Revision 2142: sql.py: put_table(): Handle DuplicateKeyExceptions by running a select query on the unique constraint columns
- 06:14 AM Revision 2141: sql.py: mk_select(): Support tuples of tables, not just lists
- 05:29 AM Revision 2140: sql.py: with_parsed_errors(): Support table names that start with "_"
- 05:20 AM Revision 2139: sql.py: DbConn: Added with_savepoint(). with_savepoint(): Use new DbConn.with_savepoint().
- 04:13 AM Revision 2138: schemas/functions.sql: Added _toBool
- 04:12 AM Revision 2137: mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Use _toBool on iscultivated, isnative
- 04:11 AM Revision 2136: schemas/functions.sql: Added _toBool
- 04:01 AM Revision 2135: schemas/functions.sql: Made trigger functions IMMUTABLE since they do not modify other tables
- 03:51 AM Revision 2134: sql.py: put_table(): Added support for putting just a window subset of the rows in the table. Removed "SELECT statement missing a WHERE, LIMIT, or OFFSET clause" warnings.
- 03:30 AM Revision 2133: sql.py: put_table(): Return the column where the pkeys are made available (the out_pkey) instead of taking it as an argument
- 03:20 AM Revision 2132: sql.py: put_table(): Get input pkeys corresponding to rows in insert and join together out_pkeys and in_pkeys into final pkeys table
- 01:04 AM Revision 2131: sql.py: put_table(): Fully support multiple in_tables, joined together using the main input table's pkey
- 01:02 AM Revision 2130: sql.py: mk_select(): joins: Fixed bug where USING-based joins did not have closing ")"
- 12:28 AM Revision 2129: db_xml.py: put_table(): Fixed bug where in_table was last in in_tables instead of first, causing it to be ignored by the current put_table() implementation, which only considers the first table name
- 12:17 AM Revision 2128: db_xml.py: put_table(): Fixed bug where pkeys_table returned by recursive call to put_table() needed to be prefixed with $ to be treated as an input column name rather than a literal value
05/09/2012
- 05:29 AM Revision 2127: sql.py: mk_select(): Support joins with USING, which can be used to merge multiple input cols into the same output col
- 04:42 AM Revision 2126: sql.py: mk_insert_select(): embeddable: Fixed bug where query that uses function was being sorted by its first column (the default mk_select() setting), when it should be left in its original order
- 04:36 AM Revision 2125: sql.py: put_table(): Take a dict mapping out to in cols instead of separate in and out cols lists
- 04:08 AM Revision 2124: sql.py: mk_select(): Joins: Reversed order of left_col and right_col in the joins dict as well, so the joined table's columns are the keys
- 04:05 AM Revision 2123: sql.py: mk_select(): Joins: Reversed order of left_col and right_col so the column of the table being joined is first, to match the form of a WHERE clause
- 03:56 AM Revision 2122: sql.py: mk_select(): Support joins
- 03:27 AM Revision 2121: sql.py: mk_select(): Accept a list of tables to join together (initial implementation just uses the first table)
- 02:26 AM Revision 2120: sql.py: mk_select(): Support ORDER BY clause. By default, order by the pkey, since PostgreSQL apparently doesn't do this automatically (and this was causing some staging table tests to fail).
- 02:04 AM Revision 2119: bin/map: In debug mode, print the row # and input row just like in error messages
- 01:51 AM Revision 2118: bin/map: verbose_errors also defaults to on in debug mode
- 01:39 AM Revision 2117: sql.py: add_row_num(): Make the row number column the primary key
- 12:36 AM Revision 2116: csv2db: Use new sql.cleanup_table() to map NULL-equivalents to NULL. Consider the empty string to be NULL.
- 12:35 AM Revision 2115: sql.py: Added cleanup_table()
- 12:33 AM Revision 2114: csvs.py: Added row filters
05/07/2012
- 11:14 PM Revision 2113: db_xml.py: put_table(): Fixed bug where relational functions were not being treated as value nodes, and thus their containing child was treated as a child with a backwards pointer instead of a field
- 11:12 PM Revision 2112: xml_func.py: Added is_func*() and is_xml_func*() and use them where their definitions were used
- 10:40 PM Revision 2111: db_xml.py: Added value() and use it where xml_dom.first_elem() was used
- 10:12 PM Revision 2110: mappings/DwC2-VegBIEN.specimens.csv: *Latitude/*Longitude: Moved _toDouble directly after the output col name, so that it's run after any translation functions (which all return strings). *ElevationInMeters: Added _toDouble around all output cols.
- 09:56 PM Revision 2109: xpath.py: get(): Create attrs: Fixed bug where attrs were created with last_only on, which caused attrs to get created multiple times if there were multiple attrs of the same name but different values, becase the last_only optimization would only check the last attr of that name
- 09:19 PM Revision 2108: mappings/DwC2-VegBIEN.specimens.csv: *Latitude/*Longitude: Use new _toDouble to convert strings to doubles (needed for by_col)
- 09:16 PM Revision 2107: schemas/functions.sql: Added _toDouble
- 09:16 PM Revision 2106: bin/map: When calling xml_func.process(), pass DB connection if available
- 09:15 PM Revision 2105: xml_func.py: process(): If DB with relational functions available (passed in via db param), call any non-local XML functions as relational funcs
- 09:09 PM Revision 2104: sql.py: put(): pkey param (now pkey_) defaults to table's pkey
- 08:30 PM Revision 2103: bin/map: by_col: In debug mode, print stripped XML tree that guides import
- 08:03 PM Revision 2102: vegbien_dest: Fixed bug where there was a missing line continuation char before schemas var
- 08:02 PM Revision 2101: sql.py: DbConn: Fixed bug where schemas db_config value needed to be split apart into strings. Fixed bug where current_setting() returned a value rather than an identifier, so it had to be used with set_config() instead of SET, and run after SET TRANSACTION ISOLATION LEVEL. Moved Input validation section before Database connections because it's used by Database connections.
- 07:29 PM Revision 2100: Regenerated vegbien.ERD exports
- 07:26 PM Revision 2099: vegbien.ERD.mwb: Changed lines to a configuration that MySQLWorkbench wouldn't keep resetting whenever the ERD was reopened
- 07:21 PM Revision 2098: vegbien_dest: Added "functions" to schemas
- 07:20 PM Revision 2097: sql.py: db_config: Added schemas param. DbConn: Use any schemas db_config value to set search_path.
- 06:58 PM Revision 2096: sql.py: add_row_num(): Name the column "_row_num" so that it doesn't conflict with any "row_num" column that's part of the table schema
- 06:50 PM Revision 2095: main Makefile: VegBIEN DB: functions schema: Renamed schemas/functions/clear to .../reset to reflect that it also resets the schema to what's in the dump file. schemas/functions/reset: Use now-available schemas/functions.sql to create the schema.
- 06:45 PM Revision 2094: Added autogen schemas/functions.sql
- 06:41 PM Revision 2093: schemas/vegbien.sql.make: Use new pg_dump_vegbien
- 06:41 PM Revision 2092: Added pg_dump_vegbien to dump a schema of the vegbien db
- 06:34 PM Revision 2091: main Makefile: VegBIEN DB: Added functions schema targets
- 06:09 PM Revision 2090: Makefile: $(confirm): Support a separate line outside of the highlighted line. Include the "Continue?" in the macro since all prompts include it.
- 05:55 PM Revision 2089: Makefile: VegBIEN DB: Display different warning message depending on whether entire DB or just current public schema is being deleted
- 05:38 PM Revision 2088: db_xml.py: put_table(): Recurse into forward pointers
05/05/2012
- 09:55 PM Revision 2087: sql.py: put_table(): Take multiple in_tables. Initial implementation just used the first in_table.
- 09:48 PM Revision 2086: sql.py: Added add_row_num(). put_table(): Add row_num to pkeys_table, so it can be joined with in_table's pkeys.
- 09:38 PM Revision 2085: sql.py: Added run_query_into() and use it in insert_select()
- 08:53 PM Revision 2084: sql.py: pkey(): Support escaped table names
- 07:32 PM Revision 2083: sql.py: mk_insert_select(): embeddable: Name the function alias "f" since it will just be wrapped in a nested SELECT, so the exact name doesn't matter (and won't be visible outside the nested SELECT anyway)
- 07:08 PM Revision 2082: db_xml.py: put_table(): Return the (table, col) where the pkeys are made available, now that this information is available from sql.put_table()
- 07:05 PM Revision 2081: sql.py: put_table(): Return just the name of the table where the pkeys are made available, since the column name in that table now equals the pkey name
- 06:58 PM Revision 2080: sql.py: mk_insert_select(): embeddable: Make the column returned by the function have the same name as the returning column
- 06:39 PM Revision 2079: db_xml.py: put_table() Use new sql.put_table()
- 06:39 PM Revision 2078: sql.py: Added put_table()
- 06:37 PM Revision 2077: sql.py: Added clean_name(). Use it where needed to make an escaped name appendable as a string.
- 05:53 PM Revision 2076: sql.py: Added with_parsed_errors() and use it in try_insert()
- 05:30 PM Revision 2075: sql.py: insert_select(): into != None: Fixed bug where cacheable was not passed through to DROP TABLE's run_query(), even though it was passed through to CREATE TABLE AS's run_query()
- 05:27 PM Revision 2074: db_xml.py: put_table(): Place pkeys in temp table
- 05:26 PM Revision 2073: sql.py: mk_insert_select(): Document that embeddable will cause the query to be fully cached, not just if it raises an exception. insert_select(): into != None: Pass recover and cacheable through to each run_query()
- 05:17 PM Revision 2072: sql.py: insert_select(): Support placing RETURNING values in temp table
- 04:40 PM Revision 2071: db_xml.py: put_table(): Support returning pkey from INSERT SELECT
- 04:38 PM Revision 2070: sql.py: mk_insert_select(): Support using an INSERT RETURNING statement as a nested SELECT
05/04/2012
- 07:15 PM Revision 2069: sql.py: mk_insert_select(): Removed unused params recover and cacheable
- 07:10 PM Revision 2068: sql.py: Added mogrify()
- 07:00 PM Revision 2067: db_xml.py: put_table(): Corrected @return doc
- 06:32 PM Revision 2066: sql.py: Added mk_insert_select() and use it in insert_select()
- 06:21 PM Revision 2065: db_xml.py: put_table(): Use new insert_select()
- 06:15 PM Revision 2064: sql.py: insert_select(): Changed order of cols and params arguments so select_query and params would be together
- 06:12 PM Revision 2063: sql.py: Added insert_select() and use it in insert()
- 04:55 PM Revision 2062: Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default
- 04:51 PM Revision 2061: sql.py: esc_name_by_module(): Changed preserve_case to ignore_case, which defaults to False
- 04:49 PM Revision 2060: Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default
- 04:47 PM Revision 2059: sql.py: esc_name_by_module(): preserve_case defaults to True
- 04:44 PM Revision 2058: sql.py: mk_select(): Escape all names used (table, column, cond, etc.)
- 04:33 PM Revision 2057: sql.py: esc_name_by_module(): If not enclosing name in quotes, call check_name() on it
- 04:30 PM Revision 2056: sql.py: mk_select(): Support literal values in the list of cols to select
- 03:22 PM Revision 2055: sql.py: mk_select(): Don't escape the table name, because it will either be check_name()d or it's already been escaped
- 03:11 PM Revision 2054: sql.py: Added mk_select(), and use it in select()
- 02:14 PM Revision 2053: bin/map: Always pass qual_name(table) to sql.select(). This is possible now that qual_name() can handle None schemas.
- 02:08 PM Revision 2052: db_xml.py: put_table(): Take separate in_table and in_schema names, instead of in_table and table_is_esc, because the in_schema is needed to scope the temp tables appropriately
- 02:04 PM Revision 2051: sql.py: qual_name(): If schema is None, don't prepend schema
05/03/2012
- 06:59 PM Revision 2050: bin/map, sql.py: Turned SQL query caching back on because benchmarks of just the caching on vs. off reveal that it does reduce processing time significantly. However, there is a slowdown that was introduced between the time caching was added and the time the same XML tree was used for each node, which was giving the false indication that the slowdown was due to the caching.
- 06:44 PM Revision 2049: bin/map: Turn SQL query caching off by default
- 06:39 PM Revision 2048: bin/map: Added cache_sql env var to enable SQL query caching
- 06:39 PM Revision 2047: sql.py: Make caching DbConn enablable. Turn caching off by default because recent benchmarks (n=1000) were showing that it slows things down.
- 04:53 PM Revision 2046: bin/map: Added new verbose_errors mode, enabled in test mode and off otherwise, which controls whether the output row and tracebacks are included in error messages. Having this off in import mode will reduce the size of error logs so they don't fill up the vegbiendev hard disk as quickly.
- 04:51 PM Revision 2045: exc.py: print_ex(): Added detail option to turn off traceback
- 04:10 PM Revision 2044: bin/map: Turn parallel processing off by default. This should fix "Cannot allocate memory" errors in large imports.
05/01/2012
- 07:58 AM Revision 2043: bin/map: in_is_db: Don't cache the main SELECT query
- 07:56 AM Revision 2042: bin/map: by_col: Use the created template, which already has the column names in it, instead of mapping a sample row
- 07:50 AM Revision 2041: bin/map: Fixed bug where db_xml could not be imported twice, or it was treated as an undefined variable for some reason
- 07:45 AM Revision 2040: bin/map: map_table(): Make each column a db_xml.ColRef instead of a bare index, so that it will appear as the column name when converted to a string. This will provide better debugging info in the template tree and also avoid needing to create a separate sample row in by_col.
- 07:33 AM Revision 2039: db_xml.py: Added ColRef
- 06:33 AM Revision 2038: bin/map: Fixed bug where row count was off by one if all rows in the input were exhausted, because the row that raises StopIteration was counting as a row
- 06:13 AM Revision 2037: main Makefile: VegBIEN DB: mk_db: Use template1 because it has PROCEDURAL LANGUAGE plpgsql already installed and we aren't using an encoding other than UTF8
- 06:11 AM Revision 2036: Moved "CREATE PROCEDURAL LANGUAGE plpgsql" to main Makefile so that it would only run when the DB is created, not when the public schema is reinstalled. This is only relevant on PostgreSQL < 9.x, where the plpgsql language is not part of template0.
- 05:56 AM Revision 2035: Renamed parallel.py to parallelproc.py to avoid conflict with new system parallel module on vegbiendev
- 05:43 AM Revision 2034: Makefile: VegBIEN DB: public schema: Added schemas/rotate
- 05:34 AM Revision 2033: bin/map: Fixed bug in input rows processed count where the count would be off by 1, because the for loop would leave i at the index of the last row instead of one-past-the-last
- 04:44 AM Revision 2032: bin/map: Use the same XML tree for each row in DB outputs, to eliminate time spent creating the tree from the XPaths for each row
- 04:08 AM Revision 2031: bin/map: map_table(): Resolve each prefix into a separate mapping, which is collision-eliminated, instead of resolving values from multiple prefixes when each individual row is mapped
- 03:50 AM Revision 2030: bin/map: Moved collision-prevention code to map_rows() so it would only run if there were mappings, and so that it would run after any mappings preprocessing by map_table() that creates more collisions
- 03:45 AM Revision 2029: bin/map: Prevent collisions if multiple inputs mapping to same output
- 02:02 AM Revision 2028: mappings/DwC1-DwC2.specimens.csv: Mapped collectorNumber and recordNumber to recordNumber with _alt so they wouldn't collide when every input column, even empty ones, are created in the XML tree
- 12:42 AM Revision 2027: bin/map: If out_is_db, in debug mode, print each row's XML tree and each value that it's putting
- 12:36 AM Revision 2026: bin/map: If out_is_db, in debug mode, print the template XML tree used to insert a sample row into the DB
04/30/2012
- 11:57 PM Revision 2025: bin/map: map_table(): When translating mappings to column indexes, use appends to a new list instead of deletions from an existing list to simplify the algorithm
- 11:20 PM Revision 2024: union: Omit mappings that are mapped *to* in the input map, in addition to mappings that were overridden. This prevents multiple outputs being created for both the renamed and original mappings, causing duplicate output nodes when one XML tree is used for all rows.
- 11:18 PM Revision 2023: union: Omit mappings that are mapped *to* in the input map, in addition to mappings that were overridden. This prevents multiple outputs being created for both the renamed and original mappings, causing duplicate output nodes when one XML tree is used for all rows.
- 11:17 PM Revision 2022: input.Makefile: Maps building: Via maps cleanup: subtract: Include comment column so commented mappings are never removed
- 11:07 PM Revision 2021: subtract: Support "ragged rows" that have fewer columns than the specified column numbers
- 11:06 PM Revision 2020: util.py: list_subset(): Added default param to specify the value to use for invalid indexes (if any)
- 09:44 AM Revision 2019: mappings/VegX-VegBIEN.stems.csv: Mappings with multiple inputs for the same output: Use _alt, etc. to map the multiple inputs to different places in the XML tree, so that when using a pregenerated tree, the empty leaves for each input will not collide with each other
- 09:20 AM Revision 2018: mappings/VegX-VegBIEN.stems.csv: Changed XPath references (using "$") to XML function references using _ref where needed to make them work even on a pre-made XML tree used by all rows
- 09:13 AM Revision 2017: xml_func.py: Added _ref to retrieve a value from another XML node
- 06:12 AM Revision 2016: xml_func.py: Made all functions take a 2nd node param, which contains the func node itself
- 04:15 AM Revision 2015: bin/map: If outputting to a DB, also create output XML elements for NULL input values. This will help with the transition to using the same XML tree for all rows.
- 04:09 AM Revision 2014: xml_func.py: _label: return None on empty input
- 03:46 AM Revision 2013: mappings/VegX-VegBIEN.stems.csv: Added _collapse around subtrees that need to be removed if they are created around a NULL value
- 03:40 AM Revision 2012: xml_func.py: Added _collapse to collapse a subtree if the "value" element in it is NULL
- 01:44 AM Revision 2011: schemas/vegbien.sql: definedvalue: Made definedvalue nullable so that each row of a datasource can have a uniform structure in VegBIEN, and to support reusing the same XML DOM tree for each row
- 01:11 AM Revision 2010: xpath.py: Added is_xpath()
- 01:10 AM Revision 2009: xml_dom.py: set_value(): If value is None and node is Element, remove value node entirely instead of setting node's value to None
- 01:02 AM Revision 2008: xml_dom.py: Added value_node(). Use new value_node() in value() and set_value(). set_value(): If the node already has a value node, reuse it instead of appending a new value node.
- 12:35 AM Revision 2007: xpath.py: put_obj(): Return the id_attr_node using get_1() because it should only be one node
- 12:30 AM Revision 2006: xml_func.py: _simplifyPath: Also treat the elem as empty if the required node exists but is empty
- 12:04 AM Revision 2005: db_xml.py: put_table(): Added part of put() code that should be common to both functions
04/27/2012
- 06:16 PM Revision 2004: xpath.py: put_obj(): Return a tuple of the inserted node and the id attr node
- 06:13 PM Revision 2003: xpath.py: set_id(): When creating the id_path, use obj() (which deepcopy()s the entire path) because it prevents pointers w/o targets
- 06:05 PM Revision 2002: xpath.py: set_id(): When creating the id_path, deepcopy() the id_elem because its keys will change in the main copy
- 05:47 PM Revision 2001: xpath.py: set_id(): Return the path to the ID attr, which can be used to change the ID
- 05:25 PM Revision 2000: xpath.py: put_obj(): Return the inserted node so it can be used to change the inserted value
- 05:08 PM Revision 1999: main Makefile: Maps validation: Fixed bug where there would be infinite recursion with the Maps validation section before the Subdir forwarding section (it's unknown why this is necessary)
04/26/2012
- 07:12 PM Revision 1998: db_xml.py: put_table(): Added commit param to specify whether to commit after each query
- 06:55 PM Revision 1997: bin/map: in_is_db: by_col: Use new put_table() (defined but not implemented yet)
- 06:54 PM Revision 1996: db_xml.py: Added put_table() (without implementation)
- 06:52 PM Revision 1995: xml_func.py: strip(): Remove _ignore XML funcs completely instead of replacing them with their values
- 06:26 PM Revision 1994: bin/map: in_is_db: by_col: Prefix each input column name by "$"
- 06:11 PM Revision 1993: bin/map: in_is_db: by_col: Strip off XML functions
- 06:09 PM Revision 1992: xml_func.py: Added strip(). pop_value(): Support custom name of value param.
- 05:44 PM Revision 1991: bin/map: in_is_db: by_col: Create XML tree of sample row, with the input column names as the values. This tree will guide the sequencing and creation of the column-based queries.
- 05:43 PM Revision 1990: input.Makefile: use_staged env var: defaults to on if by_col is on
- 05:00 PM Revision 1989: bin/map: Only turn on by_col optimization if mapping to same DB, rather than requiring each place that checks by_col to also check whether mapping to same DB
04/24/2012
- 06:32 PM Revision 1988: input.Makefile: Testing: Don't abort tester if only staging test fails, in case staging table missing
- 06:25 PM Revision 1987: input.Makefile: Testing: When cleaning up test outputs, remove everything that doesn't end in .ref
- 06:11 PM Revision 1986: input.Makefile: Testing: Added test/import.%.staging.out test to test the staging tables. Sources: cat: Updated Usage comment to include the "inputs/<datasrc>/" prefix the user would need to add when running make.
- 05:33 PM Revision 1985: bin/map: Fixed bug where mapping to same DB wouldn't work because by-column optimization wasn't implemented yet, by turning it off by default and allowing it to be enabled with an env var
- 05:25 PM Revision 1984: bin/map: DB inputs: Use by-column optimization if mapping to same DB (with skeleton code for optimization's implementation)
- 05:12 PM Revision 1983: input.Makefile: Mapping: Use the staging tables instead of any flat files if use_staged is specified
- 05:10 PM Revision 1982: bin/map: Support custom schema name. Support input table/schema override via env vars, in case the map spreadsheet was written for a different input format.
- 05:01 PM Revision 1981: sql.py: qual_name(): Fixed bugs where esc_name() nested func couldn't have same name as outer func, and esc_name() needed to be invoked without the module name because it's in the same module. select(): Support already-escaped table names.
- 04:16 PM Revision 1980: main Makefile: $(psqlAsAdmin): Tell sudo to preserve env vars so PGOPTIONS is passed to psql
- 03:33 PM Revision 1979: root map: Fill in defaults for inputs from VegBIEN, as well as outputs to it
- 02:59 PM Revision 1978: disown_all: Updated to use main function, local vars, $self, etc. like other bash scripts run using "."
- 02:55 PM Revision 1977: vegbien_dest: Fixed bug where it would give a usage error if run from a makefile rule, because the BASH_LINENO would be 0, by also checking if ${BASH_ARGV[0]} is ${BASH_SOURCE[0]}
- 02:28 PM Revision 1976: postgres_vegbien: Fixed bug where interpreter did not match vegbien_dest's new required interpreter of /bin/bash
- 02:23 PM Revision 1975: vegbien_dest: Changed interpreter to /bin/bash. Removed comment that it requires var bien_password.
- 02:20 PM Revision 1974: postgres_vegbien: Removed no longer needed retrieval of bien_password
- 02:20 PM Revision 1973: vegbien_dest: Get bien_password by searching relative to $self, which we now have a way to get in a bash script (${BASH_SOURCE[0]}), rather than requiring the caller to set it. Provide usage error if run without initial ".".
- 02:12 PM Revision 1972: input.Makefile: Staging tables: import/install-%: Use new quiet option to determine whether to tee output to terminal. Don't use log option because that's always set to true except in test mode, which doesn't apply to installs.
- 02:12 PM Revision 1971: input.Makefile: Staging tables: import/install-%: Use new quiet option to determine whether to tee output to terminal. Don't use log option because that's always set to true except in test mode, which doesn't apply to installs.
- 01:56 PM Revision 1970: main Makefile: PostgreSQL: Edit /etc/phppgadmin/apache.conf to replace "deny from all" with "allow from all", instead of uncommenting an "allow from all" that may not be there
- 01:35 PM Revision 1969: input.Makefile: Sources: Fixed bug where cat was defined before $(tables), by moving Sources after Existing maps discovery and putting just $(inputFiles) and $(dbExport) from Sources at the beginning of Existing maps discovery
- 01:05 PM Revision 1968: sql.py: Made truncate(), tables(), empty_db() schema-aware. Added qual_name(). tables(): Added option to filter tables by a LIKE pattern.
- 12:34 PM Revision 1967: main Makefile: VegBIEN DB: Install public schema in a separate step, so that it can be dropped without dropping the entire DB (which also contains staging tables that shouldn't be dropped when there is a schema change). Added schemas/install, schemas/uninstall, implicit schemas/reinstall to manage the public schema separately from the rest of the DB. Moved Subdir forwarding to the bottom so overridden targets are not forwarded. README.TXT: Since `make reinstall_db` would drop the entire DB, tell user to run new `make schemas/reinstall` instead to reinstall (main) DB from schema.
- 12:30 PM Revision 1966: schemas/postgresql.Mac.conf: Set unix_socket_directory to the new dir it seems to be using, which is now /tmp
- 11:43 AM Revision 1965: csv2db: Fixed bug where extra columns were not truncated in INSERT mode. Replace empty column names with the column # to avoid errors with CSVs that have trailing ","s, etc.
- 11:41 AM Revision 1964: streams.py: StreamIter: Define readline() as a separate method so it can be overridden, and all calls to self.next() will use the overridden readline(). This fixes a bug in ProgressInputStream where incremental counts would not be displayed and it would end with "not all input read" if the StreamIter interface was used instead of readline().
04/23/2012
- 09:57 PM Revision 1963: csv2db: Fall back to manually inserting each row (autodetecting the encoding for each field) if COPY FROM doesn't work
- 09:56 PM Revision 1962: streams.py: FilterStream: Inherit from StreamIter so that all descendants automatically have StreamIter functionality
- 09:42 PM Revision 1961: sql.py: insert(): Support using the default value for columns designated with the special value sql.default
- 09:21 PM Revision 1960: sql.py: insert(): Support rows that are just a list of values, with no columns. Support already-escaped table names.
- 08:54 PM Revision 1959: strings.py: Added contains_any()
- 08:54 PM Revision 1958: csvs.py: reader_and_header(): Use make_reader()
- 08:07 PM Revision 1957: Added reinstall_all to reinstall all inputs at once
- 08:06 PM Revision 1956: with_all: Documented that it must be run from the root svn directory
- 08:05 PM Revision 1955: input.Makefile: Staging tables: import/install-%: Only install staging table if input contains only CSV sources. Changed $(isXml) to $(isCsv) (negated) everywhere because rules almost always only run something if input contains only CSV sources, rather than if input contains XML sources.
- 07:21 PM Revision 1954: input.Makefile: Staging tables: import/install-%: Output load status to log file if log option is set
- 07:00 PM Revision 1953: Scripts that are meant to be run in the calling shell: Fixed bug where running the script inside another script would make the script think it was being run as a program, and abort with a usage error
- 06:56 PM Revision 1952: Scripts that are meant to be run in the calling shell: Fixed bug where running the script as a program (without initial ".") wouldn't be able to call return in something that was not a function. Converted all code to a <script_name>_main method so that return would work properly again. Converted all variables to local variables.
- 06:38 PM Revision 1951: env_password: return instead of exit if password not yet stored, in case user is running it from a shell without the initial "-" argument. (This would be the case if the user is just testing out the script, instead of using a command that env_password directs them to run.)
- 05:43 PM Revision 1950: env_password: Use ${BASH_SOURCE[0]} for $self and $self for $0. return instead of exit on usage error in case user is running it from a shell.
- 05:36 PM Revision 1949: stop_imports: Use ${BASH_SOURCE[0]} for $self and $self for $0
- 05:36 PM Revision 1948: import_all: Use new with_all. Use ${BASH_SOURCE[0]} for $self and $self for $0.
- 05:34 PM Revision 1947: Added with_all to run a make target on all inputs at once
- 05:05 PM Revision 1946: Made row #s 1-based to the user to match up with the staging table row #s
- 04:59 PM Revision 1945: bin/map: Fixed bug where limit passed to sql.select() was end instead of the # rows, causing extra rows to be fetched when start > 0. Documented that row #s start with 0.
- 04:19 PM Revision 1944: Removed no longer needed csv2ddl
- 04:19 PM Revision 1943: input.Makefile: Staging tables: import/install-%: Use new csv2db instead of csv2ddl/$(psqlAsBien), because it handles translating encodings properly
- 04:14 PM Revision 1942: Added csv2db to load a command's CSV output stream into a PostgreSQL table
04/21/2012
- 09:32 PM Revision 1941: schemas/postgresql.Mac.conf: Set unix_socket_directory to the appropriate Mac OS X dir, since otherwise, the socket is apparently not created and `make reinstall_db` doesn't work
- 09:30 PM Revision 1940: main Makefile: VegBIEN DB: db: Set LC_COLLATE and LC_CTYPE explicitly, to make it easier to change them
- 09:29 PM Revision 1939: Added ProgressInputStream
- 09:28 PM Revision 1938: exc.py: print_ex(): Added plain option to leave out traceback
- 06:48 PM Revision 1937: main Makefile: VegBIEN DB: db: Use template0 to allow encodings other than UTF-8. Because template0 doesn't have plpgsql on PostgreSQL before 9.x, add "CREATE PROCEDURAL LANGUAGE plpgsql;" manually in schemas/vegbien.sql.make, and filter it back out on PostgreSQL after 9.x using db_dump_localize.
- 06:39 PM Revision 1936: PostgreSQL-MySQL.csv: Remove "CREATE PROCEDURAL LANGUAGE" statements
- 06:36 PM Revision 1935: Added db_dump_localize to translate a PostgreSQL DB dump for the local server's version
- 06:32 PM Revision 1934: Added db_dump_localize to translate a PostgreSQL DB dump for the local server's version
- 03:42 PM Revision 1933: vegbien_dest: Added option to override the prefix of the created vars
- 03:35 PM Revision 1932: schemas/vegbien.sql.make: Fixed bug where data sources' schemas were also exported by exporting only the public schema. Note that this also removes the "CREATE OR REPLACE PROCEDURAL LANGUAGE plpgsql" statement, so that it doesn't have to be filtered out with `grep -v`.
- 03:19 PM Revision 1931: input.Makefile: input.Makefile: Use `$(catSrcs)|` instead of $(withCatSrcs) where possible
- 03:00 PM Revision 1930: sql.py: pkey(): Fixed bug where results were not being cached because the rows hadn't been explicitly fetched, by having DbConn.DbCursor.execute() fetch all rows if the rowcount is 0 and it's not an insert statement. DbConn.DbCursor: Made _is_insert an attribute rather than a method, which is set as soon as the query is known. Added consume_rows(). Moved Result retrieval section above Database connections because it's used by DbConn.
- 02:28 PM Revision 1929: sql.py: pkey(): Fixed bug where queries were not being cached. Use select() instead of run_query() so that caching is automatically turned on and table names are automatically escaped.
- 01:37 PM Revision 1928: streams.py: Added LineCountInputStream, which is faster than LineCountStream for input streams. Added InputStreamsOnlyException and raise it in all *InputStream classes' write() methods.
- 01:22 PM Revision 1927: sql.py: DbConn: For non-cacheable queries, use a plain cursor() instead of a DbCursor to avoid the overhead of saving the result and wrapping the cursor
04/20/2012
- 05:20 PM Revision 1926: Moved db_config_names from bin/map to sql.py so it can be used by other scripts as well
- 04:52 PM Revision 1925: csv2ddl: Also print a COPY FROM statement
- 04:47 PM Revision 1924: input.Makefile: Fixed bug where input type was considered to be different things if both $(inputFiles) and $(dbExport) are non-empty. Now, $(inputFiles) takes precedence so that the presence of any input files will cause a DB dump to be ignored. This ensures that a (slower) input DB is not used over a (faster) flat file.
- 04:21 PM Revision 1923: csvs.py: stream_info(): Added parse_header option. reader_and_header(): Use stream_info()'s new parse_header option.
- 03:53 PM Revision 1922: csv2ddl: Renamed schema name env var from datasrc to schema to reflect what it is, and to make the script general beyond importing inputs
- 03:32 PM Revision 1921: input.Makefile: Moved Installation, Staging tables after Existing maps discovery because they depend on it. Staging tables: Create a staging table for each table a map spreadsheet is available for. Put double quotes around the schema name so its case is preserved.
- 03:29 PM Revision 1920: Added csv2ddl to make a PostgreSQL CREATE TABLE statement from a CSV header
- 03:28 PM Revision 1919: sql.py: Input validation: Moved section after Database connections because some of its functions require a connection. Added esc_name_by_module() and esc_name_by_engine(), and use esc_name_by_module() in esc_name().
- 02:18 PM Revision 1918: input.Makefile: Installation: Create a schema for the datasource in VegBIEN as part of the installation process. This will be used to hold staging tables.
- 01:57 PM Revision 1917: input.Makefile: Changed install, uninstall to depend on src/install, src/uninstall targets, which in turn depend on db, rm_db. This will allow us to add additional install actions for all input types.
04/19/2012
- 07:17 PM Revision 1916: sql.py: DbConn: Cache the constructed CacheCursor itself, rather than the dict that's used to create it
- 07:06 PM Revision 1915: sql.py: pkey(): Changed to use the connection-wide caching mechanism rather than its own custom cache. DbConn.__getstate__(): Don't pickle the debug callback.
- 07:00 PM Revision 1914: sql.py: DbConn: Added is_cached(). run_query(): Use new DbConn.is_cached() to avoid creating a savepoint if the query is cached.
- 06:52 PM Revision 1913: sql.py: DbConn: Also cache cursor.description
- 06:50 PM Revision 1912: sql.py: DbConn: Cache query results as a dict subset of the cursor's key attributes, so that additional attributes can easily be cached by adding them to the subset list
- 06:48 PM Revision 1911: dicts.py: Added AttrsDictView
- 06:47 PM Revision 1910: util.py: NamedTuple.__iter__(): Removed unnecessary **attrs param
- 06:30 PM Revision 1909: sql.py: _query_lookup(): Fixed bug where params was cast to a tuple, even though it could also be a dict. index_cols(): Changed to use the connection-wide caching mechanism rather than its own custom cache.
- 06:28 PM Revision 1908: util.py: NamedTuple: Made it usable as a hashable dict (with string keys) by adding __iter__() and __getitem__()
- 06:27 PM Revision 1907: dicts.py: Added make_hashable()
04/17/2012
- 09:59 PM Revision 1906: sql.py: DbConn: Only cache exceptions for inserts since they are not idempotent, but an invalid insert will always be invalid. If a cached result in an exception, re-raise it in a separate method other than the constructor to ensure that the cursor object is still created, and that its query instance var is set.
- 09:11 PM Revision 1905: sql.py: insert(): Cache insert queries by default. This works because any DuplicateKeyException, etc. would be cached as well. This saves many inserts for rows that we already know are in the database.
- 09:06 PM Revision 1904: sql.py: DbConn.run_query(): Cache exceptions raised by queries as well
- 08:48 PM Revision 1903: sql.py: DbConn.run_query(): When debug logging, label queries with their cache status (hit/miss/non-cacheable)
- 08:25 PM Revision 1902: sql.py: DbConn.run_query(): Also debug-log queries that produce exceptions
- 08:18 PM Revision 1901: sql.py: DbConn: Allow creator to provide a log function to call on debug messages, instead of using stderr directly
- 08:01 PM Revision 1900: bin/map: Pass debug mode to DbConn so that SQL query debugging works again
- 07:49 PM Revision 1899: sql.py: DbConn: DbCursor: Fixed bug where caching was always turned on, by passing the cacheable setting to it from run_query(). Turned caching back on (uncommented it) since it's now working.
- 07:21 PM Revision 1898: bin/map: map_rows()/map_table(): Pass kw_args to process_rows() so rows_start can be specified when using them. DB inputs: Skip the pre-start rows in the SQL query itself, so that they don't need to be iterated over by the cursor in the main loop.
- 07:07 PM Revision 1897: bin/map: Fixed bug introduced in r1718 where the row # would not be incremented if i < start, causing an semi-infinite loop that only ended when the input rows were exhausted. process_rows(): Added optional rows_start parameter to use if the input rows already have the pre-start rows skipped.
- 05:49 PM Revision 1896: input.Makefile: Sources: cat: Changed Usage message to use "--silent" make option
- 05:45 PM Revision 1895: input.Makefile: Sources: cat: Added Usage message with instructions for removing echoed make commands
- 05:17 PM Revision 1894: run_*query(): Fixed bug where INSERTs, etc. were cached by making callers (such as select()) explicitly turn on caching. DbConn.run_query(): Fixed bug where cur.mogrify() was not supported under MySQL by making the cache key a tuple of the unmogrified query and its params instead of the mogrified string query. CacheCursor: Store attributes of the original cursor that we use, such as query and rowcount.
- 04:38 PM Revision 1893: sql.py: Made row() and value() cache the result by fetching all rows before returning the first row
- 04:37 PM Revision 1892: iters.py: Added func_iter() and consume_iter()
- 04:11 PM Revision 1891: sql.py: Cache the results of queries (when all rows are read)
- 03:48 PM Revision 1890: Proxy.py: Fixed infinite recursion bug by removing __setattr__() (which prevents the class and subclasses from storing instance variables using "self." syntax)
Also available in: Atom