xml_func.py: strip(): Remove _ignore XML funcs completely instead of replacing them with their values
bin/map: in_is_db: by_col: Prefix each input column name by "$"
bin/map: in_is_db: by_col: Strip off XML functions
xml_func.py: Added strip(). pop_value(): Support custom name of value param.
bin/map: in_is_db: by_col: Create XML tree of sample row, with the input column names as the values. This tree will guide the sequencing and creation of the column-based queries.
input.Makefile: use_staged env var: defaults to on if by_col is on
bin/map: Only turn on by_col optimization if mapping to same DB, rather than requiring each place that checks by_col to also check whether mapping to same DB
input.Makefile: Testing: Don't abort tester if only staging test fails, in case staging table missing
input.Makefile: Testing: When cleaning up test outputs, remove everything that doesn't end in .ref
input.Makefile: Testing: Added test/import.%.staging.out test to test the staging tables. Sources: cat: Updated Usage comment to include the "inputs/<datasrc>/" prefix the user would need to add when running make.
bin/map: Fixed bug where mapping to same DB wouldn't work because by-column optimization wasn't implemented yet, by turning it off by default and allowing it to be enabled with an env var
bin/map: DB inputs: Use by-column optimization if mapping to same DB (with skeleton code for optimization's implementation)
input.Makefile: Mapping: Use the staging tables instead of any flat files if use_staged is specified
bin/map: Support custom schema name. Support input table/schema override via env vars, in case the map spreadsheet was written for a different input format.
sql.py: qual_name(): Fixed bugs where esc_name() nested func couldn't have same name as outer func, and esc_name() needed to be invoked without the module name because it's in the same module. select(): Support already-escaped table names.
main Makefile: $(psqlAsAdmin): Tell sudo to preserve env vars so PGOPTIONS is passed to psql
root map: Fill in defaults for inputs from VegBIEN, as well as outputs to it
disown_all: Updated to use main function, local vars, $self, etc. like other bash scripts run using "."
vegbien_dest: Fixed bug where it would give a usage error if run from a makefile rule, because the BASH_LINENO would be 0, by also checking if ${BASH_ARGV0} is ${BASH_SOURCE0}
postgres_vegbien: Fixed bug where interpreter did not match vegbien_dest's new required interpreter of /bin/bash
vegbien_dest: Changed interpreter to /bin/bash. Removed comment that it requires var bien_password.
postgres_vegbien: Removed no longer needed retrieval of bien_password
vegbien_dest: Get bien_password by searching relative to $self, which we now have a way to get in a bash script (${BASH_SOURCE0}), rather than requiring the caller to set it. Provide usage error if run without initial ".".
input.Makefile: Staging tables: import/install-%: Use new quiet option to determine whether to tee output to terminal. Don't use log option because that's always set to true except in test mode, which doesn't apply to installs.
main Makefile: PostgreSQL: Edit /etc/phppgadmin/apache.conf to replace "deny from all" with "allow from all", instead of uncommenting an "allow from all" that may not be there
input.Makefile: Sources: Fixed bug where cat was defined before $(tables), by moving Sources after Existing maps discovery and putting just $(inputFiles) and $(dbExport) from Sources at the beginning of Existing maps discovery
sql.py: Made truncate(), tables(), empty_db() schema-aware. Added qual_name(). tables(): Added option to filter tables by a LIKE pattern.
main Makefile: VegBIEN DB: Install public schema in a separate step, so that it can be dropped without dropping the entire DB (which also contains staging tables that shouldn't be dropped when there is a schema change). Added schemas/install, schemas/uninstall, implicit schemas/reinstall to manage the public schema separately from the rest of the DB. Moved Subdir forwarding to the bottom so overridden targets are not forwarded. README.TXT: Since `make reinstall_db` would drop the entire DB, tell user to run new `make schemas/reinstall` instead to reinstall (main) DB from schema.
schemas/postgresql.Mac.conf: Set unix_socket_directory to the new dir it seems to be using, which is now /tmp
csv2db: Fixed bug where extra columns were not truncated in INSERT mode. Replace empty column names with the column # to avoid errors with CSVs that have trailing ","s, etc.
streams.py: StreamIter: Define readline() as a separate method so it can be overridden, and all calls to self.next() will use the overridden readline(). This fixes a bug in ProgressInputStream where incremental counts would not be displayed and it would end with "not all input read" if the StreamIter interface was used instead of readline().
csv2db: Fall back to manually inserting each row (autodetecting the encoding for each field) if COPY FROM doesn't work
streams.py: FilterStream: Inherit from StreamIter so that all descendants automatically have StreamIter functionality
sql.py: insert(): Support using the default value for columns designated with the special value sql.default
sql.py: insert(): Support rows that are just a list of values, with no columns. Support already-escaped table names.
strings.py: Added contains_any()
csvs.py: reader_and_header(): Use make_reader()
Added reinstall_all to reinstall all inputs at once
with_all: Documented that it must be run from the root svn directory
input.Makefile: Staging tables: import/install-%: Only install staging table if input contains only CSV sources. Changed $(isXml) to $(isCsv) (negated) everywhere because rules almost always only run something if input contains only CSV sources, rather than if input contains XML sources.
input.Makefile: Staging tables: import/install-%: Output load status to log file if log option is set
Scripts that are meant to be run in the calling shell: Fixed bug where running the script inside another script would make the script think it was being run as a program, and abort with a usage error
Scripts that are meant to be run in the calling shell: Fixed bug where running the script as a program (without initial ".") wouldn't be able to call return in something that was not a function. Converted all code to a <script_name>_main method so that return would work properly again. Converted all variables to local variables.
env_password: return instead of exit if password not yet stored, in case user is running it from a shell without the initial "-" argument. (This would be the case if the user is just testing out the script, instead of using a command that env_password directs them to run.)
env_password: Use ${BASH_SOURCE0} for $self and $self for $0. return instead of exit on usage error in case user is running it from a shell.
stop_imports: Use ${BASH_SOURCE0} for $self and $self for $0
import_all: Use new with_all. Use ${BASH_SOURCE0} for $self and $self for $0.
Added with_all to run a make target on all inputs at once
Made row #s 1-based to the user to match up with the staging table row #s
bin/map: Fixed bug where limit passed to sql.select() was end instead of the # rows, causing extra rows to be fetched when start > 0. Documented that row #s start with 0.
Removed no longer needed csv2ddl
input.Makefile: Staging tables: import/install-%: Use new csv2db instead of csv2ddl/$(psqlAsBien), because it handles translating encodings properly
Added csv2db to load a command's CSV output stream into a PostgreSQL table
schemas/postgresql.Mac.conf: Set unix_socket_directory to the appropriate Mac OS X dir, since otherwise, the socket is apparently not created and `make reinstall_db` doesn't work
main Makefile: VegBIEN DB: db: Set LC_COLLATE and LC_CTYPE explicitly, to make it easier to change them
Added ProgressInputStream
exc.py: print_ex(): Added plain option to leave out traceback
main Makefile: VegBIEN DB: db: Use template0 to allow encodings other than UTF-8. Because template0 doesn't have plpgsql on PostgreSQL before 9.x, add "CREATE PROCEDURAL LANGUAGE plpgsql;" manually in schemas/vegbien.sql.make, and filter it back out on PostgreSQL after 9.x using db_dump_localize.
PostgreSQL-MySQL.csv: Remove "CREATE PROCEDURAL LANGUAGE" statements
Added db_dump_localize to translate a PostgreSQL DB dump for the local server's version
vegbien_dest: Added option to override the prefix of the created vars
schemas/vegbien.sql.make: Fixed bug where data sources' schemas were also exported by exporting only the public schema. Note that this also removes the "CREATE OR REPLACE PROCEDURAL LANGUAGE plpgsql" statement, so that it doesn't have to be filtered out with `grep -v`.
input.Makefile: input.Makefile: Use `$(catSrcs)|` instead of $(withCatSrcs) where possible
sql.py: pkey(): Fixed bug where results were not being cached because the rows hadn't been explicitly fetched, by having DbConn.DbCursor.execute() fetch all rows if the rowcount is 0 and it's not an insert statement. DbConn.DbCursor: Made _is_insert an attribute rather than a method, which is set as soon as the query is known. Added consume_rows(). Moved Result retrieval section above Database connections because it's used by DbConn.
sql.py: pkey(): Fixed bug where queries were not being cached. Use select() instead of run_query() so that caching is automatically turned on and table names are automatically escaped.
streams.py: Added LineCountInputStream, which is faster than LineCountStream for input streams. Added InputStreamsOnlyException and raise it in all *InputStream classes' write() methods.
sql.py: DbConn: For non-cacheable queries, use a plain cursor() instead of a DbCursor to avoid the overhead of saving the result and wrapping the cursor
Moved db_config_names from bin/map to sql.py so it can be used by other scripts as well
csv2ddl: Also print a COPY FROM statement
input.Makefile: Fixed bug where input type was considered to be different things if both $(inputFiles) and $(dbExport) are non-empty. Now, $(inputFiles) takes precedence so that the presence of any input files will cause a DB dump to be ignored. This ensures that a (slower) input DB is not used over a (faster) flat file.
csvs.py: stream_info(): Added parse_header option. reader_and_header(): Use stream_info()'s new parse_header option.
csv2ddl: Renamed schema name env var from datasrc to schema to reflect what it is, and to make the script general beyond importing inputs
input.Makefile: Moved Installation, Staging tables after Existing maps discovery because they depend on it. Staging tables: Create a staging table for each table a map spreadsheet is available for. Put double quotes around the schema name so its case is preserved.
Added csv2ddl to make a PostgreSQL CREATE TABLE statement from a CSV header
sql.py: Input validation: Moved section after Database connections because some of its functions require a connection. Added esc_name_by_module() and esc_name_by_engine(), and use esc_name_by_module() in esc_name().
input.Makefile: Installation: Create a schema for the datasource in VegBIEN as part of the installation process. This will be used to hold staging tables.
input.Makefile: Changed install, uninstall to depend on src/install, src/uninstall targets, which in turn depend on db, rm_db. This will allow us to add additional install actions for all input types.
sql.py: DbConn: Cache the constructed CacheCursor itself, rather than the dict that's used to create it
sql.py: pkey(): Changed to use the connection-wide caching mechanism rather than its own custom cache. DbConn.__getstate__(): Don't pickle the debug callback.
sql.py: DbConn: Added is_cached(). run_query(): Use new DbConn.is_cached() to avoid creating a savepoint if the query is cached.
sql.py: DbConn: Also cache cursor.description
sql.py: DbConn: Cache query results as a dict subset of the cursor's key attributes, so that additional attributes can easily be cached by adding them to the subset list
dicts.py: Added AttrsDictView
util.py: NamedTuple.__iter__(): Removed unnecessary **attrs param
sql.py: _query_lookup(): Fixed bug where params was cast to a tuple, even though it could also be a dict. index_cols(): Changed to use the connection-wide caching mechanism rather than its own custom cache.
util.py: NamedTuple: Made it usable as a hashable dict (with string keys) by adding iter() and getitem()
dicts.py: Added make_hashable()
sql.py: DbConn: Only cache exceptions for inserts since they are not idempotent, but an invalid insert will always be invalid. If a cached result in an exception, re-raise it in a separate method other than the constructor to ensure that the cursor object is still created, and that its query instance var is set.
sql.py: insert(): Cache insert queries by default. This works because any DuplicateKeyException, etc. would be cached as well. This saves many inserts for rows that we already know are in the database.
sql.py: DbConn.run_query(): Cache exceptions raised by queries as well
sql.py: DbConn.run_query(): When debug logging, label queries with their cache status (hit/miss/non-cacheable)
sql.py: DbConn.run_query(): Also debug-log queries that produce exceptions
sql.py: DbConn: Allow creator to provide a log function to call on debug messages, instead of using stderr directly
bin/map: Pass debug mode to DbConn so that SQL query debugging works again
sql.py: DbConn: DbCursor: Fixed bug where caching was always turned on, by passing the cacheable setting to it from run_query(). Turned caching back on (uncommented it) since it's now working.
bin/map: map_rows()/map_table(): Pass kw_args to process_rows() so rows_start can be specified when using them. DB inputs: Skip the pre-start rows in the SQL query itself, so that they don't need to be iterated over by the cursor in the main loop.
bin/map: Fixed bug introduced in r1718 where the row # would not be incremented if i < start, causing an semi-infinite loop that only ended when the input rows were exhausted. process_rows(): Added optional rows_start parameter to use if the input rows already have the pre-start rows skipped.
input.Makefile: Sources: cat: Changed Usage message to use "--silent" make option