/ - Changes - BIEN 3 - NCEAS Projects

root @ 1955

#	Date	Author	Comment
1955	04/23/2012 08:05 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables: import/install-%: Only install staging table if input contains only CSV sources. Changed $(isXml) to $(isCsv) (negated) everywhere because rules almost always only run something if input contains only CSV sources, rather than if input contains XML sources.
1954	04/23/2012 07:21 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables: import/install-%: Output load status to log file if log option is set
1953	04/23/2012 07:00 PM	Aaron Marcuse-Kubitza	Scripts that are meant to be run in the calling shell: Fixed bug where running the script inside another script would make the script think it was being run as a program, and abort with a usage error
1952	04/23/2012 06:56 PM	Aaron Marcuse-Kubitza	Scripts that are meant to be run in the calling shell: Fixed bug where running the script as a program (without initial ".") wouldn't be able to call return in something that was not a function. Converted all code to a <script_name>_main method so that return would work properly again. Converted all variables to local variables.
1951	04/23/2012 06:38 PM	Aaron Marcuse-Kubitza	env_password: return instead of exit if password not yet stored, in case user is running it from a shell without the initial "-" argument. (This would be the case if the user is just testing out the script, instead of using a command that env_password directs them to run.)
1950	04/23/2012 05:43 PM	Aaron Marcuse-Kubitza	env_password: Use ${BASH_SOURCE⁰} for $self and $self for $0. return instead of exit on usage error in case user is running it from a shell.
1949	04/23/2012 05:36 PM	Aaron Marcuse-Kubitza	stop_imports: Use ${BASH_SOURCE⁰} for $self and $self for $0
1948	04/23/2012 05:36 PM	Aaron Marcuse-Kubitza	import_all: Use new with_all. Use ${BASH_SOURCE⁰} for $self and $self for $0.
1947	04/23/2012 05:34 PM	Aaron Marcuse-Kubitza	Added with_all to run a make target on all inputs at once
1946	04/23/2012 05:05 PM	Aaron Marcuse-Kubitza	Made row #s 1-based to the user to match up with the staging table row #s
1945	04/23/2012 04:59 PM	Aaron Marcuse-Kubitza	bin/map: Fixed bug where limit passed to sql.select() was end instead of the # rows, causing extra rows to be fetched when start > 0. Documented that row #s start with 0.
1944	04/23/2012 04:19 PM	Aaron Marcuse-Kubitza	Removed no longer needed csv2ddl
1943	04/23/2012 04:19 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables: import/install-%: Use new csv2db instead of csv2ddl/$(psqlAsBien), because it handles translating encodings properly
1942	04/23/2012 04:14 PM	Aaron Marcuse-Kubitza	Added csv2db to load a command's CSV output stream into a PostgreSQL table
1941	04/21/2012 09:32 PM	Aaron Marcuse-Kubitza	schemas/postgresql.Mac.conf: Set unix_socket_directory to the appropriate Mac OS X dir, since otherwise, the socket is apparently not created and `make reinstall_db` doesn't work
1940	04/21/2012 09:30 PM	Aaron Marcuse-Kubitza	main Makefile: VegBIEN DB: db: Set LC_COLLATE and LC_CTYPE explicitly, to make it easier to change them
1939	04/21/2012 09:29 PM	Aaron Marcuse-Kubitza	Added ProgressInputStream
1938	04/21/2012 09:28 PM	Aaron Marcuse-Kubitza	exc.py: print_ex(): Added plain option to leave out traceback
1937	04/21/2012 06:48 PM	Aaron Marcuse-Kubitza	main Makefile: VegBIEN DB: db: Use template0 to allow encodings other than UTF-8. Because template0 doesn't have plpgsql on PostgreSQL before 9.x, add "CREATE PROCEDURAL LANGUAGE plpgsql;" manually in schemas/vegbien.sql.make, and filter it back out on PostgreSQL after 9.x using db_dump_localize.
1936	04/21/2012 06:39 PM	Aaron Marcuse-Kubitza	PostgreSQL-MySQL.csv: Remove "CREATE PROCEDURAL LANGUAGE" statements
1935	04/21/2012 06:36 PM	Aaron Marcuse-Kubitza	Added db_dump_localize to translate a PostgreSQL DB dump for the local server's version
1934	04/21/2012 06:32 PM	Aaron Marcuse-Kubitza	Added db_dump_localize to translate a PostgreSQL DB dump for the local server's version
1933	04/21/2012 03:42 PM	Aaron Marcuse-Kubitza	vegbien_dest: Added option to override the prefix of the created vars
1932	04/21/2012 03:35 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql.make: Fixed bug where data sources' schemas were also exported by exporting only the public schema. Note that this also removes the "CREATE OR REPLACE PROCEDURAL LANGUAGE plpgsql" statement, so that it doesn't have to be filtered out with `grep -v`.
1931	04/21/2012 03:19 PM	Aaron Marcuse-Kubitza	input.Makefile: input.Makefile: Use `$(catSrcs)\|` instead of $(withCatSrcs) where possible
1930	04/21/2012 03:00 PM	Aaron Marcuse-Kubitza	sql.py: pkey(): Fixed bug where results were not being cached because the rows hadn't been explicitly fetched, by having DbConn.DbCursor.execute() fetch all rows if the rowcount is 0 and it's not an insert statement. DbConn.DbCursor: Made _is_insert an attribute rather than a method, which is set as soon as the query is known. Added consume_rows(). Moved Result retrieval section above Database connections because it's used by DbConn.
1929	04/21/2012 02:28 PM	Aaron Marcuse-Kubitza	sql.py: pkey(): Fixed bug where queries were not being cached. Use select() instead of run_query() so that caching is automatically turned on and table names are automatically escaped.
1928	04/21/2012 01:37 PM	Aaron Marcuse-Kubitza	streams.py: Added LineCountInputStream, which is faster than LineCountStream for input streams. Added InputStreamsOnlyException and raise it in all *InputStream classes' write() methods.
1927	04/21/2012 01:22 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: For non-cacheable queries, use a plain cursor() instead of a DbCursor to avoid the overhead of saving the result and wrapping the cursor
1926	04/20/2012 05:20 PM	Aaron Marcuse-Kubitza	Moved db_config_names from bin/map to sql.py so it can be used by other scripts as well
1925	04/20/2012 04:52 PM	Aaron Marcuse-Kubitza	csv2ddl: Also print a COPY FROM statement
1924	04/20/2012 04:47 PM	Aaron Marcuse-Kubitza	input.Makefile: Fixed bug where input type was considered to be different things if both $(inputFiles) and $(dbExport) are non-empty. Now, $(inputFiles) takes precedence so that the presence of any input files will cause a DB dump to be ignored. This ensures that a (slower) input DB is not used over a (faster) flat file.
1923	04/20/2012 04:21 PM	Aaron Marcuse-Kubitza	csvs.py: stream_info(): Added parse_header option. reader_and_header(): Use stream_info()'s new parse_header option.
1922	04/20/2012 03:53 PM	Aaron Marcuse-Kubitza	csv2ddl: Renamed schema name env var from datasrc to schema to reflect what it is, and to make the script general beyond importing inputs
1921	04/20/2012 03:32 PM	Aaron Marcuse-Kubitza	input.Makefile: Moved Installation, Staging tables after Existing maps discovery because they depend on it. Staging tables: Create a staging table for each table a map spreadsheet is available for. Put double quotes around the schema name so its case is preserved.
1920	04/20/2012 03:29 PM	Aaron Marcuse-Kubitza	Added csv2ddl to make a PostgreSQL CREATE TABLE statement from a CSV header
1919	04/20/2012 03:28 PM	Aaron Marcuse-Kubitza	sql.py: Input validation: Moved section after Database connections because some of its functions require a connection. Added esc_name_by_module() and esc_name_by_engine(), and use esc_name_by_module() in esc_name().
1918	04/20/2012 02:18 PM	Aaron Marcuse-Kubitza	input.Makefile: Installation: Create a schema for the datasource in VegBIEN as part of the installation process. This will be used to hold staging tables.
1917	04/20/2012 01:57 PM	Aaron Marcuse-Kubitza	input.Makefile: Changed install, uninstall to depend on src/install, src/uninstall targets, which in turn depend on db, rm_db. This will allow us to add additional install actions for all input types.
1916	04/19/2012 07:17 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Cache the constructed CacheCursor itself, rather than the dict that's used to create it
1915	04/19/2012 07:06 PM	Aaron Marcuse-Kubitza	sql.py: pkey(): Changed to use the connection-wide caching mechanism rather than its own custom cache. DbConn.__getstate__(): Don't pickle the debug callback.
1914	04/19/2012 07:00 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Added is_cached(). run_query(): Use new DbConn.is_cached() to avoid creating a savepoint if the query is cached.
1913	04/19/2012 06:52 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Also cache cursor.description
1912	04/19/2012 06:50 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Cache query results as a dict subset of the cursor's key attributes, so that additional attributes can easily be cached by adding them to the subset list
1911	04/19/2012 06:48 PM	Aaron Marcuse-Kubitza	dicts.py: Added AttrsDictView
1910	04/19/2012 06:47 PM	Aaron Marcuse-Kubitza	util.py: NamedTuple.__iter__(): Removed unnecessary **attrs param
1909	04/19/2012 06:30 PM	Aaron Marcuse-Kubitza	sql.py: _query_lookup(): Fixed bug where params was cast to a tuple, even though it could also be a dict. index_cols(): Changed to use the connection-wide caching mechanism rather than its own custom cache.
1908	04/19/2012 06:28 PM	Aaron Marcuse-Kubitza	util.py: NamedTuple: Made it usable as a hashable dict (with string keys) by adding iter() and getitem()
1907	04/19/2012 06:27 PM	Aaron Marcuse-Kubitza	dicts.py: Added make_hashable()
1906	04/17/2012 09:59 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Only cache exceptions for inserts since they are not idempotent, but an invalid insert will always be invalid. If a cached result in an exception, re-raise it in a separate method other than the constructor to ensure that the cursor object is still created, and that its query instance var is set.
1905	04/17/2012 09:11 PM	Aaron Marcuse-Kubitza	sql.py: insert(): Cache insert queries by default. This works because any DuplicateKeyException, etc. would be cached as well. This saves many inserts for rows that we already know are in the database.
1904	04/17/2012 09:06 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.run_query(): Cache exceptions raised by queries as well
1903	04/17/2012 08:48 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.run_query(): When debug logging, label queries with their cache status (hit/miss/non-cacheable)
1902	04/17/2012 08:25 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.run_query(): Also debug-log queries that produce exceptions
1901	04/17/2012 08:18 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Allow creator to provide a log function to call on debug messages, instead of using stderr directly
1900	04/17/2012 08:01 PM	Aaron Marcuse-Kubitza	bin/map: Pass debug mode to DbConn so that SQL query debugging works again
1899	04/17/2012 07:49 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: DbCursor: Fixed bug where caching was always turned on, by passing the cacheable setting to it from run_query(). Turned caching back on (uncommented it) since it's now working.
1898	04/17/2012 07:21 PM	Aaron Marcuse-Kubitza	bin/map: map_rows()/map_table(): Pass kw_args to process_rows() so rows_start can be specified when using them. DB inputs: Skip the pre-start rows in the SQL query itself, so that they don't need to be iterated over by the cursor in the main loop.
1897	04/17/2012 07:07 PM	Aaron Marcuse-Kubitza	bin/map: Fixed bug introduced in r1718 where the row # would not be incremented if i < start, causing an semi-infinite loop that only ended when the input rows were exhausted. process_rows(): Added optional rows_start parameter to use if the input rows already have the pre-start rows skipped.
1896	04/17/2012 05:49 PM	Aaron Marcuse-Kubitza	input.Makefile: Sources: cat: Changed Usage message to use "--silent" make option
1895	04/17/2012 05:45 PM	Aaron Marcuse-Kubitza	input.Makefile: Sources: cat: Added Usage message with instructions for removing echoed make commands
1894	04/17/2012 05:17 PM	Aaron Marcuse-Kubitza	run_*query(): Fixed bug where INSERTs, etc. were cached by making callers (such as select()) explicitly turn on caching. DbConn.run_query(): Fixed bug where cur.mogrify() was not supported under MySQL by making the cache key a tuple of the unmogrified query and its params instead of the mogrified string query. CacheCursor: Store attributes of the original cursor that we use, such as query and rowcount.
1893	04/17/2012 04:38 PM	Aaron Marcuse-Kubitza	sql.py: Made row() and value() cache the result by fetching all rows before returning the first row
1892	04/17/2012 04:37 PM	Aaron Marcuse-Kubitza	iters.py: Added func_iter() and consume_iter()
1891	04/17/2012 04:11 PM	Aaron Marcuse-Kubitza	sql.py: Cache the results of queries (when all rows are read)
1890	04/17/2012 03:48 PM	Aaron Marcuse-Kubitza	Proxy.py: Fixed infinite recursion bug by removing setattr() (which prevents the class and subclasses from storing instance variables using "self." syntax)
1889	04/16/2012 10:19 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Added run_query(). run_raw_query(): Use new DbConn.run_query().
1888	04/16/2012 10:18 PM	Aaron Marcuse-Kubitza	Added Proxy.py
1887	04/16/2012 09:32 PM	Aaron Marcuse-Kubitza	parallel.py: MultiProducerPool: Added code to create a shared Namespace object, commented out. Updated share() doc comment to reflect that it will writably share the values as well.
1886	04/16/2012 08:49 PM	Aaron Marcuse-Kubitza	bin/map: Share locals() with the pool at various times to try to get as many unpicklable values into the shared vars as possible
1885	04/16/2012 08:45 PM	Aaron Marcuse-Kubitza	dicts.py: Turned id_dict() factory function into IdDict class. parallel.py: MultiProducerPool: Added share_vars(). main_loop(): Only consider the program to be done if the queue is empty and there are no running tasks.
1884	04/16/2012 08:00 PM	Aaron Marcuse-Kubitza	collection.py: rmap(): Treat only built-in sequences specially instead of iterables. Pass whether the value is a leaf to the func. Added option to only recurse up to a certain # of levels.
1883	04/16/2012 07:10 PM	Aaron Marcuse-Kubitza	Added lists.py
1882	04/16/2012 04:40 PM	Aaron Marcuse-Kubitza	collection.py: rmap(): Fixed bugs: Made it recursive. Use iters.is_iterable() instead of isinstance(value, list) to work on all iterables. Use value and not nonexistent var list_.
1881	04/16/2012 04:38 PM	Aaron Marcuse-Kubitza	iters.py: Added is_iterable()
1880	04/16/2012 04:11 PM	Aaron Marcuse-Kubitza	parallel.py: prepickle(): Pickle all objects in vars_id_dict_ by ID, not just unpicklable ones. This ensures that a DB connection created in the main process will be shared with subprocesses by reference (id()) instead of by value, so that each process can take advantage of e.g. shared caches in the connection object. Note that this may require some synchronization.
1879	04/16/2012 04:06 PM	Aaron Marcuse-Kubitza	parallel.py: MultiProducerPool.main_loop(): Got rid of no longer correct doc comment
1878	04/16/2012 04:05 PM	Aaron Marcuse-Kubitza	bin/map: Share on_error with the pool
1877	04/16/2012 04:05 PM	Aaron Marcuse-Kubitza	parallel.py: MultiProducerPool: Pickle objects by ID if they're accessible to the main_loop process. This should allow e.g. DB connections and pools to be pickled, if they were defined in the main process.
1876	04/14/2012 09:31 PM	Aaron Marcuse-Kubitza	Added dicts.py with id_dict() and MergeDict
1875	04/14/2012 09:30 PM	Aaron Marcuse-Kubitza	Added collection.py with rmap()
1874	04/14/2012 07:38 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Moved pool.apply_async() from put_child() to put_(), and don't use lambdas because they can't be pickled
1873	04/14/2012 07:35 PM	Aaron Marcuse-Kubitza	parallel.py: MultiProducerPool.apply_async(): Prepickle all function args. Try pickling the args before the queue pickles them, to get better debugging output.
1872	04/14/2012 07:33 PM	Aaron Marcuse-Kubitza	sql.py: with_savepoint(): Use new rand.rand_int()
1871	04/14/2012 07:33 PM	Aaron Marcuse-Kubitza	rand.py: rand_int() Fixed bug where newly-created objects did not have unique IDs because they were on the stack. So, we have to use random.randint() anyway.
1870	04/14/2012 07:27 PM	Aaron Marcuse-Kubitza	Added rand.py
1869	04/14/2012 06:56 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Made it picklable by establishing a connection on demand
1868	04/14/2012 06:54 PM	Aaron Marcuse-Kubitza	bin/map: Also consume asynchronous tasks before closing the DB connection (this is where most if not all tasks will be consumed)
1867	04/14/2012 06:44 PM	Aaron Marcuse-Kubitza	Runnable.py: Made it picklable
1866	04/14/2012 06:44 PM	Aaron Marcuse-Kubitza	Added eval_.py
1865	04/14/2012 05:35 PM	Aaron Marcuse-Kubitza	Added Runnable
1864	04/14/2012 03:05 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Added parallel processing support for inserting children with fkeys to parent asynchronously
1863	04/14/2012 03:03 PM	Aaron Marcuse-Kubitza	parallel.py: Fixed bugs: Added self param to instance methods and inner classes where needed
1862	04/14/2012 02:32 PM	Aaron Marcuse-Kubitza	parallel.py: Changed to use multi-producer pool, which requires calling pool.main_loop()
1861	04/14/2012 01:04 PM	Aaron Marcuse-Kubitza	parallel.py: Pool: Added doc comment
1860	04/14/2012 01:03 PM	Aaron Marcuse-Kubitza	parallel.py: Pool: apply_async(): Return a result object like multiprocessing.Pool.apply_async()
1859	04/14/2012 12:53 PM	Aaron Marcuse-Kubitza	bin/map: Use new parallel.py for parallel processing
1858	04/14/2012 12:51 PM	Aaron Marcuse-Kubitza	Added parallel.py for parallel processing
1857	04/14/2012 12:37 PM	Aaron Marcuse-Kubitza	bin/map: Use dummy synchronous Pool implementation if not using parallel processing
1856	04/14/2012 12:18 PM	Aaron Marcuse-Kubitza	bin/map: Use multiprocessing instead of pp for parallel processing because it's easier to use (it uses the Python threading API and doesn't require providing all the functions a task calls). Allow the user to set the cpus option to to use all system CPUs (needed because in test mode, the default is 0 CPUs to turn off parallel processing).

Project

General

Profile