/ - Changes - BIEN 3 - NCEAS Projects

root @ 2118

#	Date	Author	Comment
2118	05/09/2012 01:51 AM	Aaron Marcuse-Kubitza	bin/map: verbose_errors also defaults to on in debug mode
2117	05/09/2012 01:39 AM	Aaron Marcuse-Kubitza	sql.py: add_row_num(): Make the row number column the primary key
2116	05/09/2012 12:36 AM	Aaron Marcuse-Kubitza	csv2db: Use new sql.cleanup_table() to map NULL-equivalents to NULL. Consider the empty string to be NULL.
2115	05/09/2012 12:35 AM	Aaron Marcuse-Kubitza	sql.py: Added cleanup_table()
2114	05/09/2012 12:33 AM	Aaron Marcuse-Kubitza	csvs.py: Added row filters
2113	05/07/2012 11:14 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Fixed bug where relational functions were not being treated as value nodes, and thus their containing child was treated as a child with a backwards pointer instead of a field
2112	05/07/2012 11:12 PM	Aaron Marcuse-Kubitza	xml_func.py: Added is_func() and is_xml_func() and use them where their definitions were used
2111	05/07/2012 10:40 PM	Aaron Marcuse-Kubitza	db_xml.py: Added value() and use it where xml_dom.first_elem() was used
2110	05/07/2012 10:12 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Latitude/Longitude: Moved _toDouble directly after the output col name, so that it's run after any translation functions (which all return strings). *ElevationInMeters: Added _toDouble around all output cols.
2109	05/07/2012 09:56 PM	Aaron Marcuse-Kubitza	xpath.py: get(): Create attrs: Fixed bug where attrs were created with last_only on, which caused attrs to get created multiple times if there were multiple attrs of the same name but different values, becase the last_only optimization would only check the last attr of that name
2108	05/07/2012 09:19 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Latitude/Longitude: Use new _toDouble to convert strings to doubles (needed for by_col)
2107	05/07/2012 09:16 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: Added _toDouble
2106	05/07/2012 09:16 PM	Aaron Marcuse-Kubitza	bin/map: When calling xml_func.process(), pass DB connection if available
2105	05/07/2012 09:15 PM	Aaron Marcuse-Kubitza	xml_func.py: process(): If DB with relational functions available (passed in via db param), call any non-local XML functions as relational funcs
2104	05/07/2012 09:09 PM	Aaron Marcuse-Kubitza	sql.py: put(): pkey param (now pkey_) defaults to table's pkey
2103	05/07/2012 08:30 PM	Aaron Marcuse-Kubitza	bin/map: by_col: In debug mode, print stripped XML tree that guides import
2102	05/07/2012 08:03 PM	Aaron Marcuse-Kubitza	vegbien_dest: Fixed bug where there was a missing line continuation char before schemas var
2101	05/07/2012 08:02 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Fixed bug where schemas db_config value needed to be split apart into strings. Fixed bug where current_setting() returned a value rather than an identifier, so it had to be used with set_config() instead of SET, and run after SET TRANSACTION ISOLATION LEVEL. Moved Input validation section before Database connections because it's used by Database connections.
2100	05/07/2012 07:29 PM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
2099	05/07/2012 07:26 PM	Aaron Marcuse-Kubitza	vegbien.ERD.mwb: Changed lines to a configuration that MySQLWorkbench wouldn't keep resetting whenever the ERD was reopened
2098	05/07/2012 07:21 PM	Aaron Marcuse-Kubitza	vegbien_dest: Added "functions" to schemas
2097	05/07/2012 07:20 PM	Aaron Marcuse-Kubitza	sql.py: db_config: Added schemas param. DbConn: Use any schemas db_config value to set search_path.
2096	05/07/2012 06:58 PM	Aaron Marcuse-Kubitza	sql.py: add_row_num(): Name the column "_row_num" so that it doesn't conflict with any "row_num" column that's part of the table schema
2095	05/07/2012 06:50 PM	Aaron Marcuse-Kubitza	main Makefile: VegBIEN DB: functions schema: Renamed schemas/functions/clear to .../reset to reflect that it also resets the schema to what's in the dump file. schemas/functions/reset: Use now-available schemas/functions.sql to create the schema.
2094	05/07/2012 06:45 PM	Aaron Marcuse-Kubitza	Added autogen schemas/functions.sql
2093	05/07/2012 06:41 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql.make: Use new pg_dump_vegbien
2092	05/07/2012 06:41 PM	Aaron Marcuse-Kubitza	Added pg_dump_vegbien to dump a schema of the vegbien db
2091	05/07/2012 06:34 PM	Aaron Marcuse-Kubitza	main Makefile: VegBIEN DB: Added functions schema targets
2090	05/07/2012 06:09 PM	Aaron Marcuse-Kubitza	Makefile: $(confirm): Support a separate line outside of the highlighted line. Include the "Continue?" in the macro since all prompts include it.
2089	05/07/2012 05:55 PM	Aaron Marcuse-Kubitza	Makefile: VegBIEN DB: Display different warning message depending on whether entire DB or just current public schema is being deleted
2088	05/07/2012 05:38 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Recurse into forward pointers
2087	05/05/2012 09:55 PM	Aaron Marcuse-Kubitza	sql.py: put_table(): Take multiple in_tables. Initial implementation just used the first in_table.
2086	05/05/2012 09:48 PM	Aaron Marcuse-Kubitza	sql.py: Added add_row_num(). put_table(): Add row_num to pkeys_table, so it can be joined with in_table's pkeys.
2085	05/05/2012 09:38 PM	Aaron Marcuse-Kubitza	sql.py: Added run_query_into() and use it in insert_select()
2084	05/05/2012 08:53 PM	Aaron Marcuse-Kubitza	sql.py: pkey(): Support escaped table names
2083	05/05/2012 07:32 PM	Aaron Marcuse-Kubitza	sql.py: mk_insert_select(): embeddable: Name the function alias "f" since it will just be wrapped in a nested SELECT, so the exact name doesn't matter (and won't be visible outside the nested SELECT anyway)
2082	05/05/2012 07:08 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Return the (table, col) where the pkeys are made available, now that this information is available from sql.put_table()
2081	05/05/2012 07:05 PM	Aaron Marcuse-Kubitza	sql.py: put_table(): Return just the name of the table where the pkeys are made available, since the column name in that table now equals the pkey name
2080	05/05/2012 06:58 PM	Aaron Marcuse-Kubitza	sql.py: mk_insert_select(): embeddable: Make the column returned by the function have the same name as the returning column
2079	05/05/2012 06:39 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table() Use new sql.put_table()
2078	05/05/2012 06:39 PM	Aaron Marcuse-Kubitza	sql.py: Added put_table()
2077	05/05/2012 06:37 PM	Aaron Marcuse-Kubitza	sql.py: Added clean_name(). Use it where needed to make an escaped name appendable as a string.
2076	05/05/2012 05:53 PM	Aaron Marcuse-Kubitza	sql.py: Added with_parsed_errors() and use it in try_insert()
2075	05/05/2012 05:30 PM	Aaron Marcuse-Kubitza	sql.py: insert_select(): into != None: Fixed bug where cacheable was not passed through to DROP TABLE's run_query(), even though it was passed through to CREATE TABLE AS's run_query()
2074	05/05/2012 05:27 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Place pkeys in temp table
2073	05/05/2012 05:26 PM	Aaron Marcuse-Kubitza	sql.py: mk_insert_select(): Document that embeddable will cause the query to be fully cached, not just if it raises an exception. insert_select(): into != None: Pass recover and cacheable through to each run_query()
2072	05/05/2012 05:17 PM	Aaron Marcuse-Kubitza	sql.py: insert_select(): Support placing RETURNING values in temp table
2071	05/05/2012 04:40 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Support returning pkey from INSERT SELECT
2070	05/05/2012 04:38 PM	Aaron Marcuse-Kubitza	sql.py: mk_insert_select(): Support using an INSERT RETURNING statement as a nested SELECT
2069	05/04/2012 07:15 PM	Aaron Marcuse-Kubitza	sql.py: mk_insert_select(): Removed unused params recover and cacheable
2068	05/04/2012 07:10 PM	Aaron Marcuse-Kubitza	sql.py: Added mogrify()
2067	05/04/2012 07:00 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Corrected @return doc
2066	05/04/2012 06:32 PM	Aaron Marcuse-Kubitza	sql.py: Added mk_insert_select() and use it in insert_select()
2065	05/04/2012 06:21 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Use new insert_select()
2064	05/04/2012 06:15 PM	Aaron Marcuse-Kubitza	sql.py: insert_select(): Changed order of cols and params arguments so select_query and params would be together
2063	05/04/2012 06:12 PM	Aaron Marcuse-Kubitza	sql.py: Added insert_select() and use it in insert()
2062	05/04/2012 04:55 PM	Aaron Marcuse-Kubitza	Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default
2061	05/04/2012 04:51 PM	Aaron Marcuse-Kubitza	sql.py: esc_name_by_module(): Changed preserve_case to ignore_case, which defaults to False
2060	05/04/2012 04:49 PM	Aaron Marcuse-Kubitza	Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default
2059	05/04/2012 04:47 PM	Aaron Marcuse-Kubitza	sql.py: esc_name_by_module(): preserve_case defaults to True
2058	05/04/2012 04:44 PM	Aaron Marcuse-Kubitza	sql.py: mk_select(): Escape all names used (table, column, cond, etc.)
2057	05/04/2012 04:33 PM	Aaron Marcuse-Kubitza	sql.py: esc_name_by_module(): If not enclosing name in quotes, call check_name() on it
2056	05/04/2012 04:30 PM	Aaron Marcuse-Kubitza	sql.py: mk_select(): Support literal values in the list of cols to select
2055	05/04/2012 03:22 PM	Aaron Marcuse-Kubitza	sql.py: mk_select(): Don't escape the table name, because it will either be check_name()d or it's already been escaped
2054	05/04/2012 03:11 PM	Aaron Marcuse-Kubitza	sql.py: Added mk_select(), and use it in select()
2053	05/04/2012 02:14 PM	Aaron Marcuse-Kubitza	bin/map: Always pass qual_name(table) to sql.select(). This is possible now that qual_name() can handle None schemas.
2052	05/04/2012 02:08 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Take separate in_table and in_schema names, instead of in_table and table_is_esc, because the in_schema is needed to scope the temp tables appropriately
2051	05/04/2012 02:04 PM	Aaron Marcuse-Kubitza	sql.py: qual_name(): If schema is None, don't prepend schema
2050	05/03/2012 06:59 PM	Aaron Marcuse-Kubitza	bin/map, sql.py: Turned SQL query caching back on because benchmarks of just the caching on vs. off reveal that it does reduce processing time significantly. However, there is a slowdown that was introduced between the time caching was added and the time the same XML tree was used for each node, which was giving the false indication that the slowdown was due to the caching.
2049	05/03/2012 06:44 PM	Aaron Marcuse-Kubitza	bin/map: Turn SQL query caching off by default
2048	05/03/2012 06:39 PM	Aaron Marcuse-Kubitza	bin/map: Added cache_sql env var to enable SQL query caching
2047	05/03/2012 06:39 PM	Aaron Marcuse-Kubitza	sql.py: Make caching DbConn enablable. Turn caching off by default because recent benchmarks (n=1000) were showing that it slows things down.
2046	05/03/2012 04:53 PM	Aaron Marcuse-Kubitza	bin/map: Added new verbose_errors mode, enabled in test mode and off otherwise, which controls whether the output row and tracebacks are included in error messages. Having this off in import mode will reduce the size of error logs so they don't fill up the vegbiendev hard disk as quickly.
2045	05/03/2012 04:51 PM	Aaron Marcuse-Kubitza	exc.py: print_ex(): Added detail option to turn off traceback
2044	05/03/2012 04:10 PM	Aaron Marcuse-Kubitza	bin/map: Turn parallel processing off by default. This should fix "Cannot allocate memory" errors in large imports.
2043	05/01/2012 07:58 AM	Aaron Marcuse-Kubitza	bin/map: in_is_db: Don't cache the main SELECT query
2042	05/01/2012 07:56 AM	Aaron Marcuse-Kubitza	bin/map: by_col: Use the created template, which already has the column names in it, instead of mapping a sample row
2041	05/01/2012 07:50 AM	Aaron Marcuse-Kubitza	bin/map: Fixed bug where db_xml could not be imported twice, or it was treated as an undefined variable for some reason
2040	05/01/2012 07:45 AM	Aaron Marcuse-Kubitza	bin/map: map_table(): Make each column a db_xml.ColRef instead of a bare index, so that it will appear as the column name when converted to a string. This will provide better debugging info in the template tree and also avoid needing to create a separate sample row in by_col.
2039	05/01/2012 07:33 AM	Aaron Marcuse-Kubitza	db_xml.py: Added ColRef
2038	05/01/2012 06:33 AM	Aaron Marcuse-Kubitza	bin/map: Fixed bug where row count was off by one if all rows in the input were exhausted, because the row that raises StopIteration was counting as a row
2037	05/01/2012 06:13 AM	Aaron Marcuse-Kubitza	main Makefile: VegBIEN DB: mk_db: Use template1 because it has PROCEDURAL LANGUAGE plpgsql already installed and we aren't using an encoding other than UTF8
2036	05/01/2012 06:11 AM	Aaron Marcuse-Kubitza	Moved "CREATE PROCEDURAL LANGUAGE plpgsql" to main Makefile so that it would only run when the DB is created, not when the public schema is reinstalled. This is only relevant on PostgreSQL < 9.x, where the plpgsql language is not part of template0.
2035	05/01/2012 05:56 AM	Aaron Marcuse-Kubitza	Renamed parallel.py to parallelproc.py to avoid conflict with new system parallel module on vegbiendev
2034	05/01/2012 05:43 AM	Aaron Marcuse-Kubitza	Makefile: VegBIEN DB: public schema: Added schemas/rotate
2033	05/01/2012 05:34 AM	Aaron Marcuse-Kubitza	bin/map: Fixed bug in input rows processed count where the count would be off by 1, because the for loop would leave i at the index of the last row instead of one-past-the-last
2032	05/01/2012 04:44 AM	Aaron Marcuse-Kubitza	bin/map: Use the same XML tree for each row in DB outputs, to eliminate time spent creating the tree from the XPaths for each row
2031	05/01/2012 04:08 AM	Aaron Marcuse-Kubitza	bin/map: map_table(): Resolve each prefix into a separate mapping, which is collision-eliminated, instead of resolving values from multiple prefixes when each individual row is mapped
2030	05/01/2012 03:50 AM	Aaron Marcuse-Kubitza	bin/map: Moved collision-prevention code to map_rows() so it would only run if there were mappings, and so that it would run after any mappings preprocessing by map_table() that creates more collisions
2029	05/01/2012 03:45 AM	Aaron Marcuse-Kubitza	bin/map: Prevent collisions if multiple inputs mapping to same output
2028	05/01/2012 02:02 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Mapped collectorNumber and recordNumber to recordNumber with _alt so they wouldn't collide when every input column, even empty ones, are created in the XML tree
2027	05/01/2012 12:42 AM	Aaron Marcuse-Kubitza	bin/map: If out_is_db, in debug mode, print each row's XML tree and each value that it's putting
2026	05/01/2012 12:36 AM	Aaron Marcuse-Kubitza	bin/map: If out_is_db, in debug mode, print the template XML tree used to insert a sample row into the DB
2025	04/30/2012 11:57 PM	Aaron Marcuse-Kubitza	bin/map: map_table(): When translating mappings to column indexes, use appends to a new list instead of deletions from an existing list to simplify the algorithm
2024	04/30/2012 11:20 PM	Aaron Marcuse-Kubitza	union: Omit mappings that are mapped to in the input map, in addition to mappings that were overridden. This prevents multiple outputs being created for both the renamed and original mappings, causing duplicate output nodes when one XML tree is used for all rows.
2023	04/30/2012 11:18 PM	Aaron Marcuse-Kubitza	union: Omit mappings that are mapped to in the input map, in addition to mappings that were overridden. This prevents multiple outputs being created for both the renamed and original mappings, causing duplicate output nodes when one XML tree is used for all rows.
2022	04/30/2012 11:17 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: Via maps cleanup: subtract: Include comment column so commented mappings are never removed
2021	04/30/2012 11:07 PM	Aaron Marcuse-Kubitza	subtract: Support "ragged rows" that have fewer columns than the specified column numbers
2020	04/30/2012 11:06 PM	Aaron Marcuse-Kubitza	util.py: list_subset(): Added default param to specify the value to use for invalid indexes (if any)
2019	04/30/2012 09:44 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Mappings with multiple inputs for the same output: Use _alt, etc. to map the multiple inputs to different places in the XML tree, so that when using a pregenerated tree, the empty leaves for each input will not collide with each other

Project

General

Profile