Project

General

Profile

Statistics
| Revision:

# Date Author Comment
2082 05/05/2012 07:08 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Return the (table, col) where the pkeys are made available, now that this information is available from sql.put_table()

2081 05/05/2012 07:05 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Return just the name of the table where the pkeys are made available, since the column name in that table now equals the pkey name

2080 05/05/2012 06:58 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): embeddable: Make the column returned by the function have the same name as the returning column

2079 05/05/2012 06:39 PM Aaron Marcuse-Kubitza

db_xml.py: put_table() Use new sql.put_table()

2078 05/05/2012 06:39 PM Aaron Marcuse-Kubitza

sql.py: Added put_table()

2077 05/05/2012 06:37 PM Aaron Marcuse-Kubitza

sql.py: Added clean_name(). Use it where needed to make an escaped name appendable as a string.

2076 05/05/2012 05:53 PM Aaron Marcuse-Kubitza

sql.py: Added with_parsed_errors() and use it in try_insert()

2075 05/05/2012 05:30 PM Aaron Marcuse-Kubitza

sql.py: insert_select(): into != None: Fixed bug where cacheable was not passed through to DROP TABLE's run_query(), even though it was passed through to CREATE TABLE AS's run_query()

2074 05/05/2012 05:27 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Place pkeys in temp table

2073 05/05/2012 05:26 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): Document that embeddable will cause the query to be fully cached, not just if it raises an exception. insert_select(): into != None: Pass recover and cacheable through to each run_query()

2072 05/05/2012 05:17 PM Aaron Marcuse-Kubitza

sql.py: insert_select(): Support placing RETURNING values in temp table

2071 05/05/2012 04:40 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Support returning pkey from INSERT SELECT

2070 05/05/2012 04:38 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): Support using an INSERT RETURNING statement as a nested SELECT

2069 05/04/2012 07:15 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): Removed unused params recover and cacheable

2068 05/04/2012 07:10 PM Aaron Marcuse-Kubitza

sql.py: Added mogrify()

2067 05/04/2012 07:00 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Corrected @return doc

2066 05/04/2012 06:32 PM Aaron Marcuse-Kubitza

sql.py: Added mk_insert_select() and use it in insert_select()

2065 05/04/2012 06:21 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Use new insert_select()

2064 05/04/2012 06:15 PM Aaron Marcuse-Kubitza

sql.py: insert_select(): Changed order of cols and params arguments so select_query and params would be together

2063 05/04/2012 06:12 PM Aaron Marcuse-Kubitza

sql.py: Added insert_select() and use it in insert()

2062 05/04/2012 04:55 PM Aaron Marcuse-Kubitza

Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default

2061 05/04/2012 04:51 PM Aaron Marcuse-Kubitza

sql.py: esc_name_by_module(): Changed preserve_case to ignore_case, which defaults to False

2060 05/04/2012 04:49 PM Aaron Marcuse-Kubitza

Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default

2059 05/04/2012 04:47 PM Aaron Marcuse-Kubitza

sql.py: esc_name_by_module(): preserve_case defaults to True

2058 05/04/2012 04:44 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Escape all names used (table, column, cond, etc.)

2057 05/04/2012 04:33 PM Aaron Marcuse-Kubitza

sql.py: esc_name_by_module(): If not enclosing name in quotes, call check_name() on it

2056 05/04/2012 04:30 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Support literal values in the list of cols to select

2055 05/04/2012 03:22 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Don't escape the table name, because it will either be check_name()d or it's already been escaped

2054 05/04/2012 03:11 PM Aaron Marcuse-Kubitza

sql.py: Added mk_select(), and use it in select()

2053 05/04/2012 02:14 PM Aaron Marcuse-Kubitza

bin/map: Always pass qual_name(table) to sql.select(). This is possible now that qual_name() can handle None schemas.

2052 05/04/2012 02:08 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Take separate in_table and in_schema names, instead of in_table and table_is_esc, because the in_schema is needed to scope the temp tables appropriately

2051 05/04/2012 02:04 PM Aaron Marcuse-Kubitza

sql.py: qual_name(): If schema is None, don't prepend schema

2050 05/03/2012 06:59 PM Aaron Marcuse-Kubitza

bin/map, sql.py: Turned SQL query caching back on because benchmarks of just the caching on vs. off reveal that it does reduce processing time significantly. However, there is a slowdown that was introduced between the time caching was added and the time the same XML tree was used for each node, which was giving the false indication that the slowdown was due to the caching.

2049 05/03/2012 06:44 PM Aaron Marcuse-Kubitza

bin/map: Turn SQL query caching off by default

2048 05/03/2012 06:39 PM Aaron Marcuse-Kubitza

bin/map: Added cache_sql env var to enable SQL query caching

2047 05/03/2012 06:39 PM Aaron Marcuse-Kubitza

sql.py: Make caching DbConn enablable. Turn caching off by default because recent benchmarks (n=1000) were showing that it slows things down.

2046 05/03/2012 04:53 PM Aaron Marcuse-Kubitza

bin/map: Added new verbose_errors mode, enabled in test mode and off otherwise, which controls whether the output row and tracebacks are included in error messages. Having this off in import mode will reduce the size of error logs so they don't fill up the vegbiendev hard disk as quickly.

2045 05/03/2012 04:51 PM Aaron Marcuse-Kubitza

exc.py: print_ex(): Added detail option to turn off traceback

2044 05/03/2012 04:10 PM Aaron Marcuse-Kubitza

bin/map: Turn parallel processing off by default. This should fix "Cannot allocate memory" errors in large imports.

2043 05/01/2012 07:58 AM Aaron Marcuse-Kubitza

bin/map: in_is_db: Don't cache the main SELECT query

2042 05/01/2012 07:56 AM Aaron Marcuse-Kubitza

bin/map: by_col: Use the created template, which already has the column names in it, instead of mapping a sample row

2041 05/01/2012 07:50 AM Aaron Marcuse-Kubitza

bin/map: Fixed bug where db_xml could not be imported twice, or it was treated as an undefined variable for some reason

2040 05/01/2012 07:45 AM Aaron Marcuse-Kubitza

bin/map: map_table(): Make each column a db_xml.ColRef instead of a bare index, so that it will appear as the column name when converted to a string. This will provide better debugging info in the template tree and also avoid needing to create a separate sample row in by_col.

2039 05/01/2012 07:33 AM Aaron Marcuse-Kubitza

db_xml.py: Added ColRef

2038 05/01/2012 06:33 AM Aaron Marcuse-Kubitza

bin/map: Fixed bug where row count was off by one if all rows in the input were exhausted, because the row that raises StopIteration was counting as a row

2037 05/01/2012 06:13 AM Aaron Marcuse-Kubitza

main Makefile: VegBIEN DB: mk_db: Use template1 because it has PROCEDURAL LANGUAGE plpgsql already installed and we aren't using an encoding other than UTF8

2036 05/01/2012 06:11 AM Aaron Marcuse-Kubitza

Moved "CREATE PROCEDURAL LANGUAGE plpgsql" to main Makefile so that it would only run when the DB is created, not when the public schema is reinstalled. This is only relevant on PostgreSQL < 9.x, where the plpgsql language is not part of template0.

2035 05/01/2012 05:56 AM Aaron Marcuse-Kubitza

Renamed parallel.py to parallelproc.py to avoid conflict with new system parallel module on vegbiendev

2034 05/01/2012 05:43 AM Aaron Marcuse-Kubitza

Makefile: VegBIEN DB: public schema: Added schemas/rotate

2033 05/01/2012 05:34 AM Aaron Marcuse-Kubitza

bin/map: Fixed bug in input rows processed count where the count would be off by 1, because the for loop would leave i at the index of the last row instead of one-past-the-last

2032 05/01/2012 04:44 AM Aaron Marcuse-Kubitza

bin/map: Use the same XML tree for each row in DB outputs, to eliminate time spent creating the tree from the XPaths for each row

2031 05/01/2012 04:08 AM Aaron Marcuse-Kubitza

bin/map: map_table(): Resolve each prefix into a separate mapping, which is collision-eliminated, instead of resolving values from multiple prefixes when each individual row is mapped

2030 05/01/2012 03:50 AM Aaron Marcuse-Kubitza

bin/map: Moved collision-prevention code to map_rows() so it would only run if there were mappings, and so that it would run after any mappings preprocessing by map_table() that creates more collisions

2029 05/01/2012 03:45 AM Aaron Marcuse-Kubitza

bin/map: Prevent collisions if multiple inputs mapping to same output

2028 05/01/2012 02:02 AM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Mapped collectorNumber and recordNumber to recordNumber with _alt so they wouldn't collide when every input column, even empty ones, are created in the XML tree

2027 05/01/2012 12:42 AM Aaron Marcuse-Kubitza

bin/map: If out_is_db, in debug mode, print each row's XML tree and each value that it's putting

2026 05/01/2012 12:36 AM Aaron Marcuse-Kubitza

bin/map: If out_is_db, in debug mode, print the template XML tree used to insert a sample row into the DB

2025 04/30/2012 11:57 PM Aaron Marcuse-Kubitza

bin/map: map_table(): When translating mappings to column indexes, use appends to a new list instead of deletions from an existing list to simplify the algorithm

2024 04/30/2012 11:20 PM Aaron Marcuse-Kubitza

union: Omit mappings that are mapped to in the input map, in addition to mappings that were overridden. This prevents multiple outputs being created for both the renamed and original mappings, causing duplicate output nodes when one XML tree is used for all rows.

2023 04/30/2012 11:18 PM Aaron Marcuse-Kubitza

union: Omit mappings that are mapped to in the input map, in addition to mappings that were overridden. This prevents multiple outputs being created for both the renamed and original mappings, causing duplicate output nodes when one XML tree is used for all rows.

2022 04/30/2012 11:17 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Via maps cleanup: subtract: Include comment column so commented mappings are never removed

2021 04/30/2012 11:07 PM Aaron Marcuse-Kubitza

subtract: Support "ragged rows" that have fewer columns than the specified column numbers

2020 04/30/2012 11:06 PM Aaron Marcuse-Kubitza

util.py: list_subset(): Added default param to specify the value to use for invalid indexes (if any)

2019 04/30/2012 09:44 AM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Mappings with multiple inputs for the same output: Use _alt, etc. to map the multiple inputs to different places in the XML tree, so that when using a pregenerated tree, the empty leaves for each input will not collide with each other

2018 04/30/2012 09:20 AM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Changed XPath references (using "$") to XML function references using _ref where needed to make them work even on a pre-made XML tree used by all rows

2017 04/30/2012 09:13 AM Aaron Marcuse-Kubitza

xml_func.py: Added _ref to retrieve a value from another XML node

2016 04/30/2012 06:12 AM Aaron Marcuse-Kubitza

xml_func.py: Made all functions take a 2nd node param, which contains the func node itself

2015 04/30/2012 04:15 AM Aaron Marcuse-Kubitza

bin/map: If outputting to a DB, also create output XML elements for NULL input values. This will help with the transition to using the same XML tree for all rows.

2014 04/30/2012 04:09 AM Aaron Marcuse-Kubitza

xml_func.py: _label: return None on empty input

2013 04/30/2012 03:46 AM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Added _collapse around subtrees that need to be removed if they are created around a NULL value

2012 04/30/2012 03:40 AM Aaron Marcuse-Kubitza

xml_func.py: Added _collapse to collapse a subtree if the "value" element in it is NULL

2011 04/30/2012 01:44 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: definedvalue: Made definedvalue nullable so that each row of a datasource can have a uniform structure in VegBIEN, and to support reusing the same XML DOM tree for each row

2010 04/30/2012 01:11 AM Aaron Marcuse-Kubitza

xpath.py: Added is_xpath()

2009 04/30/2012 01:10 AM Aaron Marcuse-Kubitza

xml_dom.py: set_value(): If value is None and node is Element, remove value node entirely instead of setting node's value to None

2008 04/30/2012 01:02 AM Aaron Marcuse-Kubitza

xml_dom.py: Added value_node(). Use new value_node() in value() and set_value(). set_value(): If the node already has a value node, reuse it instead of appending a new value node.

2007 04/30/2012 12:35 AM Aaron Marcuse-Kubitza

xpath.py: put_obj(): Return the id_attr_node using get_1() because it should only be one node

2006 04/30/2012 12:30 AM Aaron Marcuse-Kubitza

xml_func.py: _simplifyPath: Also treat the elem as empty if the required node exists but is empty

2005 04/30/2012 12:04 AM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Added part of put() code that should be common to both functions

2004 04/27/2012 06:16 PM Aaron Marcuse-Kubitza

xpath.py: put_obj(): Return a tuple of the inserted node and the id attr node

2003 04/27/2012 06:13 PM Aaron Marcuse-Kubitza

xpath.py: set_id(): When creating the id_path, use obj() (which deepcopy()s the entire path) because it prevents pointers w/o targets

2002 04/27/2012 06:05 PM Aaron Marcuse-Kubitza

xpath.py: set_id(): When creating the id_path, deepcopy() the id_elem because its keys will change in the main copy

2001 04/27/2012 05:47 PM Aaron Marcuse-Kubitza

xpath.py: set_id(): Return the path to the ID attr, which can be used to change the ID

2000 04/27/2012 05:25 PM Aaron Marcuse-Kubitza

xpath.py: put_obj(): Return the inserted node so it can be used to change the inserted value

1999 04/27/2012 05:08 PM Aaron Marcuse-Kubitza

main Makefile: Maps validation: Fixed bug where there would be infinite recursion with the Maps validation section before the Subdir forwarding section (it's unknown why this is necessary)

1998 04/26/2012 07:12 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Added commit param to specify whether to commit after each query

1997 04/26/2012 06:55 PM Aaron Marcuse-Kubitza

bin/map: in_is_db: by_col: Use new put_table() (defined but not implemented yet)

1996 04/26/2012 06:54 PM Aaron Marcuse-Kubitza

db_xml.py: Added put_table() (without implementation)

1995 04/26/2012 06:52 PM Aaron Marcuse-Kubitza

xml_func.py: strip(): Remove _ignore XML funcs completely instead of replacing them with their values

1994 04/26/2012 06:26 PM Aaron Marcuse-Kubitza

bin/map: in_is_db: by_col: Prefix each input column name by "$"

1993 04/26/2012 06:11 PM Aaron Marcuse-Kubitza

bin/map: in_is_db: by_col: Strip off XML functions

1992 04/26/2012 06:09 PM Aaron Marcuse-Kubitza

xml_func.py: Added strip(). pop_value(): Support custom name of value param.

1991 04/26/2012 05:44 PM Aaron Marcuse-Kubitza

bin/map: in_is_db: by_col: Create XML tree of sample row, with the input column names as the values. This tree will guide the sequencing and creation of the column-based queries.

1990 04/26/2012 05:43 PM Aaron Marcuse-Kubitza

input.Makefile: use_staged env var: defaults to on if by_col is on

1989 04/26/2012 05:00 PM Aaron Marcuse-Kubitza

bin/map: Only turn on by_col optimization if mapping to same DB, rather than requiring each place that checks by_col to also check whether mapping to same DB

1988 04/24/2012 06:32 PM Aaron Marcuse-Kubitza

input.Makefile: Testing: Don't abort tester if only staging test fails, in case staging table missing

1987 04/24/2012 06:25 PM Aaron Marcuse-Kubitza

input.Makefile: Testing: When cleaning up test outputs, remove everything that doesn't end in .ref

1986 04/24/2012 06:11 PM Aaron Marcuse-Kubitza

input.Makefile: Testing: Added test/import.%.staging.out test to test the staging tables. Sources: cat: Updated Usage comment to include the "inputs/<datasrc>/" prefix the user would need to add when running make.

1985 04/24/2012 05:33 PM Aaron Marcuse-Kubitza

bin/map: Fixed bug where mapping to same DB wouldn't work because by-column optimization wasn't implemented yet, by turning it off by default and allowing it to be enabled with an env var

1984 04/24/2012 05:25 PM Aaron Marcuse-Kubitza

bin/map: DB inputs: Use by-column optimization if mapping to same DB (with skeleton code for optimization's implementation)

1983 04/24/2012 05:12 PM Aaron Marcuse-Kubitza

input.Makefile: Mapping: Use the staging tables instead of any flat files if use_staged is specified