/ - Changes - BIEN 3 - NCEAS Projects

root @ 2033

#	Date	Author	Comment
2033	05/01/2012 05:34 AM	Aaron Marcuse-Kubitza	bin/map: Fixed bug in input rows processed count where the count would be off by 1, because the for loop would leave i at the index of the last row instead of one-past-the-last
2032	05/01/2012 04:44 AM	Aaron Marcuse-Kubitza	bin/map: Use the same XML tree for each row in DB outputs, to eliminate time spent creating the tree from the XPaths for each row
2031	05/01/2012 04:08 AM	Aaron Marcuse-Kubitza	bin/map: map_table(): Resolve each prefix into a separate mapping, which is collision-eliminated, instead of resolving values from multiple prefixes when each individual row is mapped
2030	05/01/2012 03:50 AM	Aaron Marcuse-Kubitza	bin/map: Moved collision-prevention code to map_rows() so it would only run if there were mappings, and so that it would run after any mappings preprocessing by map_table() that creates more collisions
2029	05/01/2012 03:45 AM	Aaron Marcuse-Kubitza	bin/map: Prevent collisions if multiple inputs mapping to same output
2028	05/01/2012 02:02 AM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Mapped collectorNumber and recordNumber to recordNumber with _alt so they wouldn't collide when every input column, even empty ones, are created in the XML tree
2027	05/01/2012 12:42 AM	Aaron Marcuse-Kubitza	bin/map: If out_is_db, in debug mode, print each row's XML tree and each value that it's putting
2026	05/01/2012 12:36 AM	Aaron Marcuse-Kubitza	bin/map: If out_is_db, in debug mode, print the template XML tree used to insert a sample row into the DB
2025	04/30/2012 11:57 PM	Aaron Marcuse-Kubitza	bin/map: map_table(): When translating mappings to column indexes, use appends to a new list instead of deletions from an existing list to simplify the algorithm
2024	04/30/2012 11:20 PM	Aaron Marcuse-Kubitza	union: Omit mappings that are mapped to in the input map, in addition to mappings that were overridden. This prevents multiple outputs being created for both the renamed and original mappings, causing duplicate output nodes when one XML tree is used for all rows.
2023	04/30/2012 11:18 PM	Aaron Marcuse-Kubitza	union: Omit mappings that are mapped to in the input map, in addition to mappings that were overridden. This prevents multiple outputs being created for both the renamed and original mappings, causing duplicate output nodes when one XML tree is used for all rows.
2022	04/30/2012 11:17 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: Via maps cleanup: subtract: Include comment column so commented mappings are never removed
2021	04/30/2012 11:07 PM	Aaron Marcuse-Kubitza	subtract: Support "ragged rows" that have fewer columns than the specified column numbers
2020	04/30/2012 11:06 PM	Aaron Marcuse-Kubitza	util.py: list_subset(): Added default param to specify the value to use for invalid indexes (if any)
2019	04/30/2012 09:44 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Mappings with multiple inputs for the same output: Use _alt, etc. to map the multiple inputs to different places in the XML tree, so that when using a pregenerated tree, the empty leaves for each input will not collide with each other
2018	04/30/2012 09:20 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Changed XPath references (using "$") to XML function references using _ref where needed to make them work even on a pre-made XML tree used by all rows
2017	04/30/2012 09:13 AM	Aaron Marcuse-Kubitza	xml_func.py: Added _ref to retrieve a value from another XML node
2016	04/30/2012 06:12 AM	Aaron Marcuse-Kubitza	xml_func.py: Made all functions take a 2nd node param, which contains the func node itself
2015	04/30/2012 04:15 AM	Aaron Marcuse-Kubitza	bin/map: If outputting to a DB, also create output XML elements for NULL input values. This will help with the transition to using the same XML tree for all rows.
2014	04/30/2012 04:09 AM	Aaron Marcuse-Kubitza	xml_func.py: _label: return None on empty input
2013	04/30/2012 03:46 AM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Added _collapse around subtrees that need to be removed if they are created around a NULL value
2012	04/30/2012 03:40 AM	Aaron Marcuse-Kubitza	xml_func.py: Added _collapse to collapse a subtree if the "value" element in it is NULL
2011	04/30/2012 01:44 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: definedvalue: Made definedvalue nullable so that each row of a datasource can have a uniform structure in VegBIEN, and to support reusing the same XML DOM tree for each row
2010	04/30/2012 01:11 AM	Aaron Marcuse-Kubitza	xpath.py: Added is_xpath()
2009	04/30/2012 01:10 AM	Aaron Marcuse-Kubitza	xml_dom.py: set_value(): If value is None and node is Element, remove value node entirely instead of setting node's value to None
2008	04/30/2012 01:02 AM	Aaron Marcuse-Kubitza	xml_dom.py: Added value_node(). Use new value_node() in value() and set_value(). set_value(): If the node already has a value node, reuse it instead of appending a new value node.
2007	04/30/2012 12:35 AM	Aaron Marcuse-Kubitza	xpath.py: put_obj(): Return the id_attr_node using get_1() because it should only be one node
2006	04/30/2012 12:30 AM	Aaron Marcuse-Kubitza	xml_func.py: _simplifyPath: Also treat the elem as empty if the required node exists but is empty
2005	04/30/2012 12:04 AM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Added part of put() code that should be common to both functions
2004	04/27/2012 06:16 PM	Aaron Marcuse-Kubitza	xpath.py: put_obj(): Return a tuple of the inserted node and the id attr node
2003	04/27/2012 06:13 PM	Aaron Marcuse-Kubitza	xpath.py: set_id(): When creating the id_path, use obj() (which deepcopy()s the entire path) because it prevents pointers w/o targets
2002	04/27/2012 06:05 PM	Aaron Marcuse-Kubitza	xpath.py: set_id(): When creating the id_path, deepcopy() the id_elem because its keys will change in the main copy
2001	04/27/2012 05:47 PM	Aaron Marcuse-Kubitza	xpath.py: set_id(): Return the path to the ID attr, which can be used to change the ID
2000	04/27/2012 05:25 PM	Aaron Marcuse-Kubitza	xpath.py: put_obj(): Return the inserted node so it can be used to change the inserted value
1999	04/27/2012 05:08 PM	Aaron Marcuse-Kubitza	main Makefile: Maps validation: Fixed bug where there would be infinite recursion with the Maps validation section before the Subdir forwarding section (it's unknown why this is necessary)
1998	04/26/2012 07:12 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Added commit param to specify whether to commit after each query
1997	04/26/2012 06:55 PM	Aaron Marcuse-Kubitza	bin/map: in_is_db: by_col: Use new put_table() (defined but not implemented yet)
1996	04/26/2012 06:54 PM	Aaron Marcuse-Kubitza	db_xml.py: Added put_table() (without implementation)
1995	04/26/2012 06:52 PM	Aaron Marcuse-Kubitza	xml_func.py: strip(): Remove _ignore XML funcs completely instead of replacing them with their values
1994	04/26/2012 06:26 PM	Aaron Marcuse-Kubitza	bin/map: in_is_db: by_col: Prefix each input column name by "$"
1993	04/26/2012 06:11 PM	Aaron Marcuse-Kubitza	bin/map: in_is_db: by_col: Strip off XML functions
1992	04/26/2012 06:09 PM	Aaron Marcuse-Kubitza	xml_func.py: Added strip(). pop_value(): Support custom name of value param.
1991	04/26/2012 05:44 PM	Aaron Marcuse-Kubitza	bin/map: in_is_db: by_col: Create XML tree of sample row, with the input column names as the values. This tree will guide the sequencing and creation of the column-based queries.
1990	04/26/2012 05:43 PM	Aaron Marcuse-Kubitza	input.Makefile: use_staged env var: defaults to on if by_col is on
1989	04/26/2012 05:00 PM	Aaron Marcuse-Kubitza	bin/map: Only turn on by_col optimization if mapping to same DB, rather than requiring each place that checks by_col to also check whether mapping to same DB
1988	04/24/2012 06:32 PM	Aaron Marcuse-Kubitza	input.Makefile: Testing: Don't abort tester if only staging test fails, in case staging table missing
1987	04/24/2012 06:25 PM	Aaron Marcuse-Kubitza	input.Makefile: Testing: When cleaning up test outputs, remove everything that doesn't end in .ref
1986	04/24/2012 06:11 PM	Aaron Marcuse-Kubitza	input.Makefile: Testing: Added test/import.%.staging.out test to test the staging tables. Sources: cat: Updated Usage comment to include the "inputs/<datasrc>/" prefix the user would need to add when running make.
1985	04/24/2012 05:33 PM	Aaron Marcuse-Kubitza	bin/map: Fixed bug where mapping to same DB wouldn't work because by-column optimization wasn't implemented yet, by turning it off by default and allowing it to be enabled with an env var
1984	04/24/2012 05:25 PM	Aaron Marcuse-Kubitza	bin/map: DB inputs: Use by-column optimization if mapping to same DB (with skeleton code for optimization's implementation)
1983	04/24/2012 05:12 PM	Aaron Marcuse-Kubitza	input.Makefile: Mapping: Use the staging tables instead of any flat files if use_staged is specified
1982	04/24/2012 05:10 PM	Aaron Marcuse-Kubitza	bin/map: Support custom schema name. Support input table/schema override via env vars, in case the map spreadsheet was written for a different input format.
1981	04/24/2012 05:01 PM	Aaron Marcuse-Kubitza	sql.py: qual_name(): Fixed bugs where esc_name() nested func couldn't have same name as outer func, and esc_name() needed to be invoked without the module name because it's in the same module. select(): Support already-escaped table names.
1980	04/24/2012 04:16 PM	Aaron Marcuse-Kubitza	main Makefile: $(psqlAsAdmin): Tell sudo to preserve env vars so PGOPTIONS is passed to psql
1979	04/24/2012 03:33 PM	Aaron Marcuse-Kubitza	root map: Fill in defaults for inputs from VegBIEN, as well as outputs to it
1978	04/24/2012 02:59 PM	Aaron Marcuse-Kubitza	disown_all: Updated to use main function, local vars, $self, etc. like other bash scripts run using "."
1977	04/24/2012 02:55 PM	Aaron Marcuse-Kubitza	vegbien_dest: Fixed bug where it would give a usage error if run from a makefile rule, because the BASH_LINENO would be 0, by also checking if ${BASH_ARGV⁰} is ${BASH_SOURCE⁰}
1976	04/24/2012 02:28 PM	Aaron Marcuse-Kubitza	postgres_vegbien: Fixed bug where interpreter did not match vegbien_dest's new required interpreter of /bin/bash
1975	04/24/2012 02:23 PM	Aaron Marcuse-Kubitza	vegbien_dest: Changed interpreter to /bin/bash. Removed comment that it requires var bien_password.
1974	04/24/2012 02:20 PM	Aaron Marcuse-Kubitza	postgres_vegbien: Removed no longer needed retrieval of bien_password
1973	04/24/2012 02:20 PM	Aaron Marcuse-Kubitza	vegbien_dest: Get bien_password by searching relative to $self, which we now have a way to get in a bash script (${BASH_SOURCE⁰}), rather than requiring the caller to set it. Provide usage error if run without initial ".".
1972	04/24/2012 02:12 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables: import/install-%: Use new quiet option to determine whether to tee output to terminal. Don't use log option because that's always set to true except in test mode, which doesn't apply to installs.
1971	04/24/2012 02:12 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables: import/install-%: Use new quiet option to determine whether to tee output to terminal. Don't use log option because that's always set to true except in test mode, which doesn't apply to installs.
1970	04/24/2012 01:56 PM	Aaron Marcuse-Kubitza	main Makefile: PostgreSQL: Edit /etc/phppgadmin/apache.conf to replace "deny from all" with "allow from all", instead of uncommenting an "allow from all" that may not be there
1969	04/24/2012 01:35 PM	Aaron Marcuse-Kubitza	input.Makefile: Sources: Fixed bug where cat was defined before $(tables), by moving Sources after Existing maps discovery and putting just $(inputFiles) and $(dbExport) from Sources at the beginning of Existing maps discovery
1968	04/24/2012 01:05 PM	Aaron Marcuse-Kubitza	sql.py: Made truncate(), tables(), empty_db() schema-aware. Added qual_name(). tables(): Added option to filter tables by a LIKE pattern.
1967	04/24/2012 12:34 PM	Aaron Marcuse-Kubitza	main Makefile: VegBIEN DB: Install public schema in a separate step, so that it can be dropped without dropping the entire DB (which also contains staging tables that shouldn't be dropped when there is a schema change). Added schemas/install, schemas/uninstall, implicit schemas/reinstall to manage the public schema separately from the rest of the DB. Moved Subdir forwarding to the bottom so overridden targets are not forwarded. README.TXT: Since `make reinstall_db` would drop the entire DB, tell user to run new `make schemas/reinstall` instead to reinstall (main) DB from schema.
1966	04/24/2012 12:30 PM	Aaron Marcuse-Kubitza	schemas/postgresql.Mac.conf: Set unix_socket_directory to the new dir it seems to be using, which is now /tmp
1965	04/24/2012 11:43 AM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where extra columns were not truncated in INSERT mode. Replace empty column names with the column # to avoid errors with CSVs that have trailing ","s, etc.
1964	04/24/2012 11:41 AM	Aaron Marcuse-Kubitza	streams.py: StreamIter: Define readline() as a separate method so it can be overridden, and all calls to self.next() will use the overridden readline(). This fixes a bug in ProgressInputStream where incremental counts would not be displayed and it would end with "not all input read" if the StreamIter interface was used instead of readline().
1963	04/23/2012 09:57 PM	Aaron Marcuse-Kubitza	csv2db: Fall back to manually inserting each row (autodetecting the encoding for each field) if COPY FROM doesn't work
1962	04/23/2012 09:56 PM	Aaron Marcuse-Kubitza	streams.py: FilterStream: Inherit from StreamIter so that all descendants automatically have StreamIter functionality
1961	04/23/2012 09:42 PM	Aaron Marcuse-Kubitza	sql.py: insert(): Support using the default value for columns designated with the special value sql.default
1960	04/23/2012 09:21 PM	Aaron Marcuse-Kubitza	sql.py: insert(): Support rows that are just a list of values, with no columns. Support already-escaped table names.
1959	04/23/2012 08:54 PM	Aaron Marcuse-Kubitza	strings.py: Added contains_any()
1958	04/23/2012 08:54 PM	Aaron Marcuse-Kubitza	csvs.py: reader_and_header(): Use make_reader()
1957	04/23/2012 08:07 PM	Aaron Marcuse-Kubitza	Added reinstall_all to reinstall all inputs at once
1956	04/23/2012 08:06 PM	Aaron Marcuse-Kubitza	with_all: Documented that it must be run from the root svn directory
1955	04/23/2012 08:05 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables: import/install-%: Only install staging table if input contains only CSV sources. Changed $(isXml) to $(isCsv) (negated) everywhere because rules almost always only run something if input contains only CSV sources, rather than if input contains XML sources.
1954	04/23/2012 07:21 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables: import/install-%: Output load status to log file if log option is set
1953	04/23/2012 07:00 PM	Aaron Marcuse-Kubitza	Scripts that are meant to be run in the calling shell: Fixed bug where running the script inside another script would make the script think it was being run as a program, and abort with a usage error
1952	04/23/2012 06:56 PM	Aaron Marcuse-Kubitza	Scripts that are meant to be run in the calling shell: Fixed bug where running the script as a program (without initial ".") wouldn't be able to call return in something that was not a function. Converted all code to a <script_name>_main method so that return would work properly again. Converted all variables to local variables.
1951	04/23/2012 06:38 PM	Aaron Marcuse-Kubitza	env_password: return instead of exit if password not yet stored, in case user is running it from a shell without the initial "-" argument. (This would be the case if the user is just testing out the script, instead of using a command that env_password directs them to run.)
1950	04/23/2012 05:43 PM	Aaron Marcuse-Kubitza	env_password: Use ${BASH_SOURCE⁰} for $self and $self for $0. return instead of exit on usage error in case user is running it from a shell.
1949	04/23/2012 05:36 PM	Aaron Marcuse-Kubitza	stop_imports: Use ${BASH_SOURCE⁰} for $self and $self for $0
1948	04/23/2012 05:36 PM	Aaron Marcuse-Kubitza	import_all: Use new with_all. Use ${BASH_SOURCE⁰} for $self and $self for $0.
1947	04/23/2012 05:34 PM	Aaron Marcuse-Kubitza	Added with_all to run a make target on all inputs at once
1946	04/23/2012 05:05 PM	Aaron Marcuse-Kubitza	Made row #s 1-based to the user to match up with the staging table row #s
1945	04/23/2012 04:59 PM	Aaron Marcuse-Kubitza	bin/map: Fixed bug where limit passed to sql.select() was end instead of the # rows, causing extra rows to be fetched when start > 0. Documented that row #s start with 0.
1944	04/23/2012 04:19 PM	Aaron Marcuse-Kubitza	Removed no longer needed csv2ddl
1943	04/23/2012 04:19 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables: import/install-%: Use new csv2db instead of csv2ddl/$(psqlAsBien), because it handles translating encodings properly
1942	04/23/2012 04:14 PM	Aaron Marcuse-Kubitza	Added csv2db to load a command's CSV output stream into a PostgreSQL table
1941	04/21/2012 09:32 PM	Aaron Marcuse-Kubitza	schemas/postgresql.Mac.conf: Set unix_socket_directory to the appropriate Mac OS X dir, since otherwise, the socket is apparently not created and `make reinstall_db` doesn't work
1940	04/21/2012 09:30 PM	Aaron Marcuse-Kubitza	main Makefile: VegBIEN DB: db: Set LC_COLLATE and LC_CTYPE explicitly, to make it easier to change them
1939	04/21/2012 09:29 PM	Aaron Marcuse-Kubitza	Added ProgressInputStream
1938	04/21/2012 09:28 PM	Aaron Marcuse-Kubitza	exc.py: print_ex(): Added plain option to leave out traceback
1937	04/21/2012 06:48 PM	Aaron Marcuse-Kubitza	main Makefile: VegBIEN DB: db: Use template0 to allow encodings other than UTF-8. Because template0 doesn't have plpgsql on PostgreSQL before 9.x, add "CREATE PROCEDURAL LANGUAGE plpgsql;" manually in schemas/vegbien.sql.make, and filter it back out on PostgreSQL after 9.x using db_dump_localize.
1936	04/21/2012 06:39 PM	Aaron Marcuse-Kubitza	PostgreSQL-MySQL.csv: Remove "CREATE PROCEDURAL LANGUAGE" statements
1935	04/21/2012 06:36 PM	Aaron Marcuse-Kubitza	Added db_dump_localize to translate a PostgreSQL DB dump for the local server's version
1934	04/21/2012 06:32 PM	Aaron Marcuse-Kubitza	Added db_dump_localize to translate a PostgreSQL DB dump for the local server's version

Project

General

Profile