/ - Changes - BIEN 3 - NCEAS Projects

root @ 3596

#	Date	Author	Comment
3596	07/24/2012 09:33 AM	Aaron Marcuse-Kubitza	sql_gen.py: plpythonu_error_handler: Always raise PL/Python exceptions as data_exception so they go in the errors table, instead of aborting the iteration
3595	07/24/2012 09:16 AM	Aaron Marcuse-Kubitza	sql_gen.py: plpythonu_error_handler: Fixed bug where not all PL/Python exceptions start with "PL/Python: " (e.g. on PostgreSQL 9.1 on vegbiendev), so the PL/Python prefix must be optional. Refactored to put IF clause for non-PL/Python exception at end for a more logical ordering of the conditions.
3594	07/24/2012 08:41 AM	Aaron Marcuse-Kubitza	Added inputs/CVS/
3593	07/24/2012 08:40 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Added steps to place the relevant files under version control
3592	07/24/2012 08:31 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Accepting the test cases: Don't auto-accept the initial tests because there could be bugs in the initial mappings that would be revealed upon inspecting the test output
3591	07/24/2012 08:14 AM	Aaron Marcuse-Kubitza	sql_gen.py: plpythonu_error_handler: Added section comment before handler block, so that it's clear in the (very long) wrapper function definition what the block is doing
3590	07/24/2012 07:59 AM	Aaron Marcuse-Kubitza	input.Makefile: Documentation: import/steps.by_col.sql: Added -s to make to avoid echoing make commands to the log file
3589	07/24/2012 07:46 AM	Aaron Marcuse-Kubitza	README.TXT: Moved Reinstall all datasources at once to Schema changes and renamed it to Reinstall staging tables to reflect that it is only necessary when the staging table format is changed
3588	07/24/2012 07:43 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Updating vegbiendev: Added step to also install the staging tables on vegbiendev
3587	07/24/2012 07:42 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Moved Install the staging tables before Map each table's columns because the install can run in the background while you're mapping. It must, however, come after Auto-create the map spreadsheets because it uses the filenames of the created maps to determine which staging tables to create.
3586	07/24/2012 07:40 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Adding a new datasource: Changed <short_name> to <name> to match usage elsewhere. Documented that it may not contain spaces, and should be abbreviated.
3585	07/24/2012 07:33 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Added steps to update vegbiendev
3584	07/24/2012 07:31 AM	Aaron Marcuse-Kubitza	inputs/Makefile: Input data: Added upload target
3583	07/24/2012 07:21 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Added steps to accept the test cases and commit
3582	07/24/2012 07:18 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Added step to install the staging tables
3581	07/24/2012 07:18 AM	Aaron Marcuse-Kubitza	bin/map: in_is_xml: doc2rows(): "Root not found in input" warning: Changed "error" to "warning" to match the type of error condition signaled
3580	07/24/2012 07:15 AM	Aaron Marcuse-Kubitza	bin/map: map_rows(): out_is_db: Changed `id_node != None` assertion to a warning because this is a normal circumstance in the base case where there are no mappings
3579	07/24/2012 07:13 AM	Aaron Marcuse-Kubitza	input.Makefile: Testing: Added test/accept-all
3578	07/24/2012 07:11 AM	Aaron Marcuse-Kubitza	csv2db: COPY FROM: Fixed %-injection bug where column names' %s were not escaped prior to cursor.mogrify(), by changing the code to use inline db.esc_value() instead
3577	07/24/2012 06:37 AM	Aaron Marcuse-Kubitza	bin/map: in_is_xml: doc2rows(): "Root not found in input" error: Changed SystemExit to a warning because this is a normal circumstance in the base case where the input XML file contains no rows
3576	07/24/2012 06:12 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Documented how to map each table's columns
3575	07/24/2012 05:57 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Changed "Auto-create the src column spreadsheets" to "Auto-create map spreadsheets" and updated command to bootstrap all maps, including newly-autogeneratable via maps
3574	07/24/2012 05:50 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: maps/$(via).%.csv: Auto-create by copying the src map if doesn't exist. Existing maps discovery: Look up via format in src maps' roots if no via map already exists.
3573	07/24/2012 05:46 AM	Aaron Marcuse-Kubitza	src_map: Fixed bug where non-header rows needed to be materialized with empty fields for each column in the header
3572	07/24/2012 04:27 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: Via maps cleanup: Match maps/$(via).%.csv with pattern instead of $(viaMaps) var so that a non-existing via map will have the recipe run, too. When auto-creating via maps is later added, this will be required.
3571	07/24/2012 04:07 AM	Aaron Marcuse-Kubitza	inputs//maps/src..csv: Regenerated using new src_map output format
3570	07/24/2012 04:06 AM	Aaron Marcuse-Kubitza	parallelproc.py: MultiProducerPool: Removed warning if not using parallel processing because this also gets generated when it's explicitly turned off, which is currently the case and clutters up stderr when testing
3569	07/24/2012 03:57 AM	Aaron Marcuse-Kubitza	src_map: Also add columns for the output mappings and comments, so that the src map can be directly copied for use as the via map (DwC.specimens.csv, etc.). The output mapping column name must be provided by the caller, which input.Makefile maps/src.%.csv provides using the new mappings roots.
3568	07/24/2012 03:52 AM	Aaron Marcuse-Kubitza	Added mappings/roots for use in creating src maps
3567	07/24/2012 03:41 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: maps/src.%.csv: Clean up by passing through `$(bin)/cols '*'` whenever it's changed. This ensures that the CSV dialect is always consistently Python's Excel dialect. (Note that this dialect actually uses \r\n as the line ending. The \n line endings were from src maps generated by a previous version of bin/src_map.)
3566	07/24/2012 03:28 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: maps/$(via).%.full.csv: Removed alternate rule when $(srcMap) doesn't exist, because this effect is actually achieved by the no-prereqs rule for maps/src.%.csv, which causes make to think it exists when matching pattern rules even if its recipe doesn't actually create it
3565	07/24/2012 03:23 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: maps/$(via).%.full.csv: Added alternate rule when $(srcMap) doesn't exist
3564	07/24/2012 03:21 AM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/: Removed unneeded src.organisms.csv since there is an way to deal with it not existing in input.Makefile
3563	07/24/2012 03:18 AM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/: Removed unneeded .VegX.plots.csv.last_cleanup
3562	07/24/2012 02:13 AM	Aaron Marcuse-Kubitza	inputs//maps/src..csv: Standardized line endings to \n
3561	07/24/2012 01:56 AM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: maps/$(via).%.full.csv: Added the src map as a prerequisite so it would be rebuilt when the src map changes. This is possible now that every datasource has at least an empty src map. (An empty src map is now treated the same way as a non-existing one.)
3560	07/24/2012 01:52 AM	Aaron Marcuse-Kubitza	inputs//maps/src..csv: Removed extraneous quotes around fields, which are added by Excel but not by Python
3559	07/24/2012 01:49 AM	Aaron Marcuse-Kubitza	inputs//maps/src..csv: Removed extraneous quotes around fields, which are added by Excel but not by Python
3558	07/24/2012 01:41 AM	Aaron Marcuse-Kubitza	inputs/CTFS: Added empty maps/src.organisms.csv so that every table of every datasource has a src map
3557	07/24/2012 12:18 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Documented how to populate the src/ subdir with input data
3556	07/23/2012 10:52 PM	Aaron Marcuse-Kubitza	Added inputs/CVS/
3555	07/23/2012 10:28 PM	Aaron Marcuse-Kubitza	sql_gen.py: plpythonu_error_handler: Translate specific Python exception types to PostgreSQL error codes (ValueError -> data_exception) instead of assuming everything is a data_exception. When removing the PL/Python prefix, preserve the Python exception class in a DETAIL message. Support non-PL/Python internal_errors by re-raising them.
3554	07/23/2012 10:25 PM	Aaron Marcuse-Kubitza	sql_gen.py: Added reraise_exc
3553	07/23/2012 10:21 PM	Aaron Marcuse-Kubitza	schemas/py_functions.sql: _date(): Raise (or pass through) ValueErrors directly instead of wrapping them in FormatExceptions, to simplify the code. This will also enable later translation of ValueErrors to data_exceptions. When year is required and missing, output a parsable 'null value in column year violates not-null constraint' error.
3552	07/23/2012 09:48 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): log_exc(): Handle infinite loops from repeated exceptions by removing all rows, instead of just aborting with a failed assertion
3551	07/23/2012 09:36 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Fixed bug where special case for unrecoverable errors needed to avoid creating an empty output pkeys table because function mode defines the returned pkeys table separately
3550	07/23/2012 09:08 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Factored defining the error handling wrapper function out of the main loop because it only needs to run once. Don't log "Trying to insert new rows" in function mode because it's inaccurate.
3549	07/23/2012 07:14 PM	Aaron Marcuse-Kubitza	sql_gen.py: Exceptions: Added suppress_exc and use it in ExcHandler.to_str()
3548	07/23/2012 06:53 PM	Aaron Marcuse-Kubitza	README.TXT: Backups: After a new import: Added step to delete previous imports so they won't bloat the full DB backup. (Note that these imports have already been backed up, and only the most recent import needs to be live in the DB.)
3547	07/23/2012 06:48 PM	Aaron Marcuse-Kubitza	README.TXT: Backups: Documented what to do after a new import
3546	07/23/2012 06:39 PM	Aaron Marcuse-Kubitza	backups/Makefile: Full DB: Added vegbien.backup/all to run both test and rotate
3545	07/23/2012 06:24 PM	Aaron Marcuse-Kubitza	README.TXT: Renamed Maintenance section to Backups for clarity
3544	07/23/2012 06:19 PM	Aaron Marcuse-Kubitza	backups/Makefile: .sql: When testing, turn it off so make won't skip `.sql: %` in favor of it
3543	07/23/2012 06:07 PM	Aaron Marcuse-Kubitza	backups/Makefile: Split %.backup and %.sql into separate targets for clarity
3542	07/23/2012 05:56 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import. Note that this import adds data provider feedback for SQL functions as well as additional date processing using _date().
3541	07/20/2012 07:10 AM	Aaron Marcuse-Kubitza	schemas/py_functions.sql: _date(): Re-enabled now that exceptions thrown are properly handled. FormatException: Support raising parsable data_exceptions when provided with the value that was invalid. Date parsing mode: Return date as the value in FormatException so it can be filtered out automatically by column-based import.
3540	07/20/2012 07:06 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Creating error handling wrapper function: Fixed bug where needed to cast NULL returned in error handler to appropriate type, because it's contained within a SELECT query which does not do implicit casts from type unknown
3539	07/20/2012 07:03 AM	Aaron Marcuse-Kubitza	sql_gen.py: Cast: Support types which are Code objects
3538	07/20/2012 06:05 AM	Aaron Marcuse-Kubitza	sql_io.py: func_wrapper_exception_handler(): Use new sql_gen.merge_not_null() to try to ensure that NULL values are not folded (which would cause the concatenated values not to match up with the concatenated column names). Note that this adds a dependency on the db object, which callers must now provide.
3537	07/20/2012 06:03 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added merge_not_null()
3536	07/20/2012 06:03 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added try_mk_not_null()
3535	07/20/2012 05:54 AM	Aaron Marcuse-Kubitza	sql_gen.py: Renamed ArrayJoin to ArrayMerge to avoid confusion with Join (a SQL construct)
3534	07/20/2012 05:46 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Creating error handling wrapper function: Set srcs on row_var so that the column type and nullability info of row_var's columns can be retrieved for use with sql_gen.ensure_not_null()
3533	07/20/2012 05:38 AM	Aaron Marcuse-Kubitza	sql_gen.py: RowExcIgnore.to_str(): Compare self.row_var to global const row_var using == to allow caller to provide a copy of row_var with the underlying table set appropriately
3532	07/20/2012 05:35 AM	Aaron Marcuse-Kubitza	sql_gen.py: underlying_table(): Support derived tables and row vars by obtaining the underlying table from the srcs
3531	07/20/2012 05:25 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Setting pkeys of missing rows: Fixed bug where also needed to do this when is_function if an empty pkeys table was created (due to an error that could not be localized to a row)
3530	07/20/2012 05:16 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): After main loop: If is_literals, return immediately to avoid needing to test for is_literals in all the code that follows (which only applies to the normal case)
3529	07/20/2012 04:43 AM	Aaron Marcuse-Kubitza	sql_gen.py: RowExcIgnore: If a custom row_var is used, require it to already be defined. This also allows sql_io.ExcToErrorsTable to place the column var definition in the outer DECLARE, eliminating the extra DECLARE block.
3528	07/20/2012 04:30 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Creating error handling wrapper function: Use new sql_gen.row_var
3527	07/20/2012 04:28 AM	Aaron Marcuse-Kubitza	sql_gen.py: RowExcIgnore: Created global constant for default row_var for callers to use
3526	07/20/2012 04:24 AM	Aaron Marcuse-Kubitza	sql_gen.py: RowExcIgnore.to_str(): Moved SQL comment explaining the use of an EXCEPTION block for each individual row to Python code to avoid cluttering the logged SQL code
3525	07/20/2012 04:19 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Creating error handling wrapper function: Handle errors using new func_wrapper_exception_handler(), which saves any data_exceptions in the errors table in addition to handling PL/Python errors
3524	07/20/2012 04:13 AM	Aaron Marcuse-Kubitza	sql_io.py: Added func_wrapper_exception_handler()
3523	07/20/2012 04:10 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added ArrayJoin
3522	07/20/2012 04:10 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added Array and to_Array()
3521	07/20/2012 02:47 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added List and inherit from it in Tuple
3520	07/20/2012 02:45 AM	Aaron Marcuse-Kubitza	sql_gen.py: Renamed Tuple to Row and List to Tuple to more accurately reflect the datatype generated by each class (a Tuple being merely a grouping of values)
3519	07/20/2012 02:43 AM	Aaron Marcuse-Kubitza	sql_gen.py: Moved Composite types to Literal values section as a subsection, since Composite types was really about just the input syntaxes for these types
3518	07/20/2012 02:32 AM	Aaron Marcuse-Kubitza	sql_gen.py: Replaced srcs_str() with cross_join_srcs() which more correctly combines the srcs of each column using a Cartesian product. Eventually, the entire tree of srcs will need to be preserved instead of flattened in order to properly attribute errors to a specific column or set of columns.
3517	07/20/2012 02:03 AM	Aaron Marcuse-Kubitza	sql_gen.py: srcs_str(): Fixed bug where needed to filter out columns with no srcs so that there aren't empty elements in the ","-separated list
3516	07/20/2012 02:00 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added has_srcs()
3515	07/20/2012 01:44 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added NestedExcHandler
3514	07/20/2012 01:44 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added srcs_str()
3513	07/20/2012 01:43 AM	Aaron Marcuse-Kubitza	sql_gen.py: as_Col(): Support non-Code, non-string inputs by making them Literals
3512	07/20/2012 01:42 AM	Aaron Marcuse-Kubitza	sql_gen.py: Added is_col() and use it in is_table_col()
3511	07/19/2012 11:54 PM	Aaron Marcuse-Kubitza	sql_io.py: ExcToErrorsTable: Require users to explicitly specify an expression for the value that caused the error, instead of assuming that a variable named "value" already exists. This allows a value expression to be computed only if needed for error handling.
3510	07/19/2012 11:22 PM	Aaron Marcuse-Kubitza	sql_gen.py: Moved repr() from ExcHandler to BaseExcHandler
3509	07/19/2012 11:21 PM	Aaron Marcuse-Kubitza	sql_gen.py: Added BaseExcHandler and inherit from it in ExcHandlers
3508	07/19/2012 10:58 PM	Aaron Marcuse-Kubitza	sql_io.py: cast(): Determining if will be saving errors: Don't add extra check if isinstance(col, sql_gen.Col) because the special case for sql_gen.Literal handles supported non-columns
3507	07/19/2012 10:56 PM	Aaron Marcuse-Kubitza	sql_io.py: data_exception_handler(): Removed no longer needed db param
3506	07/19/2012 10:47 PM	Aaron Marcuse-Kubitza	sql_io.py: Added ExcToErrorsTable, which separates out the errors table inserting code from the exception handling code. data_exception_handler(): Refactored to use new sql_gen.data_exception_handler() and ExcToErrorsTable.
3505	07/19/2012 10:43 PM	Aaron Marcuse-Kubitza	sql_gen.py: Added data_exception_handler
3504	07/19/2012 10:08 PM	Aaron Marcuse-Kubitza	sql_io.py: data_exception_handler(): Refactored to use new sql_gen.ExcToWarning when not using an errors table
3503	07/19/2012 10:03 PM	Aaron Marcuse-Kubitza	sql_gen.py: Added ExcToWarning
3502	07/19/2012 10:02 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxondetermination: taxondetermination_taxonoccurrence_id_fkey(): Fixed bug where string containing a \-escape needed an "E" prefix
3501	07/19/2012 09:42 PM	Aaron Marcuse-Kubitza	sql_io.py: data_exception_handler(): Require the caller to provide a statement to return a default value in case of error, rather than assuming the caller can accept a return value of NULL
3500	07/19/2012 09:27 PM	Aaron Marcuse-Kubitza	sql_io.py: data_exception_handler(): Refactored to use new sql.define_func()
3499	07/19/2012 09:20 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Calling function on input rows: Convert PL/Python exceptions (internal_errors) to data_exceptions using sql_gen.plpythonu_error_handler and an error handling wrapper function
3498	07/19/2012 09:10 PM	Aaron Marcuse-Kubitza	debug2redmine.csv: EXPLAIN comments: Fixed bug where needed to also match whitespace at beginning of line (indent)
3497	07/19/2012 09:07 PM	Aaron Marcuse-Kubitza	Use sql_gen.ReturnQuery where RETURN QUERY was previously manually prepended

Project

General

Profile