/ - Changes - BIEN 3 - NCEAS Projects

root @ 3688

#	Date	Author	Comment
3688	07/30/2012 06:04 PM	Aaron Marcuse-Kubitza	xml_func.py: process(): In row-based mode, when trying to evaluate function using DB, preserve unknown funcs because these might be built-in functions of db_xml.put(). The sql.DoesNotExistException should be raised again when db_xml.put() is run and it verifies whether the function is built-in or not (e.g. _simplifyPath is now built-in, for column-based support). See db_xml.put_special_funcs for built-in functions.
3687	07/30/2012 05:59 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Fixed bug where strings starting with "$" were interpreted as input columns in row-based mode (this should only apply to column-based mode). Explicitly store whether in row-based mode in is_literals var (similar to is_literals in sql_io.put_table()).
3686	07/30/2012 05:54 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): unrecoverable errors: Returning default value: is_literals: Remove column rename from default value so it doesn't get treated as a column by db_xml.put() (which is handled differently from a literal value)
3685	07/30/2012 03:53 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): put_(): Removed no longer needed in_row_ct_ref param, which is only used by put_table(). Rewrapped function body.
3684	07/30/2012 03:46 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore(): literals: Only replace invalid literal with NULL or remove row if that column actually contains the invalid value in question. This handles the case where all columns are being ignore()d because the specific column couldn't be identified, and this was not the invalid column.
3683	07/30/2012 03:02 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: plot: Mapped note
3682	07/30/2012 02:32 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: plot: Added landform mapping
3681	07/30/2012 02:24 PM	Aaron Marcuse-Kubitza	schemas/vegbank.ERD.pdf: Auto-repaired with Adobe Reader so that the repair message doesn't pop up whenever it's opened
3680	07/30/2012 02:22 PM	Aaron Marcuse-Kubitza	schemas: Added vegbank.ERD.pdf so the VegBank ERD is easily accessible when mapping
3679	07/30/2012 01:51 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: project: Mapped sourceaccessioncode. This entailed adding a distinguishing suffix to the projectname input mapping.
3678	07/30/2012 01:31 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv, VegX-VegBIEN.stems.csv: Removed all manual mappings to datasource_id now that datasource_id is auto-populated, both on the VegBIEN output side and the DwC/VegX input side. This should greatly simplify many of the mappings!
3677	07/30/2012 12:11 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Don't suppress exceptions thrown by sql_io.put_table() by passing them to on_error(), because some exceptions indicate unrecoverable database connection problems such as a broken connection, which should abort the import
3676	07/30/2012 11:52 AM	Aaron Marcuse-Kubitza	db_xml.py: put(): Support datasets with no rows, where root.firstChild == None. Documented that to use an entire XML document, you need to pass root.firstChild rather than root.
3675	07/30/2012 11:31 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes CVS.
3674	07/30/2012 11:23 AM	Aaron Marcuse-Kubitza	README.TXT: Documented that the PostgreSQL server should be restarted after installing system updates that may affect it, to avoid spurious errors that crash the import but go away upon reimport
3673	07/27/2012 11:12 PM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
3672	07/27/2012 11:10 PM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Fixed lines
3671	07/27/2012 11:08 PM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Synced with schema
3670	07/27/2012 10:51 PM	Aaron Marcuse-Kubitza	bin/map: Call sys.stdout.flush() after every call to sys.stdout.write() to avoid interleaved stdout/stderr output due to stdout buffering
3669	07/27/2012 10:48 PM	Aaron Marcuse-Kubitza	bin/map: Call sys.stdout.flush() after every call to sys.stdout.write() to avoid interleaved stdout/stderr output due to stdout buffering
3668	07/27/2012 10:13 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: *_unique_datasource UNIQUE INDEXes: Removed COALESCE from datasource_id and datasource_id IS NOT NULL filter, because datasource_id is now always NOT NULL
3667	07/27/2012 10:07 PM	Aaron Marcuse-Kubitza	schemas/filter_ERD.csv: Removed AUTO_INCREMENT because that is not added to any other tables
3666	07/27/2012 10:05 PM	Aaron Marcuse-Kubitza	Regenerated schemas/vegbien.my.sql
3665	07/27/2012 10:04 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimenreplicate: Inherit datasource_id from taxonoccurrence instead of defining it independently
3664	07/27/2012 09:56 PM	Aaron Marcuse-Kubitza	xml_func.py: Removed no longer needed local XML functions that have been translated to SQL functions
3663	07/27/2012 09:52 PM	Aaron Marcuse-Kubitza	input.Makefile: Testing: Removed VegBIEN.%.xml test because the import.%.xml test output includes the template tree that it's inserting, so there is no need to generate the XML tree in a separate test. This will also remove the need to maintain local XML functions that have already been translated to DB functions for the sole purpose of this automated test.
3662	07/27/2012 09:40 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Made datasource_id required on every table that has it, to trigger the automatic population of it by sql_io.put_table()'s col_defaults
3661	07/27/2012 09:38 PM	Aaron Marcuse-Kubitza	Moved importing of col_defaults from db_xml.put_table() to bin/map, so that it also happens in row-based mode. Note that this causes a DB entry for the datasource to always be created, even if the datasource has no mappings or no rows.
3660	07/27/2012 09:13 PM	Aaron Marcuse-Kubitza	Use new exc.reraise() where exc.raise_() was used, so that the stack trace is preserved when the exception is rethrown
3659	07/27/2012 09:11 PM	Aaron Marcuse-Kubitza	exc.py: reraise(): Take optional exception argument so it can be invoked in the same way as raise_(). Interestingly, this missing parameter does not produce the usual "...() takes no arguments (1 given)" error when the function is called inside an except block.
3658	07/27/2012 09:04 PM	Aaron Marcuse-Kubitza	exc.py: Added reraise()
3657	07/27/2012 09:02 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Inserting node: Wrap sql_io.put_table() call in catch-all exception handler that calls on_error_() (wrapper for error handler provided by caller) and returns None. This both adds additional debugging info to the exception (in on_error_()) and allows recovery from arbitrary exceptions that happen in sql_io.put_table(), so that an exception does not abort the import.
3656	07/27/2012 08:50 PM	Aaron Marcuse-Kubitza	exc.py: get_e_tracebacks_str(): Use the current system traceback if the exception doesn't contain its own traceback(s)
3655	07/27/2012 08:35 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimenreplicate: Added locationevent fkey, since fkeys are not inherited from parent tables
3654	07/27/2012 08:30 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added datasource_id fkey constraints to all tables that needed it
3653	07/27/2012 08:21 PM	Aaron Marcuse-Kubitza	bin/map: out_is_db: Use col_defaults in row-based mode as well
3652	07/27/2012 08:02 PM	Aaron Marcuse-Kubitza	db_xml.py: Renamed put_table_special_funcs to put_special_funcs because it is now used by put() as well
3651	07/27/2012 08:00 PM	Aaron Marcuse-Kubitza	db_xml.py: Moved put() before the functions that use it
3650	07/27/2012 07:58 PM	Aaron Marcuse-Kubitza	db_xml.py: Renamed _put_table_part() to put(), replacing the existing put() whose functionality it now performs
3649	07/27/2012 07:52 PM	Aaron Marcuse-Kubitza	db_xml.py: _put_table_part(): Reordered params to match put(), so that it can eventually be substituted for it
3648	07/27/2012 07:44 PM	Aaron Marcuse-Kubitza	db_xml.py: _put_table_part(): Allow being invoked directly by adding defaults for parameters
3647	07/27/2012 07:41 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Use _put_table_part(). This will ensure that all the put-related functionality is in one place, rather than duplicated.
3646	07/27/2012 07:30 PM	Aaron Marcuse-Kubitza	db_xml.py: _put_table_part(): Append the node to errors handled with on_error()
3645	07/27/2012 07:29 PM	Aaron Marcuse-Kubitza	sql_io.py: Added own SyntaxError class to replace built-in SyntaxError because it stringifies to only the first line
3644	07/27/2012 06:46 PM	Aaron Marcuse-Kubitza	input.Makefile: Testing: Removed $(via).%.xml tests because they require the via format (DwC/VegX) to be XML, but we want to flatten VegX into a DwC-like set of CSV column names
3643	07/27/2012 06:45 PM	Aaron Marcuse-Kubitza	Removed inputs/NY/test/VegX.specimens.xml.ref because NY is not mapped via VegX
3642	07/27/2012 06:31 PM	Aaron Marcuse-Kubitza	input.Makefile: Testing: Renamed import.*.out tests to end in .xml because they now contain XML import trees for validation, and this extension turns on XML syntax highlighting in a text editor
3641	07/27/2012 06:03 PM	Aaron Marcuse-Kubitza	bin/map: out_is_db: Output the put template to stdout so it will be validated in the automated testing
3640	07/27/2012 05:41 PM	Aaron Marcuse-Kubitza	xml_func.py: process(): If local XML function can't be found, just replace with last param instead of returning an error. This allows DB-only functions to be ignored in XML output mode.
3639	07/27/2012 05:32 PM	Aaron Marcuse-Kubitza	sql_gen.py: ColDict.__setitem__(): Fixed bug where None value should not be replaced with column default value if column has no underlying table
3638	07/27/2012 05:27 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.col_info(): If column does not exist, raise sql_gen.NoUnderlyingTableException
3637	07/27/2012 04:58 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): In log messages, use `.to_str(db)` instead of repr() where possible to use the SQL syntax of the DB driver
3636	07/27/2012 04:51 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore(): Replacing invalid value with NULL in nullable column: Corrected log message to "Replacing invalid value ... with NULL in column ..." because the rows with that value are not ignored in that case
3635	07/27/2012 04:47 PM	Aaron Marcuse-Kubitza	sql.py: run_query(): InvalidValueException: Parse any exception ending in "out of range", not just "field value out of range", in order to support errors that the timezone is out of range
3634	07/27/2012 04:35 PM	Aaron Marcuse-Kubitza	schemas/py_functions.sql: _dateRange*(): Made functions STRICT because they return NULL on NULL input
3633	07/26/2012 09:53 PM	Aaron Marcuse-Kubitza	sql_io.py: put(): Use a simple case of put_table(), which now supports everything put() needs. This will enable all row-based and column-based processing to be maintained in the same function, put_table(), and avoids the need to reimplement any column-based functionality (like SQL functions) in put().
3632	07/26/2012 09:51 PM	Aaron Marcuse-Kubitza	xml_dom.py: NodeTextEntryIter: Allow empty values through as None, and instead filter them out in TextEntryOnlyIter using new helper function non_empty(). This allows XML functions to decide for themselves whether empty values should be filtered out, because process() will now no longer automatically remove them. This will enable process() to work with SQL functions, which must not have empty values filtered out because this will remove required, but nullable, arguments.
3631	07/26/2012 09:45 PM	Aaron Marcuse-Kubitza	xml_func.py: Use conv_items() in every XML function that needs empty (NULL) entries removed, so that they are not dependent on what process() does to the items
3630	07/26/2012 09:43 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore(): Support invalid literals in addition to invalid column values. This also allows put_table() to fully support being called by put().
3629	07/26/2012 08:55 PM	Aaron Marcuse-Kubitza	xml_func.py: process(): In row-based mode, if function is not explicitly a relational function but does not exist as a local XML function, treat it as a relational function. This will help in merging sql_io.put() and put_table(), since put() did not support SQL functions but put_table() does, and this ensures that a SQL function is always used if the local XML function has been removed in favor of it.
3628	07/26/2012 08:37 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Removed into param to set a custom into table name because put_table() now has all the info it needs to generate this name automatically, and callers are no longer providing it
3627	07/26/2012 07:56 PM	Aaron Marcuse-Kubitza	bin/map: by_col: db_xml.put_table() call: Use new col_defaults param to automatically set datasource_id to the in_label (datasource name)
3626	07/26/2012 07:46 PM	Aaron Marcuse-Kubitza	xpath.py: path2xml(): Skip to tree created inside root, since that is how callers want to use the returned node
3625	07/26/2012 07:45 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Import col_defaults to translate nodes to pkeys
3624	07/26/2012 07:44 PM	Aaron Marcuse-Kubitza	db_xml.py: _put_table_part(): Support no in_table, for iterations with only literal values
3623	07/26/2012 07:27 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_literals: When ignoring all rows, return default value instead of always None
3622	07/26/2012 06:35 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Removed parent_ids_loc and next params since these are only used in the recursion
3621	07/26/2012 06:17 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Split into an outer function that sets up the database environment and subsets in_table, and a (recursive) inner function that imports the data
3620	07/26/2012 05:55 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Subsetting and partitioning in_table: Documented that it's OK to do this even if table already the right size because it takes <1 sec
3619	07/26/2012 05:43 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Use is_function where caller-provided is_func was used, since is_function determines whether something is a function based on whether it actually exists as a SQL function instead of just whether its name starts with "_". Removed now-unneeded is_func param.
3618	07/26/2012 05:36 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Added col_defaults param and use it if there's a missing mapping for a NOT NULL column. This requires callers passing arguments by position to add an empty value for this parameter.
3617	07/26/2012 04:48 PM	Aaron Marcuse-Kubitza	bin/map: by_col: Only clear errors table if doing full re-import starting from row 0, not if restarting import at a later row
3616	07/26/2012 04:47 PM	Aaron Marcuse-Kubitza	input.Makefile: Import to VegBIEN: Fixed bug where `&>>` was used to append stdout and stderr to the log file, but is not supported on Mac OS X. Replaced with `&>` (overwrite instead of append) because log file is unique by date/time the import runs, so there won't be an existing log file that would be overwritten.
3615	07/26/2012 04:34 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added datasource_id to all tables with a sourceaccessioncode (and corresponding *_unique_datasource constraint on these columns) so they can be directly looked up using just the input table's own fkey to parent. This will enable loading hierarchical (plots) data without "breadcrumbs", a huge benefit! Also added sourceaccessioncode wherever there was a datasource_id, to standardize on these names as being the columns that link directly to the input table rows.
3614	07/26/2012 01:15 PM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Installing the staging tables: View the logs: Fixed bug in tail syntax to also work on Linux
3613	07/25/2012 11:04 PM	Aaron Marcuse-Kubitza	Added inputs/Madidi/ with empty mappings
3612	07/25/2012 11:01 PM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Populating the src/ subdir with input data: Added step to make sure each header in multiple part files for a table is EXACTLY the same
3611	07/25/2012 10:56 PM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Installing the staging tables: Added steps to deal with colliding column names in the flat file headers. Added command to view the logs.
3610	07/25/2012 10:53 PM	Aaron Marcuse-Kubitza	csv2db: log(): sys.stderr.write(): Run strings.to_raw_str() on message to handle Unicode chars
3609	07/25/2012 10:52 PM	Aaron Marcuse-Kubitza	csv2db: Run strings.to_unicode() on column names to handle Unicode chars
3608	07/25/2012 10:36 PM	Aaron Marcuse-Kubitza	csv2db: esc_name(): Use db.esc_name()
3607	07/25/2012 09:25 PM	Aaron Marcuse-Kubitza	Added inputs/BIEN2.datasources.xlsx (formerly bien_data_sources.xlsx in nimoy:/home/bien/raw_data/)
3606	07/25/2012 09:06 PM	Aaron Marcuse-Kubitza	exc.py: e_msg(): Added assertions to check that e.args is compatible with this function
3605	07/25/2012 08:59 PM	Aaron Marcuse-Kubitza	exc.py: Use new e_str() where its definition was used
3604	07/25/2012 08:54 PM	Aaron Marcuse-Kubitza	exc.py: Use new Unicode-safe e_msg() instead of strings.ustr() on exceptions
3603	07/25/2012 08:47 PM	Aaron Marcuse-Kubitza	exc.py: e_msg(): Run strings.ustr() on the returned string so it will be appendable to other Unicode strings
3602	07/25/2012 08:43 PM	Aaron Marcuse-Kubitza	exc.py: Added e_msg(), e_str() (from SQL py_functions._date())
3601	07/25/2012 02:06 PM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Adding fkey to parent: Fixed bug where should only add parent_ids_loc table to list of tables not to truncate if it's a column, because it is sometimes just a pkey value when that iteration contained only literals
3600	07/25/2012 01:56 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import
3599	07/25/2012 01:42 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Corrected date of last import
3598	07/24/2012 09:52 AM	Aaron Marcuse-Kubitza	sql_gen.py: plpythonu_error_handler: Fixed bug where PL/Python exceptions could not be filtered by strings after the first line, because only the "message" portion of the exception is available in SQLERRM
3597	07/24/2012 09:35 AM	Aaron Marcuse-Kubitza	schemas/py_functions.sql: _date(): YMD parsing: Fixed bug where exception for ValueError needed to be stored in local var so its message could be parsed
3596	07/24/2012 09:33 AM	Aaron Marcuse-Kubitza	sql_gen.py: plpythonu_error_handler: Always raise PL/Python exceptions as data_exception so they go in the errors table, instead of aborting the iteration
3595	07/24/2012 09:16 AM	Aaron Marcuse-Kubitza	sql_gen.py: plpythonu_error_handler: Fixed bug where not all PL/Python exceptions start with "PL/Python: " (e.g. on PostgreSQL 9.1 on vegbiendev), so the PL/Python prefix must be optional. Refactored to put IF clause for non-PL/Python exception at end for a more logical ordering of the conditions.
3594	07/24/2012 08:41 AM	Aaron Marcuse-Kubitza	Added inputs/CVS/
3593	07/24/2012 08:40 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Added steps to place the relevant files under version control
3592	07/24/2012 08:31 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Accepting the test cases: Don't auto-accept the initial tests because there could be bugs in the initial mappings that would be revealed upon inspecting the test output
3591	07/24/2012 08:14 AM	Aaron Marcuse-Kubitza	sql_gen.py: plpythonu_error_handler: Added section comment before handler block, so that it's clear in the (very long) wrapper function definition what the block is doing
3590	07/24/2012 07:59 AM	Aaron Marcuse-Kubitza	input.Makefile: Documentation: import/steps.by_col.sql: Added -s to make to avoid echoing make commands to the log file
3589	07/24/2012 07:46 AM	Aaron Marcuse-Kubitza	README.TXT: Moved Reinstall all datasources at once to Schema changes and renamed it to Reinstall staging tables to reflect that it is only necessary when the staging table format is changed

Project

General

Profile