Project

General

Profile

Statistics
| Revision:

# Date Author Comment
3631 07/26/2012 09:45 PM Aaron Marcuse-Kubitza

xml_func.py: Use conv_items() in every XML function that needs empty (NULL) entries removed, so that they are not dependent on what process() does to the items

3630 07/26/2012 09:43 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore(): Support invalid literals in addition to invalid column values. This also allows put_table() to fully support being called by put().

3629 07/26/2012 08:55 PM Aaron Marcuse-Kubitza

xml_func.py: process(): In row-based mode, if function is not explicitly a relational function but does not exist as a local XML function, treat it as a relational function. This will help in merging sql_io.put() and put_table(), since put() did not support SQL functions but put_table() does, and this ensures that a SQL function is always used if the local XML function has been removed in favor of it.

3628 07/26/2012 08:37 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Removed into param to set a custom into table name because put_table() now has all the info it needs to generate this name automatically, and callers are no longer providing it

3627 07/26/2012 07:56 PM Aaron Marcuse-Kubitza

bin/map: by_col: db_xml.put_table() call: Use new col_defaults param to automatically set datasource_id to the in_label (datasource name)

3626 07/26/2012 07:46 PM Aaron Marcuse-Kubitza

xpath.py: path2xml(): Skip to tree created inside root, since that is how callers want to use the returned node

3625 07/26/2012 07:45 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Import col_defaults to translate nodes to pkeys

3624 07/26/2012 07:44 PM Aaron Marcuse-Kubitza

db_xml.py: _put_table_part(): Support no in_table, for iterations with only literal values

3623 07/26/2012 07:27 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_literals: When ignoring all rows, return default value instead of always None

3622 07/26/2012 06:35 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Removed parent_ids_loc and next params since these are only used in the recursion

3621 07/26/2012 06:17 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Split into an outer function that sets up the database environment and subsets in_table, and a (recursive) inner function that imports the data

3620 07/26/2012 05:55 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Subsetting and partitioning in_table: Documented that it's OK to do this even if table already the right size because it takes <1 sec

3619 07/26/2012 05:43 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Use is_function where caller-provided is_func was used, since is_function determines whether something is a function based on whether it actually exists as a SQL function instead of just whether its name starts with "_". Removed now-unneeded is_func param.

3618 07/26/2012 05:36 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Added col_defaults param and use it if there's a missing mapping for a NOT NULL column. This requires callers passing arguments by position to add an empty value for this parameter.

3617 07/26/2012 04:48 PM Aaron Marcuse-Kubitza

bin/map: by_col: Only clear errors table if doing full re-import starting from row 0, not if restarting import at a later row

3616 07/26/2012 04:47 PM Aaron Marcuse-Kubitza

input.Makefile: Import to VegBIEN: Fixed bug where `&>>` was used to append stdout and stderr to the log file, but is not supported on Mac OS X. Replaced with `&>` (overwrite instead of append) because log file is unique by date/time the import runs, so there won't be an existing log file that would be overwritten.

3615 07/26/2012 04:34 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added datasource_id to all tables with a sourceaccessioncode (and corresponding *_unique_datasource constraint on these columns) so they can be directly looked up using just the input table's own fkey to parent. This will enable loading hierarchical (plots) data without "breadcrumbs", a huge benefit! Also added sourceaccessioncode wherever there was a datasource_id, to standardize on these names as being the columns that link directly to the input table rows.

3614 07/26/2012 01:15 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Installing the staging tables: View the logs: Fixed bug in tail syntax to also work on Linux

3613 07/25/2012 11:04 PM Aaron Marcuse-Kubitza

Added inputs/Madidi/ with empty mappings

3612 07/25/2012 11:01 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Populating the src/ subdir with input data: Added step to make sure each header in multiple part files for a table is EXACTLY the same

3611 07/25/2012 10:56 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Installing the staging tables: Added steps to deal with colliding column names in the flat file headers. Added command to view the logs.

3610 07/25/2012 10:53 PM Aaron Marcuse-Kubitza

csv2db: log(): sys.stderr.write(): Run strings.to_raw_str() on message to handle Unicode chars

3609 07/25/2012 10:52 PM Aaron Marcuse-Kubitza

csv2db: Run strings.to_unicode() on column names to handle Unicode chars

3608 07/25/2012 10:36 PM Aaron Marcuse-Kubitza

csv2db: esc_name(): Use db.esc_name()

3607 07/25/2012 09:25 PM Aaron Marcuse-Kubitza

Added inputs/BIEN2.datasources.xlsx (formerly bien_data_sources.xlsx in nimoy:/home/bien/raw_data/)

3606 07/25/2012 09:06 PM Aaron Marcuse-Kubitza

exc.py: e_msg(): Added assertions to check that e.args is compatible with this function

3605 07/25/2012 08:59 PM Aaron Marcuse-Kubitza

exc.py: Use new e_str() where its definition was used

3604 07/25/2012 08:54 PM Aaron Marcuse-Kubitza

exc.py: Use new Unicode-safe e_msg() instead of strings.ustr() on exceptions

3603 07/25/2012 08:47 PM Aaron Marcuse-Kubitza

exc.py: e_msg(): Run strings.ustr() on the returned string so it will be appendable to other Unicode strings

3602 07/25/2012 08:43 PM Aaron Marcuse-Kubitza

exc.py: Added e_msg(), e_str() (from SQL py_functions._date())

3601 07/25/2012 02:06 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Adding fkey to parent: Fixed bug where should only add parent_ids_loc table to list of tables not to truncate if it's a column, because it is sometimes just a pkey value when that iteration contained only literals

3600 07/25/2012 01:56 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

3599 07/25/2012 01:42 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Corrected date of last import

3598 07/24/2012 09:52 AM Aaron Marcuse-Kubitza

sql_gen.py: plpythonu_error_handler: Fixed bug where PL/Python exceptions could not be filtered by strings after the first line, because only the "message" portion of the exception is available in SQLERRM

3597 07/24/2012 09:35 AM Aaron Marcuse-Kubitza

schemas/py_functions.sql: _date(): YMD parsing: Fixed bug where exception for ValueError needed to be stored in local var so its message could be parsed

3596 07/24/2012 09:33 AM Aaron Marcuse-Kubitza

sql_gen.py: plpythonu_error_handler: Always raise PL/Python exceptions as data_exception so they go in the errors table, instead of aborting the iteration

3595 07/24/2012 09:16 AM Aaron Marcuse-Kubitza

sql_gen.py: plpythonu_error_handler: Fixed bug where not all PL/Python exceptions start with "PL/Python: " (e.g. on PostgreSQL 9.1 on vegbiendev), so the PL/Python prefix must be optional. Refactored to put IF clause for non-PL/Python exception at end for a more logical ordering of the conditions.

3594 07/24/2012 08:41 AM Aaron Marcuse-Kubitza

Added inputs/CVS/

3593 07/24/2012 08:40 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Added steps to place the relevant files under version control

3592 07/24/2012 08:31 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Accepting the test cases: Don't auto-accept the initial tests because there could be bugs in the initial mappings that would be revealed upon inspecting the test output

3591 07/24/2012 08:14 AM Aaron Marcuse-Kubitza

sql_gen.py: plpythonu_error_handler: Added section comment before handler block, so that it's clear in the (very long) wrapper function definition what the block is doing

3590 07/24/2012 07:59 AM Aaron Marcuse-Kubitza

input.Makefile: Documentation: import/steps.by_col.sql: Added -s to make to avoid echoing make commands to the log file

3589 07/24/2012 07:46 AM Aaron Marcuse-Kubitza

README.TXT: Moved Reinstall all datasources at once to Schema changes and renamed it to Reinstall staging tables to reflect that it is only necessary when the staging table format is changed

3588 07/24/2012 07:43 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Updating vegbiendev: Added step to also install the staging tables on vegbiendev

3587 07/24/2012 07:42 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Moved Install the staging tables before Map each table's columns because the install can run in the background while you're mapping. It must, however, come after Auto-create the map spreadsheets because it uses the filenames of the created maps to determine which staging tables to create.

3586 07/24/2012 07:40 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Adding a new datasource: Changed <short_name> to <name> to match usage elsewhere. Documented that it may not contain spaces, and should be abbreviated.

3585 07/24/2012 07:33 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Added steps to update vegbiendev

3584 07/24/2012 07:31 AM Aaron Marcuse-Kubitza

inputs/Makefile: Input data: Added upload target

3583 07/24/2012 07:21 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Added steps to accept the test cases and commit

3582 07/24/2012 07:18 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Added step to install the staging tables

3581 07/24/2012 07:18 AM Aaron Marcuse-Kubitza

bin/map: in_is_xml: doc2rows(): "Root not found in input" warning: Changed "error" to "warning" to match the type of error condition signaled

3580 07/24/2012 07:15 AM Aaron Marcuse-Kubitza

bin/map: map_rows(): out_is_db: Changed `id_node != None` assertion to a warning because this is a normal circumstance in the base case where there are no mappings

3579 07/24/2012 07:13 AM Aaron Marcuse-Kubitza

input.Makefile: Testing: Added test/accept-all

3578 07/24/2012 07:11 AM Aaron Marcuse-Kubitza

csv2db: COPY FROM: Fixed %-injection bug where column names' %s were not escaped prior to cursor.mogrify(), by changing the code to use inline db.esc_value() instead

3577 07/24/2012 06:37 AM Aaron Marcuse-Kubitza

bin/map: in_is_xml: doc2rows(): "Root not found in input" error: Changed SystemExit to a warning because this is a normal circumstance in the base case where the input XML file contains no rows

3576 07/24/2012 06:12 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Documented how to map each table's columns

3575 07/24/2012 05:57 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Changed "Auto-create the src column spreadsheets" to "Auto-create map spreadsheets" and updated command to bootstrap all maps, including newly-autogeneratable via maps

3574 07/24/2012 05:50 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: maps/$(via).%.csv: Auto-create by copying the src map if doesn't exist. Existing maps discovery: Look up via format in src maps' roots if no via map already exists.

3573 07/24/2012 05:46 AM Aaron Marcuse-Kubitza

src_map: Fixed bug where non-header rows needed to be materialized with empty fields for each column in the header

3572 07/24/2012 04:27 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Via maps cleanup: Match maps/$(via).%.csv with pattern instead of $(viaMaps) var so that a non-existing via map will have the recipe run, too. When auto-creating via maps is later added, this will be required.

3571 07/24/2012 04:07 AM Aaron Marcuse-Kubitza

inputs/*/maps/src.*.csv: Regenerated using new src_map output format

3570 07/24/2012 04:06 AM Aaron Marcuse-Kubitza

parallelproc.py: MultiProducerPool: Removed warning if not using parallel processing because this also gets generated when it's explicitly turned off, which is currently the case and clutters up stderr when testing

3569 07/24/2012 03:57 AM Aaron Marcuse-Kubitza

src_map: Also add columns for the output mappings and comments, so that the src map can be directly copied for use as the via map (DwC.specimens.csv, etc.). The output mapping column name must be provided by the caller, which input.Makefile maps/src.%.csv provides using the new mappings roots.

3568 07/24/2012 03:52 AM Aaron Marcuse-Kubitza

Added mappings/roots for use in creating src maps

3567 07/24/2012 03:41 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: maps/src.%.csv: Clean up by passing through `$(bin)/cols '*'` whenever it's changed. This ensures that the CSV dialect is always consistently Python's Excel dialect. (Note that this dialect actually uses \r\n as the line ending. The \n line endings were from src maps generated by a previous version of bin/src_map.)

3566 07/24/2012 03:28 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: maps/$(via).%.full.csv: Removed alternate rule when $(srcMap) doesn't exist, because this effect is actually achieved by the no-prereqs rule for maps/src.%.csv, which causes make to think it exists when matching pattern rules even if its recipe doesn't actually create it

3565 07/24/2012 03:23 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: maps/$(via).%.full.csv: Added alternate rule when $(srcMap) doesn't exist

3564 07/24/2012 03:21 AM Aaron Marcuse-Kubitza

inputs/CTFS/maps/: Removed unneeded src.organisms.csv since there is an way to deal with it not existing in input.Makefile

3563 07/24/2012 03:18 AM Aaron Marcuse-Kubitza

inputs/CTFS/maps/: Removed unneeded .VegX.plots.csv.last_cleanup

3562 07/24/2012 02:13 AM Aaron Marcuse-Kubitza

inputs/*/maps/src.*.csv: Standardized line endings to \n

3561 07/24/2012 01:56 AM Aaron Marcuse-Kubitza

input.Makefile: Maps building: maps/$(via).%.full.csv: Added the src map as a prerequisite so it would be rebuilt when the src map changes. This is possible now that every datasource has at least an empty src map. (An empty src map is now treated the same way as a non-existing one.)

3560 07/24/2012 01:52 AM Aaron Marcuse-Kubitza

inputs/*/maps/src.*.csv: Removed extraneous quotes around fields, which are added by Excel but not by Python

3559 07/24/2012 01:49 AM Aaron Marcuse-Kubitza

inputs/*/maps/src.*.csv: Removed extraneous quotes around fields, which are added by Excel but not by Python

3558 07/24/2012 01:41 AM Aaron Marcuse-Kubitza

inputs/CTFS: Added empty maps/src.organisms.csv so that every table of every datasource has a src map

3557 07/24/2012 12:18 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Documented how to populate the src/ subdir with input data

3556 07/23/2012 10:52 PM Aaron Marcuse-Kubitza

Added inputs/CVS/

3555 07/23/2012 10:28 PM Aaron Marcuse-Kubitza

sql_gen.py: plpythonu_error_handler: Translate specific Python exception types to PostgreSQL error codes (ValueError -> data_exception) instead of assuming everything is a data_exception. When removing the PL/Python prefix, preserve the Python exception class in a DETAIL message. Support non-PL/Python internal_errors by re-raising them.

3554 07/23/2012 10:25 PM Aaron Marcuse-Kubitza

sql_gen.py: Added reraise_exc

3553 07/23/2012 10:21 PM Aaron Marcuse-Kubitza

schemas/py_functions.sql: _date(): Raise (or pass through) ValueErrors directly instead of wrapping them in FormatExceptions, to simplify the code. This will also enable later translation of ValueErrors to data_exceptions. When year is required and missing, output a parsable 'null value in column year violates not-null constraint' error.

3552 07/23/2012 09:48 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): log_exc(): Handle infinite loops from repeated exceptions by removing all rows, instead of just aborting with a failed assertion

3551 07/23/2012 09:36 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Fixed bug where special case for unrecoverable errors needed to avoid creating an empty output pkeys table because function mode defines the returned pkeys table separately

3550 07/23/2012 09:08 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Factored defining the error handling wrapper function out of the main loop because it only needs to run once. Don't log "Trying to insert new rows" in function mode because it's inaccurate.

3549 07/23/2012 07:14 PM Aaron Marcuse-Kubitza

sql_gen.py: Exceptions: Added suppress_exc and use it in ExcHandler.to_str()

3548 07/23/2012 06:53 PM Aaron Marcuse-Kubitza

README.TXT: Backups: After a new import: Added step to delete previous imports so they won't bloat the full DB backup. (Note that these imports have already been backed up, and only the most recent import needs to be live in the DB.)

3547 07/23/2012 06:48 PM Aaron Marcuse-Kubitza

README.TXT: Backups: Documented what to do after a new import

3546 07/23/2012 06:39 PM Aaron Marcuse-Kubitza

backups/Makefile: Full DB: Added vegbien.backup/all to run both test and rotate

3545 07/23/2012 06:24 PM Aaron Marcuse-Kubitza

README.TXT: Renamed Maintenance section to Backups for clarity

3544 07/23/2012 06:19 PM Aaron Marcuse-Kubitza

backups/Makefile: .sql: When testing, turn it off so make won't skip `.sql: %` in favor of it

3543 07/23/2012 06:07 PM Aaron Marcuse-Kubitza

backups/Makefile: Split %.backup and %.sql into separate targets for clarity

3542 07/23/2012 05:56 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. Note that this import adds data provider feedback for SQL functions as well as additional date processing using _date().

3541 07/20/2012 07:10 AM Aaron Marcuse-Kubitza

schemas/py_functions.sql: _date(): Re-enabled now that exceptions thrown are properly handled. FormatException: Support raising parsable data_exceptions when provided with the value that was invalid. Date parsing mode: Return date as the value in FormatException so it can be filtered out automatically by column-based import.

3540 07/20/2012 07:06 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Creating error handling wrapper function: Fixed bug where needed to cast NULL returned in error handler to appropriate type, because it's contained within a SELECT query which does not do implicit casts from type unknown

3539 07/20/2012 07:03 AM Aaron Marcuse-Kubitza

sql_gen.py: Cast: Support types which are Code objects

3538 07/20/2012 06:05 AM Aaron Marcuse-Kubitza

sql_io.py: func_wrapper_exception_handler(): Use new sql_gen.merge_not_null() to try to ensure that NULL values are not folded (which would cause the concatenated values not to match up with the concatenated column names). Note that this adds a dependency on the db object, which callers must now provide.

3537 07/20/2012 06:03 AM Aaron Marcuse-Kubitza

sql_gen.py: Added merge_not_null()

3536 07/20/2012 06:03 AM Aaron Marcuse-Kubitza

sql_gen.py: Added try_mk_not_null()

3535 07/20/2012 05:54 AM Aaron Marcuse-Kubitza

sql_gen.py: Renamed ArrayJoin to ArrayMerge to avoid confusion with Join (a SQL construct)

3534 07/20/2012 05:46 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Creating error handling wrapper function: Set srcs on row_var so that the column type and nullability info of row_var's columns can be retrieved for use with sql_gen.ensure_not_null()

3533 07/20/2012 05:38 AM Aaron Marcuse-Kubitza

sql_gen.py: RowExcIgnore.to_str(): Compare self.row_var to global const row_var using == to allow caller to provide a copy of row_var with the underlying table set appropriately

3532 07/20/2012 05:35 AM Aaron Marcuse-Kubitza

sql_gen.py: underlying_table(): Support derived tables and row vars by obtaining the underlying table from the srcs