/trunk/lib - Changes - BIEN 3 - NCEAS Projects

root/trunk/lib @ 14620

svn:ignore: *.pyc tnrs.url

#	Date	Author	Comment
14620	08/28/2014 07:57 PM	Aaron Marcuse-Kubitza	bugfix: lib/csvs.py: JsonReader: need to pass col_order to row_dict_to_list_reader
14618	08/28/2014 07:12 PM	Aaron Marcuse-Kubitza	bugfix: lib/tnrs.py: JSON output: need to stringify arrays so they match what is output in TSV-export mode
14617	08/28/2014 07:10 PM	Aaron Marcuse-Kubitza	lib/csvs.py: JsonReader: added support for values that are arrays
14616	08/28/2014 07:05 PM	Aaron Marcuse-Kubitza	lib/csvs.py: MultiFilter: inherit from WrapReader instead of Filter to avoid needing to define a no-op filter_() function
14615	08/28/2014 06:49 PM	Aaron Marcuse-Kubitza	bugfix: lib/csvs.py: row_dict_to_list_reader: need to override next() directly instead of just using Filter, because Filter doesn't support returning multiple rows for one input row (in this case, prepending a header row). this caused the 1st data row to be missing.
14614	08/28/2014 06:47 PM	Aaron Marcuse-Kubitza	lib/csvs.py: Filter: inherit from WrapReader, which separates out the CSV-reader API code
14613	08/28/2014 06:43 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added WrapReader
14612	08/28/2014 06:43 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added Reader
14600	08/28/2014 03:10 AM	Aaron Marcuse-Kubitza	lib/csvs.py: JsonReader: factored out row-dict-to-list into new row_dict_to_list_reader so that JSON-specific preprocessing is kept separate from the row format translation
14599	08/27/2014 03:17 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added MultiFilter, which enables applying multiple filters by nesting
14598	08/26/2014 07:57 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: single_tnrs_request(): JSON mode: implemented output of JSON data
14597	08/26/2014 07:53 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: single_tnrs_request(): factored out wrapping in TnrsOutputStream, since this is done for both modes
14596	08/26/2014 07:47 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: JSON mode: TSV export columns: need to translate these to JSON column names before they can be used with the JSON data
14595	08/26/2014 07:44 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added JsonReader, which reads parsed JSON data as row tuples
14594	08/26/2014 07:43 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added row_dict_to_list(), which translates a CSV dict-based row to a list-based one
14593	08/26/2014 07:43 PM	Aaron Marcuse-Kubitza	lib/csvs.py: RowNumFilter: added support for filtering the header row as well
14592	08/26/2014 07:42 PM	Aaron Marcuse-Kubitza	lib/csvs.py: ColInsertFilter: added support for filtering the header row as well
14591	08/26/2014 05:12 PM	Aaron Marcuse-Kubitza	lib/csvs.py: InputRewriter: documented that this is also a stream (in addition to inheriting from StreamFilter)
14590	08/26/2014 05:11 PM	Aaron Marcuse-Kubitza	bugfix: lib/csvs.py: InputRewriter: accept a reader, as would be expected, instead of a custom stream whose lines are tuples
14589	08/26/2014 05:08 PM	Aaron Marcuse-Kubitza	fix: lib/sql_io.py: append_csv(): use new csvs.ProgressInputFilter instead of streams.ProgressInputStream(csvs.StreamFilter(__)), so that the input to csvs.InputRewriter is a reader, not a stream. this avoids the need for csvs.InputRewriter to accept a stream whose lines are tuples, instead of the expected reader.
14586	08/26/2014 04:49 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added ProgressInputFilter, analogous to streams.ProgressInputStream
14585	08/26/2014 04:46 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: added commented-out debug statement used to troubleshoot copy_expert() errors
14584	08/26/2014 04:45 PM	Aaron Marcuse-Kubitza	lib/dicts.py: added pair_keys(), pair_values()
14583	08/26/2014 04:15 PM	Aaron Marcuse-Kubitza	bugfix: lib/streams.py: CaptureStream: end_idx must also be > start_idx
14578	08/25/2014 10:17 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: single_tnrs_request(): use_tnrs_export=False: need to obtain export columns
14577	08/25/2014 10:16 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added header(stream)
14576	08/25/2014 10:16 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: single_tnrs_request(): need to `assert name_ct >= 1`, because with no names, TNRS hangs indefinitely
14545	08/21/2014 12:40 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/archives.sh: compress(): don't include dir prefix in zip archive
14544	08/21/2014 12:40 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: cd(): use echo_run instead of a manual echo_cmd call
14543	08/21/2014 12:35 PM	Aaron Marcuse-Kubitza	fix: lib/sh/util.sh: cd(): indent after running cd rather than before
14542	08/21/2014 12:32 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: cd(): support rebasing path vars for the new dir
14541	08/21/2014 11:51 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/archives.sh: compress(): need to use zip's path syntax to avoid the file in the archive being named "-"
14540	08/21/2014 08:56 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: added option to avoid using TNRS's TSV export feature, which currently returns incorrect selected matches (vegpath.org/issues/943). this has been implemented up through the GWT/JSON decoding.
14539	08/21/2014 08:50 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: added gwt_decode()
14538	08/21/2014 08:49 AM	Aaron Marcuse-Kubitza	lib/strings.py: added unesc_quotes() and helper functions
14537	08/21/2014 08:49 AM	Aaron Marcuse-Kubitza	lib/strings.py: added json_decode()
14534	08/20/2014 11:12 PM	Aaron Marcuse-Kubitza	lib/runscripts/extract.run: export_(): also compress created file
14533	08/20/2014 11:11 PM	Aaron Marcuse-Kubitza	lib/sh/archives.sh: added compress(), expand(), which handle compression of individual files
14511	08/19/2014 08:37 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: documentation about output of the retrieve step: added that this is also unusable because the array does not contain all the columns and contains no column names
14470	08/14/2014 03:25 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: retrieval_request_template: source_sorting (Constrain by Source): corrected explanation to reflect that the behavior is actually the same in both modes, since only one match is ever marked as selected, and that match should always come first
14412	08/04/2014 05:09 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: str2varname(): need to lowercase str because on case-insensitive filesystems, paths sometimes canonicalize to a different capitalization than the original
14411	08/04/2014 05:00 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added lowercase()
14410	08/03/2014 09:54 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: die(): need stub since this is invoked before it's defined
14409	08/03/2014 09:12 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: setup_log_fd(): don't change $log_fd to stdlog until stdlog is set up, to avoid "Bad file descriptor" errors
14407	08/02/2014 07:13 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: func_override(), copy_func(): added echo_func to facilitate debugging
14406	08/02/2014 07:12 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: stubs: log++ alias also needs to be moved to stub section
14405	08/01/2014 06:31 PM	Aaron Marcuse-Kubitza	bugfix: lib/Firefox_bookmarks.reformat.csv: URLs: match only the uppercase tags used by Firefox, not any lowercase tags added by the user
14404	08/01/2014 05:42 PM	Aaron Marcuse-Kubitza	fix: lib/Firefox_bookmarks.reformat.csv: page's self-description: updated comment to match regexp
14403	08/01/2014 05:39 PM	Aaron Marcuse-Kubitza	bugfix: lib/Firefox_bookmarks.reformat.csv: page's self-description: updated "page's self-description: " prefix to remove
14348	07/26/2014 05:28 PM	Aaron Marcuse-Kubitza	lib/sh/sync.sh: db_snapshot(): before backing up, trim bloated temp files (eg. from rolled back changes)
14074	07/15/2014 05:44 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql_io.py: put_table(): handle_MissingCastException(): when updating join_cols, don't add new entry for join_cols[out_col], only update existing one. this fixes #902 (import bug), and with #902 fixed, #887 (disk space leak) should no longer occur.
14005	07/14/2014 09:06 AM	Aaron Marcuse-Kubitza	bugfix: lib/Firefox_bookmarks.reformat.csv: updated for new Firefox bookmarks format, which indents the <DD> tag
13860	06/25/2014 07:54 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: dirty: documented that this actually used to be on in the web app (see r9910, 2013-6-18), but does not appear to be needed (the source_sorting bug alluded to in r9910 is not fixed by enabling the dirty setting)
13859	06/25/2014 07:46 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: requests: also debug-print request URL
13858	06/25/2014 07:44 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: Download: include the same debug info as do_request()
13857	06/25/2014 07:41 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: do_request(): also debug-print request headers
13856	06/25/2014 07:39 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: download_request_template: dirty: documented why this must be off
13855	06/25/2014 07:36 PM	Aaron Marcuse-Kubitza	bugfix: lib/tnrs.py: download_request_template: fixed bug where multiple names were being marked as Selected, because dirty was incorrectly set to true unlike in the web app
13849	06/25/2014 02:38 PM	Aaron Marcuse-Kubitza	fix: lib/phpPgAdmin.login.php.diff: use relative file path rather than the path the file was at when the patch was created
13848	06/25/2014 02:34 PM	Aaron Marcuse-Kubitza	/Makefile, lib/phpPgAdmin.login.php.diff: public_ user: added auto-filled password so that users would not be confused as to what to type in the password field
13833	06/24/2014 03:27 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: source_sorting (Constrain by Source): documented the different behavior for this in each match mode (all-matches and best-match)
13821	06/19/2014 01:57 AM	Aaron Marcuse-Kubitza	fix: lib/runscripts/extract.run: export_(): explicitly prevent files from becoming web-accessible, to protect against an incorrect umask in the calling process
13636	06/05/2014 04:30 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: max_names: raised back up to 500 now that a workaround for the Internal Server Errors is in place (https://github.com/iPlantCollaborativeOpenSource/TNRS/issues/7)
13630	06/04/2014 03:01 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: max_names: lowered to 50 because the dev TNRS server is now always crashing with an Internal Server Error when scrubbing 500 names at a time (https://github.com/iPlantCollaborativeOpenSource/TNRS/issues/7)
13597	06/02/2014 04:24 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: Constrain by Source: turn it on so that the download settings reflect what TNRS actually used, while this is broken
13596	06/02/2014 06:19 AM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: max_names: reduced back to 500 because even 5000 crashes the dev TNRS server
13595	06/02/2014 05:52 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: max_names: reduced to 5000 because 100,000 causes an internal server error
13591	06/02/2014 04:50 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: switched to downloading all matches per name, as is needed to implement #917. note that this will break the parts of the schema that use the tnrs table, until Brad's match-picking algorithm can be implemented, but this tradeoff is necessary to be able to begin scrubbing sooner (Martha; wiki.vegpath.org/2014-05-29_conference_call#TNRS)
13562	05/30/2014 07:50 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: max_names: increased to 100000 because the dev server can handle more names (no simultaneous users), as decided in the conference call (wiki.vegpath.org/2014-05-29_conference_call#TNRS)
13559	05/30/2014 07:37 AM	Aaron Marcuse-Kubitza	fix: lib/PostgreSQL-MySQL.csv: need to replace "double precision" with "double" to work with MySQL Workbench 5.2.47
13548	05/29/2014 11:53 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: commented out the value of max_names that is not active, for clarity
13544	05/27/2014 11:12 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: sources: updated to list/sort order in issue #917
13500	05/21/2014 01:23 AM	Aaron Marcuse-Kubitza	fix: lib/PostgreSQL-MySQL.csv: also remove left-behind lines such as `$$);`
13473	05/17/2014 06:22 PM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/util.run: $is_runscript: unexport so don't pass it to invoked scripts
13465	05/17/2014 02:15 PM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/util.run: run_args_cmd(): don't prepend main to args if no args, because for a non-runscript, all args will be passed to main(), leading `main` to be doubled
13464	05/17/2014 01:30 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: use the TNRS dev server (with private URL in tnrs.url) instead of the live server, since that contains datasources that we need
13463	05/17/2014 01:29 PM	Aaron Marcuse-Kubitza	lib/streams.py: added file_get_contents()
13462	05/17/2014 01:14 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: configure the server separately from the base URL
13461	05/17/2014 01:12 PM	Aaron Marcuse-Kubitza	lib/: svn:ignore tnrs.url so the TNRS dev server URL does not become public
13436	05/12/2014 07:06 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: retrieval_request_template: taxonomic_constraint, source_sorting: documented their meaning and why they need to be on/off
13413	05/07/2014 05:18 PM	Aaron Marcuse-Kubitza	lib/runscripts/import.run: added install() target
13409	05/07/2014 03:29 PM	Aaron Marcuse-Kubitza	lib/runscripts/in_datasrc_dir.run: use new local.run
13408	05/07/2014 03:25 PM	Aaron Marcuse-Kubitza	added lib/runscripts/local.run
13392	05/02/2014 10:57 PM	Aaron Marcuse-Kubitza	fix: lib/util.py: dict_subset(): raise an error if collections.OrderedDict isn't available, because some callers may depend on this. note that using dict instead of OrderedDict may be the cause of the joining on the wrong columns bug (issue #902).
13373	05/01/2014 01:37 PM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/validations.pg.sql.run: updated to reflect that validations.sql is now located inside a subdir, not the datasrc dir
13372	05/01/2014 01:29 PM	Aaron Marcuse-Kubitza	fix: lib/runscripts/file.pg.sql.run: removed include of in_datasrc_dir.run, because this location does not apply to all .sql export scripts
13367	05/01/2014 04:09 AM	Aaron Marcuse-Kubitza	lib/runscripts/validations.pg.sql.run: export_(): make the export idempotent for easier re-runnability
13365	05/01/2014 03:14 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/db.sh: pg_dump(): need use_pg to import $pg_database before checking for existence of $database
13364	05/01/2014 03:11 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: import_vars: documented that it's idempotent
13361	04/30/2014 06:58 PM	Aaron Marcuse-Kubitza	bugfix: lib/util.py: use OrderedDict from collections rather than ordereddict to work with Mac OS X 10.8 Mountain Lion (http://vegpath.org/links/#OrderedDict)
13354	04/29/2014 11:36 PM	Aaron Marcuse-Kubitza	bugfix: benign_does_not_exist_error(): removed ignore_e=3, because this exit status is also used for other errors
13353	04/29/2014 11:35 PM	Aaron Marcuse-Kubitza	fix: lib/sh/db.sh: benign_does_not_exist_error(): use benign_error=1, which is now supported properly by stderr_matches()
13352	04/29/2014 11:34 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: stderr_matches(): support $benign_error properly, by handling exit status logging in this func instead
13351	04/29/2014 11:03 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/db.sh: pg_schema_exists(): also need to benignify "does not exist" error if returns false
13350	04/29/2014 10:42 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: stderr_matches(): need to separately display errors that were incorrectly suppressed due to $benign_error
13349	04/29/2014 10:36 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: is_err(): rethrow must be inverted (rethrow->false if error)
13348	04/29/2014 10:32 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added is_err()
13347	04/29/2014 09:53 PM	Aaron Marcuse-Kubitza	lib/sh/local.sh: public_schema_exists(): moved to lib/sh/db.sh since this no longer depends on BIEN-specific configurations
13346	04/29/2014 09:42 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/db.sh: public_schema_exists(): don't hide the function call tree so it's clear which function is running the psql commands
13345	04/29/2014 09:40 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/db.sh: public_schema_exists(): don't hide the function call tree so it's clear which function is running the psql commands

Project

General

Profile