/trunk/lib/tnrs.py - Changes - BIEN 3 - NCEAS Projects

root/trunk/lib/tnrs.py @ 14901

#	Date	Author	Comment
14803	10/07/2014 12:01 AM	Aaron Marcuse-Kubitza	bugfix: lib/tnrs.py: encode_map: also need to encode + because TNRS removes it from the morphospecies (vegpath.org/wiki/CVS_validation#Bobs-revised-document > issue #4)
14622	08/28/2014 08:13 PM	Aaron Marcuse-Kubitza	lib/tnrs.py single_tnrs_request(), bin/tnrs_client: use_tnrs_export: default to False because this mode uses incorrect selected matches (vegpath.org/issues/943), and the JSON mode that fixes this is now available
14618	08/28/2014 07:12 PM	Aaron Marcuse-Kubitza	bugfix: lib/tnrs.py: JSON output: need to stringify arrays so they match what is output in TSV-export mode
14598	08/26/2014 07:57 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: single_tnrs_request(): JSON mode: implemented output of JSON data
14597	08/26/2014 07:53 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: single_tnrs_request(): factored out wrapping in TnrsOutputStream, since this is done for both modes
14596	08/26/2014 07:47 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: JSON mode: TSV export columns: need to translate these to JSON column names before they can be used with the JSON data
14578	08/25/2014 10:17 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: single_tnrs_request(): use_tnrs_export=False: need to obtain export columns
14576	08/25/2014 10:16 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: single_tnrs_request(): need to `assert name_ct >= 1`, because with no names, TNRS hangs indefinitely
14540	08/21/2014 08:56 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: added option to avoid using TNRS's TSV export feature, which currently returns incorrect selected matches (vegpath.org/issues/943). this has been implemented up through the GWT/JSON decoding.
14539	08/21/2014 08:50 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: added gwt_decode()
14511	08/19/2014 08:37 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: documentation about output of the retrieve step: added that this is also unusable because the array does not contain all the columns and contains no column names
14470	08/14/2014 03:25 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: retrieval_request_template: source_sorting (Constrain by Source): corrected explanation to reflect that the behavior is actually the same in both modes, since only one match is ever marked as selected, and that match should always come first
13860	06/25/2014 07:54 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: dirty: documented that this actually used to be on in the web app (see r9910, 2013-6-18), but does not appear to be needed (the source_sorting bug alluded to in r9910 is not fixed by enabling the dirty setting)
13859	06/25/2014 07:46 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: requests: also debug-print request URL
13858	06/25/2014 07:44 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: Download: include the same debug info as do_request()
13857	06/25/2014 07:41 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: do_request(): also debug-print request headers
13856	06/25/2014 07:39 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: download_request_template: dirty: documented why this must be off
13855	06/25/2014 07:36 PM	Aaron Marcuse-Kubitza	bugfix: lib/tnrs.py: download_request_template: fixed bug where multiple names were being marked as Selected, because dirty was incorrectly set to true unlike in the web app
13833	06/24/2014 03:27 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: source_sorting (Constrain by Source): documented the different behavior for this in each match mode (all-matches and best-match)
13636	06/05/2014 04:30 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: max_names: raised back up to 500 now that a workaround for the Internal Server Errors is in place (https://github.com/iPlantCollaborativeOpenSource/TNRS/issues/7)
13630	06/04/2014 03:01 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: max_names: lowered to 50 because the dev TNRS server is now always crashing with an Internal Server Error when scrubbing 500 names at a time (https://github.com/iPlantCollaborativeOpenSource/TNRS/issues/7)
13597	06/02/2014 04:24 PM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: Constrain by Source: turn it on so that the download settings reflect what TNRS actually used, while this is broken
13596	06/02/2014 06:19 AM	Aaron Marcuse-Kubitza	fix: lib/tnrs.py: max_names: reduced back to 500 because even 5000 crashes the dev TNRS server
13595	06/02/2014 05:52 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: max_names: reduced to 5000 because 100,000 causes an internal server error
13591	06/02/2014 04:50 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: switched to downloading all matches per name, as is needed to implement #917. note that this will break the parts of the schema that use the tnrs table, until Brad's match-picking algorithm can be implemented, but this tradeoff is necessary to be able to begin scrubbing sooner (Martha; wiki.vegpath.org/2014-05-29_conference_call#TNRS)
13562	05/30/2014 07:50 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: max_names: increased to 100000 because the dev server can handle more names (no simultaneous users), as decided in the conference call (wiki.vegpath.org/2014-05-29_conference_call#TNRS)
13548	05/29/2014 11:53 AM	Aaron Marcuse-Kubitza	lib/tnrs.py: commented out the value of max_names that is not active, for clarity
13544	05/27/2014 11:12 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: sources: updated to list/sort order in issue #917
13464	05/17/2014 01:30 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: use the TNRS dev server (with private URL in tnrs.url) instead of the live server, since that contains datasources that we need
13462	05/17/2014 01:14 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: configure the server separately from the base URL
13436	05/12/2014 07:06 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: retrieval_request_template: taxonomic_constraint, source_sorting: documented their meaning and why they need to be on/off
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
9912	06/18/2013 05:55 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: HTTP requests: rewrapped lines
9911	06/18/2013 05:53 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: updated HTTP requests to match current web app
9910	06/18/2013 05:51 PM	Aaron Marcuse-Kubitza	bugfix: lib/tnrs.py: download_request_template: changed dirty to true (to match the current web app), which is apparently needed to apply the source_sorting setting to the downloaded TSV in addition to the GUI results
9909	06/18/2013 05:29 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: retrieval_request_template: turned source_sorting back off, because it causes any match from the first source to always be used, even if it has a lower match score than the match from the other source. (Brad confirms that this should be off.) I think we had this on originally to ensure that only Tropicos results were used when available, rather than USDA when it was a better match. * note that due to a bug in the web app, this change will not actually be effective, because the source_sorting option is only applied to the GUI results, not the downloaded TSV. *
9904	06/18/2013 02:21 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: submission_request_template: include GCC in addition to Tropicos, because it provides more synonyms than Tropicos for Asteraceae, and the accepted names still match the Tropicos backbone (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2013-06-13_conference_call#include-GCC-when-running-TNRS)
9525	05/23/2013 02:53 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: single_tnrs_request(): added support for a cumulative profiler using the cumulative_profiler kw param
9520	05/23/2013 02:32 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: repeated_tnrs_request(): renamed to tnrs_request() since this is the function that should usually be used, to ensure that debugging information is output in the case of an error. (the TNRS request must be made again to output this information.)
9519	05/23/2013 02:30 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: tnrs_request(): renamed to single_tnrs_request() to distinguish it from repeated_tnrs_request()
5786	10/25/2012 03:45 PM	Aaron Marcuse-Kubitza	tnrs.py: retrieval_request_template: Turn on taxonomic_constraint (to match family before genus) and source_sorting (to always return any result from the first source before returning results from any other sources, regardless of match %)
5691	10/22/2012 08:22 PM	Aaron Marcuse-Kubitza	tnrs.py: submission_request_template: Use just Tropicos as the name source, as Brad says "GCC is for only one family (Asteraceae)" and USDA's "taxonomy is of lower quality and sometimes conflicts with Tropicos"
5171	10/02/2012 09:54 PM	Aaron Marcuse-Kubitza	tnrs.py: encode_map: Added hidden minus sign, which TNRS removes
5169	10/02/2012 09:25 PM	Aaron Marcuse-Kubitza	tnrs.py: encode_map: Added × (times), which TNRS replaces with x
5168	10/02/2012 09:18 PM	Aaron Marcuse-Kubitza	tnrs.py: encode_map: Added " and ', which TNRS removes when at the beginning or end
5167	10/02/2012 09:12 PM	Aaron Marcuse-Kubitza	tnrs.py: encode_map: Documented why each character needs to be encoded
5166	10/02/2012 09:04 PM	Aaron Marcuse-Kubitza	tnrs.py: encode_map: Removed '&', which is actually not a special character for TNRS (although ';' is)
5165	10/02/2012 09:02 PM	Aaron Marcuse-Kubitza	tnrs.py: encode_map: Added '_', which TNRS replaces with space
5160	10/02/2012 06:50 PM	Aaron Marcuse-Kubitza	tnrs.py: repeated_tnrs_request(): Also retry request in debug mode if an HTTPError is thrown, so that debugging info can also be obtained if there is a bug in the TNRS client
5154	10/01/2012 09:36 PM	Aaron Marcuse-Kubitza	tnrs.py: encode(): Also prepend special padding string to empty and whitespace-only strings because these names are otherwise ignored by TNRS (no response row)
5151	10/01/2012 08:58 PM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): Rewrapped lines (became >80 chars after adding profiling)
5150	10/01/2012 08:52 PM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): Use new encode() and TnrsOutputStream to escape TNRS-invalid characters
5149	10/01/2012 08:51 PM	Aaron Marcuse-Kubitza	tnrs.py: Added encode(), decode(), decode_for_tsv(), and TnrsOutputStream to handle escaping TNRS-invalid characters
5144	10/01/2012 05:47 PM	Aaron Marcuse-Kubitza	tnrs.py: gwt_encode(): Escape special characters in the string instead of removing them, so that TNRS receives the original name rather than a modified version. This will help make the submitted names match up with the returned Name_submitted.
5127	09/28/2012 02:31 PM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): Added comment that names containing only whitespace characters are ignored by TNRS and do not receive a response row. Our tnrs_db and reimport pipeline handles the necessary re-matching-up by just creating taxonpaths for each Name_submitted, and then letting the data import process on the following import attach to the prepopulated taxonpaths.
5125	09/28/2012 02:02 PM	Aaron Marcuse-Kubitza	tnrs.py: max_pause: Changed to 30 min because TNRS sometimes freezes for ~10 min. The freezing usually happens while the data is being uploaded rather than when it's being retrieved, so that the max_pause would not apply, but to be on the safe side, requests should not time out unnecessarily.
5121	09/28/2012 01:15 PM	Aaron Marcuse-Kubitza	TNRS-related programs: Use "names" instead of "taxons" for variable names because what's being submitted are actually verbatim taxonomic names, not official references to specific taxa
5120	09/28/2012 01:08 PM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): Profile the TNRS request
5119	09/28/2012 12:58 PM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): Fixed bug where initial_headers needed to be copied instead of just assigned to headers, because initial_headers is a global constant and should not be changed when the Cookie header is added
5108	09/28/2012 10:54 AM	Aaron Marcuse-Kubitza	tnrs.py: repeated_tnrs_request(): Just retry the request once with with debug turned on, to avoid cluttering the log output with the verbose debug info of multiple failed requests if the error is not resolved on retry
5107	09/28/2012 10:47 AM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): repeated_tnrs_request(): Print all suppressed exceptions to stderr
5106	09/28/2012 10:41 AM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): parse_response(): Include both the response headers and the response body in the InvalidResponse message
5101	09/28/2012 09:51 AM	Aaron Marcuse-Kubitza	tnrs_db: Moved lower max_taxons limit to tnrs.py because it's really required to avoid crashing the TNRS server and should apply to all callers
5091	09/28/2012 08:30 AM	Aaron Marcuse-Kubitza	repeated_tnrs_request(): When retrying after an invalid response, output protocol info for debugging
5088	09/28/2012 08:16 AM	Aaron Marcuse-Kubitza	tnrs.py: Added repeated_tnrs_request() to retry a TNRS request which returned an invalid response
5083	09/28/2012 07:43 AM	Aaron Marcuse-Kubitza	tnrs.py: parse_response(): Raise custom InvalidResponse exception instead of SystemExit, so callers can catch the exception and respond to it
5006	09/26/2012 06:45 PM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): Return the CSV stream directly instead of reading it into a string
5005	09/26/2012 06:42 PM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): Moved CSV-download-specific functionality from do_request() to the Download section
5003	09/25/2012 11:13 PM	Aaron Marcuse-Kubitza	tnrs.py: tnrs_request(): Return the response instead of printing it to stdout
4990	09/25/2012 07:43 PM	Aaron Marcuse-Kubitza	Added tnrs.py

Project

General

Profile

root/trunk/lib/tnrs.py @ 14901