Project

General

Profile

# Date Author Comment
14576 08/25/2014 10:16 PM Aaron Marcuse-Kubitza

fix: lib/tnrs.py: single_tnrs_request(): need to `assert name_ct >= 1`, because with no names, TNRS hangs indefinitely

14540 08/21/2014 08:56 AM Aaron Marcuse-Kubitza

lib/tnrs.py: added option to avoid using TNRS's TSV export feature, which currently returns incorrect selected matches (vegpath.org/issues/943). this has been implemented up through the GWT/JSON decoding.

14539 08/21/2014 08:50 AM Aaron Marcuse-Kubitza

lib/tnrs.py: added gwt_decode()

14511 08/19/2014 08:37 AM Aaron Marcuse-Kubitza

lib/tnrs.py: documentation about output of the retrieve step: added that this is also unusable because the array does not contain all the columns and contains no column names

14470 08/14/2014 03:25 PM Aaron Marcuse-Kubitza

fix: lib/tnrs.py: retrieval_request_template: source_sorting (Constrain by Source): corrected explanation to reflect that the behavior is actually the same in both modes, since only one match is ever marked as selected, and that match should always come first

13860 06/25/2014 07:54 PM Aaron Marcuse-Kubitza

lib/tnrs.py: dirty: documented that this actually used to be on in the web app (see r9910, 2013-6-18), but does not appear to be needed (the source_sorting bug alluded to in r9910 is not fixed by enabling the dirty setting)

13859 06/25/2014 07:46 PM Aaron Marcuse-Kubitza

lib/tnrs.py: requests: also debug-print request URL

13858 06/25/2014 07:44 PM Aaron Marcuse-Kubitza

lib/tnrs.py: Download: include the same debug info as do_request()

13857 06/25/2014 07:41 PM Aaron Marcuse-Kubitza

lib/tnrs.py: do_request(): also debug-print request headers

13856 06/25/2014 07:39 PM Aaron Marcuse-Kubitza

lib/tnrs.py: download_request_template: dirty: documented why this must be off

13855 06/25/2014 07:36 PM Aaron Marcuse-Kubitza

bugfix: lib/tnrs.py: download_request_template: fixed bug where multiple names were being marked as Selected, because dirty was incorrectly set to true unlike in the web app

13833 06/24/2014 03:27 PM Aaron Marcuse-Kubitza

lib/tnrs.py: source_sorting (Constrain by Source): documented the different behavior for this in each match mode (all-matches and best-match)

13636 06/05/2014 04:30 AM Aaron Marcuse-Kubitza

lib/tnrs.py: max_names: raised back up to 500 now that a workaround for the Internal Server Errors is in place (https://github.com/iPlantCollaborativeOpenSource/TNRS/issues/7)

13630 06/04/2014 03:01 PM Aaron Marcuse-Kubitza

fix: lib/tnrs.py: max_names: lowered to 50 because the dev TNRS server is now always crashing with an Internal Server Error when scrubbing 500 names at a time (https://github.com/iPlantCollaborativeOpenSource/TNRS/issues/7)

13597 06/02/2014 04:24 PM Aaron Marcuse-Kubitza

fix: lib/tnrs.py: Constrain by Source: turn it on so that the download settings reflect what TNRS actually used, while this is broken

13596 06/02/2014 06:19 AM Aaron Marcuse-Kubitza

fix: lib/tnrs.py: max_names: reduced back to 500 because even 5000 crashes the dev TNRS server

13595 06/02/2014 05:52 AM Aaron Marcuse-Kubitza

lib/tnrs.py: max_names: reduced to 5000 because 100,000 causes an internal server error

13591 06/02/2014 04:50 AM Aaron Marcuse-Kubitza

lib/tnrs.py: switched to downloading all matches per name, as is needed to implement #917. note that this will break the parts of the schema that use the tnrs table, until Brad's match-picking algorithm can be implemented, but this tradeoff is necessary to be able to begin scrubbing sooner (Martha; wiki.vegpath.org/2014-05-29_conference_call#TNRS)

13562 05/30/2014 07:50 AM Aaron Marcuse-Kubitza

lib/tnrs.py: max_names: increased to 100000 because the dev server can handle more names (no simultaneous users), as decided in the conference call (wiki.vegpath.org/2014-05-29_conference_call#TNRS)

13548 05/29/2014 11:53 AM Aaron Marcuse-Kubitza

lib/tnrs.py: commented out the value of max_names that is not active, for clarity

13544 05/27/2014 11:12 PM Aaron Marcuse-Kubitza

lib/tnrs.py: sources: updated to list/sort order in issue #917

13464 05/17/2014 01:30 PM Aaron Marcuse-Kubitza

lib/tnrs.py: use the TNRS dev server (with private URL in tnrs.url) instead of the live server, since that contains datasources that we need

13462 05/17/2014 01:14 PM Aaron Marcuse-Kubitza

lib/tnrs.py: configure the server separately from the base URL

13436 05/12/2014 07:06 PM Aaron Marcuse-Kubitza

lib/tnrs.py: retrieval_request_template: taxonomic_constraint, source_sorting: documented their meaning and why they need to be on/off

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

9912 06/18/2013 05:55 PM Aaron Marcuse-Kubitza

lib/tnrs.py: HTTP requests: rewrapped lines

9911 06/18/2013 05:53 PM Aaron Marcuse-Kubitza

lib/tnrs.py: updated HTTP requests to match current web app

9910 06/18/2013 05:51 PM Aaron Marcuse-Kubitza

bugfix: lib/tnrs.py: download_request_template: changed dirty to true (to match the current web app), which is apparently needed to apply the source_sorting setting to the downloaded TSV in addition to the GUI results

9909 06/18/2013 05:29 PM Aaron Marcuse-Kubitza

lib/tnrs.py: retrieval_request_template: turned source_sorting back off, because it causes any match from the first source to always be used, even if it has a lower match score than the match from the other source. (Brad confirms that this should be off.) I think we had this on originally to ensure that only Tropicos results were used when available, rather than USDA when it was a better match. * note that due to a bug in the web app, this change will not actually be effective, because the source_sorting option is only applied to the GUI results, not the downloaded TSV. *

9904 06/18/2013 02:21 PM Aaron Marcuse-Kubitza

lib/tnrs.py: submission_request_template: include GCC in addition to Tropicos, because it provides more synonyms than Tropicos for Asteraceae, and the accepted names still match the Tropicos backbone (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2013-06-13_conference_call#include-GCC-when-running-TNRS)

9525 05/23/2013 02:53 PM Aaron Marcuse-Kubitza

lib/tnrs.py: single_tnrs_request(): added support for a cumulative profiler using the cumulative_profiler kw param

9520 05/23/2013 02:32 PM Aaron Marcuse-Kubitza

lib/tnrs.py: repeated_tnrs_request(): renamed to tnrs_request() since this is the function that should usually be used, to ensure that debugging information is output in the case of an error. (the TNRS request must be made again to output this information.)

9519 05/23/2013 02:30 PM Aaron Marcuse-Kubitza

lib/tnrs.py: tnrs_request(): renamed to single_tnrs_request() to distinguish it from repeated_tnrs_request()

5786 10/25/2012 03:45 PM Aaron Marcuse-Kubitza

tnrs.py: retrieval_request_template: Turn on taxonomic_constraint (to match family before genus) and source_sorting (to always return any result from the first source before returning results from any other sources, regardless of match %)

5691 10/22/2012 08:22 PM Aaron Marcuse-Kubitza

tnrs.py: submission_request_template: Use just Tropicos as the name source, as Brad says "GCC is for only one family (Asteraceae)" and USDA's "taxonomy is of lower quality and sometimes conflicts with Tropicos"

5171 10/02/2012 09:54 PM Aaron Marcuse-Kubitza

tnrs.py: encode_map: Added hidden minus sign, which TNRS removes

5169 10/02/2012 09:25 PM Aaron Marcuse-Kubitza

tnrs.py: encode_map: Added × (times), which TNRS replaces with x

5168 10/02/2012 09:18 PM Aaron Marcuse-Kubitza

tnrs.py: encode_map: Added " and ', which TNRS removes when at the beginning or end

5167 10/02/2012 09:12 PM Aaron Marcuse-Kubitza

tnrs.py: encode_map: Documented why each character needs to be encoded

5166 10/02/2012 09:04 PM Aaron Marcuse-Kubitza

tnrs.py: encode_map: Removed '&', which is actually not a special character for TNRS (although ';' is)

5165 10/02/2012 09:02 PM Aaron Marcuse-Kubitza

tnrs.py: encode_map: Added '_', which TNRS replaces with space

5160 10/02/2012 06:50 PM Aaron Marcuse-Kubitza

tnrs.py: repeated_tnrs_request(): Also retry request in debug mode if an HTTPError is thrown, so that debugging info can also be obtained if there is a bug in the TNRS client

5154 10/01/2012 09:36 PM Aaron Marcuse-Kubitza

tnrs.py: encode(): Also prepend special padding string to empty and whitespace-only strings because these names are otherwise ignored by TNRS (no response row)

5151 10/01/2012 08:58 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Rewrapped lines (became >80 chars after adding profiling)

5150 10/01/2012 08:52 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Use new encode() and TnrsOutputStream to escape TNRS-invalid characters

5149 10/01/2012 08:51 PM Aaron Marcuse-Kubitza

tnrs.py: Added encode(), decode(), decode_for_tsv(), and TnrsOutputStream to handle escaping TNRS-invalid characters

5144 10/01/2012 05:47 PM Aaron Marcuse-Kubitza

tnrs.py: gwt_encode(): Escape special characters in the string instead of removing them, so that TNRS receives the original name rather than a modified version. This will help make the submitted names match up with the returned Name_submitted.

5127 09/28/2012 02:31 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Added comment that names containing only whitespace characters are ignored by TNRS and do not receive a response row. Our tnrs_db and reimport pipeline handles the necessary re-matching-up by just creating taxonpaths for each Name_submitted, and then letting the data import process on the following import attach to the prepopulated taxonpaths.

5125 09/28/2012 02:02 PM Aaron Marcuse-Kubitza

tnrs.py: max_pause: Changed to 30 min because TNRS sometimes freezes for ~10 min. The freezing usually happens while the data is being uploaded rather than when it's being retrieved, so that the max_pause would not apply, but to be on the safe side, requests should not time out unnecessarily.

5121 09/28/2012 01:15 PM Aaron Marcuse-Kubitza

TNRS-related programs: Use "names" instead of "taxons" for variable names because what's being submitted are actually verbatim taxonomic names, not official references to specific taxa

5120 09/28/2012 01:08 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Profile the TNRS request

5119 09/28/2012 12:58 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Fixed bug where initial_headers needed to be copied instead of just assigned to headers, because initial_headers is a global constant and should not be changed when the Cookie header is added

5108 09/28/2012 10:54 AM Aaron Marcuse-Kubitza

tnrs.py: repeated_tnrs_request(): Just retry the request once with with debug turned on, to avoid cluttering the log output with the verbose debug info of multiple failed requests if the error is not resolved on retry

5107 09/28/2012 10:47 AM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): repeated_tnrs_request(): Print all suppressed exceptions to stderr

5106 09/28/2012 10:41 AM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): parse_response(): Include both the response headers and the response body in the InvalidResponse message

5101 09/28/2012 09:51 AM Aaron Marcuse-Kubitza

tnrs_db: Moved lower max_taxons limit to tnrs.py because it's really required to avoid crashing the TNRS server and should apply to all callers

5091 09/28/2012 08:30 AM Aaron Marcuse-Kubitza

repeated_tnrs_request(): When retrying after an invalid response, output protocol info for debugging

5088 09/28/2012 08:16 AM Aaron Marcuse-Kubitza

tnrs.py: Added repeated_tnrs_request() to retry a TNRS request which returned an invalid response

5083 09/28/2012 07:43 AM Aaron Marcuse-Kubitza

tnrs.py: parse_response(): Raise custom InvalidResponse exception instead of SystemExit, so callers can catch the exception and respond to it

5006 09/26/2012 06:45 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Return the CSV stream directly instead of reading it into a string

5005 09/26/2012 06:42 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Moved CSV-download-specific functionality from do_request() to the Download section

5003 09/25/2012 11:13 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Return the response instead of printing it to stdout

4990 09/25/2012 07:43 PM Aaron Marcuse-Kubitza

Added tnrs.py