Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 almost 13 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  backups 4751 over 12 years Aaron Marcuse-Kubitza backups/Makefile: Backups: Full DB: Specify the...
  bin 5153 about 12 years Aaron Marcuse-Kubitza tnrs_db: pause: Increased to 30 min because if ...
  config 272 about 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 5138 about 12 years Aaron Marcuse-Kubitza mappings/VegCore-VegBIEN.csv: non-TNRS taxonpat...
  lib 5154 about 12 years Aaron Marcuse-Kubitza tnrs.py: encode(): Also prepend special padding...
  mappings 5138 about 12 years Aaron Marcuse-Kubitza mappings/VegCore-VegBIEN.csv: non-TNRS taxonpat...
  schemas 5141 about 12 years Aaron Marcuse-Kubitza schemas/vegbien.sql: placepath.canon_placepath_...
  to_do 4524 over 12 years Aaron Marcuse-Kubitza to_do/timeline.doc: Updated to reflect addition...
  validation 4523 over 12 years Aaron Marcuse-Kubitza Added validation/
Makefile 9.99 KB 4752 over 12 years Aaron Marcuse-Kubitza root Makefile: PostgreSQL: postgres-Linux: Adde...
README.TXT 11.3 KB 5040 over 12 years Aaron Marcuse-Kubitza README.TXT: Data import: Starting column-based ...
map 1.28 KB 4981 over 12 years Aaron Marcuse-Kubitza root map: Fixed custom public schema override t...
new_terms.csv 30.4 KB 4887 over 12 years Aaron Marcuse-Kubitza Regenerated root unmapped_terms.csv, new_terms.csv
unmapped_terms.csv 5.8 KB 4887 over 12 years Aaron Marcuse-Kubitza Regenerated root unmapped_terms.csv, new_terms.csv

Latest revisions

# Date Author Comment
5154 10/01/2012 09:36 PM Aaron Marcuse-Kubitza

tnrs.py: encode(): Also prepend special padding string to empty and whitespace-only strings because these names are otherwise ignored by TNRS (no response row)

5153 10/01/2012 09:15 PM Aaron Marcuse-Kubitza

tnrs_db: pause: Increased to 30 min because if no new names are available in TNRS.tnrs, there is no need to check every minute for new names (which clutters up the log file output). The pause feature is designed to allow tnrs_db to run in parallel with the import process, and process new names as they are made available, which only happens once for each partition of each datasource.

5152 10/01/2012 09:11 PM Aaron Marcuse-Kubitza

tnrs_db: Fixed bug where the new filtering out of already-scrubbed names caused names to be skipped, because the loop would both advance by the number of rows found and those rows would no longer be returned by the query, causing only every other set of rows to be processed

5151 10/01/2012 08:58 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Rewrapped lines (became >80 chars after adding profiling)

5150 10/01/2012 08:52 PM Aaron Marcuse-Kubitza

tnrs.py: tnrs_request(): Use new encode() and TnrsOutputStream to escape TNRS-invalid characters

5149 10/01/2012 08:51 PM Aaron Marcuse-Kubitza

tnrs.py: Added encode(), decode(), decode_for_tsv(), and TnrsOutputStream to handle escaping TNRS-invalid characters

5148 10/01/2012 08:48 PM Aaron Marcuse-Kubitza

strings.py: Added regexp_repl_esc()

5147 10/01/2012 08:47 PM Aaron Marcuse-Kubitza

strings.py: Added replace_all() and replace_all_re(), as well as flip_map() for use with maps for these functions

5146 10/01/2012 08:46 PM Aaron Marcuse-Kubitza

csvs.py: Added tsv_encode_map for use in creating TSVs parsed by TsvReader

5145 10/01/2012 06:42 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Also interpret '\t' as a tab, to provide a mechanism for encoding embedded tabs

View all revisions | View revisions

Also available in: Atom