Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 almost 13 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  backups 4751 over 12 years Aaron Marcuse-Kubitza backups/Makefile: Backups: Full DB: Specify the...
  bin 4927 over 12 years Aaron Marcuse-Kubitza csv2db: COPY FROM mode: Removed no longer neede...
  config 272 about 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 4923 over 12 years Aaron Marcuse-Kubitza inputs/VegBank/: Added taxonimportance/
  lib 4917 over 12 years Aaron Marcuse-Kubitza streams.py: Line iteration: Added read_all()
  mappings 4922 over 12 years Aaron Marcuse-Kubitza mappings/VegCore.csv: Added and mapped aggregat...
  schemas 4863 over 12 years Aaron Marcuse-Kubitza schemas/functions.sql: Added _in_to_m()
  to_do 4524 over 12 years Aaron Marcuse-Kubitza to_do/timeline.doc: Updated to reflect addition...
  validation 4523 over 12 years Aaron Marcuse-Kubitza Added validation/
Makefile 9.99 KB 4752 over 12 years Aaron Marcuse-Kubitza root Makefile: PostgreSQL: postgres-Linux: Adde...
README.TXT 11.1 KB 4793 over 12 years Aaron Marcuse-Kubitza README.TXT: Data import: Added note that `make ...
map 1.22 KB 3475 over 12 years Aaron Marcuse-Kubitza root map: Run bin/map with a nice increment of ...
new_terms.csv 30.4 KB 4887 over 12 years Aaron Marcuse-Kubitza Regenerated root unmapped_terms.csv, new_terms.csv
unmapped_terms.csv 5.8 KB 4887 over 12 years Aaron Marcuse-Kubitza Regenerated root unmapped_terms.csv, new_terms.csv

Latest revisions

# Date Author Comment
4927 09/21/2012 03:57 PM Aaron Marcuse-Kubitza

csv2db: COPY FROM mode: Removed no longer needed explicit column list, now that the initial table has the exact width of the CSV (the row_num is added later)

4926 09/21/2012 03:55 PM Aaron Marcuse-Kubitza

csv2db: Add any row_num column after creating the table, so it does not interfere with row widths when using COPY FROM without explicit column names

4925 09/21/2012 03:48 PM Aaron Marcuse-Kubitza

csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.

4924 09/21/2012 03:44 PM Aaron Marcuse-Kubitza

csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.

4923 09/21/2012 03:28 PM Aaron Marcuse-Kubitza

inputs/VegBank/: Added taxonimportance/

4922 09/21/2012 03:20 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Added and mapped aggregateOccurrenceID

4921 09/21/2012 03:12 PM Aaron Marcuse-Kubitza

mappings/VegCore.csv: taxonOccurrenceID: Re-sourced to VegBank taxonobservation and DwC occurrenceID, because this is where the VegBIEN table name came from

4920 09/21/2012 02:57 PM Aaron Marcuse-Kubitza

tnrs_client: Support parsing multiple taxons at once, by specifying each as a command-line argument. Increased the max_pause to 10 min to support large batches. Limited the batch size to 5000 names, using the limit at <http://tnrs.iplantcollaborative.org/TNRSapp.html&gt;. Note that when using xargs to pass many names, xargs will by default split its arguments into chunks of 5000. You can change this using the -n option.

4919 09/21/2012 02:29 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

4918 09/21/2012 01:20 PM Aaron Marcuse-Kubitza

Added tnrs_client. Note that obtaining an actual CSV requires four (!) steps: submit, retrieve, prepare download, and download. The output of the retrieve step is unusable because the array has different lengths depending on the taxonomic ranks present in the provided taxon name. This initial version runs one name at a time, but could later be expanded to batch process because TNRS can run multiple names at once.

View all revisions | View revisions

Also available in: Atom