Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 over 12 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  backups 9496 over 11 years Aaron Marcuse-Kubitza added backups/*.md5
  bin 9520 over 11 years Aaron Marcuse-Kubitza lib/tnrs.py: repeated_tnrs_request(): renamed t...
  config 7801 over 11 years Aaron Marcuse-Kubitza root Makefile: VegBIEN DB: mk_db: Added command...
  exports 8798 over 11 years Aaron Marcuse-Kubitza exports/: svn:ignore *.csv
  inputs 9518 over 11 years Aaron Marcuse-Kubitza bin/tnrs_db: removed no longer used $wait flag ...
  lib 9520 over 11 years Aaron Marcuse-Kubitza lib/tnrs.py: repeated_tnrs_request(): renamed t...
  mappings 9459 over 11 years Aaron Marcuse-Kubitza bugfix: mappings/VegCore-VegBIEN.csv: place.geo...
  planning 9403 over 11 years Aaron Marcuse-Kubitza added planning/workflow/validation/GeoDistKM.sq...
  schemas 9493 over 11 years Aaron Marcuse-Kubitza inputs/.TNRS/schema.sql, data.sql: updated for ...
  web 9389 over 11 years Aaron Marcuse-Kubitza web/links/index.htm: updated to Firefox bookmar...
.htaccess 326 Bytes 8771 over 11 years Aaron Marcuse-Kubitza /.htaccess: use canonical URL without symlinks
Makefile 12.6 KB 8844 over 11 years Aaron Marcuse-Kubitza bugfix: /Makefile: moved schemas/install from i...
README.TXT 22.7 KB 9499 over 11 years Aaron Marcuse-Kubitza README.TXT: Full database import: added warning...
fix_perms 97 Bytes 7560 almost 12 years Aaron Marcuse-Kubitza Added root fix_perms
map 1001 Bytes 6949 almost 12 years Aaron Marcuse-Kubitza vegbien_dest: Changed default $prefix to "", so...
new_terms.csv 38.1 KB 7222 almost 12 years Aaron Marcuse-Kubitza new_terms.csv: Regenerated
run 450 Bytes 9074 over 11 years Aaron Marcuse-Kubitza *{.sh,run}: removed extra space between functio...
unmapped_terms.csv 13.1 KB 7201 almost 12 years Aaron Marcuse-Kubitza **/new_terms.csv, **/unmapped_terms.csv: Regene...

Latest revisions

# Date Author Comment
9520 05/23/2013 02:32 PM Aaron Marcuse-Kubitza

lib/tnrs.py: repeated_tnrs_request(): renamed to tnrs_request() since this is the function that should usually be used, to ensure that debugging information is output in the case of an error. (the TNRS request must be made again to output this information.)

9519 05/23/2013 02:30 PM Aaron Marcuse-Kubitza

lib/tnrs.py: tnrs_request(): renamed to single_tnrs_request() to distinguish it from repeated_tnrs_request()

9518 05/23/2013 02:25 PM Aaron Marcuse-Kubitza

bin/tnrs_db: removed no longer used $wait flag (which caused tnrs_db to wait max_pause for new rows to be added), because tnrs_db is now invoked automatically after each import by the import_scrub target (in inputs/input.Makefile) and does not need to run as a daemon. note that when scrub is invoked, it is possible that a previous datasource's import has already scrubbed the names for this import, because tnrs_db runs until all rows in tnrs_input_name are scrubbed....

9517 05/23/2013 02:14 PM Aaron Marcuse-Kubitza

bin/tnrs_db: removed no longer needed explicit population of the Time_submitted, which is now done automatically by the tnrs table. however, this requires starting the transaction before submitting data, so Time_submitted is correctly set to the submission time rather than the insertion time. the setting of the correct time can be tested by inserting `time.sleep(n_sec)` after the TNRS request and checking that the Time_submitted is close to the time tnrs_db was run instead of n_sec seconds later.

9516 05/23/2013 02:09 PM Aaron Marcuse-Kubitza

bin/tnrs_db: start transaction before submitting data, so Time_submitted is correctly set to the submission time rather than the insertion time. these may differ by several minutes if TNRS is slow. the setting of the correct time can be tested by inserting `time.sleep(n_sec)` after the TNRS request, removing the explicit setting of Time_submitted, and checking that the Time_submitted is close to the time tnrs_db was run instead of n_sec seconds later.

9515 05/23/2013 02:05 PM Aaron Marcuse-Kubitza

bugfix: bin/tnrs_db: wrap just the TNRS request and the storing of the response data in a function (undoing part of r9514), because the transaction start time for Time_submitted should not be until the TNRS request is actually made (it often takes several minutes to materialize the next set of input names on a full DB)

9514 05/23/2013 01:56 PM Aaron Marcuse-Kubitza

bin/tnrs_db: Iterate over unscrubbed verbatim taxonlabels: put loop body in a function (which returns whether or not the loop should continue), so that the loop body can easily be wrapped in a transaction using sql.with_savepoint()

9513 05/23/2013 01:19 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs.Time_submitted: set default to now() (the timestamp of the start of the current transaction, http://www.postgresql.org/docs/9.1/static/functions-datetime.html) so that it would automatically be populated when rows are added. note that because the start of the current transaction instead of the exact time at insertion is used, all rows inserted in the same transaction (e.g. as part of the same batch) will have the same value for this, linking them together.

9512 05/23/2013 01:10 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_populate_derived_fields(): renamed to tnrs_populate_fields() so it can be used to populate other fields as well

9511 05/23/2013 01:07 PM Aaron Marcuse-Kubitza

bin/tnrs_db: removed no longer needed explicit appending of derived cols, and instead use append_csv()'s new support for importing CSVs whose columns are a subset of the full table

View all revisions | View revisions

Also available in: Atom