backups/*retention_policy*: added explanations
backups/*retention_policy*: on jupiter: backups further back: removed "if disk space permits" because this is already labeled "optionally"
backups/*retention_policy*: changed to require retaining *.backup of the last 2 successful imports on all machines
backups/*retention_policy*: allow keeping *.backup of the last 2 successful imports on all machines, not just jupiter
: renamed 2TB drive's BIEN3 partition to BIEN3.**SAVE since one might not see the SAVE file in it
/"**DO_NOT_DELETE": renamed to shorter SAVE
added backups/*retention_policies*/ with retention policy files for each partition
backups/README.TXT: renamed to retention_policy to match the naming convention of the retention policy files in the various partitions
/README.TXT: to back up the local machine's hard drive: also exclude *-files indicating the (differing) retention statuses of the partitions involved
lib/tnrs.py single_tnrs_request(), bin/tnrs_client: use_tnrs_export: default to False because this mode uses incorrect selected matches (vegpath.org/issues/943), and the JSON mode that fixes this is now available
bin/tnrs_db: tnrs.tnrs_request() call: explicitly set use_tnrs_export=True so that this continues to work if the default value is changed
bugfix: lib/csvs.py: JsonReader: need to pass col_order to row_dict_to_list_reader
config/VirtualBox_VMs/vegbiendev/README.TXT: ~/Documents/BIEN/vegbiendev.2014-2-2_1-07-32PT.+VirtualBox_changes/: renamed to vegbiendev.2014-2-2_1-07-32PT.VirtualBox/ to make clear that this is the VirtualBox version of vegbiendev
bugfix: lib/tnrs.py: JSON output: need to stringify arrays so they match what is output in TSV-export mode
lib/csvs.py: JsonReader: added support for values that are arrays
lib/csvs.py: MultiFilter: inherit from WrapReader instead of Filter to avoid needing to define a no-op filter_() function
bugfix: lib/csvs.py: row_dict_to_list_reader: need to override next() directly instead of just using Filter, because Filter doesn't support returning multiple rows for one input row (in this case, prepending a header row). this caused the 1st data row to be missing.
lib/csvs.py: Filter: inherit from WrapReader, which separates out the CSV-reader API code
lib/csvs.py: added WrapReader
lib/csvs.py: added Reader
schemas/public_.sql: views that use view_full_occurrence_individual_view: use the view_full_occurrence_individual table instead, now that this is materialized.
planning/meetings/BIEN conference call availability.xlsx: updated
/README.TXT: to back up the local machine's hard drive: renamed backup partition to BIEN3 to make clear what the backup drive contains
fix: /README.TXT: to back up the local machine's hard drive: updated location of `screen` for added commands
/README.TXT: added trailing / on dirs to make clear that they're dirs
config/VirtualBox_VMs/vegbiendev/README.TXT: added instructions to configure the VM to support VirtualBox
config/VirtualBox_VMs/vegbiendev/README.TXT: added instructions to retrieve the contents of the VM, with the VirtualBox changes added
config/VirtualBox_VMs/vegbiendev/README.TXT: to retrieve the original contents of the backup from the VM: added steps to restore the correct VM snapshot
config/VirtualBox_VMs/vegbiendev/README.TXT: also generate list of all the files whose permissions were changed since the backup, but which are extracted with their changed permissions instead of their original ones in the backup
config/VirtualBox_VMs/vegbiendev/README.TXT: added instructions to retrieve the original contents of the backup from the VM
fix: /README.TXT: to back up vegbiendev: also back up /home/aaronmk/bien/ (instead of just symlinking to the local copy), since this can be done space-efficiently with hardlinks. this ensures that the vegbiendev backup will not be modified when the local copy of bien/ is.
lib/csvs.py: JsonReader: factored out row-dict-to-list into new row_dict_to_list_reader so that JSON-specific preprocessing is kept separate from the row format translation
lib/csvs.py: added MultiFilter, which enables applying multiple filters by nesting
lib/tnrs.py: single_tnrs_request(): JSON mode: implemented output of JSON data
lib/tnrs.py: single_tnrs_request(): factored out wrapping in TnrsOutputStream, since this is done for both modes
fix: lib/tnrs.py: JSON mode: TSV export columns: need to translate these to JSON column names before they can be used with the JSON data
lib/csvs.py: added JsonReader, which reads parsed JSON data as row tuples
lib/csvs.py: added row_dict_to_list(), which translates a CSV dict-based row to a list-based one
lib/csvs.py: RowNumFilter: added support for filtering the header row as well
lib/csvs.py: ColInsertFilter: added support for filtering the header row as well
lib/csvs.py: InputRewriter: documented that this is also a stream (in addition to inheriting from StreamFilter)
bugfix: lib/csvs.py: InputRewriter: accept a reader, as would be expected, instead of a custom stream whose lines are tuples
fix: lib/sql_io.py: append_csv(): use new csvs.ProgressInputFilter instead of streams.ProgressInputStream(csvs.StreamFilter(__)), so that the input to csvs.InputRewriter is a reader, not a stream. this avoids the need for csvs.InputRewriter to accept a stream whose lines are tuples, instead of the expected reader.
bugfix: inputs/input.Makefile: %/install: $(exportHeader) must come before postprocess because postprocess renames columns
exports/: svn:ignore: added *.gz
lib/csvs.py: added ProgressInputFilter, analogous to streams.ProgressInputStream
lib/sql_io.py: added commented-out debug statement used to troubleshoot copy_expert() errors
lib/dicts.py: added pair_keys(), pair_values()
bugfix: lib/streams.py: CaptureStream: end_idx must also be > start_idx
bugfix: inputs/input.Makefile: $(import_install_): need `set -o pipefail` to enable errexit
/README.TXT: to backup files not in Time Machine: don't need to review diff because command is unidirectional
fix: /README.TXT: to back up the local machine's hard drive: "repeat until only minimal changes" should refer to the first sync command
inputs/.geoscrub/geoscrub_output/run: documented postprocess() rm=1 runtime (6 min)
lib/tnrs.py: single_tnrs_request(): use_tnrs_export=False: need to obtain export columns
lib/csvs.py: added header(stream)
fix: lib/tnrs.py: single_tnrs_request(): need to `assert name_ct >= 1`, because with no names, TNRS hangs indefinitely
bin/tnrs_client: added env var to configure use_tnrs_export
/README.TXT: to back up vegbiendev: use inplace=1 to speed stopping and resuming transfer
fix: /README.TXT: to back up the local machine's hard drive: removed --extended-attributes (after initial sync) because rsync apparently has to visit every file for this
fix: /README.TXT: to back up the local machine's hard drive: also need --extended-attributes
/README.TXT: to back up the local machine's hard drive: removed --delete-before now that that partition has been expanded
fix: /README.TXT: to back up vegbiendev: exclude /var/lib/mysql.bak,postgresql.bak because the local machine doesn't need 2 copies of this information
/README.TXT: to back up vegbiendev: removed no longer needed exclude of Dropbox subdir backup
fix: /README.TXT: to back up vegbiendev: also need to do steps under Maintenance > "to synchronize vegbiendev, jupiter, and your local machine" because /home/aaronmk/bien is not synced here
bugfix: /README.TXT: to back up vegbiendev: need `overwrite=1`
/README.TXT: to back up the version history: don't also need this on vegbiendev because it's already on jupiter and the local machine
bugfix: /README.TXT: to back up vegbiendev: need to include Postgres config files
/README.TXT: to back up the local machine's hard drive: don't back up temp files: added /.fseventsd/
fix: /README.TXT: to back up the local machine's hard drive: initial runtime: use range instead because some of the later runtime might have been from the same files
/README.TXT: to back up the local machine's hard drive: updated initial runtime to include additional transferred files (17 h)
fix: /README.TXT: to back up the local machine's hard drive: need to use --delete-before because the backup partition is near capacity
/README.TXT: to back up the local machine's hard drive: don't back up temp files such as /private/var/vm/*
fix: /README.TXT: to back up the local machine's hard drive: back up most Dropbox/Postgres files before stopping processes, to minimize downtime
bugfix: /README.TXT: to back up the local machine's hard drive: can't use ~ with --exclude
fix: inputs/.geoscrub/geoscrub_output/postprocess.sql: map_geovalidity(): unscrubbable names should actually be geo*in*valid, not geovalid=NULL, according to Brad
/README.TXT: to back up the local machine's hard drive: back up the non-Dropbox, non-Postgres files separately to minimize the Dropbox and Postgres downtime
/README.TXT: to back up the vegbiendev databases: don't need to review diff for these as it's always unidirectional
/README.TXT: added instructions to back up vegbiendev
fix: /README.TXT: to back up the local machine's hard drive: also need to repeat backup command until only minimal changes
/README.TXT: to back up the local machine's hard drive: added step to stop Postgres
bugfix: /README.TXT: to back up the local machine's hard drive: also need to stop Dropbox
/README.TXT: to back up the local machine's settings: added step to remove .DS_Store
fix: /README.TXT: to back up the local machine's settings: Dropbox: shoudl not run with `del=`, because the backup should be an exact replica
backups/TNRS.*: removed no longer needed old TNRS backups, which are part of the respective full-database backups in any case
added config/phpMyAdmin/ symlink to schemas/VegCore/phpMyAdmin/
bugfix: lib/sh/archives.sh: compress(): don't include dir prefix in zip archive
lib/sh/util.sh: cd(): use echo_run instead of a manual echo_cmd call
fix: lib/sh/util.sh: cd(): indent after running cd rather than before
lib/sh/util.sh: cd(): support rebasing path vars for the new dir
bugfix: lib/sh/archives.sh: compress(): need to use zip's path syntax to avoid the file in the archive being named "-"
lib/tnrs.py: added option to avoid using TNRS's TSV export feature, which currently returns incorrect selected matches (vegpath.org/issues/943). this has been implemented up through the GWT/JSON decoding.
lib/tnrs.py: added gwt_decode()
lib/strings.py: added unesc_quotes() and helper functions
lib/strings.py: added json_decode()
/README.TXT: To re-run geoscrubbing: updated runtimes
exports/*_GBIF.csv.run: documented compress_() runtime (20 min-1 h)
lib/runscripts/extract.run: export_(): also compress created file
lib/sh/archives.sh: added compress(), expand(), which handle compression of individual files