/README.TXT: Full database import: added steps to figure out which datasource tables were not successfully imported due to disk space errors
fix: /README.TXT: Full database import: moved verification of exit statuses before verification of DB contents because there is no point in verifying the DB if the datasources didn't finish importing
/README.TXT: Full database import: disk space: documented that the entire disk again gets used long after the beginning of the import, when only a few datasources are running (ie. it definitely seems to be a recent bug in Postgres, and not a latent problem)
/README.TXT: Maintenance: added task to regularly re-run full-database import so that bugs in it don't pile up. it needs to be kept in working order so that it works when it's needed.
/README.TXT: Full database import: added steps to manually reimport the applicable datasources if there are errors due to exceeding available disk space
/README.TXT: Full database import: removed extra `ssh -t vegbiendev.nceas.ucsb.edu` before "upload logs", because the previous steps also occur on vegbiendev
/README.TXT: Notes on system stability: added recommendation to maintain a snapshot copy of the VM as it was at the last successful import, for fallback use if a system upgrade breaks anything. system upgrades on the snapshot VM should be disabled completely, and because this will also disable security fixes, the snapshot VM should be disconnected from the internet and all networking interfaces. (this is an unfortunate consequence of modern OSes being written in non-memory-safe languages such as C and C++.)
/README.TXT: Full database import: disk space: documented that a higher high-water mark actually occurs later in the import, so that the disk usage issue actually remains a problem after the very beginning
fix: /README.TXT: Full database import: disk space: increased the minimum free space recommendation to 1 TB, because analysis of the disk usage during the beginning of the import shows that actually close to the entire amount is being used. however, this problem is normally undetectable unless the disk space is specifically checked, because it only manifests itself if the available disk space is exceeded completely.
/README.TXT: Full database import: documented that the beginning of the import should be scheduled at a time when the DB will not be needed for other uses, because vegbiendev will be slow for the first few hours of the import due to the import using all the available cores
/README.TXT: Full database import: documented that CPU load warning e-mails can safely be ignored. they happen because the parallel imports use all the available cores.
fix: lib/common.Makefile: $(nice): use an increment of +10 instead of +5 because +5 still leaves the shell sluggish
lib/common.Makefile: added $(nice) and use it everywhere its definition is used
/README.TXT: Full database import: exiting `screen`: clarify that you must use `exit`, as Ctrl+D gets disabled to prevent accidental exits
/README.TXT: Full database import: added step to restart Postgres to free up any disk space used by temp tables from the last import (this is apparently not automatically reclaimed)
/Makefile: postgres_restart-Linux: documented that the manual running of the command is needed because for some reason, pg_ctl does not work when run inside make
fix: /Makefile: postgres_restart-Linux: added pause after telling the user the command to run
/Makefile: $(postgresReload-*): use postgres_restart for the postgres-restarting step
bugfix: /Makefile: postgres_restart: added separate Linux version that deals with Linux-specific issues (as in $(postgresReload-Linux))
/Makefile: added postgres_restart, since this is often invoked separately from the entire postgres_reload target
/README.TXT: Full database import: disk space: increased minimum requirement to 500GB (~200GB extra), as the import may use significant additional space for temp tables
/README.TXT: Full database import: documented that env vars set before invoking `screen` will be inherited by it, so these steps will work even if they come before `screen`
backups/TNRS.backup.md5: updated
/README.TXT: Full database import: added steps to set a custom version, if the auto-assigned one would cause a collision with the last import
/README.TXT: Full database import: `unset version`: documented that this is needed because it may have been set in the outer shell
fix: lib/sql_io.py: put_table(): don't warn if can't create pkey, because this just indicates that a set-returning function was used. this should get rid of the last of the confusing benign warnings in the test output.
fix: lib/sql.py: flatten(): don't warn if can't create pkey, because this just indicates that a set-returning function was used
lib/sql.py: run_query_into() added add_pkey_warn param to support turning off "could not create unique index" warnings, which are sometimes benign (eg. when using set-returning functions with column-based import)
/README.TXT: Full database import: disk space: updated schema size (315GB)
/README.TXT: Full database import: removed `up` on jupiter because this is done as part of "do steps under Maintenance > "to synchronize vegbiendev, ..."
/README.TXT: Full database import: moved "do steps under Maintenance > "to synchronize vegbiendev, ..." outside of "On local machine" because these steps don't only take place on the local machine
/README.TXT: use `up` instead of `svn up --force` for consistency
fix: /README.TXT: always use `up` instead of `svn up` since this includes --force
/README.TXT: Full database import: removed unneeded `ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk` at beginning since this is performed again the first time it's needed
fix: /README.TXT: Full database import: removed erroneous line that resulted from a search-and-replace of connection commands in r12396. (it used to read "Follow the steps under Connecting to vegbiendev above, using jupiter instead". this step is now performed on the line below it.)
bin/make_analytical_db: removed remake_diff_tables() because this is now done for each datasource in inputs/input.Makefile
bugfix: schemas/vegbien.sql: schemas/vegbien.sql(): need to util.use_schema(schema_anchor) before initializing vars that use own-schema functions
inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
inputs/input.Makefile: import: validate at the end of the import
inputs/input.Makefile: added new-style aggregating validations (`validate` target)
bin/make_analytical_db: removed no longer needed "${public}_validations" schema qualifier, now that it is in the search_path
fix: bin/vegbien_dest: added public_validations
added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
lib/sh/util.sh: removed end_try_subshell, which now does the same thing as end_try
fix: lib/sh/archives.sh: unzip(): support -p option, which pipes extracted data to stdout
added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
added lib/runscripts/extract_header.run
fix: lib/sh/make.sh: direct the user to use begin_target instead of set_make_vars (set_make_vars is now used by begin_target)
fix: lib/runscripts/util.run: to_top_file(): handle $_remake properly, without requiring deferred_check_target_exists to set to_file()'s flags
bugfix: lib/sh/util.sh: die(): usage: documented that if msg uses $(...), save_e is needed
bugfix: lib/sh/util.sh: already_exists_msg(): need to save_e, because new $(mk_hint) call resets $?
lib/sh/util.sh: die(): always errexit even if $e = 0, because die always indicates an error
lib/sh/util.sh: added rethrow!(), which always errexits, even if $e = 0
lib/sh/util.sh: rethrow(): also work in situations where $e is not set
lib/sh/util.sh: rethrow: made it a function since there is now no need for it to be an alias
lib/sh/util.sh: rethrow: removed `test "$e" != 0` since errexit only does anything if $e != 0
lib/sh/util.sh: removed separate rethrow_exit*, rethrow_subshell*, since they now do the same thing as rethrow*
lib/sh/util.sh: rethrow*!: use new errexit, which works in functions and subshells
lib/sh/util.sh: added errexit(), used in place of (exit "$1") because a bug in bash prevents subshells from triggering errexit
lib/sh/util.sh: added bool!()
fix: lib/sh/util.sh: redir(): need to indent before invoking an external command (not just in command__exec(), but for all redir() calls)
lib/sh/make.sh: with_rm(): documented that it only works inside a runscript target that starts w/ begin_target
*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest
schemas/vegbien.sql: updated _specimens_01_count_of_total_records_specimens_in_source_db
validation/aggregating/specimens/qualitative_validations_specimens.sql: use taxonoccurrence instead of location as the table that all specimens should have, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
lib/runscripts/util.run: support conventional main() method as well as `all` target
fix: inputs/*/*/map.csv: remapped occurrenceID-mapped fields to dataProviderRecordID when these were not globally unique DwC occurrenceIDs (http://rs.tdwg.org/dwc/terms/#occurrenceID)
fix: inputs/CTFS/AggregateObservation/map.csv: field mapped to occurrenceID: remapped to aggregateOrganismObservationID because these are not specimen occurrences
fix: mappings/VegCore-VegBIEN.csv: taxonoccurrence.sourceaccessioncode: need to populate from aggregateOrganismObservationID when only that is available
bugfix: inputs/NY/Ecatalog_all/map.csv: can't use CatalogNumber as pkey because it's not unique and not always populated. this fixes the NY NULL accessionNumbers bug (wiki.vegpath.org/Aggregating_validations_status#bugs).
/README.TXT: moved "to back up e-mails" and "to back up the version history" before settings backup so that the local backup of these is up to date when everything gets backed up
inputs/XAL/Specimen/header.csv: updated
/README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: do this before the general sync so that any reverse sync that's needed won't include it
/README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: use bin/sync_upload now that this works for rsync-ignored files
bugfix: lib/sh/sync.sh: don't unintentionally rsync-ignore explicitly-specified files
lib/sh/util.sh: filesystem: added is_*(), could_be_*()
lib/sh/util.sh: added contains_match()
lib/sh/util.sh: added ends_with()
fix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: run `up` on all machines, not just jupiter, because all must be up-to-date to avoid extraneous diffs
bugfix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: `svn up` on jupiter: need to use up alias because that adds --force
bugfix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: added `svn up` on jupiter: needs to be in main dir (~/bien), not ~/Dropbox/svn/
/README.TXT: to synchronize vegbiendev, jupiter, and your local machine: added `svn up` on jupiter to avoid extraneous diffs when rsyncing
planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
planning/workflow/bien3_architecture.pptx: stage I: clarified that the database input is intended to be a normalized input, and its corresonding output is intended to be denormalized
validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: stage I: clarified that the database input is intended to be a normalized input, and its corresonding output is intended to be denormalized
bugfix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: _specimens_16_list_distinct_specimen_descriptions: should use DISTINCT
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_16_list_distinct_specimen_descriptions
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_15_list_distinct_locality_descriptions
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_09_list_of_unique_verbatim_author_taxa_with_genus
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_08_count_of_unique_verbatim_author_taxa_with_genus
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_05_list_of_verbatim_species_excluding_author
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_04_count_of_unique_verbatim_species_without_author
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_03_list_of_verbatim_families
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_02_count_of_unique_verbatim_families
schemas/vegbien.ERD.mwb: regenerated exports
schemas/vegbien.sql: public_validations: added _specimens_01_count_of_total_records_specimens_in_source_db
validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_01_count_of_total_records_specimens_in_source_db
validation/aggregating/specimens/qualitative_validations_specimens.sql: added config statements for datasource and query planner