/trunk - Changes - BIEN 3 - NCEAS Projects

root/trunk @ 13051

svn:ignore: extern

#	Date	Author	Comment
13051	04/03/2014 03:40 PM	Aaron Marcuse-Kubitza	planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
13050	04/03/2014 03:39 PM	Aaron Marcuse-Kubitza	planning/workflow/bien3_architecture.pptx: updated to reflect decisions made in the 2014-04-03 conference call (wiki.vegpath.org/2014-04-03_conference_call#import-process-2)
13049	04/03/2014 08:53 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_14_count_of_all_invalid_verbatim_lat_long
13048	04/03/2014 08:35 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_12_distinct_collector_name_collect_num_date_w_count
13047	04/03/2014 08:04 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: fixed whitespace
13046	04/03/2014 07:32 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: removed trailing whitespace
13045	04/03/2014 07:31 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_13_count_of_all_verbatim_and_decimal_lat_long
13044	04/02/2014 05:55 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_11_list_of_three_standard_political_divisions
13043	04/02/2014 05:36 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: *_of_species_binomials: switched back to the old queries that use the split-apart ranks instead of the concatenated taxon name. note that these will not work on all specimens datasources, but now that #6,7 were selected to use the concatenated taxon name, this isn't a problem.
13042	04/02/2014 05:21 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
13041	04/02/2014 05:16 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _of_verbatim_species_excluding_author: renamed to _species_binomials for clarity
13040	04/02/2014 05:14 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
13039	04/02/2014 05:01 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_10_count_number_of_records_by_institution
13038	04/02/2014 04:38 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
13037	04/02/2014 04:09 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _of_verbatim_subspecific_taxa_without_author, etc.: renamed to _with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
13036	04/02/2014 04:03 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_06_count_of_unique_verb_subsp_taxa_without_author, _specimens_07_list_of_verbatim_subspecific_taxa_without_author
13035	04/02/2014 03:54 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _verbatim_species_without_author, etc.: renamed to _with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
13034	04/02/2014 03:14 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: removed extra ; at ends of queries
13033	04/02/2014 03:13 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
13032	04/02/2014 03:05 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
13031	04/02/2014 11:17 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: added high-water mark of 1.8 TB @11:15:05
13030	04/02/2014 10:56 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added steps to figure out which datasource tables were not successfully imported due to disk space errors
13029	04/02/2014 10:45 AM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: moved verification of exit statuses before verification of DB contents because there is no point in verifying the DB if the datasources didn't finish importing
13028	04/02/2014 09:01 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: documented that the entire disk again gets used long after the beginning of the import, when only a few datasources are running (ie. it definitely seems to be a recent bug in Postgres, and not a latent problem)
13027	04/01/2014 05:40 PM	Aaron Marcuse-Kubitza	/README.TXT: Maintenance: added task to regularly re-run full-database import so that bugs in it don't pile up. it needs to be kept in working order so that it works when it's needed.
13026	04/01/2014 04:24 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added steps to manually reimport the applicable datasources if there are errors due to exceeding available disk space
13025	04/01/2014 04:13 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: removed extra `ssh -t vegbiendev.nceas.ucsb.edu` before "upload logs", because the previous steps also occur on vegbiendev
13024	04/01/2014 04:04 PM	Aaron Marcuse-Kubitza	/README.TXT: Notes on system stability: added recommendation to maintain a snapshot copy of the VM as it was at the last successful import, for fallback use if a system upgrade breaks anything. system upgrades on the snapshot VM should be disabled completely, and because this will also disable security fixes, the snapshot VM should be disconnected from the internet and all networking interfaces. (this is an unfortunate consequence of modern OSes being written in non-memory-safe languages such as C and C++.)
13023	04/01/2014 03:43 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: documented that a higher high-water mark actually occurs later in the import, so that the disk usage issue actually remains a problem after the very beginning
13022	04/01/2014 03:37 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: disk space: increased the minimum free space recommendation to 1 TB, because analysis of the disk usage during the beginning of the import shows that actually close to the entire amount is being used. however, this problem is normally undetectable unless the disk space is specifically checked, because it only manifests itself if the available disk space is exceeded completely.
13021	04/01/2014 02:04 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that the beginning of the import should be scheduled at a time when the DB will not be needed for other uses, because vegbiendev will be slow for the first few hours of the import due to the import using all the available cores
13020	04/01/2014 01:36 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that CPU load warning e-mails can safely be ignored. they happen because the parallel imports use all the available cores.
13019	04/01/2014 01:31 PM	Aaron Marcuse-Kubitza	fix: lib/common.Makefile: $(nice): use an increment of +10 instead of +5 because +5 still leaves the shell sluggish
13018	04/01/2014 01:29 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: added $(nice) and use it everywhere its definition is used
13017	04/01/2014 01:14 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: exiting `screen`: clarify that you must use `exit`, as Ctrl+D gets disabled to prevent accidental exits
13016	04/01/2014 12:47 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added step to restart Postgres to free up any disk space used by temp tables from the last import (this is apparently not automatically reclaimed)
13015	04/01/2014 12:45 PM	Aaron Marcuse-Kubitza	/Makefile: postgres_restart-Linux: documented that the manual running of the command is needed because for some reason, pg_ctl does not work when run inside make
13014	04/01/2014 12:43 PM	Aaron Marcuse-Kubitza	fix: /Makefile: postgres_restart-Linux: added pause after telling the user the command to run
13013	04/01/2014 12:42 PM	Aaron Marcuse-Kubitza	/Makefile: $(postgresReload-*): use postgres_restart for the postgres-restarting step
13012	04/01/2014 12:30 PM	Aaron Marcuse-Kubitza	bugfix: /Makefile: postgres_restart: added separate Linux version that deals with Linux-specific issues (as in $(postgresReload-Linux))
13011	04/01/2014 12:15 PM	Aaron Marcuse-Kubitza	/Makefile: added postgres_restart, since this is often invoked separately from the entire postgres_reload target
13010	04/01/2014 11:40 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: increased minimum requirement to 500GB (~200GB extra), as the import may use significant additional space for temp tables
13009	04/01/2014 11:37 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that env vars set before invoking `screen` will be inherited by it, so these steps will work even if they come before `screen`
13008	04/01/2014 11:26 AM	Aaron Marcuse-Kubitza	backups/TNRS.backup.md5: updated
13007	04/01/2014 11:23 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added steps to set a custom version, if the auto-assigned one would cause a collision with the last import
13006	04/01/2014 11:08 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: `unset version`: documented that this is needed because it may have been set in the outer shell
13005	03/30/2014 07:54 PM	Aaron Marcuse-Kubitza	fix: lib/sql_io.py: put_table(): don't warn if can't create pkey, because this just indicates that a set-returning function was used. this should get rid of the last of the confusing benign warnings in the test output.
13004	03/30/2014 07:53 PM	Aaron Marcuse-Kubitza	fix: lib/sql.py: flatten(): don't warn if can't create pkey, because this just indicates that a set-returning function was used
13003	03/30/2014 07:52 PM	Aaron Marcuse-Kubitza	lib/sql.py: run_query_into() added add_pkey_warn param to support turning off "could not create unique index" warnings, which are sometimes benign (eg. when using set-returning functions with column-based import)
13002	03/30/2014 06:52 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: updated schema size (315GB)
13001	03/30/2014 06:45 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: removed `up` on jupiter because this is done as part of "do steps under Maintenance > "to synchronize vegbiendev, ..."
13000	03/30/2014 06:44 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: moved "do steps under Maintenance > "to synchronize vegbiendev, ..." outside of "On local machine" because these steps don't only take place on the local machine
12999	03/30/2014 06:41 PM	Aaron Marcuse-Kubitza	/README.TXT: use `up` instead of `svn up --force` for consistency
12998	03/30/2014 06:40 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: always use `up` instead of `svn up` since this includes --force
12997	03/30/2014 06:39 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: removed unneeded `ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk` at beginning since this is performed again the first time it's needed
12996	03/30/2014 06:38 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: removed erroneous line that resulted from a search-and-replace of connection commands in r12396. (it used to read "Follow the steps under Connecting to vegbiendev above, using jupiter instead". this step is now performed on the line below it.)
12995	03/30/2014 06:31 PM	Aaron Marcuse-Kubitza	bin/make_analytical_db: removed remake_diff_tables() because this is now done for each datasource in inputs/input.Makefile
12994	03/30/2014 06:28 PM	Aaron Marcuse-Kubitza	bugfix: schemas/vegbien.sql: schemas/vegbien.sql(): need to util.use_schema(schema_anchor) before initializing vars that use own-schema functions
12993	03/30/2014 06:12 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
12992	03/30/2014 06:08 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: import: validate at the end of the import
12991	03/30/2014 06:02 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: added new-style aggregating validations (`validate` target)
12990	03/30/2014 06:02 PM	Aaron Marcuse-Kubitza	bin/make_analytical_db: removed no longer needed "${public}_validations" schema qualifier, now that it is in the search_path
12989	03/30/2014 06:00 PM	Aaron Marcuse-Kubitza	fix: bin/vegbien_dest: added public_validations
12988	03/30/2014 05:41 PM	Aaron Marcuse-Kubitza	added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
12987	03/30/2014 05:39 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: removed end_try_subshell, which now does the same thing as end_try
12986	03/30/2014 05:38 PM	Aaron Marcuse-Kubitza	fix: lib/sh/archives.sh: unzip(): support -p option, which pipes extracted data to stdout
12985	03/30/2014 05:11 PM	Aaron Marcuse-Kubitza	added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
12984	03/30/2014 05:11 PM	Aaron Marcuse-Kubitza	added lib/runscripts/extract_header.run
12983	03/30/2014 05:09 PM	Aaron Marcuse-Kubitza	fix: lib/sh/make.sh: direct the user to use begin_target instead of set_make_vars (set_make_vars is now used by begin_target)
12982	03/30/2014 05:06 PM	Aaron Marcuse-Kubitza	fix: lib/runscripts/util.run: to_top_file(): handle $_remake properly, without requiring deferred_check_target_exists to set to_file()'s flags
12981	03/30/2014 05:03 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: die(): usage: documented that if msg uses $(...), save_e is needed
12980	03/30/2014 04:59 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: already_exists_msg(): need to save_e, because new $(mk_hint) call resets $?
12979	03/30/2014 04:55 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: die(): always errexit even if $e = 0, because die always indicates an error
12978	03/30/2014 04:53 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added rethrow!(), which always errexits, even if $e = 0
12977	03/30/2014 04:53 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: rethrow(): also work in situations where $e is not set
12976	03/30/2014 04:50 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: rethrow: made it a function since there is now no need for it to be an alias
12975	03/30/2014 04:47 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: rethrow: removed `test "$e" != 0` since errexit only does anything if $e != 0
12974	03/30/2014 04:45 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: removed separate rethrow_exit, rethrow_subshell, since they now do the same thing as rethrow*
12973	03/30/2014 04:42 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: rethrow!: use new errexit, which works in functions and* subshells
12972	03/30/2014 04:38 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added errexit(), used in place of (exit "$1") because a bug in bash prevents subshells from triggering errexit
12971	03/30/2014 04:18 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added bool!()
12970	03/30/2014 03:08 PM	Aaron Marcuse-Kubitza	fix: lib/sh/util.sh: redir(): need to indent before invoking an external command (not just in command__exec(), but for all redir() calls)
12969	03/29/2014 04:10 AM	Aaron Marcuse-Kubitza	lib/sh/make.sh: with_rm(): documented that it only works inside a runscript target that starts w/ begin_target
12968	03/29/2014 04:06 AM	Aaron Marcuse-Kubitza	*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
12967	03/29/2014 03:58 AM	Aaron Marcuse-Kubitza	lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest
12966	03/28/2014 07:17 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: updated _specimens_01_count_of_total_records_specimens_in_source_db
12965	03/28/2014 07:10 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: use taxonoccurrence instead of location as the table that all specimens should have, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
12964	03/28/2014 07:03 AM	Aaron Marcuse-Kubitza	lib/runscripts/util.run: support conventional main() method as well as `all` target
12963	03/28/2014 02:39 AM	Aaron Marcuse-Kubitza	fix: inputs///map.csv: remapped occurrenceID-mapped fields to dataProviderRecordID when these were not globally unique DwC occurrenceIDs (http://rs.tdwg.org/dwc/terms/#occurrenceID)
12962	03/28/2014 02:34 AM	Aaron Marcuse-Kubitza	fix: inputs/CTFS/AggregateObservation/map.csv: field mapped to occurrenceID: remapped to aggregateOrganismObservationID because these are not specimen occurrences
12961	03/28/2014 02:32 AM	Aaron Marcuse-Kubitza	fix: mappings/VegCore-VegBIEN.csv: taxonoccurrence.sourceaccessioncode: need to populate from aggregateOrganismObservationID when only that is available
12960	03/28/2014 02:03 AM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/Ecatalog_all/map.csv: can't use CatalogNumber as pkey because it's not unique and not always populated. this fixes the NY NULL accessionNumbers bug (wiki.vegpath.org/Aggregating_validations_status#bugs).
12959	03/28/2014 01:31 AM	Aaron Marcuse-Kubitza	/README.TXT: moved "to back up e-mails" and "to back up the version history" before settings backup so that the local backup of these is up to date when everything gets backed up
12958	03/28/2014 01:29 AM	Aaron Marcuse-Kubitza	inputs/XAL/Specimen/header.csv: updated
12957	03/28/2014 12:45 AM	Aaron Marcuse-Kubitza	/README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: do this before the general sync so that any reverse sync that's needed won't include it
12956	03/28/2014 12:44 AM	Aaron Marcuse-Kubitza	/README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: use bin/sync_upload now that this works for rsync-ignored files
12955	03/28/2014 12:36 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/sync.sh: don't unintentionally rsync-ignore explicitly-specified files
12954	03/28/2014 12:32 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: filesystem: added is_(), could_be_()
12953	03/28/2014 12:31 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added contains_match()
12952	03/28/2014 12:31 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added ends_with()

Project

General

Profile