/trunk/README.TXT - Changes - BIEN 3 - NCEAS Projects

root/trunk/README.TXT @ 13422

#	Date	Author	Comment
13422	05/09/2014 01:42 AM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: clear any limit set in .profile: moved to inside screen because it must happen within screen to avoid affecting the outer shell
13421	05/09/2014 01:40 AM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: added step to clear any limit set in .profile (applicable to local machine)
13341	04/29/2014 09:33 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: Mac backup: exclude ~/VirtualBox VMs/Ubuntu/Ubuntu.vdi, to avoid it being re-uploaded twice each time, due to an rsync verification error (https://projects.nceas.ucsb.edu/nceas/issues/907)
13337	04/29/2014 04:42 PM	Aaron Marcuse-Kubitza	/README.TXT: changed "then rerun with l=1 ..." to "then review diff, and rerun with `l=1` prepended" to ensure that user reviews diff before syncing
13336	04/29/2014 04:40 PM	Aaron Marcuse-Kubitza	/README.TXT: to synchronize a Mac's settings with my testing machine's: removed separate step to upload just the VirtualBox VMs, because that is now part of the main upload
13335	04/29/2014 04:40 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: to synchronize a Mac's settings with my testing machine's: need to sync VirtualBox VMs with inplace=1 because they are very large files
13333	04/29/2014 03:26 PM	Aaron Marcuse-Kubitza	/README.TXT: to back up the version history: back up first on the local machine, because often only the svnsync command gets run, and that way it will get backed up immediately to Dropbox (and hourly to Time Machine), while vegbiendev only gets backed up daily to tape
13332	04/29/2014 03:23 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: to back up the version history: use absolute path for vegbiendev commands because the Ubuntu 14.04 version of rsync doesn't expand ~ properly
13331	04/29/2014 02:36 PM	Aaron Marcuse-Kubitza	/README.TXT: to back up the version history: use $HOME to make paths platform-independent
13284	04/22/2014 08:14 PM	Aaron Marcuse-Kubitza	/README.TXT: use `sudo -u ... -i` instead of `sudo su - ...` to avoid using two commands to accomplish the login
13119	04/10/2014 03:13 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Full database import: to import just a subset of the datasources: array env var needs to be set after opening the `screen` shell because array vars are apparently not inherited by the `screen` shell
13118	04/10/2014 02:42 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: to import just a subset of the datasources: added step to set custom import name
13117	04/10/2014 02:41 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added instructions for importing just a subset of the datasources
13031	04/02/2014 11:17 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: added high-water mark of 1.8 TB @11:15:05
13030	04/02/2014 10:56 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added steps to figure out which datasource tables were not successfully imported due to disk space errors
13029	04/02/2014 10:45 AM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: moved verification of exit statuses before verification of DB contents because there is no point in verifying the DB if the datasources didn't finish importing
13028	04/02/2014 09:01 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: documented that the entire disk again gets used long after the beginning of the import, when only a few datasources are running (ie. it definitely seems to be a recent bug in Postgres, and not a latent problem)
13027	04/01/2014 05:40 PM	Aaron Marcuse-Kubitza	/README.TXT: Maintenance: added task to regularly re-run full-database import so that bugs in it don't pile up. it needs to be kept in working order so that it works when it's needed.
13026	04/01/2014 04:24 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added steps to manually reimport the applicable datasources if there are errors due to exceeding available disk space
13025	04/01/2014 04:13 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: removed extra `ssh -t vegbiendev.nceas.ucsb.edu` before "upload logs", because the previous steps also occur on vegbiendev
13024	04/01/2014 04:04 PM	Aaron Marcuse-Kubitza	/README.TXT: Notes on system stability: added recommendation to maintain a snapshot copy of the VM as it was at the last successful import, for fallback use if a system upgrade breaks anything. system upgrades on the snapshot VM should be disabled completely, and because this will also disable security fixes, the snapshot VM should be disconnected from the internet and all networking interfaces. (this is an unfortunate consequence of modern OSes being written in non-memory-safe languages such as C and C++.)
13023	04/01/2014 03:43 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: documented that a higher high-water mark actually occurs later in the import, so that the disk usage issue actually remains a problem after the very beginning
13022	04/01/2014 03:37 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: disk space: increased the minimum free space recommendation to 1 TB, because analysis of the disk usage during the beginning of the import shows that actually close to the entire amount is being used. however, this problem is normally undetectable unless the disk space is specifically checked, because it only manifests itself if the available disk space is exceeded completely.
13021	04/01/2014 02:04 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that the beginning of the import should be scheduled at a time when the DB will not be needed for other uses, because vegbiendev will be slow for the first few hours of the import due to the import using all the available cores
13020	04/01/2014 01:36 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that CPU load warning e-mails can safely be ignored. they happen because the parallel imports use all the available cores.
13017	04/01/2014 01:14 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: exiting `screen`: clarify that you must use `exit`, as Ctrl+D gets disabled to prevent accidental exits
13016	04/01/2014 12:47 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added step to restart Postgres to free up any disk space used by temp tables from the last import (this is apparently not automatically reclaimed)
13010	04/01/2014 11:40 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: increased minimum requirement to 500GB (~200GB extra), as the import may use significant additional space for temp tables
13009	04/01/2014 11:37 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that env vars set before invoking `screen` will be inherited by it, so these steps will work even if they come before `screen`
13007	04/01/2014 11:23 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added steps to set a custom version, if the auto-assigned one would cause a collision with the last import
13006	04/01/2014 11:08 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: `unset version`: documented that this is needed because it may have been set in the outer shell
13002	03/30/2014 06:52 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: updated schema size (315GB)
13001	03/30/2014 06:45 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: removed `up` on jupiter because this is done as part of "do steps under Maintenance > "to synchronize vegbiendev, ..."
13000	03/30/2014 06:44 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: moved "do steps under Maintenance > "to synchronize vegbiendev, ..." outside of "On local machine" because these steps don't only take place on the local machine
12999	03/30/2014 06:41 PM	Aaron Marcuse-Kubitza	/README.TXT: use `up` instead of `svn up --force` for consistency
12998	03/30/2014 06:40 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: always use `up` instead of `svn up` since this includes --force
12997	03/30/2014 06:39 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: removed unneeded `ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk` at beginning since this is performed again the first time it's needed
12996	03/30/2014 06:38 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: removed erroneous line that resulted from a search-and-replace of connection commands in r12396. (it used to read "Follow the steps under Connecting to vegbiendev above, using jupiter instead". this step is now performed on the line below it.)
12959	03/28/2014 01:31 AM	Aaron Marcuse-Kubitza	/README.TXT: moved "to back up e-mails" and "to back up the version history" before settings backup so that the local backup of these is up to date when everything gets backed up
12957	03/28/2014 12:45 AM	Aaron Marcuse-Kubitza	/README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: do this before the general sync so that any reverse sync that's needed won't include it
12956	03/28/2014 12:44 AM	Aaron Marcuse-Kubitza	/README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: use bin/sync_upload now that this works for rsync-ignored files
12951	03/27/2014 11:13 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: run `up` on all machines, not just jupiter, because all must be up-to-date to avoid extraneous diffs
12950	03/27/2014 11:11 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: `svn up` on jupiter: need to use up alias because that adds --force
12949	03/27/2014 11:10 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: added `svn up` on jupiter: needs to be in main dir (~/bien), not ~/Dropbox/svn/
12948	03/27/2014 11:08 PM	Aaron Marcuse-Kubitza	/README.TXT: to synchronize vegbiendev, jupiter, and your local machine: added `svn up` on jupiter to avoid extraneous diffs when rsyncing
12929	03/27/2014 04:43 AM	Aaron Marcuse-Kubitza	/README.TXT: Schema changes: manually apply schema changes to the live public schema: moved under "update mappings and staging table column names" because this is a necessary part of that step
12928	03/27/2014 04:43 AM	Aaron Marcuse-Kubitza	/README.TXT: Schema changes: manually apply schema changes to the live public schema: moved under "update mappings and staging table column names" because this is a necessary part of that step
12927	03/27/2014 04:40 AM	Aaron Marcuse-Kubitza	/README.TXT: Schema changes: changed "update staging table column names" to "update mappings and staging table column names"
12887	03/24/2014 05:45 PM	Aaron Marcuse-Kubitza	/README.TXT: `make inputs/{NVS,SALVIAS,TEAM}/test`: updated runtime (1 min)
12883	03/24/2014 05:04 PM	Aaron Marcuse-Kubitza	/README.TXT: calls to `inputs/run postprocess`: direct user to refer to inputs/run for this, so the runtime doesn't have to be updated in multiple places
12881	03/24/2014 05:01 PM	Aaron Marcuse-Kubitza	/README.TXT: Schema changes: added steps to update staging table column names on the local machine and vegbiendev
12877	03/24/2014 01:21 AM	Aaron Marcuse-Kubitza	/README.TXT: Maintenance: VegCore data dictionary: `make inputs/{NVS,SALVIAS,TEAM}/test`: recorded runtime (30 s)
12876	03/24/2014 01:17 AM	Aaron Marcuse-Kubitza	/README.TXT: Maintenance: VegCore data dictionary: `make inputs/{NVS,SALVIAS,TEAM}/test`: prepended `time` to enable obtaining the runtime
12875	03/24/2014 01:11 AM	Aaron Marcuse-Kubitza	/README.TXT: Maintenance: VegCore data dictionary: `inputs/run postprocess`: updated runtime (20 min)
12752	03/18/2014 05:34 AM	Aaron Marcuse-Kubitza	inputs/run: postprocess(): documented runtime (30 min)
12741	03/18/2014 02:59 AM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Maintenance: VegCore data dictionary: apply new data dict mappings: need to use postprocess rather than import runscript target, so that the command also works on an svn checkout without the flat files (the flat files are not needed for the staging table renaming)
12718	03/14/2014 09:09 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Maintenance: VegCore data dictionary: apply new data dict mappings: need to use import rather than mappings runscript target, to rename the staging tables
12717	03/14/2014 09:06 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Maintenance: VegCore data dictionary: also need to apply new data dict mappings on vegbiendev
12716	03/14/2014 08:19 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: Maintenance: VegCore data dictionary: added steps to apply the new data dictionary mappings to the datasource mappings and staging tables
12548	02/28/2014 10:28 PM	Aaron Marcuse-Kubitza	/Makefile: added separate phppgadmin-Linux target to avoid needing to run the entire postgres-Linux target whenever http://vegbiendev.nceas.ucsb.edu/phppgadmin/ goes down (after some system updates)
12426	02/25/2014 07:56 AM	Aaron Marcuse-Kubitza	/README.TXT: use full hostname for jupiter so the commands work outside of the NCEAS network as well
12396	02/23/2014 11:57 PM	Aaron Marcuse-Kubitza	fix: /README.TXT: use exact ssh command needed to connect to vegbiendev/jupiter (eg. `ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk`) instead of vaguely referring to "on vegbiendev"/"on jupiter"
12395	02/23/2014 11:28 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: screen: run `unset TMOUT` first because it is most important, now that the remote servers have a TMOUT set for extra security
12381	02/23/2014 06:00 PM	Aaron Marcuse-Kubitza	/README.TXT: to back up the version history: added steps to sync git to the local machine
12227	02/15/2014 04:08 AM	Aaron Marcuse-Kubitza	/README.TXT: Schema changes: clarified that the staging tables should only be reinstalled if needed
12226	02/15/2014 04:08 AM	Aaron Marcuse-Kubitza	/README.TXT: put ... around all uppercased text, for consistency
12148	02/08/2014 11:17 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Full database import: Check that source contains [# datasources] rows up through XAL: added alternative verification method when this is not the case (some datasources may be near the end depending on import order)
12128	02/07/2014 08:37 AM	Aaron Marcuse-Kubitza	/README.TXT: to back up the version history: added back `git svn fetch` so we keep the git export up-to-date, too
12127	02/07/2014 08:29 AM	Aaron Marcuse-Kubitza	/README.TXT: to back up the version history: added runtimes (1.5 h for the initial svnsync)
12126	02/07/2014 08:24 AM	Aaron Marcuse-Kubitza	/README.TXT: to back up the version history: added trailing /s to dirs
12125	02/07/2014 08:24 AM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: to back up the version history: fixed svn_repo/ path
12115	02/07/2014 06:52 AM	Aaron Marcuse-Kubitza	/README.TXT: to back up the version history: use svnsync instead of `git svn fetch`, so that the backup is in a format that can be directly reimported into an svn repo
12027	02/02/2014 10:27 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: `make test by_col=1`: documented runtime (20 min)
12026	02/02/2014 10:27 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: `. bin/import_all`: documented how to view progress
12023	02/02/2014 06:22 PM	Aaron Marcuse-Kubitza	/README.TXT: to back up the version history: added steps to sync the git repository to jupiter and the local machine
12022	02/02/2014 05:44 PM	Aaron Marcuse-Kubitza	/README.TXT: added steps to back up the version history
12011	01/25/2014 09:18 PM	Aaron Marcuse-Kubitza	/README.TXT: Notes on running programs: added warning that you should always start with a clean shell to avoid spurious bugs
11985	01/21/2014 07:23 PM	Aaron Marcuse-Kubitza	/README.TXT: Testing: added pointer to development machine specs
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11967	01/18/2014 10:51 PM	Aaron Marcuse-Kubitza	/README.TXT: added note that shell scripts should always be read-only, so that editing them while an import is in progress will not crash the import (see http://vegpath.org/links/#**%20modifying%20a%20running%20shell%20script)
11940	01/09/2014 12:31 AM	Aaron Marcuse-Kubitza	/README.TXT: to synchronize a Mac's settings with my testing machine's: added step to remove the downloaded Spam folder, because spam e-mails often contain viruses that would trigger clamscan
11915	12/16/2013 05:46 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that you should always start with a clean shell, which does not have changes to the env vars. (there have been inexplicable bugs that went away after closing and reopening the terminal window.) note that running `exec bash` is not sufficient to reset the env vars.
11897	12/11/2013 07:53 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: backups: added step to download backup to local machine
11892	12/10/2013 07:36 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: In PostgreSQL: documented that the tables to check are located in the r# schema, not public
11866	12/09/2013 02:27 PM	Aaron Marcuse-Kubitza	/README.TXT: Datasource setup: added steps to backup e-mails
11800	12/03/2013 06:27 AM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Full database import: To restart an aborted import for a specific table: run the two commands in errexit mode so that the datasource does not incorrectly have the temp suffix removed if the import command exited with an error
11795	11/27/2013 11:16 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Full database import: To restart an aborted import for a specific table: added command to remove the temp suffix from the source table entry, which is not automatic for importing a specific table (only for importing the entire datasource, at the end of which the datasource is considered completely imported and ready to overwrite any previous import)
11787	11/26/2013 11:10 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that `make schemas/reinstall` requires sudo access
11728	11/21/2013 04:59 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: verifying import: In PostgreSQL: don't include current values of the datasource counts, etc., because these may change and should always be re-checked at wiki.vegpath.org/VegBIEN_contents
11686	11/18/2013 05:05 AM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: to backup files not in Time Machine: PostgreSQL: need to run with `overwrite=1` so removed files are also deleted
11685	11/18/2013 05:02 AM	Aaron Marcuse-Kubitza	/README.TXT: to backup files not in Time Machine: PostgreSQL: only stop PostgreSQL after all files have been copied, to minimize the time that the PostgreSQL server is down (the final copy just copies concurrent changes)
11684	11/18/2013 05:02 AM	Aaron Marcuse-Kubitza	/README.TXT: to backup files not in Time Machine: PostgreSQL: only stop PostgreSQL after all files have been copied, to minimize the time that the PostgreSQL server is down (the final copy just copies concurrent changes)
11683	11/18/2013 04:59 AM	Aaron Marcuse-Kubitza	/README.TXT: updated to PostgreSQL 9.3
11573	11/05/2013 10:31 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: after import: record the import times in inputs/import.stats.xls: documented that this should be run on the local machine, because it needs the Mac filename ordering
11570	11/05/2013 08:54 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: after import: removed step to install analytical_stem on nimoy because the import mechanism is not set up to do this (we don't generate CSV exports of the full analytical_stem table because they take up a lot of space and are not currently used for anything)
11569	11/05/2013 08:32 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: after import: In PostgreSQL: added step to check that analytical_stem contains the expected # of rows
11568	11/05/2013 08:16 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: after import: In PostgreSQL: added specific instructions for determining which/how many datasources are expected to be included in the provider_count and source tables
11516	10/31/2013 12:50 AM	Aaron Marcuse-Kubitza	/README.TXT: for each task, documented which machine it's run on. for tasks run on vegbiendev, added pointer to "Connecting to vegbiendev" steps.
11515	10/31/2013 12:19 AM	Aaron Marcuse-Kubitza	/README.TXT: added instructions for connecting to vegbiendev
11263	10/13/2013 12:02 AM	Aaron Marcuse-Kubitza	/README.TXT: Single datasource import: added pointer to instructions to remake the analytical DB (also required after single datasource import)

Project

General

Profile