/trunk/README.TXT - BIEN 3 - NCEAS Projects

root/trunk/README.TXT @ 14883

       Installation:
       	open a terminal window
       	Check out svn:
       		sudo apt-get --yes install subversion # not preinstalled on Ubuntu
       		svn co https://code.nceas.ucsb.edu/code/projects/bien/trunk bien
       	cd bien/
       	Install:
       		**WARNING**: This will delete the public schema of your VegBIEN DB!
       		make install
       		# at "reload PATH" (if displayed), do what it says
       		# at "Are you sure you want to continue connecting", type "yes" and
       			press Enter
       		# at "aaronmk@jupiter's password", enter the applicable password
       		# at "[sudo] password for user", enter your password and press Enter
       		# at "Modifying postgresql.conf and pg_hba.conf", type y and press Enter
       		# at "kernel.shmmax [...] Press ENTER to continue":
       			# open a new window
       			# run what it says
       			# press Ctrl-D
       			# return to the previous window
       			# press Enter
       		# at "restart PostgreSQL manually ... Press ENTER to continue":
       			# open a new window
       			# run what it says
       			# press Ctrl-D
       			# return to the previous window
       			# press Enter
       		# at "This will delete the current public schema of your VegBIEN DB",
       			type y and press Enter
       		# at "If asked for MySQL root password", copy the password to the
       			clipboard and press Enter
       		# at "Web server to reconfigure automatically", select apache2 and click
       			Ok
       		# at "Configure database for phpmyadmin with dbconfig-common?", click
       			Yes
       		# at "Password of the database's administrative user", paste the
       			password and click Ok
       		# at "MySQL application password for phpmyadmin", just click Ok
       		# at "An error occurred while installing the database", click Ok
       		# at "Next step for database installation", select ignore and click Ok
       		# at "aaronmk@jupiter's password", enter the applicable password
       	Uninstall: make uninstall
       		**WARNING**: This will delete your entire VegBIEN DB!
       		This includes all archived imports and staging tables.
       Connecting to vegbiendev:
       	ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       	cd /home/bien # should happen automatically at login
       Single datasource refresh:
       	ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       	# -> Maintenance > to back up the vegbiendev databases
       	# place updated extract in inputs/$datasrc/_src/
       	# place extracted flat file(s) in the appropriate table subdirs
       	rm=1 inputs/<datasrc>/run # reload staging tables
       	make inputs/<datasrc>/reimport_scrub by_col=1 &
       		# this works whether or not datasource is already imported
       	tail -150 inputs/<datasrc>/*/logs/public.log.sql # view progress
       	# -> Full database import > To re-run geoscrubbing
       	# -> Full database import > To remake analytical DB
       	# -> Full database import > To back up DB
       	# -> Maintenance > to back up the vegbiendev databases
       datasource removal:
       	ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       	$ make inputs/$datasrc/rm # runtime: see
       		# http://vegpath.org/wiki/Individual_datasource_refresh#datasource-removal-runtimes
       Notes on system stability:
       	**WARNING**: when shutting down the VM, always first stop Postgres:
       		sudo service postgresql stop
       		this prevents the OS from SIGKILLing Postgres, which sometimes causes
       		database corruption
       Notes on running programs:
       	**WARNING**: always start with a clean shell, to avoid spurious bugs. the
       		shell should not have changes to the env vars. (there have been bugs
       		that went away after closing and reopening the terminal window.) note
       		that running `exec bash` is not sufficient to *reset* the env vars.
       Notes on editing files:
       	**WARNING**: shell scripts should always be read-only, so that editing them
       		while an import is in progress will not crash the import (see
       		http://vegpath.org/links/#**%20modifying%20a%20running%20shell%20script)
       Full database import:
       	**WARNING**: You must perform *every single* step listed below, to avoid
       		breaking column-based import
       	**WARNING**: always start with a clean shell, as described above under
       		"Notes on running programs"
       	**IMPORTANT**: the beginning of the import should be scheduled at a time
       		when the DB will not be needed for other uses. this is necessary because
       		vegbiendev will be slow for the first few hours of the import, due to
       		the import using all the available cores.
       	do steps under Maintenance > "to synchronize vegbiendev, jupiter, and
       		your local machine"
       	On local machine:
       		make inputs/upload
       		make inputs/upload live=1
       		make test by_col=1 # runtime: 1 h ("53m7.383s") @starscream
       			if you encounter errors, they are most likely related to the
       				PostgreSQL error parsing in /lib/sql.py parse_exception()
       			See note under Testing below
       	ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       	Ensure there are no local modifications: svn st
       	up
       	make inputs/download
       	make inputs/download live=1
       	For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
       	Update the auxiliary schemas: make schemas/reinstall
       		**WARNING**: requires sudo access!
       		The public schema will be installed separately by the import process
       	Delete imports before the last so they won't bloat the full DB backup:
       		make backups/vegbien.<version>.backup/remove
       		To keep a previous import other than the public schema:
       			export dump_opts='--exclude-schema=public --exclude-schema=<version>'
       			# env var will be inherited by `screen` shell
       	restart Postgres to free up any disk space used by temp tables from the last
       		import (this is apparently not automatically reclaimed):
       		make postgres_restart
       	Make sure there is at least 1 TB of disk space on /: df -h
       		although the import schema itself is only 315 GB, Postgres uses
       			significant temporary space at the beginning of the import.
       			the total disk usage oscillates between 1.2 TB and the entire disk
       			for the first day (for import started @12:55:09, high-water marks of
 .7 TB @14:00:25, 1.8 TB @15:38:32; then next day w/ 2 datasources
       			running: entire disk for 4 min @05:35:44, 1.8 TB @11:15:05).
       		To free up space, remove backups that have been archived on jupiter:
       			List backups/ to view older backups
       			Check their MD5 sums using the steps under On jupiter below
       			Remove these backups
       	for full import:
       		screen
       		Press ENTER
       	$0 # nested shell to prevent errexit from closing the window
       	the following must happen within screen to avoid affecting the outer shell:
       	unset TMOUT # TMOUT causes shell to exit even with background processes
       	set -o ignoreeof # prevent Ctrl+D from exiting shell to keep attached jobs
       	on local machine:
       		unset n # clear any limit set in .profile (unless desired)
       		unset log # allow logging output to go to log files
       	unset version # clear any version from last import, etc.
       	if no commits have been made since the last import (eg. if retrying an
       		import), set a custom version that differs from the auto-assigned one
       		(would otherwise cause a collision with the last import):
       		svn info
       		extract the svn revision after "Revision:"
       		export version=r[revision]_2 # +suffix to distinguish from last import
       			# env var will be inherited by `screen` shell
       	to import just a subset of the datasources:
       		declare -ax inputs; inputs=(inputs/{src,...}/) # no () in declare on Mac
       			# array vars *not* inherited by `screen` shell
       		export version=custom_import_name
       	Start column-based import: . bin/import_all
       		To use row-based import: . bin/import_all by_col=
       		To stop all running imports: . bin/stop_imports
       		**WARNING**: Do NOT run import_all in the background, or the jobs it
       			creates won't be owned by your shell.
       		Note that import_all will take up to an hour to import the NCBI backbone
       			and other metadata before returning control to the shell.
       		To view progress:
       			tail inputs/{.[^as.],}*/*/logs/$version.log.sql
       	note: at the beginning of the import, the system may send out CPU load
       		warning e-mails. these can safely be ignored. (they happen because the
       		parallel imports use all the available cores.)
       	for test import, turn off DB backup (also turns off analytical DB creation):
       		kill % # cancel after_import()
       	Wait (4 days) for the import to finish
       	**WARNING**: do *not* run backups/pg_snapshot while the import is running,
       		due to continuously-changing files
       	**WARNING**: do *not* run backups/pg_snapshot until the previous import has
       		been replaced, to avoid running into disk space limits
       	To recover from a closed terminal window: screen -r
       	To restart an aborted import for a specific table:
       		export version=<version>
       		(set -o errexit; make inputs/<datasrc>/<table>/import_scrub by_col=1 continue=1; make inputs/<datasrc>/publish) &
       		bin/after_import $! & # $! can also be obtained from `jobs -l`
       	Get $version: echo $version
       	Set $version in all vegbiendev terminals: export version=<version>
       	When there are no more running jobs, exit `screen`: exit # not Ctrl+D
       	upload logs: make inputs/upload live=1
       	On local machine: make inputs/download-logs live=1
       	check for disk space errors:
       		grep --files-with-matches -F 'No space left on device' inputs/{.[^as.],}*/*/logs/$version.log.sql
       		if there are any matches:
       			manually reimport these datasources using the steps under
       				Single datasource import
       			bin/after_import &
       			wait for the import to finish
       	tail inputs/{.[^as.],}*/*/logs/$version.log.sql
       	In the output, search for "Command exited with non-zero status"
       	For inputs that have this, fix the associated bug(s)
       	If many inputs have errors, discard the current (partial) import:
       		make schemas/$version/uninstall
       	Otherwise, continue
       	In PostgreSQL:
       		Go to wiki.vegpath.org/VegBIEN_contents
       		Get the # observations
       		Get the # datasources
       		Get the # datasources with observations
       		in the r# schema:
       		Check that analytical_stem contains [# observations] rows
       		Check that source contains [# datasources] rows up through XAL. If this
       			is not the case, manually check the entries in source against the
       			datasources list on the wiki page (some datasources may be near the
       			end depending on import order).
       		Check that provider_count contains [# datasources with observations]
       			rows with dataset="(total)" (at the top when the table is unsorted)
       	Check that TNRS ran successfully:
       		tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
       		If the log ends in an AssertionError
       			"assert sql.table_col_names(db, table) == header":
       			Figure out which TNRS CSV columns have changed
       			On local machine:
       				Make the changes in the DB's TNRS and public schemas
       				rm=1 inputs/.TNRS/schema.sql.run export_
       				make schemas/remake
       				inputs/test_taxonomic_names/test_scrub # re-run TNRS
       				rm=1 inputs/.TNRS/data.sql.run export_
       				Commit
       			ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       				If dropping a column, save the dependent views
       				Make the same changes in the live TNRS.tnrs table on vegbiendev
       				If dropping a column, recreate the dependent views
       				Restart the TNRS client: make scrub by_col=1 &
       	Publish the new import:
       		**WARNING**: Before proceeding, be sure you have done *every single*
       			verification step listed above. Otherwise, a previous valid import
       			could incorrectly be overwritten with a broken one.
       		make schemas/$version/publish # runtime: 1 min ("real 1m10.451s")
       	unset version
       	make backups/upload live=1
       	on local machine:
       		make backups/vegbien.$version.backup/download live=1
       			# download backup to local machine
       	ssh aaronmk@jupiter.nceas.ucsb.edu
       		cd /data/dev/aaronmk/bien/backups
       		For each newly-archived backup:
       			make -s <backup>.md5/test
       			Check that "OK" is printed next to the filename
       	If desired, record the import times in inputs/import.stats.xls:
       		On local machine:
       		Open inputs/import.stats.xls
       		click the "current" tab
       		If the rightmost import is within 5 columns of column IV:
       			Copy the current tab to <leftmost-date>~<rightmost-date>
       			Remove the previous imports from the current tab because they are
       				now in the copied tab instead
       		Insert a copy of the leftmost "By column" column group before it
       		export version=<version>
       		bin/import_date inputs/{.[^as.],}*/*/logs/$version.log.sql
       		Update the import date in the upper-right corner
       		bin/import_times inputs/{.[^as.],}*/*/logs/$version.log.sql
       		Paste the output over the # Rows/Time columns, making sure that the
       			row counts match up with the previous import's row counts
       		If the row counts do not match up, insert or reorder rows as needed
       			until they do. Get the datasource names from the log file footers:
       			tail inputs/{.[^as.],}*/*/logs/$version.log.sql
       		update the Postprocessing times:
       			analytical DB remake time:
       				from the end of inputs/analytical_db/logs/make_analytical_db.log.sql,
       					search upwards for "_individual_view_modify" followed by a
       					line of -'s
       				enter as: =[ms]/1000/3600/24
       		Commit: svn ci -m 'inputs/import.stats.xls: updated import times'
       	Running individual steps separately:
       	To run TNRS:
       		To use an import other than public: export version=<version>
       		to rescrub all names:
       			make inputs/.TNRS/reinstall
       			re-create public-schema views that were cascadingly deleted
       		make scrub &
       		To view progress:
       			tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
       	To re-run geoscrubbing:
       		$ screen
       		# press Enter
       		$ unset TMOUT # TMOUT causes shell to exit even with background processes
       		# to use an import other than public: $ export version=<version>
       		$ bin/psql_verbose_vegbien <<<'SELECT geoscrub_input_view_modify();' &
       			# runtime: 6 min ("6:02.30") @r14827 @vegbiendev
       		# wait until done
       		$ rm=1 exports/geoscrub_input.csv.run
       			# runtime: 1 min ("1m2.962s") @r14827 @vegbiendev
       		$ $0 # subshell to avoid closing screen on errexit
       		$ rm=1 inputs/.geoscrub/geoscrub_output/geoscrub.csv.run &
       			# runtime: 1.5 h ("84m55.408s") @r14827 @vegbiendev
       		# wait until done
       		$ rm=1 inputs/.geoscrub/geoscrub_output/run &
       			# runtime: 12 min ("11m35.693s") @r14827 @vegbiendev
       		# wait until done
       		# re-create public-schema views that were cascadingly deleted (currently
       			plot.**, view_full_occurrence_individual_view, geoscrub_input_new)
       		# press Ctrl+D
       		# remake the analytical DB (below)
       	To remake analytical DB:
       		To use an import other than public: export version=<version>
       		bin/make_analytical_db & # runtime: 13 h ("12:43:57elapsed")
       		To view progress:
       			tail -150 inputs/analytical_db/logs/make_analytical_db.log.sql
       	To back up DB (staging tables and last import):
       		To use an import *other than public*: export version=<version>
       		make backups/TNRS.backup-remake &
       		dump_opts=--exclude-schema=public make backups/vegbien.$version.backup/test &
       			If after renaming to public, instead set dump_opts='' and replace
       			$version with the appropriate revision
       		make backups/upload live=1
       Datasource setup:
       	On local machine:
       	Example steps for a datasource: wiki.vegpath.org/Import_process_for_Madidi
       	umask ug=rwx,o= # prevent files from becoming web-accessible
       	Add a new datasource: make inputs/<datasrc>/add
       		<datasrc> may not contain spaces, and should be abbreviated.
       		If the datasource is a herbarium, <datasrc> should be the herbarium code
       			as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
       	For a new-style datasource (one containing a ./run runscript):
       		"cp" -f inputs/.NCBI/{Makefile,run,table.run} inputs/<datasrc>/
       	For MySQL inputs (exports and live DB connections):
       		For .sql exports:
       			Place the original .sql file in _src/ (*not* in _MySQL/)
       			Follow the steps starting with Install the staging tables below.
       				This is for an initial sync to get the file onto vegbiendev.
       			ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       				Create a database for the MySQL export in phpMyAdmin
       				Give the bien user all database-specific privileges *except*
       					UPDATE, DELETE, ALTER, DROP. This prevents bugs in the
       					import scripts from accidentally deleting data.
       				bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
       		mkdir inputs/<datasrc>/_MySQL/
       		cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
       		Edit _MySQL/*.make for the DB connection
       			For a .sql export, use server=vegbiendev and --user=bien
       		Skip the Add input data for each table section
       	For MS Access databases:
       		Place the .mdb or .accdb file in _src/
       		Download and install Bullzip's MS Access to PostgreSQL from
       			http://bullzip.com/download.php > Access To PostgreSQL > Download
       		Use Access To PostgreSQL to export the database:
       			Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
       				using the settings in the associated .ini file where available
       			Export just the data to inputs/<datasrc>/<file>.data.sql using the
       				settings in the associated .ini file where available
       		In <file>.schema.sql, make the following changes:
       			Replace text "^CREATE DATABASE .*?;$" with "/*$0*/"
       			Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
       			Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
       		Skip the Add input data for each table section
       	Add input data for each table present in the datasource:
       		For .sql exports, you must use the name of the table in the DB export
       		For CSV files, you can use any name. It's recommended to use a table
       			name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
       		Note that if this table will be joined together with another table, its
       			name must end in ".src"
       		make inputs/<datasrc>/<table>/add
       			Important: DO NOT just create an empty directory named <table>!
       				This command also creates necessary subdirs, such as logs/.
       		If the table is in a .sql export: make inputs/<datasrc>/<table>/install
       			Otherwise, place the CSV(s) for the table in
       			inputs/<datasrc>/<table>/ OR place a query joining other tables
       			together in inputs/<datasrc>/<table>/create.sql
       		Important: When exporting relational databases to CSVs, you MUST ensure
       			that embedded quotes are escaped by doubling them, *not* by
       			preceding them with a "\" as is the default in phpMyAdmin
       		If there are multiple part files for a table, and the header is repeated
       			in each part, make sure each header is EXACTLY the same.
       			(If the headers are not the same, the CSV concatenation script
       			assumes the part files don't have individual headers and treats the
       			subsequent headers as data rows.)
       		Add <table> to inputs/<datasrc>/import_order.txt before other tables
       			that depend on it
       		For a new-style datasource:
       			"cp" -f inputs/.NCBI/nodes/run inputs/<datasrc>/<table>/
       			inputs/<datasrc>/<table>/run
       	Install the staging tables:
       		make inputs/<datasrc>/reinstall quiet=1 &
       		For a MySQL .sql export:
       			At prompt "[you]@vegbiendev's password:", enter your password
       			At prompt "Enter password:", enter the value in config/bien_password
       		To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
       		View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
       			tail provides a header line with the filename
       			+1 starts at the first line, to show the whole file
       		For every file with an error 'column "..." specified more than once':
       			Add a header override file "+header.<ext>" in <table>/:
       				Note: The leading "+" should sort it before the flat files.
       					"_" unfortunately sorts *after* capital letters in ASCII.
       				Create a text file containing the header line of the flat files
       				Add an ! at the beginning of the line
       					This signals cat_csv that this is a header override.
       				For empty names, use their 0-based column # (by convention)
       				For duplicate names, add a distinguishing suffix
       				For long names that collided, rename them to <= 63 chars long
       				Do NOT make readability changes in this step; that is what the
       					map spreadsheets (below) are for.
       				Save
       		If you made any changes, re-run the install command above
       	Auto-create the map spreadsheets: make inputs/<datasrc>/
       	Map each table's columns:
       		In each <table>/ subdir, for each "via map" map.csv:
       			Open the map in a spreadsheet editor
       			Open the "core map" /mappings/Veg+-VegBIEN.csv
       			In each row of the via map, set the right column to a value from the
       				left column of the core map
       			Save
       		Regenerate the derived maps: make inputs/<datasrc>/
       	Accept the test cases:
       		For a new-style datasource:
       			inputs/<datasrc>/run
       			svn di inputs/<datasrc>/*/test.xml.ref
       			If you get errors, follow the steps for old-style datasources below
       		For an old-style datasource:
       			make inputs/<datasrc>/test
       			When prompted to "Accept new test output", enter y and press ENTER
       			If you instead get errors, do one of the following for each one:
       			-	If the error was due to a bug, fix it
       			-	Add a SQL function that filters or transforms the invalid data
       			-	Make an empty mapping for the columns that produced the error.
       				Put something in the Comments column of the map spreadsheet to
       				prevent the automatic mapper from auto-removing the mapping.
       			When accepting tests, it's helpful to use WinMerge
       				(see WinMerge setup below for configuration)
       		make inputs/<datasrc>/test by_col=1
       			If you get errors this time, this always indicates a bug, usually in
       				the VegBIEN unique constraints or column-based import itself
       	Add newly-created files: make inputs/<datasrc>/add
       	Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
       	Update vegbiendev:
       		ssh aaronmk@jupiter.nceas.ucsb.edu
       			up
       		On local machine:
       			./fix_perms
       			make inputs/upload
       			make inputs/upload live=1
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       			up
       			make inputs/download
       			make inputs/download live=1
       			Follow the steps under Install the staging tables above
       Maintenance:
       	on a live machine, you should put the following in your .profile:
       --
       # make svn files web-accessible. this does not affect unversioned files, because
       # these get the right permissions on the local machine instead.
       umask ug=rwx,o=rx
       unset TMOUT # TMOUT causes screen to exit even with background processes
       --
       	if http://vegbiendev.nceas.ucsb.edu/phppgadmin/ goes down:
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       			make phppgadmin-Linux
       	regularly, re-run full-database import so that bugs in it don't pile up.
       		it needs to be kept in working order so that it works when it's needed.
       	to back up the vegbiendev databases:
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       		back up MySQL: # usually few changes, so do this first
       			backups/mysql_snapshot
       			l=1        overwrite=1 inplace=1 local_dir=/ remote_url="$USER@jupiter:/data/dev/aaronmk/Documents/BIEN/" subpath=/var/lib/mysql.bak/ sudo -E env PATH="$PATH" bin/sync_upload
       			on local machine:
       			l=1 swap=1 overwrite=1 inplace=1 local_dir=~ sync_remote_subdir=                          subpath=~/Documents/BIEN/var/lib/mysql.bak/                          bin/sync_upload
       		back up Postgres:
       			backups/pg_snapshot
       	to synchronize vegbiendev, jupiter, and your local machine:
       		**WARNING**: pay careful attention to all files that will be deleted or
       			overwritten!
       		install put if needed:
       			download https://uutils.googlecode.com/svn/trunk/bin/put to ~/bin/ and `chmod +x` it
       		when changes are made on vegbiendev:
       			avoid extraneous diffs when rsyncing:
       				on local machine:
       					up; ./fix_perms
       				ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       					up; ./fix_perms
       				ssh aaronmk@jupiter.nceas.ucsb.edu
       					up; ./fix_perms
       			ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       				upload:
       				overwrite=1 bin/sync_upload --size-only
       					then review diff, and rerun with `l=1` prepended
       			on your machine:
       				download:
       				overwrite=1 swap=1 src=. dest='aaronmk@jupiter.nceas.ucsb.edu:~/bien' put --exclude=.svn web/BIEN3/TWiki
       					then review diff, and rerun with `l=1` prepended
       				swap=1 bin/sync_upload backups/TNRS.backup
       					then review diff, and rerun with `l=1` prepended
       				overwrite=1 swap=1 bin/sync_upload --size-only
       					then review diff, and rerun with `l=1` prepended
       				overwrite=1 sync_remote_url=~/Dropbox/svn/ bin/sync_upload --existing --size-only # just update mtimes/perms
       					then review diff, and rerun with `l=1` prepended
       	to back up e-mails:
       		on local machine:
       		/Applications/gmvault-v1.8.1-beta/bin/gmvault sync --multiple-db-owner --type quick aaronmk.nceas@gmail.com
       		open Thunderbird
       		click the All Mail folder for each account and wait for it to download the e-mails in it
       	to back up the version history:
       		# back up first on the local machine, because often only the svnsync
       			command gets run, and that way it will get backed up immediately to
       			Dropbox (and hourly to Time Machine), while vegbiendev only gets
       			backed up daily to tape
       		on local machine:
       		svnsync sync file://"$HOME"/Dropbox/docs/BIEN/svn_repo/ # initial runtime: 1.5 h ("08:21:38" - "06:45:26") @vegbiendev
       		(cd ~/Dropbox/docs/BIEN/git/; git svn fetch)
       		# use absolute path for vegbiendev commands because the Ubuntu 14.04
       			version of rsync doesn't expand ~ properly
       		overwrite=1        src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/svn_repo/ # runtime: 1 min ("1:05.08")
       			then review diff, and rerun with `l=1` prepended
       		overwrite=1        src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/git/
       			then review diff, and rerun with `l=1` prepended
       	to back up vegbiendev:
       		do steps under Maintenance > "to synchronize vegbiendev, jupiter, and
       			your local machine"
       		on local machine:
       			l=1 overwrite=1 inplace=1 src=root@vegbiendev.nceas.ucsb.edu:/ dest=~/Documents/BIEN/vegbiendev/ sudo -E put --exclude=/var/lib/mysql.bak --exclude=/var/lib/postgresql.bak --exclude='/var/lib/postgresql/9.3/main/*/' --exclude=/home/aaronmk/bien
       			# enable --link-dest to work:
       				chmod -R o+r ~/bien/.svn/; find ~/bien/.svn -type d -exec chmod o+rx {} \; # match perms
       				l=1 overwrite=1 del= src='aaronmk@vegbiendev.nceas.ucsb.edu:~/bien/' dest=~/bien/ put --existing --size-only .svn/pristine/ # match times and perms
       			l=1 overwrite=1 inplace=1 src=aaronmk@vegbiendev.nceas.ucsb.edu:/ dest=~/Documents/BIEN/vegbiendev/ sudo -E put --link-dest="$HOME"/Documents/BIEN/svn/ --no-owner --no-group home/aaronmk/bien/
       				# --no-owner --no-group: needed to allow --link-dest to work
       				# --link-dest: relative to dest, not currdir, so need abs path
       	to back up the local machine's settings:
       		do step when changes are made on vegbiendev > on your machine, download
       		ssh aaronmk@jupiter.nceas.ucsb.edu
       			(cd ~/Dropbox/svn/; up)
       		on your machine:
       			sudo find / -name .DS_Store -print -delete
       			rm ~/'Library/Thunderbird/Profiles/9oo8rcyn.default/ImapMail/imap.googlemail.com/[Gmail].sbd/Spam'
       				# remove the downloaded Spam folder, because spam e-mails often contain viruses that would trigger clamscan
       			overwrite=1           sync_local_dir=~/Dropbox/svn/ sync_remote_subdir=Dropbox/svn/ bin/sync_upload --size-only # just update mtimes
       				then review diff, and rerun with `l=1` prepended
       			overwrite=1 inplace=1 sync_local_dir=~/             sync_remote_subdir=             bin/sync_upload ~/"VirtualBox VMs/**" # need inplace=1 because they are very large files
       				then review diff, and rerun with `l=1` prepended
       			overwrite=1           sync_local_dir=~/             sync_remote_subdir= sudo -E     bin/sync_upload --exclude="/Library/Saved Application State/" --exclude="/.Trash/" --exclude="/bin/" --exclude="/bin/pg_ctl" --exclude="/bin/unzip" --exclude="/Dropbox/home/" --exclude="/.profile" --exclude="/.shrc" --exclude="/.bashrc" --exclude="/software/**/.svn/"
       				# sudo -E: needed for Documents/BIEN/vegbiendev*/
       				then review diff, and rerun with `l=1` prepended
       			pause Dropbox: system tray > Dropbox icon > gear icon > Pause Syncing
       				this prevents Dropbox from trying to capture filesystem
       				events while syncing
       			overwrite=1           sync_local_dir=~/             sync_remote_url=~/Dropbox/home/ bin/sync_upload --exclude="/Library/Saved Application State/" --exclude="/.Trash/" --exclude="/.dropbox/" --exclude="/Documents/BIEN/" --exclude="/Dropbox/" --exclude=/gmvault-db/ --exclude="/software/" --exclude="/VirtualBox VMs/**.sav" --exclude="/VirtualBox VMs/**.vdi" --exclude="/VirtualBox VMs/**.vmdk"
       				then review diff, and rerun with `l=1` prepended
       			resume Dropbox: system tray > Dropbox icon > gear icon > Resume Syncing
       	to backup files not in Time Machine:
       		**IMPORTANT**: need to use 2 TB external hard drive instead of Time
       			Machine drive because Time Machine drive does not have
       			~/Documents/BIEN/ in a location where it can be hardlinked against
       		On local machine:
       		on first run, create parent dirs:
       			sudo mkdir -p '/Volumes/BIEN3.**SAVE**/Users/aaronmk/Documents/BIEN/'
       			sudo mkdir -p '/Volumes/BIEN3.**SAVE**/usr/local/var/postgres/'
       			l=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --existing
       		l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --include='/vegbiendev**' --exclude='**' Users/aaronmk/Documents/BIEN/
       			# this cannot be backed up by Time Machine because it dereferences hard links:
       			#  `sudo find /Volumes/Time\ Machine\ Backups/Backups.backupdb/ ! -type d -links +1`
       			#  returns no files when there is a single timestamped backup, but
       			#  `sudo find / ! -type d -links +1` does
       		l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put usr/local/var/postgres/
       			# this cannot be backed up by Time Machine because it prevents the backup process from ending
       		launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist # stop the PostgreSQL server
       		l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put usr/local/var/postgres/
       		launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist # start the PostgreSQL server
       	to back up the local machine's hard drive:
       		turn on and connect the 2 TB external hard drive
       		screen
       		# --exclude='/\**': exclude *-files indicating the (differing) retention
       		#  statuses of the partitions involved
       		pause Dropbox: system tray > Dropbox icon > gear icon > Pause Syncing
       			otherwise, the backup of ~/.dropbox will be corrupted
       		launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist # stop the PostgreSQL server
       		l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --exclude='/\**' --exclude=/.fseventsd/ --exclude=/private/var/vm/
       			# no --extended-attributes: rsync has to visit every file for this
       			# runtime: 10 min (~600); initial runtime: 4-13 h ("2422.84"+"12379.91" .. "45813.19"+"747.96")
       		launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist # start the PostgreSQL server
       		resume Dropbox: system tray > Dropbox icon > gear icon > Resume Syncing
       	to restore from Time Machine:
       		# restart holding Alt
       		# select Time Machine Backups
       		# restore the last Time Machine backup to Macintosh HD
       		# restart holding Alt
       		# select Macintosh HD
       		$ screen
       		$ l=1 swap=1 src=/ dest=/Volumes/Time\ Machine\ Backups/ sudo -E put usr/local/var/postgres/ # runtime: 1 h ("4020.61")
       		$ make postgres_restart
       	VegCore data dictionary:
       		Regularly, or whenever the VegCore data dictionary page
       			(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
       			is changed, regenerate mappings/VegCore.csv:
       			On local machine:
       			make mappings/VegCore.htm-remake; make mappings/
       			apply new data dict mappings to datasource mappings/staging tables:
       				inputs/run postprocess # runtime: see inputs/run
       				time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
       			svn di mappings/VegCore.tables.redmine
       			If there are changes, update the data dictionary's Tables section
       			When moving terms, check that no terms were lost: svn di
       			svn ci -m 'mappings/VegCore.htm: regenerated from wiki'
       			ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       				perform the steps under "apply new data dict mappings to
       					datasource mappings/staging tables" above
       	Important: Whenever you install a system update that affects PostgreSQL or
       		any of its dependencies, such as libc, you should restart the PostgreSQL
       		server. Otherwise, you may get strange errors like "the database system
       		is in recovery mode" which go away upon reimport, or you may not be able
       		to access the database as the postgres superuser. This applies to both
       		Linux and Mac OS X.
       Backups:
       	Archived imports:
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       		Back up: make backups/<version>.backup &
       			Note: To back up the last import, you must archive it first:
       				make schemas/rotate
       		Test: make -s backups/<version>.backup/test &
       		Restore: make backups/<version>.backup/restore &
       		Remove: make backups/<version>.backup/remove
       		Download: make backups/<version>.backup/download
       	TNRS cache:
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       		Back up: make backups/TNRS.backup-remake &
       			runtime: 3 min ("real 2m48.859s")
       		Restore:
       			yes|make inputs/.TNRS/uninstall
       			make backups/TNRS.backup/restore &
       				runtime: 5.5 min ("real 5m35.829s")
       			yes|make schemas/public/reinstall
       				Must come after TNRS restore to recreate tnrs_input_name view
       	Full DB:
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       		Back up: make backups/vegbien.<version>.backup &
       		Test: make -s backups/vegbien.<version>.backup/test &
       		Restore: make backups/vegbien.<version>.backup/restore &
       		Download: make backups/vegbien.<version>.backup/download
       	Import logs:
       		On local machine:
       		Download: make inputs/download-logs live=1
       Datasource refreshing:
       	VegBank:
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       		make inputs/VegBank/vegbank.sql-remake
       		make inputs/VegBank/reinstall quiet=1 &
       Schema changes:
       	On local machine:
       	When changing the analytical views, run sync_analytical_..._to_view()
       		to update the corresponding table
       	Remember to update the following files with any renamings:
       		schemas/filter_ERD.csv
       		mappings/VegCore-VegBIEN.csv
       		mappings/verify.*.sql
       	Regenerate schema from installed DB: make schemas/remake
       	Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
       		**WARNING**: This will delete the public schema of your VegBIEN DB!
       	If needed, reinstall staging tables:
       		On local machine:
       			sudo -E -u postgres psql <<<'ALTER DATABASE vegbien RENAME TO vegbien_prev'
       			make db
       			. bin/reinstall_all
       			Fix any bugs and retry until no errors
       			make schemas/public/install
       				This must be run *after* the datasources are installed, because
       				views in public depend on some of the datasources
       			sudo -E -u postgres psql <<<'DROP DATABASE vegbien_prev'
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       			repeat the above steps
       			**WARNING**: Do not run this until reinstall_all runs successfully
       			on the local machine, or the live DB may be unrestorable!
       	update mappings and staging table column names:
       		on local machine:
       			inputs/run postprocess # runtime: see inputs/run
       			time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       			manually apply schema changes to the live public schema
       			do steps under "on local machine" above
       	Sync ERD with vegbien.sql schema:
       		Run make schemas/vegbien.my.sql
       		Open schemas/vegbien.ERD.mwb in MySQLWorkbench
       		Go to File > Export > Synchronize With SQL CREATE Script...
       		For Input File, select schemas/vegbien.my.sql
       		Click Continue
       		In the changes list, select each table with an arrow next to it
       		Click Update Model
       		Click Continue
       		Note: The generated SQL script will be empty because we are syncing in
       			the opposite direction
       		Click Execute
       		Reposition any lines that have been reset
       		Add any new tables by dragging them from the Catalog in the left sidebar
       			to the diagram
       		Remove any deleted tables by right-clicking the table's diagram element,
       			selecting Delete '<table name>', and clicking Delete
       		Save
       		If desired, update the graphical ERD exports (see below)
       	Update graphical ERD exports:
       		Go to File > Export > Export as PNG...
       		Select schemas/vegbien.ERD.png and click Save
       		Go to File > Export > Export as SVG...
       		Select schemas/vegbien.ERD.svg and click Save
       		Go to File > Export > Export as Single Page PDF...
       		Select schemas/vegbien.ERD.1_pg.pdf and click Save
       		Go to File > Print...
       		In the lower left corner, click PDF > Save as PDF...
       		Set the Title and Author to ""
       		Select schemas/vegbien.ERD.pdf and click Save
       		Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
       	Refactoring tips:
       		To rename a table:
       			In vegbien.sql, do the following:
       				Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
       					This is necessary because the table name is *everywhere*
       				Search for <new>
       				Manually change back any replacements inside comments
       		To rename a column:
       			Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
       			Recreate any foreign key for the column, removing CONSTRAINT <name>
       				This resets the foreign key name using the new column name
       	Creating a poster of the ERD:
       		Determine the poster size:
       			Measure the line height (from the bottom of one line to the bottom
       				of another): 16.3cm/24 lines = 0.679cm
       			Measure the height of the ERD: 35.4cm*2 = 70.8cm
       			Zoom in as far as possible
       			Measure the height of a capital letter: 3.5mm
       			Measure the line height: 8.5mm
       			Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
       			Calculate the text height: 0.679cm*0.41 = 0.28cm
       			Calculate the text height's fraction of the ERD height:
 .28cm/70.8cm = 0.0040
       			Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
       			Calculate the VegBIEN poster height to make the text the same size:
 .55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
       			The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
       			Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
       			The minimum VegBIEN poster size is 35x54in portrait
       		Determine the cost:
       			The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
       				charges the following for posters:
       				base: $7.25/sq ft
       				lamination: $3/sq ft
       				mounting on a board: $8/sq ft
       Testing:
       	On a development machine, you should put the following in your .profile:
       		umask ug=rwx,o= # prevent files from becoming web-accessible
       		export log= n=2
       	For development machine specs, see /planning/resources/dev_machine.specs/
       	On local machine:
       	Mapping process: make test
       		Including column-based import: make test by_col=1
       			If the row-based and column-based imports produce different inserted
       			row counts, this usually means that a table is underconstrained
       			(the unique indexes don't cover all possible rows).
       			This can occur if you didn't use COALESCE(field, null_value) around
       			a nullable field in a unique index. See sql_gen.null_sentinels for
       			the appropriate null value to use.
       	Map spreadsheet generation: make remake
       	Missing mappings: make missing_mappings
       	Everything (for most complete coverage): make test-all
       Debugging:
       	"Binary chop" debugging:
       		(This is primarily useful for regressions that occurred in a previous
       		revision, which was committed without running all the tests)
       		up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
       	.htaccess:
       		mod_rewrite:
       			**IMPORTANT**: whenever you change the DirectorySlash setting for a
       				directory, you *must* clear your browser's cache to ensure that
       				a cached redirect is not used. this is because RewriteRule
       				redirects are (by default) temporary, but DirectorySlash
       				redirects are permanent.
       				for Firefox:
       					press Cmd+Shift+Delete
       					check only Cache
       					press Enter or click Clear Now
       WinMerge setup:
       	In a Windows VM:
       	Install WinMerge from <http://winmerge.org/>
       	Open WinMerge
       	Go to Edit > Options and click Compare in the left sidebar
       	Enable "Moved block detection", as described at
       		<http://manual.winmerge.org/Configuration.html#d0e5892>.
       	Set Whitespace to Ignore change, as described at
       		<http://manual.winmerge.org/Configuration.html#d0e5758>.
       Documentation:
       	To generate a Redmine-formatted list of steps for column-based import:
       		On local machine:
       		make schemas/public/reinstall
       		make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
       	To import and scrub just the test taxonomic names:
       		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
       		inputs/test_taxonomic_names/test_scrub
       General:
       	To see a program's description, read its top-of-file comment
       	To see a program's usage, run it without arguments
       	To remake a directory: make <dir>/remake
       	To remake a file: make <file>-remake

« Previous
1
…
4
5
6
7
8
…
11
Next »

(6-6/11)