Project

General

Profile

1
Installation:
2
	Check out svn:
3
		sudo apt-get --yes install subversion # not preinstalled on Ubuntu
4
		svn co https://code.nceas.ucsb.edu/code/projects/bien/trunk bien
5
	cd bien/
6
	Install:
7
		**WARNING**: This will delete the public schema of your VegBIEN DB!
8
		make install
9
		# at "reload PATH" (if displayed), do what it says
10
		# at "Are you sure you want to continue connecting", type "yes" and
11
			press Enter
12
		# at "[sudo] password for user", enter your password and press Enter
13
		# at "Modifying postgresql.conf and pg_hba.conf", type y and press Enter
14
		# at "kernel.shmmax [...] Press ENTER to continue":
15
			# open a new window
16
			# run what it says
17
			# press Ctrl-D
18
			# return to the previous window
19
			# press Enter
20
		# at "restart PostgreSQL manually ... Press ENTER to continue":
21
			# open a new window
22
			# run what it says
23
			# press Ctrl-D
24
			# return to the previous window
25
			# press Enter
26
		# at "This will delete the current public schema of your VegBIEN DB",
27
			type y and press Enter
28
	Uninstall: make uninstall
29
		**WARNING**: This will delete your entire VegBIEN DB!
30
		This includes all archived imports and staging tables.
31

    
32
Connecting to vegbiendev:
33
	ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
34
	cd /home/bien # should happen automatically at login
35

    
36
Single datasource refresh:
37
	ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
38
	# -> Maintenance > to back up the vegbiendev databases
39
	# place updated extract in inputs/ACAD/_src/
40
	# place extracted flat file(s) in the appropriate table subdirs 
41
	rm=1 inputs/<datasrc>/run # reload staging tables
42
	make inputs/<datasrc>/reimport_scrub by_col=1 &
43
		# this works whether or not datasource is already imported
44
	tail -150 inputs/<datasrc>/*/logs/public.log.sql # view progress
45
	# -> Full database import > To re-run geoscrubbing
46
	# -> Full database import > To remake analytical DB
47
	# -> Maintenance > to back up the vegbiendev databases
48

    
49
Notes on system stability:
50
	**WARNING**: when shutting down the VM, always first stop Postgres:
51
		sudo service postgresql stop
52
		this prevents the OS from SIGKILLing Postgres, which sometimes causes
53
		database corruption 
54

    
55
Notes on running programs:
56
	**WARNING**: always start with a clean shell, to avoid spurious bugs. the
57
		shell should not have changes to the env vars. (there have been bugs
58
		that went away after closing and reopening the terminal window.) note
59
		that running `exec bash` is not sufficient to *reset* the env vars.
60

    
61
Notes on editing files:
62
	**WARNING**: shell scripts should always be read-only, so that editing them
63
		while an import is in progress will not crash the import (see
64
		http://vegpath.org/links/#**%20modifying%20a%20running%20shell%20script)
65

    
66
Full database import:
67
	**WARNING**: You must perform *every single* step listed below, to avoid
68
		breaking column-based import
69
	**WARNING**: always start with a clean shell, as described above under
70
		"Notes on running programs"
71
	**IMPORTANT**: the beginning of the import should be scheduled at a time
72
		when the DB will not be needed for other uses. this is necessary because
73
		vegbiendev will be slow for the first few hours of the import, due to
74
		the import using all the available cores.
75
	do steps under Maintenance > "to synchronize vegbiendev, jupiter, and
76
		your local machine"
77
	On local machine:
78
		make inputs/upload
79
		make inputs/upload live=1
80
		make test by_col=1 # runtime: 1 h ("53m7.383s") @starscream
81
			if you encounter errors, they are most likely related to the
82
				PostgreSQL error parsing in /lib/sql.py parse_exception()
83
			See note under Testing below
84
	ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
85
	Ensure there are no local modifications: svn st
86
	up
87
	make inputs/download
88
	make inputs/download live=1
89
	For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
90
	Update the auxiliary schemas: make schemas/reinstall
91
		**WARNING**: requires sudo access!
92
		The public schema will be installed separately by the import process
93
	Delete imports before the last so they won't bloat the full DB backup:
94
		make backups/vegbien.<version>.backup/remove
95
		To keep a previous import other than the public schema:
96
			export dump_opts='--exclude-schema=public --exclude-schema=<version>'
97
			# env var will be inherited by `screen` shell
98
	restart Postgres to free up any disk space used by temp tables from the last
99
		import (this is apparently not automatically reclaimed):
100
		make postgres_restart
101
	Make sure there is at least 1 TB of disk space on /: df -h
102
		although the import schema itself is only 315 GB, Postgres uses
103
			significant temporary space at the beginning of the import.
104
			the total disk usage oscillates between 1.2 TB and the entire disk
105
			for the first day (for import started @12:55:09, high-water marks of
106
			1.7 TB @14:00:25, 1.8 TB @15:38:32; then next day w/ 2 datasources
107
			running: entire disk for 4 min @05:35:44, 1.8 TB @11:15:05).
108
		To free up space, remove backups that have been archived on jupiter:
109
			List backups/ to view older backups
110
			Check their MD5 sums using the steps under On jupiter below
111
			Remove these backups
112
	for full import:
113
		screen
114
		Press ENTER
115
	$0 # nested shell to prevent errexit from closing the window
116
	the following must happen within screen to avoid affecting the outer shell:
117
	unset TMOUT # TMOUT causes shell to exit even with background processes
118
	set -o ignoreeof # prevent Ctrl+D from exiting shell to keep attached jobs
119
	on local machine:
120
		unset n # clear any limit set in .profile (unless desired)
121
		unset log # allow logging output to go to log files
122
	unset version # clear any version from last import, etc.
123
	if no commits have been made since the last import (eg. if retrying an
124
		import), set a custom version that differs from the auto-assigned one
125
		(would otherwise cause a collision with the last import):
126
		svn info
127
		extract the svn revision after "Revision:"
128
		export version=r[revision]_2 # +suffix to distinguish from last import
129
			# env var will be inherited by `screen` shell
130
	to import just a subset of the datasources:
131
		declare -ax inputs; inputs=(inputs/{src,...}/) # no () in declare on Mac
132
			# array vars *not* inherited by `screen` shell
133
		export version=custom_import_name
134
	Start column-based import: . bin/import_all
135
		To use row-based import: . bin/import_all by_col=
136
		To stop all running imports: . bin/stop_imports
137
		**WARNING**: Do NOT run import_all in the background, or the jobs it
138
			creates won't be owned by your shell.
139
		Note that import_all will take up to an hour to import the NCBI backbone
140
			and other metadata before returning control to the shell.
141
		To view progress:
142
			tail inputs/{.,}??*/*/logs/$version.log.sql
143
	note: at the beginning of the import, the system may send out CPU load
144
		warning e-mails. these can safely be ignored. (they happen because the
145
		parallel imports use all the available cores.)
146
	for test import, turn off DB backup (also turns off analytical DB creation):
147
		kill % # cancel after_import()
148
	Wait (4 days) for the import to finish
149
	**WARNING**: do *not* run backups/pg_snapshot while the import is running,
150
		due to continuously-changing files
151
	**WARNING**: do *not* run backups/pg_snapshot until the previous import has
152
		been replaced, to avoid running into disk space limits
153
	To recover from a closed terminal window: screen -r
154
	To restart an aborted import for a specific table:
155
		export version=<version>
156
		(set -o errexit; make inputs/<datasrc>/<table>/import_scrub by_col=1 continue=1; make inputs/<datasrc>/publish) &
157
		bin/after_import $! & # $! can also be obtained from `jobs -l`
158
	Get $version: echo $version
159
	Set $version in all vegbiendev terminals: export version=<version>
160
	When there are no more running jobs, exit `screen`: exit # not Ctrl+D
161
	upload logs: make inputs/upload live=1
162
	On local machine: make inputs/download-logs live=1
163
	check for disk space errors:
164
		grep --files-with-matches -F 'No space left on device' inputs/{.,}??*/*/logs/$version.log.sql
165
		if there are any matches:
166
			manually reimport these datasources using the steps under
167
				Single datasource import
168
			bin/after_import &
169
			wait for the import to finish
170
	tail inputs/{.,}??*/*/logs/$version.log.sql
171
	In the output, search for "Command exited with non-zero status"
172
	For inputs that have this, fix the associated bug(s)
173
	If many inputs have errors, discard the current (partial) import:
174
		make schemas/$version/uninstall
175
	Otherwise, continue
176
	In PostgreSQL:
177
		Go to wiki.vegpath.org/VegBIEN_contents
178
		Get the # observations
179
		Get the # datasources
180
		Get the # datasources with observations
181
		in the r# schema:
182
		Check that analytical_stem contains [# observations] rows
183
		Check that source contains [# datasources] rows up through XAL. If this
184
			is not the case, manually check the entries in source against the
185
			datasources list on the wiki page (some datasources may be near the
186
			end depending on import order).
187
		Check that provider_count contains [# datasources with observations]
188
			rows with dataset="(total)" (at the top when the table is unsorted)
189
	Check that TNRS ran successfully:
190
		tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
191
		If the log ends in an AssertionError
192
			"assert sql.table_col_names(db, table) == header":
193
			Figure out which TNRS CSV columns have changed
194
			On local machine:
195
				Make the changes in the DB's TNRS and public schemas
196
				rm=1 inputs/.TNRS/schema.sql.run export_
197
				make schemas/remake
198
				inputs/test_taxonomic_names/test_scrub # re-run TNRS
199
				rm=1 inputs/.TNRS/data.sql.run export_
200
				Commit
201
			ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
202
				If dropping a column, save the dependent views
203
				Make the same changes in the live TNRS.tnrs table on vegbiendev
204
				If dropping a column, recreate the dependent views
205
				Restart the TNRS client: make scrub by_col=1 &
206
	Publish the new import:
207
		**WARNING**: Before proceeding, be sure you have done *every single*
208
			verification step listed above. Otherwise, a previous valid import
209
			could incorrectly be overwritten with a broken one.
210
		make schemas/$version/publish # runtime: 1 min ("real 1m10.451s")
211
	unset version
212
	make backups/upload live=1
213
	on local machine:
214
		make backups/vegbien.$version.backup/download live=1
215
			# download backup to local machine
216
	ssh aaronmk@jupiter.nceas.ucsb.edu
217
		cd /data/dev/aaronmk/bien/backups
218
		For each newly-archived backup:
219
			make -s <backup>.md5/test
220
			Check that "OK" is printed next to the filename
221
	If desired, record the import times in inputs/import.stats.xls:
222
		On local machine:
223
		Open inputs/import.stats.xls
224
		If the rightmost import is within 5 columns of column IV:
225
			Copy the current tab to <leftmost-date>~<rightmost-date>
226
			Remove the previous imports from the current tab because they are
227
				now in the copied tab instead
228
		Insert a copy of the leftmost "By column" column group before it
229
		export version=<version>
230
		bin/import_date inputs/{.,}??*/*/logs/$version.log.sql
231
		Update the import date in the upper-right corner
232
		bin/import_times inputs/{.,}??*/*/logs/$version.log.sql
233
		Paste the output over the # Rows/Time columns, making sure that the
234
			row counts match up with the previous import's row counts
235
		If the row counts do not match up, insert or reorder rows as needed
236
			until they do. Get the datasource names from the log file footers:
237
			tail inputs/{.,}??*/*/logs/$version.log.sql
238
		Commit: svn ci -m 'inputs/import.stats.xls: updated import times'
239
	Running individual steps separately:
240
	To run TNRS:
241
		To use an import other than public: export version=<version>
242
		to rescrub all names:
243
			make inputs/.TNRS/reinstall
244
			re-create public-schema views that were cascadingly deleted
245
		make scrub &
246
		To view progress:
247
			tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
248
	To re-run geoscrubbing:
249
		$ screen
250
		# press Enter
251
		# to use an import other than public: $ export version=<version>
252
		$ bin/psql_verbose_vegbien <<<'SELECT geoscrub_input_view_modify();' &
253
			# runtime: 8 min ("7:40.54") @r14089 @vegbiendev
254
		# wait until done
255
		$ rm=1 exports/geoscrub_input.csv.run
256
			# runtime: 25 s ("0m24.936s") @r14089 @vegbiendev
257
		$ rm=1 inputs/.geoscrub/geoscrub_output/geoscrub.csv.run &
258
			# runtime: 2.5 h
259
		# wait until done
260
		$ rm=1 inputs/.geoscrub/run &
261
			# runtime: 15 min ("16m34.052s") @r14089 @vegbiendev
262
		# wait until done
263
		# re-create public-schema views that were cascadingly deleted
264
		# press Ctrl+D
265
		# remake the analytical DB (below)
266
	To remake analytical DB:
267
		To use an import other than public: export version=<version>
268
		bin/make_analytical_db & # runtime: 13 h ("12:43:57elapsed")
269
		To view progress:
270
			tail -150 inputs/analytical_db/logs/make_analytical_db.log.sql
271
	To back up DB (staging tables and last import):
272
		To use an import *other than public*: export version=<version>
273
		make backups/TNRS.backup-remake &
274
		dump_opts=--exclude-schema=public make backups/vegbien.$version.backup/test &
275
			If after renaming to public, instead set dump_opts='' and replace
276
			$version with the appropriate revision
277
		make backups/upload live=1
278

    
279
Datasource setup:
280
	On local machine:
281
	Example steps for a datasource: wiki.vegpath.org/Import_process_for_Madidi
282
	umask ug=rwx,o= # prevent files from becoming web-accessible
283
	Add a new datasource: make inputs/<datasrc>/add
284
		<datasrc> may not contain spaces, and should be abbreviated.
285
		If the datasource is a herbarium, <datasrc> should be the herbarium code
286
			as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
287
	For a new-style datasource (one containing a ./run runscript):
288
		"cp" -f inputs/.NCBI/{Makefile,run,table.run} inputs/<datasrc>/
289
	For MySQL inputs (exports and live DB connections):
290
		For .sql exports:
291
			Place the original .sql file in _src/ (*not* in _MySQL/)
292
			Follow the steps starting with Install the staging tables below.
293
				This is for an initial sync to get the file onto vegbiendev.
294
			ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
295
				Create a database for the MySQL export in phpMyAdmin
296
				Give the bien user all database-specific privileges *except*
297
					UPDATE, DELETE, ALTER, DROP. This prevents bugs in the
298
					import scripts from accidentally deleting data.
299
				bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
300
		mkdir inputs/<datasrc>/_MySQL/
301
		cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
302
		Edit _MySQL/*.make for the DB connection
303
			For a .sql export, use server=vegbiendev and --user=bien
304
		Skip the Add input data for each table section
305
	For MS Access databases:
306
		Place the .mdb or .accdb file in _src/
307
		Download and install Bullzip's MS Access to PostgreSQL from
308
			http://bullzip.com/download.php > Access To PostgreSQL > Download
309
		Use Access To PostgreSQL to export the database:
310
			Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
311
				using the settings in the associated .ini file where available
312
			Export just the data to inputs/<datasrc>/<file>.data.sql using the
313
				settings in the associated .ini file where available
314
		In <file>.schema.sql, make the following changes:
315
			Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
316
			Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
317
		Skip the Add input data for each table section
318
	Add input data for each table present in the datasource:
319
		For .sql exports, you must use the name of the table in the DB export
320
		For CSV files, you can use any name. It's recommended to use a table
321
			name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
322
		Note that if this table will be joined together with another table, its
323
			name must end in ".src"
324
		make inputs/<datasrc>/<table>/add
325
			Important: DO NOT just create an empty directory named <table>!
326
				This command also creates necessary subdirs, such as logs/.
327
		If the table is in a .sql export: make inputs/<datasrc>/<table>/install
328
			Otherwise, place the CSV(s) for the table in
329
			inputs/<datasrc>/<table>/ OR place a query joining other tables
330
			together in inputs/<datasrc>/<table>/create.sql
331
		Important: When exporting relational databases to CSVs, you MUST ensure
332
			that embedded quotes are escaped by doubling them, *not* by
333
			preceding them with a "\" as is the default in phpMyAdmin
334
		If there are multiple part files for a table, and the header is repeated
335
			in each part, make sure each header is EXACTLY the same.
336
			(If the headers are not the same, the CSV concatenation script
337
			assumes the part files don't have individual headers and treats the
338
			subsequent headers as data rows.)
339
		Add <table> to inputs/<datasrc>/import_order.txt before other tables
340
			that depend on it
341
		For a new-style datasource:
342
			"cp" -f inputs/.NCBI/nodes/run inputs/<datasrc>/<table>/
343
			inputs/<datasrc>/<table>/run
344
	Install the staging tables:
345
		make inputs/<datasrc>/reinstall quiet=1 &
346
		For a MySQL .sql export:
347
			At prompt "[you]@vegbiendev's password:", enter your password
348
			At prompt "Enter password:", enter the value in config/bien_password
349
		To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
350
		View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
351
			tail provides a header line with the filename
352
			+1 starts at the first line, to show the whole file
353
		For every file with an error 'column "..." specified more than once':
354
			Add a header override file "+header.<ext>" in <table>/:
355
				Note: The leading "+" should sort it before the flat files.
356
					"_" unfortunately sorts *after* capital letters in ASCII.
357
				Create a text file containing the header line of the flat files
358
				Add an ! at the beginning of the line
359
					This signals cat_csv that this is a header override.
360
				For empty names, use their 0-based column # (by convention)
361
				For duplicate names, add a distinguishing suffix
362
				For long names that collided, rename them to <= 63 chars long
363
				Do NOT make readability changes in this step; that is what the
364
					map spreadsheets (below) are for.
365
				Save
366
		If you made any changes, re-run the install command above
367
	Auto-create the map spreadsheets: make inputs/<datasrc>/
368
	Map each table's columns:
369
		In each <table>/ subdir, for each "via map" map.csv:
370
			Open the map in a spreadsheet editor
371
			Open the "core map" /mappings/Veg+-VegBIEN.csv
372
			In each row of the via map, set the right column to a value from the
373
				left column of the core map
374
			Save
375
		Regenerate the derived maps: make inputs/<datasrc>/
376
	Accept the test cases:
377
		For a new-style datasource:
378
			inputs/<datasrc>/run
379
			svn di inputs/<datasrc>/*/test.xml.ref
380
			If you get errors, follow the steps for old-style datasources below
381
		For an old-style datasource:
382
			make inputs/<datasrc>/test
383
			When prompted to "Accept new test output", enter y and press ENTER
384
			If you instead get errors, do one of the following for each one:
385
			-	If the error was due to a bug, fix it
386
			-	Add a SQL function that filters or transforms the invalid data
387
			-	Make an empty mapping for the columns that produced the error.
388
				Put something in the Comments column of the map spreadsheet to
389
				prevent the automatic mapper from auto-removing the mapping.
390
			When accepting tests, it's helpful to use WinMerge
391
				(see WinMerge setup below for configuration)
392
		make inputs/<datasrc>/test by_col=1
393
			If you get errors this time, this always indicates a bug, usually in
394
				the VegBIEN unique constraints or column-based import itself
395
	Add newly-created files: make inputs/<datasrc>/add
396
	Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
397
	Update vegbiendev:
398
		ssh aaronmk@jupiter.nceas.ucsb.edu
399
			up
400
		On local machine:
401
			./fix_perms
402
			make inputs/upload
403
			make inputs/upload live=1
404
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
405
			up
406
			make inputs/download
407
			make inputs/download live=1
408
			Follow the steps under Install the staging tables above
409

    
410
Maintenance:
411
	on a live machine, you should put the following in your .profile:
412
--
413
# make svn files web-accessible. this does not affect unversioned files, because
414
# these get the right permissions on the local machine instead.
415
umask ug=rwx,o=rx
416

    
417
unset TMOUT # TMOUT causes screen to exit even with background processes
418
--
419
	if http://vegbiendev.nceas.ucsb.edu/phppgadmin/ goes down:
420
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
421
			make phppgadmin-Linux
422
	regularly, re-run full-database import so that bugs in it don't pile up.
423
		it needs to be kept in working order so that it works when it's needed.
424
	to back up the vegbiendev databases:
425
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
426
		back up MySQL: # usually few changes, so do this first
427
			backups/mysql_snapshot
428
			l=1        overwrite=1 inplace=1 local_dir=/ remote_url="$USER@jupiter:/data/dev/aaronmk/Documents/BIEN/" subpath=/var/lib/mysql.bak/ sudo -E env PATH="$PATH" bin/sync_upload
429
			on local machine:
430
			l=1 swap=1 overwrite=1 inplace=1 local_dir=~ sync_remote_subdir=                          subpath=~/Documents/BIEN/var/lib/mysql.bak/                          bin/sync_upload
431
		back up Postgres:
432
			backups/pg_snapshot
433
	to synchronize vegbiendev, jupiter, and your local machine:
434
		**WARNING**: pay careful attention to all files that will be deleted or
435
			overwritten!
436
		install put if needed:
437
			download https://uutils.googlecode.com/svn/trunk/bin/put to ~/bin/ and `chmod +x` it
438
		when changes are made on vegbiendev:
439
			avoid extraneous diffs when rsyncing:
440
				on local machine:
441
					up; ./fix_perms
442
				ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
443
					up; ./fix_perms
444
				ssh aaronmk@jupiter.nceas.ucsb.edu
445
					up; ./fix_perms
446
			ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
447
				upload:
448
				overwrite=1 bin/sync_upload --size-only
449
					then review diff, and rerun with `l=1` prepended
450
			on your machine:
451
				download:
452
				overwrite=1 swap=1 src=. dest='aaronmk@jupiter.nceas.ucsb.edu:~/bien' put --exclude=.svn web/BIEN3/TWiki
453
					then review diff, and rerun with `l=1` prepended
454
				swap=1 bin/sync_upload backups/TNRS.backup
455
					then review diff, and rerun with `l=1` prepended
456
				overwrite=1 swap=1 bin/sync_upload --size-only
457
					then review diff, and rerun with `l=1` prepended
458
				overwrite=1 sync_remote_url=~/Dropbox/svn/ bin/sync_upload --existing --size-only # just update mtimes/perms
459
					then review diff, and rerun with `l=1` prepended
460
	to back up e-mails:
461
		on local machine:
462
		/Applications/gmvault-v1.8.1-beta/bin/gmvault sync --multiple-db-owner --type quick aaronmk.nceas@gmail.com
463
		open Thunderbird
464
		click the All Mail folder for each account and wait for it to download the e-mails in it
465
	to back up the version history:
466
		# back up first on the local machine, because often only the svnsync
467
			command gets run, and that way it will get backed up immediately to
468
			Dropbox (and hourly to Time Machine), while vegbiendev only gets
469
			backed up daily to tape
470
		on local machine:
471
		svnsync sync file://"$HOME"/Dropbox/docs/BIEN/svn_repo/ # initial runtime: 1.5 h ("08:21:38" - "06:45:26") @vegbiendev
472
		(cd ~/Dropbox/docs/BIEN/git/; git svn fetch)
473
		# use absolute path for vegbiendev commands because the Ubuntu 14.04
474
			version of rsync doesn't expand ~ properly
475
		overwrite=1        src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/svn_repo/ # runtime: 1 min ("1:05.08")
476
			then review diff, and rerun with `l=1` prepended
477
		overwrite=1        src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/git/
478
			then review diff, and rerun with `l=1` prepended
479
	to back up vegbiendev:
480
		do steps under Maintenance > "to synchronize vegbiendev, jupiter, and
481
			your local machine"
482
		on local machine:
483
			l=1 overwrite=1 inplace=1 src=root@vegbiendev.nceas.ucsb.edu:/ dest=~/Documents/BIEN/vegbiendev/ sudo -E put --exclude=/var/lib/mysql.bak --exclude=/var/lib/postgresql.bak --exclude='/var/lib/postgresql/9.3/main/*/' --exclude=/home/aaronmk/bien
484
			# enable --link-dest to work:
485
				chmod -R o+r ~/bien/.svn/; find ~/bien/.svn -type d -exec chmod o+rx {} \; # match perms
486
				l=1 overwrite=1 del= src='aaronmk@vegbiendev.nceas.ucsb.edu:~/bien/' dest=~/bien/ put --existing --size-only .svn/pristine/ # match times and perms
487
			l=1 overwrite=1 inplace=1 src=aaronmk@vegbiendev.nceas.ucsb.edu:/ dest=~/Documents/BIEN/vegbiendev/ sudo -E put --link-dest="$HOME"/Documents/BIEN/svn/ --no-owner --no-group home/aaronmk/bien/
488
				# --no-owner --no-group: needed to allow --link-dest to work
489
				# --link-dest: relative to dest, not currdir, so need abs path
490
	to back up the local machine's settings:
491
		do step when changes are made on vegbiendev > on your machine, download
492
		ssh aaronmk@jupiter.nceas.ucsb.edu
493
			(cd ~/Dropbox/svn/; up)
494
		on your machine:
495
			sudo find / -name .DS_Store -print -delete
496
			rm ~/'Library/Thunderbird/Profiles/9oo8rcyn.default/ImapMail/imap.googlemail.com/[Gmail].sbd/Spam'
497
				# remove the downloaded Spam folder, because spam e-mails often contain viruses that would trigger clamscan
498
			overwrite=1           sync_local_dir=~/Dropbox/svn/ sync_remote_subdir=Dropbox/svn/ bin/sync_upload --size-only # just update mtimes
499
				then review diff, and rerun with `l=1` prepended
500
			overwrite=1 inplace=1 sync_local_dir=~/             sync_remote_subdir=             bin/sync_upload ~/"VirtualBox VMs/**" # need inplace=1 because they are very large files
501
				then review diff, and rerun with `l=1` prepended
502
			overwrite=1           sync_local_dir=~/             sync_remote_subdir= sudo -E     bin/sync_upload --exclude="/Library/Saved Application State/" --exclude="/.Trash/" --exclude="/bin/" --exclude="/bin/pg_ctl" --exclude="/bin/unzip" --exclude="/Dropbox/home/" --exclude="/.profile" --exclude="/.shrc" --exclude="/.bashrc" --exclude="/software/**/.svn/"
503
				# sudo -E: needed for Documents/BIEN/vegbiendev*/
504
				then review diff, and rerun with `l=1` prepended
505
			pause Dropbox: system tray > Dropbox icon > gear icon > Pause Syncing
506
				this prevents Dropbox from trying to capture filesystem
507
				events while syncing
508
			overwrite=1           sync_local_dir=~/             sync_remote_url=~/Dropbox/home/ bin/sync_upload --exclude="/Library/Saved Application State/" --exclude="/.Trash/" --exclude="/.dropbox/" --exclude="/Documents/BIEN/" --exclude="/Dropbox/" --exclude=/gmvault-db/ --exclude="/software/" --exclude="/VirtualBox VMs/**.sav" --exclude="/VirtualBox VMs/**.vdi" --exclude="/VirtualBox VMs/**.vmdk"
509
				then review diff, and rerun with `l=1` prepended
510
			resume Dropbox: system tray > Dropbox icon > gear icon > Resume Syncing
511
	to backup files not in Time Machine:
512
		**IMPORTANT**: need to use 2 TB external hard drive instead of Time
513
			Machine drive because Time Machine drive does not have
514
			~/Documents/BIEN/ in a location where it can be hardlinked against
515
		On local machine:
516
		on first run, create parent dirs:
517
			sudo mkdir -p '/Volumes/BIEN3.**SAVE**/Users/aaronmk/Documents/BIEN/'
518
			sudo mkdir -p '/Volumes/BIEN3.**SAVE**/usr/local/var/postgres/'
519
			l=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --existing
520
		l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --include='/vegbiendev**' --exclude='**' Users/aaronmk/Documents/BIEN/
521
			# this cannot be backed up by Time Machine because it dereferences hard links:
522
			#  `sudo find /Volumes/Time\ Machine\ Backups/Backups.backupdb/ ! -type d -links +1`
523
			#  returns no files when there is a single timestamped backup, but
524
			#  `sudo find / ! -type d -links +1` does
525
		l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put usr/local/var/postgres/
526
			# this cannot be backed up by Time Machine because it prevents the backup process from ending
527
		launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist # stop the PostgreSQL server
528
		l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put usr/local/var/postgres/
529
		launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist # start the PostgreSQL server
530
	to back up the local machine's hard drive:
531
		turn on and connect the 2 TB external hard drive
532
		screen
533
		# --exclude='/\**': exclude *-files indicating the (differing) retention
534
		#  statuses of the partitions involved
535
		pause Dropbox: system tray > Dropbox icon > gear icon > Pause Syncing
536
			otherwise, the backup of ~/.dropbox will be corrupted
537
		launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist # stop the PostgreSQL server
538
		l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --exclude='/\**' --exclude=/.fseventsd/ --exclude=/private/var/vm/
539
			# no --extended-attributes: rsync has to visit every file for this
540
			# runtime: 10 min (~600); initial runtime: 4-13 h ("2422.84"+"12379.91" .. "45813.19"+"747.96")
541
		launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist # start the PostgreSQL server
542
		resume Dropbox: system tray > Dropbox icon > gear icon > Resume Syncing
543
	to restore from Time Machine:
544
		# restart holding Alt
545
		# select Time Machine Backups
546
		# restore the last Time Machine backup to Macintosh HD
547
		# restart holding Alt
548
		# select Macintosh HD
549
		$ screen
550
		$ l=1 swap=1 src=/ dest=/Volumes/Time\ Machine\ Backups/ sudo -E put usr/local/var/postgres/ # runtime: 1 h ("4020.61")
551
		$ make postgres_restart
552
	VegCore data dictionary:
553
		Regularly, or whenever the VegCore data dictionary page
554
			(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
555
			is changed, regenerate mappings/VegCore.csv:
556
			On local machine:
557
			make mappings/VegCore.htm-remake; make mappings/
558
			apply new data dict mappings to datasource mappings/staging tables:
559
				inputs/run postprocess # runtime: see inputs/run
560
				time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
561
			svn di mappings/VegCore.tables.redmine
562
			If there are changes, update the data dictionary's Tables section
563
			When moving terms, check that no terms were lost: svn di
564
			svn ci -m 'mappings/VegCore.htm: regenerated from wiki'
565
			ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
566
				perform the steps under "apply new data dict mappings to
567
					datasource mappings/staging tables" above
568
	Important: Whenever you install a system update that affects PostgreSQL or
569
		any of its dependencies, such as libc, you should restart the PostgreSQL
570
		server. Otherwise, you may get strange errors like "the database system
571
		is in recovery mode" which go away upon reimport, or you may not be able
572
		to access the database as the postgres superuser. This applies to both
573
		Linux and Mac OS X.
574

    
575
Backups:
576
	Archived imports:
577
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
578
		Back up: make backups/<version>.backup &
579
			Note: To back up the last import, you must archive it first:
580
				make schemas/rotate
581
		Test: make -s backups/<version>.backup/test &
582
		Restore: make backups/<version>.backup/restore &
583
		Remove: make backups/<version>.backup/remove
584
		Download: make backups/<version>.backup/download
585
	TNRS cache:
586
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
587
		Back up: make backups/TNRS.backup-remake &
588
			runtime: 3 min ("real 2m48.859s")
589
		Restore:
590
			yes|make inputs/.TNRS/uninstall
591
			make backups/TNRS.backup/restore &
592
				runtime: 5.5 min ("real 5m35.829s")
593
			yes|make schemas/public/reinstall
594
				Must come after TNRS restore to recreate tnrs_input_name view
595
	Full DB:
596
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
597
		Back up: make backups/vegbien.<version>.backup &
598
		Test: make -s backups/vegbien.<version>.backup/test &
599
		Restore: make backups/vegbien.<version>.backup/restore &
600
		Download: make backups/vegbien.<version>.backup/download
601
	Import logs:
602
		On local machine:
603
		Download: make inputs/download-logs live=1
604

    
605
Datasource refreshing:
606
	VegBank:
607
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
608
		make inputs/VegBank/vegbank.sql-remake
609
		make inputs/VegBank/reinstall quiet=1 &
610

    
611
Schema changes:
612
	On local machine:
613
	When changing the analytical views, run sync_analytical_..._to_view()
614
		to update the corresponding table
615
	Remember to update the following files with any renamings:
616
		schemas/filter_ERD.csv
617
		mappings/VegCore-VegBIEN.csv
618
		mappings/verify.*.sql
619
	Regenerate schema from installed DB: make schemas/remake
620
	Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
621
		**WARNING**: This will delete the public schema of your VegBIEN DB!
622
	If needed, reinstall staging tables:
623
		On local machine:
624
			sudo -E -u postgres psql <<<'ALTER DATABASE vegbien RENAME TO vegbien_prev'
625
			make db
626
			. bin/reinstall_all
627
			Fix any bugs and retry until no errors
628
			make schemas/public/install
629
				This must be run *after* the datasources are installed, because
630
				views in public depend on some of the datasources
631
			sudo -E -u postgres psql <<<'DROP DATABASE vegbien_prev'
632
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
633
			repeat the above steps
634
			**WARNING**: Do not run this until reinstall_all runs successfully
635
			on the local machine, or the live DB may be unrestorable!
636
	update mappings and staging table column names:
637
		on local machine:
638
			inputs/run postprocess # runtime: see inputs/run
639
			time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
640
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
641
			manually apply schema changes to the live public schema
642
			do steps under "on local machine" above
643
	Sync ERD with vegbien.sql schema:
644
		Run make schemas/vegbien.my.sql
645
		Open schemas/vegbien.ERD.mwb in MySQLWorkbench
646
		Go to File > Export > Synchronize With SQL CREATE Script...
647
		For Input File, select schemas/vegbien.my.sql
648
		Click Continue
649
		In the changes list, select each table with an arrow next to it
650
		Click Update Model
651
		Click Continue
652
		Note: The generated SQL script will be empty because we are syncing in
653
			the opposite direction
654
		Click Execute
655
		Reposition any lines that have been reset
656
		Add any new tables by dragging them from the Catalog in the left sidebar
657
			to the diagram
658
		Remove any deleted tables by right-clicking the table's diagram element,
659
			selecting Delete '<table name>', and clicking Delete
660
		Save
661
		If desired, update the graphical ERD exports (see below)
662
	Update graphical ERD exports:
663
		Go to File > Export > Export as PNG...
664
		Select schemas/vegbien.ERD.png and click Save
665
		Go to File > Export > Export as SVG...
666
		Select schemas/vegbien.ERD.svg and click Save
667
		Go to File > Export > Export as Single Page PDF...
668
		Select schemas/vegbien.ERD.1_pg.pdf and click Save
669
		Go to File > Print...
670
		In the lower left corner, click PDF > Save as PDF...
671
		Set the Title and Author to ""
672
		Select schemas/vegbien.ERD.pdf and click Save
673
		Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
674
	Refactoring tips:
675
		To rename a table:
676
			In vegbien.sql, do the following:
677
				Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
678
					This is necessary because the table name is *everywhere*
679
				Search for <new>
680
				Manually change back any replacements inside comments
681
		To rename a column:
682
			Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
683
			Recreate any foreign key for the column, removing CONSTRAINT <name>
684
				This resets the foreign key name using the new column name
685
	Creating a poster of the ERD:
686
		Determine the poster size:
687
			Measure the line height (from the bottom of one line to the bottom
688
				of another): 16.3cm/24 lines = 0.679cm
689
			Measure the height of the ERD: 35.4cm*2 = 70.8cm
690
			Zoom in as far as possible
691
			Measure the height of a capital letter: 3.5mm
692
			Measure the line height: 8.5mm
693
			Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
694
			Calculate the text height: 0.679cm*0.41 = 0.28cm
695
			Calculate the text height's fraction of the ERD height:
696
				0.28cm/70.8cm = 0.0040
697
			Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
698
			Calculate the VegBIEN poster height to make the text the same size:
699
				0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
700
			The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
701
			Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
702
			The minimum VegBIEN poster size is 35x54in portrait
703
		Determine the cost:
704
			The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
705
				charges the following for posters:
706
				base: $7.25/sq ft
707
				lamination: $3/sq ft
708
				mounting on a board: $8/sq ft
709

    
710
Testing:
711
	On a development machine, you should put the following in your .profile:
712
		umask ug=rwx,o= # prevent files from becoming web-accessible
713
		export log= n=2
714
	For development machine specs, see /planning/resources/dev_machine.specs/
715
	On local machine:
716
	Mapping process: make test
717
		Including column-based import: make test by_col=1
718
			If the row-based and column-based imports produce different inserted
719
			row counts, this usually means that a table is underconstrained
720
			(the unique indexes don't cover all possible rows).
721
			This can occur if you didn't use COALESCE(field, null_value) around
722
			a nullable field in a unique index. See sql_gen.null_sentinels for
723
			the appropriate null value to use.
724
	Map spreadsheet generation: make remake
725
	Missing mappings: make missing_mappings
726
	Everything (for most complete coverage): make test-all
727

    
728
Debugging:
729
	"Binary chop" debugging:
730
		(This is primarily useful for regressions that occurred in a previous
731
		revision, which was committed without running all the tests)
732
		up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
733
	.htaccess:
734
		mod_rewrite:
735
			**IMPORTANT**: whenever you change the DirectorySlash setting for a
736
				directory, you *must* clear your browser's cache to ensure that
737
				a cached redirect is not used. this is because RewriteRule
738
				redirects are (by default) temporary, but DirectorySlash
739
				redirects are permanent.
740
				for Firefox:
741
					press Cmd+Shift+Delete
742
					check only Cache
743
					press Enter or click Clear Now
744

    
745
WinMerge setup:
746
	In a Windows VM:
747
	Install WinMerge from <http://winmerge.org/>
748
	Open WinMerge
749
	Go to Edit > Options and click Compare in the left sidebar
750
	Enable "Moved block detection", as described at
751
		<http://manual.winmerge.org/Configuration.html#d0e5892>.
752
	Set Whitespace to Ignore change, as described at
753
		<http://manual.winmerge.org/Configuration.html#d0e5758>.
754

    
755
Documentation:
756
	To generate a Redmine-formatted list of steps for column-based import:
757
		On local machine:
758
		make schemas/public/reinstall
759
		make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
760
	To import and scrub just the test taxonomic names:
761
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
762
		inputs/test_taxonomic_names/test_scrub
763

    
764
General:
765
	To see a program's description, read its top-of-file comment
766
	To see a program's usage, run it without arguments
767
	To remake a directory: make <dir>/remake
768
	To remake a file: make <file>-remake
(6-6/11)