Project

General

Profile

1
Installation:
2
	Check out svn: svn co https://code.nceas.ucsb.edu/code/projects/bien
3
	cd bien/
4
	Install: make install
5
		**WARNING**: This will delete the public schema of your VegBIEN DB!
6
	Uninstall: make uninstall
7
		**WARNING**: This will delete your entire VegBIEN DB!
8
		This includes all archived imports and staging tables.
9

    
10
Connecting to vegbiendev:
11
	ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
12
	cd /home/bien/svn # should happen automatically at login
13

    
14
Notes on running programs:
15
	**WARNING**: always start with a clean shell, to avoid spurious bugs. the
16
		shell should not have changes to the env vars. (there have been bugs
17
		that went away after closing and reopening the terminal window.) note
18
		that running `exec bash` is not sufficient to *reset* the env vars.
19

    
20
Notes on editing files:
21
	**WARNING**: shell scripts should always be read-only, so that editing them
22
		while an import is in progress will not crash the import (see
23
		http://vegpath.org/links/#**%20modifying%20a%20running%20shell%20script)
24

    
25
Single datasource import:
26
	ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
27
	(Re)import and scrub: make inputs/<datasrc>/reimport_scrub by_col=1 &
28
	(Re)import only: make inputs/<datasrc>/reimport by_col=1 &
29
	Note that these commands also work if the datasource is not yet imported
30
	Remake analytical DB: see Full database import > To remake analytical DB
31

    
32
Full database import:
33
	**WARNING**: You must perform *every single* step listed below, to avoid
34
		breaking column-based import
35
	**WARNING**: always start with a clean shell, as described above under
36
		"Notes on running programs"
37
	**IMPORTANT**: the beginning of the import should be scheduled at a time
38
		when the DB will not be needed for other uses. this is necessary because
39
		vegbiendev will be slow for the first few hours of the import, due to
40
		the import using all the available cores.
41
	do steps under Maintenance > "to synchronize vegbiendev, jupiter, and
42
		your local machine"
43
	On local machine:
44
		make inputs/upload
45
		make inputs/upload live=1
46
		make test by_col=1 # runtime: 20 min ("4m46.108s" + ("21:50:43" - "21:37:09")) @starscream
47
			if you encounter errors, they are most likely related to the
48
				PostgreSQL error parsing in /lib/sql.py parse_exception()
49
			See note under Testing below
50
	ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
51
	Ensure there are no local modifications: svn st
52
	up
53
	make inputs/download
54
	make inputs/download live=1
55
	For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
56
	Update the auxiliary schemas: make schemas/reinstall
57
		**WARNING**: requires sudo access!
58
		The public schema will be installed separately by the import process
59
	Delete imports before the last so they won't bloat the full DB backup:
60
		make backups/vegbien.<version>.backup/remove
61
		To keep a previous import other than the public schema:
62
			export dump_opts='--exclude-schema=public --exclude-schema=<version>'
63
			# env var will be inherited by `screen` shell
64
	restart Postgres to free up any disk space used by temp tables from the last
65
		import (this is apparently not automatically reclaimed):
66
		make postgres_restart
67
	Make sure there is at least 500GB of disk space on /: df -h
68
		The import schema is 315GB, and may use significant additional space for
69
			temp tables
70
		To free up space, remove backups that have been archived on jupiter:
71
			List backups/ to view older backups
72
			Check their MD5 sums using the steps under On jupiter below
73
			Remove these backups
74
	unset version # clear any version from last import, etc.
75
	if no commits have been made since the last import (eg. if retrying an
76
		import), set a custom version that differs from the auto-assigned one
77
		(would otherwise cause a collision with the last import):
78
		svn info
79
		extract the svn revision after "Revision:"
80
		export version=r[revision]_2 # +suffix to distinguish from last import
81
			# env var will be inherited by `screen` shell
82
	screen
83
	Press ENTER
84
	unset TMOUT # TMOUT causes screen to exit even with background processes
85
	set -o ignoreeof #prevent Ctrl+D from exiting `screen` to keep attached jobs
86
	Start column-based import: . bin/import_all
87
		To use row-based import: . bin/import_all by_col=
88
		To stop all running imports: . bin/stop_imports
89
		**WARNING**: Do NOT run import_all in the background, or the jobs it
90
			creates won't be owned by your shell.
91
		Note that import_all will take up to an hour to import the NCBI backbone
92
			and other metadata before returning control to the shell.
93
		To view progress:
94
			tail inputs/{.,}*/*/logs/$version.log.sql
95
	note: at the beginning of the import, the system may send out CPU load
96
		warning e-mails. these can safely be ignored. (they happen because the
97
		parallel imports use all the available cores.)
98
	Wait (4 days) for the import to finish
99
	To recover from a closed terminal window: screen -r
100
	To restart an aborted import for a specific table:
101
		export version=<version>
102
		(set -o errexit; make inputs/<datasrc>/<table>/import_scrub by_col=1 continue=1; make inputs/<datasrc>/publish) &
103
		bin/after_import $! & # $! can also be obtained from `jobs -l`
104
	Get $version: echo $version
105
	Set $version in all vegbiendev terminals: export version=<version>
106
	When there are no more running jobs, exit `screen`: exit # not Ctrl+D
107
	ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
108
		Upload logs: make inputs/upload live=1
109
	On local machine: make inputs/download-logs live=1
110
	In PostgreSQL:
111
		Go to wiki.vegpath.org/VegBIEN_contents
112
		Get the # observations
113
		Get the # datasources
114
		Get the # datasources with observations
115
		in the r# schema:
116
		Check that analytical_stem contains [# observations] rows
117
		Check that source contains [# datasources] rows up through XAL. If this
118
			is not the case, manually check the entries in source against the
119
			datasources list on the wiki page (some datasources may be near the
120
			end depending on import order).
121
		Check that provider_count contains [# datasources with observations]
122
			rows with dataset="(total)" (at the top when the table is unsorted)
123
	Check that TNRS ran successfully:
124
		tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
125
		If the log ends in an AssertionError
126
			"assert sql.table_col_names(db, table) == header":
127
			Figure out which TNRS CSV columns have changed
128
			On local machine:
129
				Make the changes in the DB's TNRS and public schemas
130
				rm=1 inputs/.TNRS/schema.sql.run export_
131
				make schemas/remake
132
				inputs/test_taxonomic_names/test_scrub # re-run TNRS
133
				rm=1 inputs/.TNRS/data.sql.run export_
134
				Commit
135
			ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
136
				If dropping a column, save the dependent views
137
				Make the same changes in the live TNRS.tnrs table on vegbiendev
138
				If dropping a column, recreate the dependent views
139
				Restart the TNRS client: make scrub by_col=1 &
140
	tail inputs/{.,}*/*/logs/$version.log.sql
141
	In the output, search for "Command exited with non-zero status"
142
	For inputs that have this, fix the associated bug(s)
143
	If many inputs have errors, discard the current (partial) import:
144
		make schemas/$version/uninstall
145
	Otherwise, continue
146
	Publish the new import:
147
		**WARNING**: Before proceeding, be sure you have done *every single*
148
			verification step listed above. Otherwise, a previous valid import
149
			could incorrectly be overwritten with a broken one.
150
		make schemas/$version/publish # runtime: 1 min ("real 1m10.451s")
151
	unset version
152
	make backups/upload live=1
153
	on local machine:
154
		make backups/vegbien.$version.backup/download live=1
155
			# download backup to local machine
156
	ssh aaronmk@jupiter.nceas.ucsb.edu
157
		cd /data/dev/aaronmk/bien/backups
158
		For each newly-archived backup:
159
			make -s <backup>.md5/test
160
			Check that "OK" is printed next to the filename
161
	If desired, record the import times in inputs/import.stats.xls:
162
		On local machine:
163
		Open inputs/import.stats.xls
164
		If the rightmost import is within 5 columns of column IV:
165
			Copy the current tab to <leftmost-date>~<rightmost-date>
166
			Remove the previous imports from the current tab because they are
167
				now in the copied tab instead
168
		Insert a copy of the leftmost "By column" column group before it
169
		export version=<version>
170
		bin/import_date inputs/{.,}*/*/logs/$version.log.sql
171
		Update the import date in the upper-right corner
172
		bin/import_times inputs/{.,}*/*/logs/$version.log.sql
173
		Paste the output over the # Rows/Time columns, making sure that the
174
			row counts match up with the previous import's row counts
175
		If the row counts do not match up, insert or reorder rows as needed
176
			until they do. Get the datasource names from the log file footers:
177
			tail inputs/{.,}*/*/logs/$version.log.sql
178
		Commit: svn ci -m 'inputs/import.stats.xls: updated import times'
179
	Running individual steps separately:
180
	To run TNRS:
181
		To use an import other than public: export version=<version>
182
		make scrub &
183
		To view progress:
184
			tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
185
	To remake analytical DB:
186
		To use an import other than public: export version=<version>
187
		bin/make_analytical_db & # runtime: 13 h ("12:43:57elapsed")
188
		To view progress:
189
			tail -150 inputs/analytical_db/logs/make_analytical_db.log.sql
190
	To back up DB (staging tables and last import):
191
		To use an import *other than public*: export version=<version>
192
		make backups/TNRS.backup-remake &
193
		dump_opts=--exclude-schema=public make backups/vegbien.$version.backup/test &
194
			If after renaming to public, instead set dump_opts='' and replace
195
			$version with the appropriate revision
196
		make backups/upload live=1
197

    
198
Datasource setup:
199
	On local machine:
200
	Example steps for a datasource: wiki.vegpath.org/Import_process_for_Madidi
201
	umask ug=rwx,o= # prevent files from becoming web-accessible
202
	Add a new datasource: make inputs/<datasrc>/add
203
		<datasrc> may not contain spaces, and should be abbreviated.
204
		If the datasource is a herbarium, <datasrc> should be the herbarium code
205
			as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
206
	For a new-style datasource (one containing a ./run runscript):
207
		"cp" -f inputs/.NCBI/{Makefile,run,table.run} inputs/<datasrc>/
208
	For MySQL inputs (exports and live DB connections):
209
		For .sql exports:
210
			Place the original .sql file in _src/ (*not* in _MySQL/)
211
			Follow the steps starting with Install the staging tables below.
212
				This is for an initial sync to get the file onto vegbiendev.
213
			ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
214
				Create a database for the MySQL export in phpMyAdmin
215
				Give the bien user all database-specific privileges *except*
216
					UPDATE, DELETE, ALTER, DROP. This prevents bugs in the
217
					import scripts from accidentally deleting data.
218
				bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
219
		mkdir inputs/<datasrc>/_MySQL/
220
		cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
221
		Edit _MySQL/*.make for the DB connection
222
			For a .sql export, use server=vegbiendev and --user=bien
223
		Skip the Add input data for each table section
224
	For MS Access databases:
225
		Place the .mdb or .accdb file in _src/
226
		Download and install Access To PostgreSQL from
227
			http://www.bullzip.com/download.php
228
		Use Access To PostgreSQL to export the database:
229
			Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
230
			Export just the data to inputs/<datasrc>/<file>.data.sql
231
		In <file>.schema.sql, make the following changes:
232
			Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
233
			Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
234
		Skip the Add input data for each table section
235
	Add input data for each table present in the datasource:
236
		For .sql exports, you must use the name of the table in the DB export
237
		For CSV files, you can use any name. It's recommended to use a table
238
			name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
239
		Note that if this table will be joined together with another table, its
240
			name must end in ".src"
241
		make inputs/<datasrc>/<table>/add
242
			Important: DO NOT just create an empty directory named <table>!
243
				This command also creates necessary subdirs, such as logs/.
244
		If the table is in a .sql export: make inputs/<datasrc>/<table>/install
245
			Otherwise, place the CSV(s) for the table in
246
			inputs/<datasrc>/<table>/ OR place a query joining other tables
247
			together in inputs/<datasrc>/<table>/create.sql
248
		Important: When exporting relational databases to CSVs, you MUST ensure
249
			that embedded quotes are escaped by doubling them, *not* by
250
			preceding them with a "\" as is the default in phpMyAdmin
251
		If there are multiple part files for a table, and the header is repeated
252
			in each part, make sure each header is EXACTLY the same.
253
			(If the headers are not the same, the CSV concatenation script
254
			assumes the part files don't have individual headers and treats the
255
			subsequent headers as data rows.)
256
		Add <table> to inputs/<datasrc>/import_order.txt before other tables
257
			that depend on it
258
		For a new-style datasource:
259
			"cp" -f inputs/.NCBI/nodes/run inputs/<datasrc>/<table>/
260
			inputs/<datasrc>/<table>/run
261
	Install the staging tables:
262
		make inputs/<datasrc>/reinstall quiet=1 &
263
		For a MySQL .sql export:
264
			At prompt "[you]@vegbiendev's password:", enter your password
265
			At prompt "Enter password:", enter the value in config/bien_password
266
		To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
267
		View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
268
			tail provides a header line with the filename
269
			+1 starts at the first line, to show the whole file
270
		For every file with an error 'column "..." specified more than once':
271
			Add a header override file "+header.<ext>" in <table>/:
272
				Note: The leading "+" should sort it before the flat files.
273
					"_" unfortunately sorts *after* capital letters in ASCII.
274
				Create a text file containing the header line of the flat files
275
				Add an ! at the beginning of the line
276
					This signals cat_csv that this is a header override.
277
				For empty names, use their 0-based column # (by convention)
278
				For duplicate names, add a distinguishing suffix
279
				For long names that collided, rename them to <= 63 chars long
280
				Do NOT make readability changes in this step; that is what the
281
					map spreadsheets (below) are for.
282
				Save
283
		If you made any changes, re-run the install command above
284
	Auto-create the map spreadsheets: make inputs/<datasrc>/
285
	Map each table's columns:
286
		In each <table>/ subdir, for each "via map" map.csv:
287
			Open the map in a spreadsheet editor
288
			Open the "core map" /mappings/Veg+-VegBIEN.csv
289
			In each row of the via map, set the right column to a value from the
290
				left column of the core map
291
			Save
292
		Regenerate the derived maps: make inputs/<datasrc>/
293
	Accept the test cases:
294
		For a new-style datasource:
295
			inputs/<datasrc>/run
296
			svn di inputs/<datasrc>/*/test.xml.ref
297
			If you get errors, follow the steps for old-style datasources below
298
		For an old-style datasource:
299
			make inputs/<datasrc>/test
300
			When prompted to "Accept new test output", enter y and press ENTER
301
			If you instead get errors, do one of the following for each one:
302
			-	If the error was due to a bug, fix it
303
			-	Add a SQL function that filters or transforms the invalid data
304
			-	Make an empty mapping for the columns that produced the error.
305
				Put something in the Comments column of the map spreadsheet to
306
				prevent the automatic mapper from auto-removing the mapping.
307
			When accepting tests, it's helpful to use WinMerge
308
				(see WinMerge setup below for configuration)
309
		make inputs/<datasrc>/test by_col=1
310
			If you get errors this time, this always indicates a bug, usually in
311
				the VegBIEN unique constraints or column-based import itself
312
	Add newly-created files: make inputs/<datasrc>/add
313
	Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
314
	Update vegbiendev:
315
		ssh aaronmk@jupiter.nceas.ucsb.edu
316
			up
317
		On local machine:
318
			./fix_perms
319
			make inputs/upload
320
			make inputs/upload live=1
321
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
322
			up
323
			make inputs/download
324
			make inputs/download live=1
325
			Follow the steps under Install the staging tables above
326

    
327
Maintenance:
328
	on a live machine, you should put the following in your .profile:
329
--
330
# make svn files web-accessible. this does not affect unversioned files, because
331
# these get the right permissions on the local machine instead.
332
umask ug=rwx,o=rx
333

    
334
unset TMOUT # TMOUT causes screen to exit even with background processes
335
--
336
	if http://vegbiendev.nceas.ucsb.edu/phppgadmin/ goes down:
337
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
338
			make phppgadmin-Linux
339
	to synchronize vegbiendev, jupiter, and your local machine:
340
		**WARNING**: pay careful attention to all files that will be deleted or
341
			overwritten!
342
		install put if needed:
343
			download https://uutils.googlecode.com/svn/trunk/bin/put to ~/bin/ and `chmod +x` it
344
		when changes are made on vegbiendev:
345
			avoid extraneous diffs when rsyncing:
346
				on all machines:
347
				up
348
				./fix_perms
349
			ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
350
				upload:
351
				overwrite=1 bin/sync_upload --size-only
352
					then rerun with l=1 ...
353
			on your machine:
354
				download:
355
				overwrite=1 swap=1 src=. dest='aaronmk@jupiter.nceas.ucsb.edu:~/bien' put --exclude=.svn inputs/VegBIEN/TWiki
356
					then rerun with l=1 ...
357
				swap=1 bin/sync_upload backups/TNRS.backup
358
					then rerun with l=1 ...
359
				overwrite=1 swap=1 bin/sync_upload --size-only
360
					then rerun with l=1 ...
361
				overwrite=1 sync_remote_url=~/Dropbox/svn/ bin/sync_upload --existing --size-only # just update mtimes/perms
362
					then rerun with l=1 ...
363
	to back up e-mails:
364
		on local machine:
365
		/Applications/gmvault-v1.8.1-beta/bin/gmvault sync --multiple-db-owner --type quick aaronmk.nceas@gmail.com
366
		/Applications/gmvault-v1.8.1-beta/bin/gmvault sync --multiple-db-owner --type quick aaronmk@nceas.ucsb.edu
367
		open Thunderbird
368
		click the All Mail folder for each account and wait for it to download the e-mails in it
369
	to back up the version history:
370
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
371
		svnsync sync file:///home/aaronmk/Dropbox/docs/BIEN/svn_repo/ # initial runtime: 1.5 h ("08:21:38" - "06:45:26") @vegbiendev
372
		(cd ~/Dropbox/docs/BIEN/git/; git svn fetch)
373
		overwrite=1        src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:~' put Dropbox/docs/BIEN/svn_repo/ # runtime: 1 min ("1:05.08")
374
			then rerun with l=1 ...
375
		overwrite=1        src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:~' put Dropbox/docs/BIEN/git/
376
			then rerun with l=1 ...
377
		on local machine:
378
		overwrite=1 swap=1 src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:~' put Dropbox/docs/BIEN/svn_repo/ # runtime: 30 s ("36.19")
379
			then rerun with l=1 ...
380
		overwrite=1 swap=1 src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:~' put Dropbox/docs/BIEN/git/
381
			then rerun with l=1 ...
382
	to synchronize a Mac's settings with my testing machine's:
383
		download:
384
			**WARNING**: this will overwrite all your user's settings!
385
			on your machine:
386
			overwrite=1 swap=1 sync_local_dir=~/Library/ sync_remote_subdir=Library/ bin/sync_upload --exclude="/Saved Application State"
387
				then rerun with l=1 ...
388
		upload:
389
			do step when changes are made on vegbiendev > on your machine, download
390
			ssh aaronmk@jupiter.nceas.ucsb.edu
391
				(cd ~/Dropbox/svn/; up)
392
			on your machine:
393
				rm ~/'Library/Thunderbird/Profiles/9oo8rcyn.default/ImapMail/imap.googlemail.com/[Gmail].sbd/Spam'
394
					# remove the downloaded Spam folder, because spam e-mails often contain viruses that would trigger clamscan
395
				overwrite=1 del= sync_local_dir=~/Dropbox/svn/ sync_remote_subdir=Dropbox/svn/ bin/sync_upload --size-only # just update mtimes
396
					then rerun with l=1 ...
397
				overwrite=1      sync_local_dir=~              sync_remote_subdir=             bin/sync_upload --exclude="/Library/Saved Application State" --exclude="/.Trash" --exclude="/bin" --exclude="/bin/pg_ctl" --exclude="/bin/unzip" --exclude="/Dropbox/home" --exclude="/.profile" --exclude="/.shrc" --exclude="/.bashrc"
398
					then rerun with l=1 ...
399
				overwrite=1      sync_local_dir=~              sync_remote_url=~/Dropbox/home  bin/sync_upload --exclude="/Library/Saved Application State" --exclude="/.Trash" --exclude="/.dropbox" --exclude="/Documents/BIEN" --exclude="/Dropbox" --exclude="/software" --exclude="/VirtualBox VMs/**.sav" --exclude="/VirtualBox VMs/**.vdi" --exclude="/VirtualBox VMs/**.vmdk"
400
					then rerun with l=1 ...
401
		upload just the VirtualBox VMs:
402
			on your machine:
403
				overwrite=1      sync_local_dir=~              sync_remote_subdir=             bin/sync_upload ~/"VirtualBox VMs/**"
404
					then rerun with l=1 ...
405
	to backup files not in Time Machine:
406
		On local machine:
407
		overwrite=1 src=/ dest=/Volumes/Time\ Machine\ Backups/ sudo -E put Library/PostgreSQL/9.3/data/
408
			then rerun with l=1 ...
409
		pg_ctl. stop # stop the PostgreSQL server
410
		overwrite=1 src=/ dest=/Volumes/Time\ Machine\ Backups/ sudo -E put Library/PostgreSQL/9.3/data/
411
			then rerun with l=1 ...
412
		pg_ctl. start # start the PostgreSQL server
413
	VegCore data dictionary:
414
		Regularly, or whenever the VegCore data dictionary page
415
			(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
416
			is changed, regenerate mappings/VegCore.csv:
417
			On local machine:
418
			make mappings/VegCore.htm-remake; make mappings/
419
			apply new data dict mappings to datasource mappings/staging tables:
420
				inputs/run postprocess # runtime: see inputs/run
421
				time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
422
			svn di mappings/VegCore.tables.redmine
423
			If there are changes, update the data dictionary's Tables section
424
			When moving terms, check that no terms were lost: svn di
425
			svn ci -m 'mappings/VegCore.htm: regenerated from wiki'
426
			ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
427
				perform the steps under "apply new data dict mappings to
428
					datasource mappings/staging tables" above
429
	Important: Whenever you install a system update that affects PostgreSQL or
430
		any of its dependencies, such as libc, you should restart the PostgreSQL
431
		server. Otherwise, you may get strange errors like "the database system
432
		is in recovery mode" which go away upon reimport, or you may not be able
433
		to access the database as the postgres superuser. This applies to both
434
		Linux and Mac OS X.
435

    
436
Backups:
437
	Archived imports:
438
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
439
		Back up: make backups/<version>.backup &
440
			Note: To back up the last import, you must archive it first:
441
				make schemas/rotate
442
		Test: make -s backups/<version>.backup/test &
443
		Restore: make backups/<version>.backup/restore &
444
		Remove: make backups/<version>.backup/remove
445
		Download: make backups/<version>.backup/download
446
	TNRS cache:
447
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
448
		Back up: make backups/TNRS.backup-remake &
449
			runtime: 3 min ("real 2m48.859s")
450
		Restore:
451
			yes|make inputs/.TNRS/uninstall
452
			make backups/TNRS.backup/restore &
453
				runtime: 5.5 min ("real 5m35.829s")
454
			yes|make schemas/public/reinstall
455
				Must come after TNRS restore to recreate tnrs_input_name view
456
	Full DB:
457
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
458
		Back up: make backups/vegbien.<version>.backup &
459
		Test: make -s backups/vegbien.<version>.backup/test &
460
		Restore: make backups/vegbien.<version>.backup/restore &
461
		Download: make backups/vegbien.<version>.backup/download
462
	Import logs:
463
		On local machine:
464
		Download: make inputs/download-logs live=1
465

    
466
Datasource refreshing:
467
	VegBank:
468
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
469
		make inputs/VegBank/vegbank.sql-remake
470
		make inputs/VegBank/reinstall quiet=1 &
471

    
472
Schema changes:
473
	On local machine:
474
	When changing the analytical views, run sync_analytical_..._to_view()
475
		to update the corresponding table
476
	Remember to update the following files with any renamings:
477
		schemas/filter_ERD.csv
478
		mappings/VegCore-VegBIEN.csv
479
		mappings/verify.*.sql
480
	Regenerate schema from installed DB: make schemas/remake
481
	Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
482
		**WARNING**: This will delete the public schema of your VegBIEN DB!
483
	If needed, reinstall staging tables:
484
		On local machine:
485
			sudo -E -u postgres psql <<<'ALTER DATABASE vegbien RENAME TO vegbien_prev'
486
			make db
487
			. bin/reinstall_all
488
			Fix any bugs and retry until no errors
489
			make schemas/public/install
490
				This must be run *after* the datasources are installed, because
491
				views in public depend on some of the datasources
492
			sudo -E -u postgres psql <<<'DROP DATABASE vegbien_prev'
493
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
494
			repeat the above steps
495
			**WARNING**: Do not run this until reinstall_all runs successfully
496
			on the local machine, or the live DB may be unrestorable!
497
	update mappings and staging table column names:
498
		on local machine:
499
			inputs/run postprocess # runtime: see inputs/run
500
			time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
501
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
502
			manually apply schema changes to the live public schema
503
			do steps under "on local machine" above
504
	Sync ERD with vegbien.sql schema:
505
		Run make schemas/vegbien.my.sql
506
		Open schemas/vegbien.ERD.mwb in MySQLWorkbench
507
		Go to File > Export > Synchronize With SQL CREATE Script...
508
		For Input File, select schemas/vegbien.my.sql
509
		Click Continue
510
		In the changes list, select each table with an arrow next to it
511
		Click Update Model
512
		Click Continue
513
		Note: The generated SQL script will be empty because we are syncing in
514
			the opposite direction
515
		Click Execute
516
		Reposition any lines that have been reset
517
		Add any new tables by dragging them from the Catalog in the left sidebar
518
			to the diagram
519
		Remove any deleted tables by right-clicking the table's diagram element,
520
			selecting Delete '<table name>', and clicking Delete
521
		Save
522
		If desired, update the graphical ERD exports (see below)
523
	Update graphical ERD exports:
524
		Go to File > Export > Export as PNG...
525
		Select schemas/vegbien.ERD.png and click Save
526
		Go to File > Export > Export as SVG...
527
		Select schemas/vegbien.ERD.svg and click Save
528
		Go to File > Export > Export as Single Page PDF...
529
		Select schemas/vegbien.ERD.1_pg.pdf and click Save
530
		Go to File > Print...
531
		In the lower left corner, click PDF > Save as PDF...
532
		Set the Title and Author to ""
533
		Select schemas/vegbien.ERD.pdf and click Save
534
		Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
535
	Refactoring tips:
536
		To rename a table:
537
			In vegbien.sql, do the following:
538
				Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
539
					This is necessary because the table name is *everywhere*
540
				Search for <new>
541
				Manually change back any replacements inside comments
542
		To rename a column:
543
			Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
544
			Recreate any foreign key for the column, removing CONSTRAINT <name>
545
				This resets the foreign key name using the new column name
546
	Creating a poster of the ERD:
547
		Determine the poster size:
548
			Measure the line height (from the bottom of one line to the bottom
549
				of another): 16.3cm/24 lines = 0.679cm
550
			Measure the height of the ERD: 35.4cm*2 = 70.8cm
551
			Zoom in as far as possible
552
			Measure the height of a capital letter: 3.5mm
553
			Measure the line height: 8.5mm
554
			Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
555
			Calculate the text height: 0.679cm*0.41 = 0.28cm
556
			Calculate the text height's fraction of the ERD height:
557
				0.28cm/70.8cm = 0.0040
558
			Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
559
			Calculate the VegBIEN poster height to make the text the same size:
560
				0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
561
			The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
562
			Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
563
			The minimum VegBIEN poster size is 35x54in portrait
564
		Determine the cost:
565
			The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
566
				charges the following for posters:
567
				base: $7.25/sq ft
568
				lamination: $3/sq ft
569
				mounting on a board: $8/sq ft
570

    
571
Testing:
572
	On a development machine, you should put the following in your .profile:
573
		umask ug=rwx,o= # prevent files from becoming web-accessible
574
		export log= n=2
575
	For development machine specs, see /planning/resources/dev_machine.specs/
576
	On local machine:
577
	Mapping process: make test
578
		Including column-based import: make test by_col=1
579
			If the row-based and column-based imports produce different inserted
580
			row counts, this usually means that a table is underconstrained
581
			(the unique indexes don't cover all possible rows).
582
			This can occur if you didn't use COALESCE(field, null_value) around
583
			a nullable field in a unique index. See sql_gen.null_sentinels for
584
			the appropriate null value to use.
585
	Map spreadsheet generation: make remake
586
	Missing mappings: make missing_mappings
587
	Everything (for most complete coverage): make test-all
588

    
589
Debugging:
590
	"Binary chop" debugging:
591
		(This is primarily useful for regressions that occurred in a previous
592
		revision, which was committed without running all the tests)
593
		up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
594
	.htaccess:
595
		mod_rewrite:
596
			**IMPORTANT**: whenever you change the DirectorySlash setting for a
597
				directory, you *must* clear your browser's cache to ensure that
598
				a cached redirect is not used. this is because RewriteRule
599
				redirects are (by default) temporary, but DirectorySlash
600
				redirects are permanent.
601
				for Firefox:
602
					press Cmd+Shift+Delete
603
					check only Cache
604
					press Enter or click Clear Now
605

    
606
WinMerge setup:
607
	In a Windows VM:
608
	Install WinMerge from <http://winmerge.org/>
609
	Open WinMerge
610
	Go to Edit > Options and click Compare in the left sidebar
611
	Enable "Moved block detection", as described at
612
		<http://manual.winmerge.org/Configuration.html#d0e5892>.
613
	Set Whitespace to Ignore change, as described at
614
		<http://manual.winmerge.org/Configuration.html#d0e5758>.
615

    
616
Documentation:
617
	To generate a Redmine-formatted list of steps for column-based import:
618
		On local machine:
619
		make schemas/public/reinstall
620
		make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
621
	To import and scrub just the test taxonomic names:
622
		ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk
623
		inputs/test_taxonomic_names/test_scrub
624

    
625
General:
626
	To see a program's description, read its top-of-file comment
627
	To see a program's usage, run it without arguments
628
	To remake a directory: make <dir>/remake
629
	To remake a file: make <file>-remake
(6-6/11)