Project

General

Profile

1
Installation:
2
	Check out svn: svn co https://code.nceas.ucsb.edu/code/projects/bien
3
	cd bien/
4
	Install: make install
5
		WARNING: This will delete the current public schema of your VegBIEN DB!
6
	Uninstall: make uninstall
7
		WARNING: This will delete your entire VegBIEN DB!
8
		This includes all archived imports and staging tables.
9

    
10
Maintenance:
11
	to synchronize vegbiendev, jupiter, and your local machine:
12
		install put if needed:
13
			download https://uutils.googlecode.com/svn/trunk/bin/put to ~/bin/ and `chmod +x` it
14
		when changes are made on vegbiendev:
15
			on vegbiendev, upload:
16
				env overwrite=1             src=. dest='aaronmk@jupiter:~/bien' put --exclude=.svn --exclude=install.log.sql --exclude='*.backup*' --exclude='/backups/analytical_aggregate.*.csv' --exclude='inputs/GBIF/**.data.sql' --exclude='bin/dotlockfile'
17
					then rerun with env l=1 ...
18
				env overwrite=1 del=        src=. dest='aaronmk@jupiter:~/bien' put --exclude=.svn --exclude=install.log.sql --exclude='inputs/GBIF/**.data.sql'
19
					then rerun with env l=1 ...
20
			on your machine, download:
21
				env overwrite=1 del= swap=1 src=. dest='aaronmk@jupiter:~/bien' put --exclude=.svn --exclude=install.log.sql --exclude='*.backup' --exclude='/backups/analytical_aggregate.*.csv' --exclude='inputs/GBIF/**.data.sql' --exclude='bin/dotlockfile'
22
					then rerun with env l=1 ...
23
	to synchronize a Mac's settings with my testing machine's:
24
		download:
25
			WARNING: this will overwrite all your user's settings!
26
			env overwrite=1 swap=1 src=~/Library/ dest='aaronmk@jupiter:~/Library/' put --exclude="/Saved Application State/**"
27
				then rerun with env l=1 ...
28
		upload:
29
			env overwrite=1        src=~/Library/ dest='aaronmk@jupiter:~/Library/' put --exclude="/Saved Application State/**"
30
				then rerun with env l=1 ...
31
	VegCore data dictionary:
32
		Regularly, or whenever the VegCore data dictionary page
33
			(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
34
			is changed, regenerate mappings/VegCore.csv:
35
			make mappings/VegCore.htm-remake; make mappings/
36
			svn di mappings/VegCore.tables.redmine
37
			If there are changes, update the data dictionary's Tables section
38
			When moving terms, check that no terms were lost: svn di
39
			svn ci -m "mappings/VegCore.htm: Regenerated from wiki"
40
	Important: Whenever you install a system update that affects PostgreSQL or
41
		any of its dependencies, such as libc, you should restart the PostgreSQL
42
		server. Otherwise, you may get strange errors like "the database system
43
		is in recovery mode" which go away upon reimport, or you may not be able
44
		to access the database as the postgres superuser. This applies to both
45
		Linux and Mac OS X.
46

    
47
Single datasource import:
48
	(Re)import and scrub: make inputs/<datasrc>/reimport_scrub by_col=1
49
	(Re)import only: make inputs/<datasrc>/reimport by_col=1
50
	(Re)scrub: make inputs/<datasrc>/rescrub by_col=1
51
	Note that these commands also work if the datasource is not yet imported
52

    
53
Full database import:
54
	On jupiter: svn up
55
	On local machine:
56
		./fix_perms
57
		make inputs/upload
58
		make test by_col=1
59
			See note under Testing below
60
	On vegbiendev:
61
	Ensure there are no local modifications: svn st
62
	svn up
63
	make inputs/download
64
	For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
65
	Update the auxiliary schemas: make schemas/reinstall
66
		The public schema will be installed separately by the import process
67
	Delete imports before the last so they won't bloat the full DB backup:
68
		make backups/vegbien.<version>.backup/remove
69
		To keep a previous import other than the public schema:
70
			export dump_opts='--exclude-schema=public --exclude-schema=<version>'
71
	Make sure there is at least 150GB of disk space on /: df -h
72
		The import schema is 100GB, and may use additional space for temp tables
73
		To free up space, remove backups that have been archived on jupiter:
74
			List backups/ to view older backups
75
			Check their MD5 sums using the steps under On jupiter below
76
			Remove these backups
77
	unset version
78
	screen
79
	Press ENTER
80
	Start column-based import: . bin/import_all by_col=1
81
		To use row-based import: . bin/import_all
82
		To stop all running imports: . bin/stop_imports
83
		WARNING: Do NOT run import_all in the background, or the jobs it creates
84
			won't be owned by your shell.
85
		Note that import_all will take up to an hour to import the NCBI backbone
86
			and other metadata before returning control to the shell.
87
	Wait (overnight) for the import to finish
88
	To recover from a closed terminal window: screen -r
89
	When there are no more running jobs, exit the screen
90
	Get $version: echo $version
91
	Set $version in all vegbiendev terminals: export version=<version>
92
	Upload logs (run on vegbiendev): make inputs/upload
93
	On local machine: make inputs/download-logs
94
	In PostgreSQL:
95
		Check that the provider_count and source tables contain entries for all
96
			inputs
97
	tail inputs/{.,}*/*/logs/$version.log.sql
98
	In the output, search for "Command exited with non-zero status"
99
	For inputs that have this, fix the associated bug(s)
100
	If many inputs have errors, discard the current (partial) import:
101
		make schemas/$version/uninstall
102
	Otherwise, continue
103
	make schemas/$version/publish
104
	unset version
105
	backups/fix_perms
106
	make backups/upload
107
	On jupiter:
108
		cd /data/dev/aaronmk/bien/backups
109
		For each newly-archived backup:
110
			make -s <backup>.md5/test
111
			Check that "OK" is printed next to the filename
112
	On nimoy:
113
		cd /home/bien/svn/
114
		svn up
115
		export version=<version>
116
		make backups/analytical_stem.$version.csv/download
117
		In the bien_web DB:
118
			Create the analytical_stem_<version> table using its schema
119
				in schemas/vegbien.my.sql
120
		make -s backups/analytical_stem.$version.csv.md5/test
121
		Check that "OK" is printed next to the filename
122
		env table=analytical_stem_$version bin/publish_analytical_db \
123
			backups/analytical_stem.$version.csv
124
	If desired, record the import times in inputs/import.stats.xls:
125
		Open inputs/import.stats.xls
126
		If the rightmost import is within 5 columns of column IV:
127
			Copy the current tab to <leftmost-date>~<rightmost-date>
128
			Remove the previous imports from the current tab because they are
129
				now in the copied tab instead
130
		Insert a copy of the leftmost "By column" column group before it
131
		export version=<version>
132
		bin/import_date inputs/{.,}*/*/logs/$version.log.sql
133
		Update the import date in the upper-right corner
134
		bin/import_times inputs/{.,}*/*/logs/$version.log.sql
135
		Paste the output over the # Rows/Time columns, making sure that the
136
			row counts match up with the previous import's row counts
137
		If the row counts do not match up, insert or reorder rows as needed
138
			until they do. Get the datasource names from the log file footers:
139
			tail inputs/{.,}*/*/logs/$version.log.sql
140
		Commit: svn ci -m "inputs/import.stats.xls: Updated import times"
141
	To run TNRS: make scrub by_col=1 &
142
		export version=<version>
143
		To view progress:
144
			tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
145
	To remake analytical DB: bin/make_analytical_db &
146
		export version=<version>
147
		To view progress:
148
			tail -100 inputs/analytical_db/logs/make_analytical_db.log.sql
149
	To back up DB (staging tables and last import):
150
		export version=<version>
151
		If before renaming to public: export dump_opts=--exclude-schema=public
152
		make backups/vegbien.$version.backup/test &
153

    
154
Backups:
155
	Archived imports:
156
		Back up: make backups/<version>.backup &
157
			Note: To back up the last import, you must archive it first:
158
				make schemas/rotate
159
		Test: make -s backups/<version>.backup/test &
160
		Restore: make backups/<version>.backup/restore &
161
		Remove: make backups/<version>.backup/remove
162
		Download: make backups/download
163
	TNRS cache:
164
		Back up: make backups/TNRS.backup-remake &
165
		Restore:
166
			yes|make inputs/.TNRS/uninstall
167
			make backups/TNRS.backup/restore &
168
			yes|make schemas/public/reinstall
169
				Must come after TNRS restore to recreate tnrs_input_name view
170
	Full DB:
171
		Back up: make backups/vegbien.<version>.backup &
172
		Test: make -s backups/vegbien.<version>.backup/test &
173
		Restore: make backups/vegbien.<version>.backup/restore &
174
		Download: make backups/download
175
	Import logs:
176
		Download: make inputs/download-logs
177

    
178
Datasource setup:
179
	umask -S ug=rwx,o= # prevent files from becoming web-accessible
180
	Add a new datasource: make inputs/<datasrc>/add
181
		<datasrc> may not contain spaces, and should be abbreviated.
182
		If the datasource is a herbarium, <datasrc> should be the herbarium code
183
			as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
184
	For MySQL inputs (exports and live DB connections):
185
		For .sql exports:
186
			Place the original .sql file in _src/ (*not* in _MySQL/)
187
			Follow the steps starting with Install the staging tables below.
188
				This is for an initial sync to get the file onto vegbiendev.
189
			On vegbiendev:
190
				Create a database for the MySQL export in phpMyAdmin
191
				bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
192
		mkdir inputs/<datasrc>/_MySQL/
193
		cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
194
		Edit _MySQL/*.make for the DB connection
195
			For a .sql export, use server=vegbiendev and --user=bien
196
		Skip the Add input data for each table section
197
	For MS Access databases:
198
		Place the .mdb or .accdb file in _src/
199
		Download and install Access To PostgreSQL from
200
			http://www.bullzip.com/download.php
201
		Use Access To PostgreSQL to export the database:
202
			Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
203
			Export just the data to inputs/<datasrc>/<file>.data.sql
204
		In <file>.schema.sql, make the following changes:
205
			Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
206
			Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
207
		Skip the Add input data for each table section
208
	Add input data for each table present in the datasource:
209
		For .sql exports, you must use the name of the table in the DB export
210
		For CSV files, you can use any name. It's recommended to use a table
211
			name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
212
		Note that if this table will be joined together with another table, its
213
			name must end in ".src"
214
		make inputs/<datasrc>/<table>/add
215
			Important: DO NOT just create an empty directory named <table>!
216
				This command also creates necessary subdirs, such as logs/.
217
		If the table is in a .sql export: make inputs/<datasrc>/<table>/install
218
			Otherwise, place the CSV(s) for the table in
219
			inputs/<datasrc>/<table>/ OR place a query joining other tables
220
			together in inputs/<datasrc>/<table>/create.sql
221
		Important: When exporting relational databases to CSVs, you MUST ensure
222
			that embedded quotes are escaped by doubling them, *not* by
223
			preceding them with a "\" as is the default in phpMyAdmin
224
		If there are multiple part files for a table, and the header is repeated
225
			in each part, make sure each header is EXACTLY the same.
226
			(If the headers are not the same, the CSV concatenation script
227
			assumes the part files don't have individual headers and treats the
228
			subsequent headers as data rows.)
229
		Add <table> to inputs/<datasrc>/import_order.txt before other tables
230
			that depend on it
231
	Install the staging tables:
232
		make inputs/<datasrc>/reinstall quiet=1 &
233
		For a MySQL .sql export:
234
			At prompt "[you]@vegbiendev's password:", enter your password
235
			At prompt "Enter password:", enter the value in config/bien_password
236
		To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
237
		View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
238
			tail provides a header line with the filename
239
			+1 starts at the first line, to show the whole file
240
		For every file with an error 'column "..." specified more than once':
241
			Add a header override file "+header.<ext>" in <table>/:
242
				Note: The leading "+" should sort it before the flat files.
243
					"_" unfortunately sorts *after* capital letters in ASCII.
244
				Create a text file containing the header line of the flat files
245
				Add an ! at the beginning of the line
246
					This signals cat_csv that this is a header override.
247
				For empty names, use their 0-based column # (by convention)
248
				For duplicate names, add a distinguishing suffix
249
				For long names that collided, rename them to <= 63 chars long
250
				Do NOT make readability changes in this step; that is what the
251
					map spreadsheets (below) are for.
252
				Save
253
		If you made any changes, re-run the install command above
254
	Auto-create the map spreadsheets: make inputs/<datasrc>/
255
	Map each table's columns:
256
		In each <table>/ subdir, for each "via map" map.csv:
257
			Open the map in a spreadsheet editor
258
			Open the "core map" /mappings/Veg+-VegBIEN.csv
259
			In each row of the via map, set the right column to a value from the
260
				left column of the core map
261
			Save
262
		Regenerate the derived maps: make inputs/<datasrc>/
263
	Accept the test cases:
264
		make inputs/<datasrc>/test
265
			When prompted to "Accept new test output", enter y and press ENTER
266
			If you instead get errors, do one of the following for each one:
267
			-	If the error was due to a bug, fix it
268
			-	Add a SQL function that filters or transforms the invalid data
269
			-	Make an empty mapping for the columns that produced the error.
270
				Put something in the Comments column of the map spreadsheet to
271
				prevent the automatic mapper from auto-removing the mapping.
272
			When accepting tests, it's helpful to use WinMerge
273
				(see WinMerge setup below for configuration)
274
		make inputs/<datasrc>/test by_col=1
275
			If you get errors this time, this always indicates a bug, usually in
276
				the VegBIEN unique constraints or column-based import itself
277
	Add newly-created files: make inputs/<datasrc>/add
278
	Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
279
	Update vegbiendev:
280
		On jupiter: svn up
281
		On local machine:
282
			./fix_perms
283
			make inputs/upload
284
		On vegbiendev:
285
			svn up
286
			make inputs/download
287
			Follow the steps under Install the staging tables above
288

    
289
Datasource refreshing:
290
	VegBank:
291
		make inputs/VegBank/vegbank.sql-remake
292
		make inputs/VegBank/reinstall quiet=1 &
293

    
294
Schema changes:
295
	When changing the analytical views, run sync_analytical_..._to_view()
296
		to update the corresponding table
297
	Remember to update the following files with any renamings:
298
		schemas/filter_ERD.csv
299
		mappings/VegCore-VegBIEN.csv
300
		mappings/verify.*.sql
301
	Regenerate schema from installed DB: make schemas/remake
302
	Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
303
		WARNING: This will delete the current public schema of your VegBIEN DB!
304
	Reinstall staging tables: . bin/reinstall_all
305
	Sync ERD with vegbien.sql schema:
306
		Run make schemas/vegbien.my.sql
307
		Open schemas/vegbien.ERD.mwb in MySQLWorkbench
308
		Go to File > Export > Synchronize With SQL CREATE Script...
309
		For Input File, select schemas/vegbien.my.sql
310
		Click Continue
311
		In the changes list, select each table with an arrow next to it
312
		Click Update Model
313
		Click Continue
314
		Note: The generated SQL script will be empty because we are syncing in
315
			the opposite direction
316
		Click Execute
317
		Reposition any lines that have been reset
318
		Add any new tables by dragging them from the Catalog in the left sidebar
319
			to the diagram
320
		Remove any deleted tables by right-clicking the table's diagram element,
321
			selecting Delete '<table name>', and clicking Delete
322
		Save
323
		If desired, update the graphical ERD exports (see below)
324
	Update graphical ERD exports:
325
		Go to File > Export > Export as PNG...
326
		Select schemas/vegbien.ERD.png and click Save
327
		Go to File > Export > Export as SVG...
328
		Select schemas/vegbien.ERD.svg and click Save
329
		Go to File > Export > Export as Single Page PDF...
330
		Select schemas/vegbien.ERD.1_pg.pdf and click Save
331
		Go to File > Print...
332
		In the lower left corner, click PDF > Save as PDF...
333
		Set the Title and Author to ""
334
		Select schemas/vegbien.ERD.pdf and click Save
335
		Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
336
	Refactoring tips:
337
		To rename a table:
338
			In vegbien.sql, do the following:
339
				Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
340
					This is necessary because the table name is *everywhere*
341
				Search for <new>
342
				Manually change back any replacements inside comments
343
		To rename a column:
344
			Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
345
			Recreate any foreign key for the column, removing CONSTRAINT <name>
346
				This resets the foreign key name using the new column name
347
	Creating a poster of the ERD:
348
		Determine the poster size:
349
			Measure the line height (from the bottom of one line to the bottom
350
				of another): 16.3cm/24 lines = 0.679cm
351
			Measure the height of the ERD: 35.4cm*2 = 70.8cm
352
			Zoom in as far as possible
353
			Measure the height of a capital letter: 3.5mm
354
			Measure the line height: 8.5mm
355
			Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
356
			Calculate the text height: 0.679cm*0.41 = 0.28cm
357
			Calculate the text height's fraction of the ERD height:
358
				0.28cm/70.8cm = 0.0040
359
			Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
360
			Calculate the VegBIEN poster height to make the text the same size:
361
				0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
362
			The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
363
			Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
364
			The minimum VegBIEN poster size is 35x54in portrait
365
		Determine the cost:
366
			The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
367
				charges the following for posters:
368
				base: $7.25/sq ft
369
				lamination: $3/sq ft
370
				mounting on a board: $8/sq ft
371

    
372
Testing:
373
	On a development machine, you should put the following in your .profile:
374
		umask -S ug=rwx,o= # prevent files from becoming web-accessible
375
		export log= n=2
376
	Mapping process: make test
377
		Including column-based import: make test by_col=1
378
			If the row-based and column-based imports produce different inserted
379
			row counts, this usually means that a table is underconstrained
380
			(the unique indexes don't cover all possible rows).
381
			This can occur if you didn't use COALESCE(field, null_value) around
382
			a nullable field in a unique index. See sql_gen.null_sentinels for
383
			the appropriate null value to use.
384
	Map spreadsheet generation: make remake
385
	Missing mappings: make missing_mappings
386
	Everything (for most complete coverage): make test-all
387

    
388
Debugging:
389
	"Binary chop" debugging:
390
		(This is primarily useful for regressions that occurred in a previous
391
		revision, which was committed without running all the tests)
392
		svn up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
393

    
394
WinMerge setup:
395
	Install WinMerge from <http://winmerge.org/>
396
	Open WinMerge
397
	Go to Edit > Options and click Compare in the left sidebar
398
	Enable "Moved block detection", as described at
399
		<http://manual.winmerge.org/Configuration.html#d0e5892>.
400
	Set Whitespace to Ignore change, as described at
401
		<http://manual.winmerge.org/Configuration.html#d0e5758>.
402

    
403
Documentation:
404
	To generate a Redmine-formatted list of steps for column-based import:
405
		make schemas/public/reinstall
406
		make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
407
	To import and scrub just the test taxonomic names:
408
		inputs/test_taxonomic_names/test_scrub
409

    
410
General:
411
	To see a program's description, read its top-of-file comment
412
	To see a program's usage, run it without arguments
413
	To remake a directory: make <dir>/remake
414
	To remake a file: make <file>-remake
(3-3/7)