Project

General

Profile

1
Installation:
2
	Check out svn: svn co https://code.nceas.ucsb.edu/code/projects/bien
3
	cd bien/
4
	Install: make install
5
		WARNING: This will delete the current public schema of your VegBIEN DB!
6
	Uninstall: make uninstall
7
		WARNING: This will delete your entire VegBIEN DB!
8
		This includes all archived imports and staging tables.
9

    
10
Maintenance:
11
	to synchronize vegbiendev, jupiter, and your local machine:
12
		install put if needed:
13
			download https://uutils.googlecode.com/svn/trunk/bin/put to ~/bin/ and `chmod +x` it
14
		when changes are made on vegbiendev:
15
			on vegbiendev, upload:
16
				env overwrite=1             src=. dest='aaronmk@jupiter:~/bien' put --exclude=.svn --exclude=install.log.sql --exclude='*.backup*' --exclude='/backups/analytical_aggregate.*.csv' --exclude='inputs/GBIF/**.data.sql' --exclude='bin/dotlockfile'
17
					then rerun with env l=1 ...
18
				env overwrite=1 del=        src=. dest='aaronmk@jupiter:~/bien' put --exclude=.svn --exclude=install.log.sql --exclude='inputs/GBIF/**.data.sql'
19
					then rerun with env l=1 ...
20
			on your machine, download:
21
				env overwrite=1 del= swap=1 src=. dest='aaronmk@jupiter:~/bien' put --exclude=.svn --exclude=install.log.sql --exclude='*.backup' --exclude='/backups/analytical_aggregate.*.csv' --exclude='inputs/GBIF/**.data.sql' --exclude='bin/dotlockfile'
22
					then rerun with env l=1 ...
23
	to synchronize a Mac's settings with my testing machine's:
24
		download:
25
			WARNING: this will overwrite all your user's settings!
26
			env overwrite=1 swap=1 src=~/Library/ dest='aaronmk@jupiter:~/Library/' put --exclude="/Saved Application State/**"
27
				then rerun with env l=1 ...
28
		upload:
29
			env overwrite=1        src=~/Library/ dest='aaronmk@jupiter:~/Library/' put --exclude="/Saved Application State/**"
30
				then rerun with env l=1 ...
31
	VegCore data dictionary:
32
		Regularly, or whenever the VegCore data dictionary page
33
			(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
34
			is changed, regenerate mappings/VegCore.csv:
35
			make mappings/VegCore.htm-remake; make mappings/
36
			svn di mappings/VegCore.tables.redmine
37
			If there are changes, update the data dictionary's Tables section
38
			When moving terms, check that no terms were lost: svn di
39
			svn ci -m "mappings/VegCore.htm: Regenerated from wiki"
40
	Important: Whenever you install a system update that affects PostgreSQL or
41
		any of its dependencies, such as libc, you should restart the PostgreSQL
42
		server. Otherwise, you may get strange errors like "the database system
43
		is in recovery mode" which go away upon reimport, or you may not be able
44
		to access the database as the postgres superuser. This applies to both
45
		Linux and Mac OS X.
46

    
47
Single datasource import:
48
	(Re)import and scrub: make inputs/<datasrc>/reimport_scrub by_col=1
49
	(Re)import only: make inputs/<datasrc>/reimport by_col=1
50
	(Re)scrub: make inputs/<datasrc>/rescrub by_col=1
51
	Note that these commands also work if the datasource is not yet imported
52

    
53
Full database import:
54
	On jupiter: svn up
55
	On local machine:
56
		./fix_perms
57
		make inputs/upload
58
		make test by_col=1
59
			See note under Testing below
60
	On vegbiendev:
61
	Ensure there are no local modifications: svn st
62
	svn up
63
	make inputs/download
64
	For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
65
	Update the auxiliary schemas: make schemas/reinstall
66
		The public schema will be installed separately by the import process
67
	Delete imports before the last so they won't bloat the full DB backup:
68
		make backups/vegbien.<version>.backup/remove
69
		To keep a previous import other than the public schema:
70
			export dump_opts='--exclude-schema=public --exclude-schema=<version>'
71
	Make sure there is at least 150GB of disk space on /: df -h
72
		The import schema is 100GB, and may use additional space for temp tables
73
		To free up space, remove backups that have been archived on jupiter:
74
			List backups/ to view older backups
75
			Check their MD5 sums using the steps under On jupiter below
76
			Remove these backups
77
	unset version
78
	screen
79
	Press ENTER
80
	Start column-based import: . bin/import_all by_col=1
81
		To use row-based import: . bin/import_all
82
		To stop all running imports: . bin/stop_imports
83
		WARNING: Do NOT run import_all in the background, or the jobs it creates
84
			won't be owned by your shell.
85
		Note that import_all will take up to an hour to import the NCBI backbone
86
			and other metadata before returning control to the shell.
87
	Wait (overnight) for the import to finish
88
	To recover from a closed terminal window: screen -r
89
	When there are no more running jobs, exit the screen
90
	Get $version: echo $version
91
	Set $version in all vegbiendev terminals: export version=<version>
92
	Upload logs (run on vegbiendev): make inputs/upload
93
	On local machine: make inputs/download-logs
94
	In PostgreSQL:
95
		Check that the provider_count and source tables contain entries for all
96
			inputs
97
	tail inputs/{.,}*/*/logs/$version.log.sql
98
	In the output, search for "Command exited with non-zero status"
99
	For inputs that have this, fix the associated bug(s)
100
	If many inputs have errors, discard the current (partial) import:
101
		make schemas/$version/uninstall
102
	Otherwise, continue
103
	make schemas/$version/publish
104
	unset version
105
	backups/fix_perms
106
	make backups/upload
107
	On jupiter:
108
		cd /data/dev/aaronmk/bien/backups
109
		For each newly-archived backup:
110
			make -s <backup>.md5/test
111
			Check that "OK" is printed next to the filename
112
	On nimoy:
113
		cd /home/bien/svn/
114
		svn up
115
		export version=<version>
116
		make backups/analytical_stem.$version.csv/download
117
		In the bien_web DB:
118
			Create the analytical_stem_<version> table using its schema
119
				in schemas/vegbien.my.sql
120
		make -s backups/analytical_stem.$version.csv.md5/test
121
		Check that "OK" is printed next to the filename
122
		env table=analytical_stem_$version bin/publish_analytical_db \
123
			backups/analytical_stem.$version.csv
124
	If desired, record the import times in inputs/import.stats.xls:
125
		Open inputs/import.stats.xls
126
		If the rightmost import is within 5 columns of column IV:
127
			Copy the current tab to <leftmost-date>~<rightmost-date>
128
			Remove the previous imports from the current tab because they are
129
				now in the copied tab instead
130
		Insert a copy of the leftmost "By column" column group before it
131
		export version=<version>
132
		bin/import_date inputs/{.,}*/*/logs/$version.log.sql
133
		Update the import date in the upper-right corner
134
		bin/import_times inputs/{.,}*/*/logs/$version.log.sql
135
		Paste the output over the # Rows/Time columns, making sure that the
136
			row counts match up with the previous import's row counts
137
		If the row counts do not match up, insert or reorder rows as needed
138
			until they do. Get the datasource names from the log file footers:
139
			tail inputs/{.,}*/*/logs/$version.log.sql
140
		Commit: svn ci -m "inputs/import.stats.xls: Updated import times"
141
	To run TNRS: make scrub by_col=1 &
142
		export version=<version>
143
		To view progress:
144
			tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
145
	To remake analytical DB: bin/make_analytical_db &
146
		export version=<version>
147
		To view progress:
148
			tail -100 inputs/analytical_db/logs/make_analytical_db.log.sql
149
	To back up DB (staging tables and last import):
150
		export version=<version>
151
		If before renaming to public: export dump_opts=--exclude-schema=public
152
		make backups/vegbien.$version.backup/test &
153

    
154
Backups:
155
	Archived imports:
156
		Back up: make backups/<version>.backup &
157
			Note: To back up the last import, you must archive it first:
158
				make schemas/rotate
159
		Test: make -s backups/<version>.backup/test &
160
		Restore: make backups/<version>.backup/restore &
161
		Remove: make backups/<version>.backup/remove
162
		Download: make backups/download
163
	TNRS cache:
164
		Back up: make backups/TNRS.backup-remake &
165
		Restore:
166
			yes|make inputs/.TNRS/uninstall
167
			make backups/TNRS.backup/restore &
168
			yes|make schemas/public/reinstall
169
				Must come after TNRS restore to recreate tnrs_input_name view
170
	Full DB:
171
		Back up: make backups/vegbien.<version>.backup &
172
		Test: make -s backups/vegbien.<version>.backup/test &
173
		Restore: make backups/vegbien.<version>.backup/restore &
174
		Download: make backups/download
175
	Import logs:
176
		Download: make inputs/download-logs
177

    
178
Datasource setup:
179
	Add a new datasource: make inputs/<datasrc>/add
180
		<datasrc> may not contain spaces, and should be abbreviated.
181
		If the datasource is a herbarium, <datasrc> should be the herbarium code
182
			as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
183
	For MySQL inputs (exports and live DB connections):
184
		For .sql exports:
185
			Place the original .sql file in _src/ (*not* in _MySQL/)
186
			Follow the steps starting with Install the staging tables below.
187
				This is for an initial sync to get the file onto vegbiendev.
188
			On vegbiendev:
189
				Create a database for the MySQL export in phpMyAdmin
190
				bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
191
		mkdir inputs/<datasrc>/_MySQL/
192
		cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
193
		Edit _MySQL/*.make for the DB connection
194
			For a .sql export, use server=vegbiendev and --user=bien
195
		Skip the Add input data for each table section
196
	For MS Access databases:
197
		Place the .mdb or .accdb file in _src/
198
		Download and install Access To PostgreSQL from
199
			http://www.bullzip.com/download.php
200
		Use Access To PostgreSQL to export the database:
201
			Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
202
			Export just the data to inputs/<datasrc>/<file>.data.sql
203
		In <file>.schema.sql, make the following changes:
204
			Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
205
			Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
206
		Skip the Add input data for each table section
207
	Add input data for each table present in the datasource:
208
		For .sql exports, you must use the name of the table in the DB export
209
		For CSV files, you can use any name. It's recommended to use a table
210
			name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
211
		Note that if this table will be joined together with another table, its
212
			name must end in ".src"
213
		make inputs/<datasrc>/<table>/add
214
			Important: DO NOT just create an empty directory named <table>!
215
				This command also creates necessary subdirs, such as logs/.
216
		If the table is in a .sql export: make inputs/<datasrc>/<table>/install
217
			Otherwise, place the CSV(s) for the table in
218
			inputs/<datasrc>/<table>/ OR place a query joining other tables
219
			together in inputs/<datasrc>/<table>/create.sql
220
		Important: When exporting relational databases to CSVs, you MUST ensure
221
			that embedded quotes are escaped by doubling them, *not* by
222
			preceding them with a "\" as is the default in phpMyAdmin
223
		If there are multiple part files for a table, and the header is repeated
224
			in each part, make sure each header is EXACTLY the same.
225
			(If the headers are not the same, the CSV concatenation script
226
			assumes the part files don't have individual headers and treats the
227
			subsequent headers as data rows.)
228
		Add <table> to inputs/<datasrc>/import_order.txt before other tables
229
			that depend on it
230
	Install the staging tables:
231
		make inputs/<datasrc>/reinstall quiet=1 &
232
		For a MySQL .sql export:
233
			At prompt "[you]@vegbiendev's password:", enter your password
234
			At prompt "Enter password:", enter the value in config/bien_password
235
		To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
236
		View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
237
			tail provides a header line with the filename
238
			+1 starts at the first line, to show the whole file
239
		For every file with an error 'column "..." specified more than once':
240
			Add a header override file "+header.<ext>" in <table>/:
241
				Note: The leading "+" should sort it before the flat files.
242
					"_" unfortunately sorts *after* capital letters in ASCII.
243
				Create a text file containing the header line of the flat files
244
				Add an ! at the beginning of the line
245
					This signals cat_csv that this is a header override.
246
				For empty names, use their 0-based column # (by convention)
247
				For duplicate names, add a distinguishing suffix
248
				For long names that collided, rename them to <= 63 chars long
249
				Do NOT make readability changes in this step; that is what the
250
					map spreadsheets (below) are for.
251
				Save
252
		If you made any changes, re-run the install command above
253
	Auto-create the map spreadsheets: make inputs/<datasrc>/
254
	Map each table's columns:
255
		In each <table>/ subdir, for each "via map" map.csv:
256
			Open the map in a spreadsheet editor
257
			Open the "core map" /mappings/Veg+-VegBIEN.csv
258
			In each row of the via map, set the right column to a value from the
259
				left column of the core map
260
			Save
261
		Regenerate the derived maps: make inputs/<datasrc>/
262
	Accept the test cases:
263
		make inputs/<datasrc>/test
264
			When prompted to "Accept new test output", enter y and press ENTER
265
			If you instead get errors, do one of the following for each one:
266
			-	If the error was due to a bug, fix it
267
			-	Add a SQL function that filters or transforms the invalid data
268
			-	Make an empty mapping for the columns that produced the error.
269
				Put something in the Comments column of the map spreadsheet to
270
				prevent the automatic mapper from auto-removing the mapping.
271
			When accepting tests, it's helpful to use WinMerge
272
				(see WinMerge setup below for configuration)
273
		make inputs/<datasrc>/test by_col=1
274
			If you get errors this time, this always indicates a bug, usually in
275
				the VegBIEN unique constraints or column-based import itself
276
	Add newly-created files: make inputs/<datasrc>/add
277
	Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
278
	Update vegbiendev:
279
		On jupiter: svn up
280
		On local machine:
281
			./fix_perms
282
			make inputs/upload
283
		On vegbiendev:
284
			svn up
285
			make inputs/download
286
			Follow the steps under Install the staging tables above
287

    
288
Datasource refreshing:
289
	VegBank:
290
		make inputs/VegBank/vegbank.sql-remake
291
		make inputs/VegBank/reinstall quiet=1 &
292

    
293
Schema changes:
294
	When changing the analytical views, run sync_analytical_..._to_view()
295
		to update the corresponding table
296
	Remember to update the following files with any renamings:
297
		schemas/filter_ERD.csv
298
		mappings/VegCore-VegBIEN.csv
299
		mappings/verify.*.sql
300
	Regenerate schema from installed DB: make schemas/remake
301
	Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
302
		WARNING: This will delete the current public schema of your VegBIEN DB!
303
	Reinstall staging tables: . bin/reinstall_all
304
	Sync ERD with vegbien.sql schema:
305
		Run make schemas/vegbien.my.sql
306
		Open schemas/vegbien.ERD.mwb in MySQLWorkbench
307
		Go to File > Export > Synchronize With SQL CREATE Script...
308
		For Input File, select schemas/vegbien.my.sql
309
		Click Continue
310
		In the changes list, select each table with an arrow next to it
311
		Click Update Model
312
		Click Continue
313
		Note: The generated SQL script will be empty because we are syncing in
314
			the opposite direction
315
		Click Execute
316
		Reposition any lines that have been reset
317
		Add any new tables by dragging them from the Catalog in the left sidebar
318
			to the diagram
319
		Remove any deleted tables by right-clicking the table's diagram element,
320
			selecting Delete '<table name>', and clicking Delete
321
		Save
322
		If desired, update the graphical ERD exports (see below)
323
	Update graphical ERD exports:
324
		Go to File > Export > Export as PNG...
325
		Select schemas/vegbien.ERD.png and click Save
326
		Go to File > Export > Export as SVG...
327
		Select schemas/vegbien.ERD.svg and click Save
328
		Go to File > Export > Export as Single Page PDF...
329
		Select schemas/vegbien.ERD.1_pg.pdf and click Save
330
		Go to File > Print...
331
		In the lower left corner, click PDF > Save as PDF...
332
		Set the Title and Author to ""
333
		Select schemas/vegbien.ERD.pdf and click Save
334
		Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
335
	Refactoring tips:
336
		To rename a table:
337
			In vegbien.sql, do the following:
338
				Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
339
					This is necessary because the table name is *everywhere*
340
				Search for <new>
341
				Manually change back any replacements inside comments
342
		To rename a column:
343
			Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
344
			Recreate any foreign key for the column, removing CONSTRAINT <name>
345
				This resets the foreign key name using the new column name
346
	Creating a poster of the ERD:
347
		Determine the poster size:
348
			Measure the line height (from the bottom of one line to the bottom
349
				of another): 16.3cm/24 lines = 0.679cm
350
			Measure the height of the ERD: 35.4cm*2 = 70.8cm
351
			Zoom in as far as possible
352
			Measure the height of a capital letter: 3.5mm
353
			Measure the line height: 8.5mm
354
			Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
355
			Calculate the text height: 0.679cm*0.41 = 0.28cm
356
			Calculate the text height's fraction of the ERD height:
357
				0.28cm/70.8cm = 0.0040
358
			Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
359
			Calculate the VegBIEN poster height to make the text the same size:
360
				0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
361
			The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
362
			Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
363
			The minimum VegBIEN poster size is 35x54in portrait
364
		Determine the cost:
365
			The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
366
				charges the following for posters:
367
				base: $7.25/sq ft
368
				lamination: $3/sq ft
369
				mounting on a board: $8/sq ft
370

    
371
Testing:
372
	On a development machine, you should put the following in your .profile:
373
		umask -S ug=rwx,o= # prevent files from becoming web-accessible
374
		export log= n=2
375
	Mapping process: make test
376
		Including column-based import: make test by_col=1
377
			If the row-based and column-based imports produce different inserted
378
			row counts, this usually means that a table is underconstrained
379
			(the unique indexes don't cover all possible rows).
380
			This can occur if you didn't use COALESCE(field, null_value) around
381
			a nullable field in a unique index. See sql_gen.null_sentinels for
382
			the appropriate null value to use.
383
	Map spreadsheet generation: make remake
384
	Missing mappings: make missing_mappings
385
	Everything (for most complete coverage): make test-all
386

    
387
Debugging:
388
	"Binary chop" debugging:
389
		(This is primarily useful for regressions that occurred in a previous
390
		revision, which was committed without running all the tests)
391
		svn up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
392

    
393
WinMerge setup:
394
	Install WinMerge from <http://winmerge.org/>
395
	Open WinMerge
396
	Go to Edit > Options and click Compare in the left sidebar
397
	Enable "Moved block detection", as described at
398
		<http://manual.winmerge.org/Configuration.html#d0e5892>.
399
	Set Whitespace to Ignore change, as described at
400
		<http://manual.winmerge.org/Configuration.html#d0e5758>.
401

    
402
Documentation:
403
	To generate a Redmine-formatted list of steps for column-based import:
404
		make schemas/public/reinstall
405
		make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
406
	To import and scrub just the test taxonomic names:
407
		inputs/test_taxonomic_names/test_scrub
408

    
409
General:
410
	To see a program's description, read its top-of-file comment
411
	To see a program's usage, run it without arguments
412
	To remake a directory: make <dir>/remake
413
	To remake a file: make <file>-remake
(3-3/7)