1
|
Installation:
|
2
|
Check out svn: svn co https://code.nceas.ucsb.edu/code/projects/bien
|
3
|
cd bien/
|
4
|
Install: make install
|
5
|
**WARNING**: This will delete the public schema of your VegBIEN DB!
|
6
|
Uninstall: make uninstall
|
7
|
**WARNING**: This will delete your entire VegBIEN DB!
|
8
|
This includes all archived imports and staging tables.
|
9
|
|
10
|
Connecting to vegbiendev:
|
11
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
12
|
cd /home/bien # should happen automatically at login
|
13
|
|
14
|
Notes on system stability:
|
15
|
**WARNING**: system upgrades can break key parts of the full-database
|
16
|
import, causing errors such as disk space overruns. for this reason, it
|
17
|
is recommended to maintain a snapshot copy of the VM as it was at the
|
18
|
last successful import, for fallback use if a system upgrade breaks
|
19
|
anything. system upgrades on the snapshot VM should be disabled
|
20
|
completely, and because this will also disable security fixes, the
|
21
|
snapshot VM should be disconnected from the internet and all networking
|
22
|
interfaces. (this is an unfortunate consequence of modern OSes being
|
23
|
written in non-memory-safe languages such as C and C++.)
|
24
|
|
25
|
Notes on running programs:
|
26
|
**WARNING**: always start with a clean shell, to avoid spurious bugs. the
|
27
|
shell should not have changes to the env vars. (there have been bugs
|
28
|
that went away after closing and reopening the terminal window.) note
|
29
|
that running `exec bash` is not sufficient to *reset* the env vars.
|
30
|
|
31
|
Notes on editing files:
|
32
|
**WARNING**: shell scripts should always be read-only, so that editing them
|
33
|
while an import is in progress will not crash the import (see
|
34
|
http://vegpath.org/links/#**%20modifying%20a%20running%20shell%20script)
|
35
|
|
36
|
Single datasource import:
|
37
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
38
|
(Re)import and scrub: make inputs/<datasrc>/reimport_scrub by_col=1 &
|
39
|
(Re)import only: make inputs/<datasrc>/reimport by_col=1 &
|
40
|
Note that these commands also work if the datasource is not yet imported
|
41
|
Remake analytical DB: see Full database import > To remake analytical DB
|
42
|
|
43
|
Full database import:
|
44
|
**WARNING**: You must perform *every single* step listed below, to avoid
|
45
|
breaking column-based import
|
46
|
**WARNING**: always start with a clean shell, as described above under
|
47
|
"Notes on running programs"
|
48
|
**IMPORTANT**: the beginning of the import should be scheduled at a time
|
49
|
when the DB will not be needed for other uses. this is necessary because
|
50
|
vegbiendev will be slow for the first few hours of the import, due to
|
51
|
the import using all the available cores.
|
52
|
do steps under Maintenance > "to synchronize vegbiendev, jupiter, and
|
53
|
your local machine"
|
54
|
On local machine:
|
55
|
make inputs/upload
|
56
|
make inputs/upload live=1
|
57
|
make test by_col=1 # runtime: 20 min ("4m46.108s" + ("21:50:43" - "21:37:09")) @starscream
|
58
|
if you encounter errors, they are most likely related to the
|
59
|
PostgreSQL error parsing in /lib/sql.py parse_exception()
|
60
|
See note under Testing below
|
61
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
62
|
Ensure there are no local modifications: svn st
|
63
|
up
|
64
|
make inputs/download
|
65
|
make inputs/download live=1
|
66
|
For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
|
67
|
Update the auxiliary schemas: make schemas/reinstall
|
68
|
**WARNING**: requires sudo access!
|
69
|
The public schema will be installed separately by the import process
|
70
|
Delete imports before the last so they won't bloat the full DB backup:
|
71
|
make backups/vegbien.<version>.backup/remove
|
72
|
To keep a previous import other than the public schema:
|
73
|
export dump_opts='--exclude-schema=public --exclude-schema=<version>'
|
74
|
# env var will be inherited by `screen` shell
|
75
|
restart Postgres to free up any disk space used by temp tables from the last
|
76
|
import (this is apparently not automatically reclaimed):
|
77
|
make postgres_restart
|
78
|
Make sure there is at least 1 TB of disk space on /: df -h
|
79
|
**WARNING**: sometimes, this amount of available space is insufficient
|
80
|
and the entire disk space gets used up, crashing the import. if this
|
81
|
occurs, the problem will often be fixed just by rerunning the import
|
82
|
again. (the high-water mark varies by import.)
|
83
|
although the import schema itself is only 315 GB, Postgres uses
|
84
|
significant temporary space at the beginning of the import.
|
85
|
the total disk usage oscillates between 1.2 TB and the entire disk
|
86
|
for the first day (for import started @12:55:09, high-water marks of
|
87
|
1.7 TB @14:00:25, 1.8 TB @15:38:32; then next day w/ 2 datasources
|
88
|
running: entire disk for 4 min @05:35:44, 1.8 TB @11:15:05).
|
89
|
To free up space, remove backups that have been archived on jupiter:
|
90
|
List backups/ to view older backups
|
91
|
Check their MD5 sums using the steps under On jupiter below
|
92
|
Remove these backups
|
93
|
for full import:
|
94
|
screen
|
95
|
Press ENTER
|
96
|
for small import, use above, or the following:
|
97
|
$0 # nested shell to contain the env changes
|
98
|
the following must happen within screen to avoid affecting the outer shell:
|
99
|
unset TMOUT # TMOUT causes shell to exit even with background processes
|
100
|
set -o ignoreeof # prevent Ctrl+D from exiting shell to keep attached jobs
|
101
|
on local machine:
|
102
|
unset n # clear any limit set in .profile (unless desired)
|
103
|
unset log # allow logging output to go to log files
|
104
|
unset version # clear any version from last import, etc.
|
105
|
if no commits have been made since the last import (eg. if retrying an
|
106
|
import), set a custom version that differs from the auto-assigned one
|
107
|
(would otherwise cause a collision with the last import):
|
108
|
svn info
|
109
|
extract the svn revision after "Revision:"
|
110
|
export version=r[revision]_2 # +suffix to distinguish from last import
|
111
|
# env var will be inherited by `screen` shell
|
112
|
to import just a subset of the datasources:
|
113
|
declare -ax inputs; inputs=(inputs/{src,...}/) # no () in declare on Mac
|
114
|
# array vars *not* inherited by `screen` shell
|
115
|
export version=custom_import_name
|
116
|
Start column-based import: . bin/import_all
|
117
|
To use row-based import: . bin/import_all by_col=
|
118
|
To stop all running imports: . bin/stop_imports
|
119
|
**WARNING**: Do NOT run import_all in the background, or the jobs it
|
120
|
creates won't be owned by your shell.
|
121
|
Note that import_all will take up to an hour to import the NCBI backbone
|
122
|
and other metadata before returning control to the shell.
|
123
|
To view progress:
|
124
|
tail inputs/{.,}*/*/logs/$version.log.sql
|
125
|
note: at the beginning of the import, the system may send out CPU load
|
126
|
warning e-mails. these can safely be ignored. (they happen because the
|
127
|
parallel imports use all the available cores.)
|
128
|
for test import, turn off DB backup (also turns off analytical DB creation):
|
129
|
kill % # cancel after_import()
|
130
|
Wait (4 days) for the import to finish
|
131
|
To recover from a closed terminal window: screen -r
|
132
|
To restart an aborted import for a specific table:
|
133
|
export version=<version>
|
134
|
(set -o errexit; make inputs/<datasrc>/<table>/import_scrub by_col=1 continue=1; make inputs/<datasrc>/publish) &
|
135
|
bin/after_import $! & # $! can also be obtained from `jobs -l`
|
136
|
Get $version: echo $version
|
137
|
Set $version in all vegbiendev terminals: export version=<version>
|
138
|
When there are no more running jobs, exit `screen`: exit # not Ctrl+D
|
139
|
upload logs: make inputs/upload live=1
|
140
|
On local machine: make inputs/download-logs live=1
|
141
|
check for disk space errors:
|
142
|
grep --files-with-matches -F 'No space left on device' inputs/{.,}*/*/logs/$version.log.sql
|
143
|
if there are any matches:
|
144
|
manually reimport these datasources using the steps under
|
145
|
Single datasource import
|
146
|
bin/after_import &
|
147
|
wait for the import to finish
|
148
|
tail inputs/{.,}*/*/logs/$version.log.sql
|
149
|
In the output, search for "Command exited with non-zero status"
|
150
|
For inputs that have this, fix the associated bug(s)
|
151
|
If many inputs have errors, discard the current (partial) import:
|
152
|
make schemas/$version/uninstall
|
153
|
Otherwise, continue
|
154
|
In PostgreSQL:
|
155
|
Go to wiki.vegpath.org/VegBIEN_contents
|
156
|
Get the # observations
|
157
|
Get the # datasources
|
158
|
Get the # datasources with observations
|
159
|
in the r# schema:
|
160
|
Check that analytical_stem contains [# observations] rows
|
161
|
Check that source contains [# datasources] rows up through XAL. If this
|
162
|
is not the case, manually check the entries in source against the
|
163
|
datasources list on the wiki page (some datasources may be near the
|
164
|
end depending on import order).
|
165
|
Check that provider_count contains [# datasources with observations]
|
166
|
rows with dataset="(total)" (at the top when the table is unsorted)
|
167
|
Check that TNRS ran successfully:
|
168
|
tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
|
169
|
If the log ends in an AssertionError
|
170
|
"assert sql.table_col_names(db, table) == header":
|
171
|
Figure out which TNRS CSV columns have changed
|
172
|
On local machine:
|
173
|
Make the changes in the DB's TNRS and public schemas
|
174
|
rm=1 inputs/.TNRS/schema.sql.run export_
|
175
|
make schemas/remake
|
176
|
inputs/test_taxonomic_names/test_scrub # re-run TNRS
|
177
|
rm=1 inputs/.TNRS/data.sql.run export_
|
178
|
Commit
|
179
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
180
|
If dropping a column, save the dependent views
|
181
|
Make the same changes in the live TNRS.tnrs table on vegbiendev
|
182
|
If dropping a column, recreate the dependent views
|
183
|
Restart the TNRS client: make scrub by_col=1 &
|
184
|
Publish the new import:
|
185
|
**WARNING**: Before proceeding, be sure you have done *every single*
|
186
|
verification step listed above. Otherwise, a previous valid import
|
187
|
could incorrectly be overwritten with a broken one.
|
188
|
make schemas/$version/publish # runtime: 1 min ("real 1m10.451s")
|
189
|
unset version
|
190
|
make backups/upload live=1
|
191
|
on local machine:
|
192
|
make backups/vegbien.$version.backup/download live=1
|
193
|
# download backup to local machine
|
194
|
ssh aaronmk@jupiter.nceas.ucsb.edu
|
195
|
cd /data/dev/aaronmk/bien/backups
|
196
|
For each newly-archived backup:
|
197
|
make -s <backup>.md5/test
|
198
|
Check that "OK" is printed next to the filename
|
199
|
If desired, record the import times in inputs/import.stats.xls:
|
200
|
On local machine:
|
201
|
Open inputs/import.stats.xls
|
202
|
If the rightmost import is within 5 columns of column IV:
|
203
|
Copy the current tab to <leftmost-date>~<rightmost-date>
|
204
|
Remove the previous imports from the current tab because they are
|
205
|
now in the copied tab instead
|
206
|
Insert a copy of the leftmost "By column" column group before it
|
207
|
export version=<version>
|
208
|
bin/import_date inputs/{.,}*/*/logs/$version.log.sql
|
209
|
Update the import date in the upper-right corner
|
210
|
bin/import_times inputs/{.,}*/*/logs/$version.log.sql
|
211
|
Paste the output over the # Rows/Time columns, making sure that the
|
212
|
row counts match up with the previous import's row counts
|
213
|
If the row counts do not match up, insert or reorder rows as needed
|
214
|
until they do. Get the datasource names from the log file footers:
|
215
|
tail inputs/{.,}*/*/logs/$version.log.sql
|
216
|
Commit: svn ci -m 'inputs/import.stats.xls: updated import times'
|
217
|
Running individual steps separately:
|
218
|
To run TNRS:
|
219
|
To use an import other than public: export version=<version>
|
220
|
to rescrub all names:
|
221
|
make inputs/.TNRS/reinstall
|
222
|
re-create public-schema views that were cascadingly deleted
|
223
|
make scrub &
|
224
|
To view progress:
|
225
|
tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
|
226
|
To remake analytical DB:
|
227
|
To use an import other than public: export version=<version>
|
228
|
bin/make_analytical_db & # runtime: 13 h ("12:43:57elapsed")
|
229
|
To view progress:
|
230
|
tail -150 inputs/analytical_db/logs/make_analytical_db.log.sql
|
231
|
To back up DB (staging tables and last import):
|
232
|
To use an import *other than public*: export version=<version>
|
233
|
make backups/TNRS.backup-remake &
|
234
|
dump_opts=--exclude-schema=public make backups/vegbien.$version.backup/test &
|
235
|
If after renaming to public, instead set dump_opts='' and replace
|
236
|
$version with the appropriate revision
|
237
|
make backups/upload live=1
|
238
|
|
239
|
Datasource setup:
|
240
|
On local machine:
|
241
|
Example steps for a datasource: wiki.vegpath.org/Import_process_for_Madidi
|
242
|
umask ug=rwx,o= # prevent files from becoming web-accessible
|
243
|
Add a new datasource: make inputs/<datasrc>/add
|
244
|
<datasrc> may not contain spaces, and should be abbreviated.
|
245
|
If the datasource is a herbarium, <datasrc> should be the herbarium code
|
246
|
as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
|
247
|
For a new-style datasource (one containing a ./run runscript):
|
248
|
"cp" -f inputs/.NCBI/{Makefile,run,table.run} inputs/<datasrc>/
|
249
|
For MySQL inputs (exports and live DB connections):
|
250
|
For .sql exports:
|
251
|
Place the original .sql file in _src/ (*not* in _MySQL/)
|
252
|
Follow the steps starting with Install the staging tables below.
|
253
|
This is for an initial sync to get the file onto vegbiendev.
|
254
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
255
|
Create a database for the MySQL export in phpMyAdmin
|
256
|
Give the bien user all database-specific privileges *except*
|
257
|
UPDATE, DELETE, ALTER, DROP. This prevents bugs in the
|
258
|
import scripts from accidentally deleting data.
|
259
|
bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
|
260
|
mkdir inputs/<datasrc>/_MySQL/
|
261
|
cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
|
262
|
Edit _MySQL/*.make for the DB connection
|
263
|
For a .sql export, use server=vegbiendev and --user=bien
|
264
|
Skip the Add input data for each table section
|
265
|
For MS Access databases:
|
266
|
Place the .mdb or .accdb file in _src/
|
267
|
Download and install Access To PostgreSQL from
|
268
|
http://www.bullzip.com/download.php
|
269
|
Use Access To PostgreSQL to export the database:
|
270
|
Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
|
271
|
Export just the data to inputs/<datasrc>/<file>.data.sql
|
272
|
In <file>.schema.sql, make the following changes:
|
273
|
Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
|
274
|
Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
|
275
|
Skip the Add input data for each table section
|
276
|
Add input data for each table present in the datasource:
|
277
|
For .sql exports, you must use the name of the table in the DB export
|
278
|
For CSV files, you can use any name. It's recommended to use a table
|
279
|
name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
|
280
|
Note that if this table will be joined together with another table, its
|
281
|
name must end in ".src"
|
282
|
make inputs/<datasrc>/<table>/add
|
283
|
Important: DO NOT just create an empty directory named <table>!
|
284
|
This command also creates necessary subdirs, such as logs/.
|
285
|
If the table is in a .sql export: make inputs/<datasrc>/<table>/install
|
286
|
Otherwise, place the CSV(s) for the table in
|
287
|
inputs/<datasrc>/<table>/ OR place a query joining other tables
|
288
|
together in inputs/<datasrc>/<table>/create.sql
|
289
|
Important: When exporting relational databases to CSVs, you MUST ensure
|
290
|
that embedded quotes are escaped by doubling them, *not* by
|
291
|
preceding them with a "\" as is the default in phpMyAdmin
|
292
|
If there are multiple part files for a table, and the header is repeated
|
293
|
in each part, make sure each header is EXACTLY the same.
|
294
|
(If the headers are not the same, the CSV concatenation script
|
295
|
assumes the part files don't have individual headers and treats the
|
296
|
subsequent headers as data rows.)
|
297
|
Add <table> to inputs/<datasrc>/import_order.txt before other tables
|
298
|
that depend on it
|
299
|
For a new-style datasource:
|
300
|
"cp" -f inputs/.NCBI/nodes/run inputs/<datasrc>/<table>/
|
301
|
inputs/<datasrc>/<table>/run
|
302
|
Install the staging tables:
|
303
|
make inputs/<datasrc>/reinstall quiet=1 &
|
304
|
For a MySQL .sql export:
|
305
|
At prompt "[you]@vegbiendev's password:", enter your password
|
306
|
At prompt "Enter password:", enter the value in config/bien_password
|
307
|
To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
|
308
|
View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
|
309
|
tail provides a header line with the filename
|
310
|
+1 starts at the first line, to show the whole file
|
311
|
For every file with an error 'column "..." specified more than once':
|
312
|
Add a header override file "+header.<ext>" in <table>/:
|
313
|
Note: The leading "+" should sort it before the flat files.
|
314
|
"_" unfortunately sorts *after* capital letters in ASCII.
|
315
|
Create a text file containing the header line of the flat files
|
316
|
Add an ! at the beginning of the line
|
317
|
This signals cat_csv that this is a header override.
|
318
|
For empty names, use their 0-based column # (by convention)
|
319
|
For duplicate names, add a distinguishing suffix
|
320
|
For long names that collided, rename them to <= 63 chars long
|
321
|
Do NOT make readability changes in this step; that is what the
|
322
|
map spreadsheets (below) are for.
|
323
|
Save
|
324
|
If you made any changes, re-run the install command above
|
325
|
Auto-create the map spreadsheets: make inputs/<datasrc>/
|
326
|
Map each table's columns:
|
327
|
In each <table>/ subdir, for each "via map" map.csv:
|
328
|
Open the map in a spreadsheet editor
|
329
|
Open the "core map" /mappings/Veg+-VegBIEN.csv
|
330
|
In each row of the via map, set the right column to a value from the
|
331
|
left column of the core map
|
332
|
Save
|
333
|
Regenerate the derived maps: make inputs/<datasrc>/
|
334
|
Accept the test cases:
|
335
|
For a new-style datasource:
|
336
|
inputs/<datasrc>/run
|
337
|
svn di inputs/<datasrc>/*/test.xml.ref
|
338
|
If you get errors, follow the steps for old-style datasources below
|
339
|
For an old-style datasource:
|
340
|
make inputs/<datasrc>/test
|
341
|
When prompted to "Accept new test output", enter y and press ENTER
|
342
|
If you instead get errors, do one of the following for each one:
|
343
|
- If the error was due to a bug, fix it
|
344
|
- Add a SQL function that filters or transforms the invalid data
|
345
|
- Make an empty mapping for the columns that produced the error.
|
346
|
Put something in the Comments column of the map spreadsheet to
|
347
|
prevent the automatic mapper from auto-removing the mapping.
|
348
|
When accepting tests, it's helpful to use WinMerge
|
349
|
(see WinMerge setup below for configuration)
|
350
|
make inputs/<datasrc>/test by_col=1
|
351
|
If you get errors this time, this always indicates a bug, usually in
|
352
|
the VegBIEN unique constraints or column-based import itself
|
353
|
Add newly-created files: make inputs/<datasrc>/add
|
354
|
Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
|
355
|
Update vegbiendev:
|
356
|
ssh aaronmk@jupiter.nceas.ucsb.edu
|
357
|
up
|
358
|
On local machine:
|
359
|
./fix_perms
|
360
|
make inputs/upload
|
361
|
make inputs/upload live=1
|
362
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
363
|
up
|
364
|
make inputs/download
|
365
|
make inputs/download live=1
|
366
|
Follow the steps under Install the staging tables above
|
367
|
|
368
|
Maintenance:
|
369
|
on a live machine, you should put the following in your .profile:
|
370
|
--
|
371
|
# make svn files web-accessible. this does not affect unversioned files, because
|
372
|
# these get the right permissions on the local machine instead.
|
373
|
umask ug=rwx,o=rx
|
374
|
|
375
|
unset TMOUT # TMOUT causes screen to exit even with background processes
|
376
|
--
|
377
|
if http://vegbiendev.nceas.ucsb.edu/phppgadmin/ goes down:
|
378
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
379
|
make phppgadmin-Linux
|
380
|
regularly, re-run full-database import so that bugs in it don't pile up.
|
381
|
it needs to be kept in working order so that it works when it's needed.
|
382
|
to back up the vegbiendev databases:
|
383
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
384
|
back up MySQL: # usually few changes, so do this first
|
385
|
live= backups/mysql_snapshot
|
386
|
then review diff, and rerun without `live=`
|
387
|
l=1 overwrite=1 inplace=1 local_dir=/ remote_url="$USER@jupiter:/data/dev/aaronmk/Documents/BIEN/" subpath=/var/lib/mysql.bak/ sudo -E env PATH="$PATH" bin/sync_upload
|
388
|
on local machine:
|
389
|
l=1 swap=1 overwrite=1 inplace=1 local_dir=~ sync_remote_subdir= subpath=~/Documents/BIEN/var/lib/mysql.bak/ bin/sync_upload
|
390
|
back up Postgres:
|
391
|
live= backups/pg_snapshot
|
392
|
then review diff, and rerun without `live=`
|
393
|
to synchronize vegbiendev, jupiter, and your local machine:
|
394
|
**WARNING**: pay careful attention to all files that will be deleted or
|
395
|
overwritten!
|
396
|
install put if needed:
|
397
|
download https://uutils.googlecode.com/svn/trunk/bin/put to ~/bin/ and `chmod +x` it
|
398
|
when changes are made on vegbiendev:
|
399
|
avoid extraneous diffs when rsyncing:
|
400
|
on all machines:
|
401
|
up
|
402
|
./fix_perms
|
403
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
404
|
upload:
|
405
|
overwrite=1 bin/sync_upload --size-only
|
406
|
then review diff, and rerun with `l=1` prepended
|
407
|
on your machine:
|
408
|
download:
|
409
|
overwrite=1 swap=1 src=. dest='aaronmk@jupiter.nceas.ucsb.edu:~/bien' put --exclude=.svn inputs/VegBIEN/TWiki
|
410
|
then review diff, and rerun with `l=1` prepended
|
411
|
swap=1 bin/sync_upload backups/TNRS.backup
|
412
|
then review diff, and rerun with `l=1` prepended
|
413
|
overwrite=1 swap=1 bin/sync_upload --size-only
|
414
|
then review diff, and rerun with `l=1` prepended
|
415
|
overwrite=1 sync_remote_url=~/Dropbox/svn/ bin/sync_upload --existing --size-only # just update mtimes/perms
|
416
|
then review diff, and rerun with `l=1` prepended
|
417
|
to back up e-mails:
|
418
|
on local machine:
|
419
|
/Applications/gmvault-v1.8.1-beta/bin/gmvault sync --multiple-db-owner --type quick aaronmk.nceas@gmail.com
|
420
|
/Applications/gmvault-v1.8.1-beta/bin/gmvault sync --multiple-db-owner --type quick aaronmk@nceas.ucsb.edu
|
421
|
open Thunderbird
|
422
|
click the All Mail folder for each account and wait for it to download the e-mails in it
|
423
|
to back up the version history:
|
424
|
# back up first on the local machine, because often only the svnsync
|
425
|
command gets run, and that way it will get backed up immediately to
|
426
|
Dropbox (and hourly to Time Machine), while vegbiendev only gets
|
427
|
backed up daily to tape
|
428
|
on local machine:
|
429
|
svnsync sync file://"$HOME"/Dropbox/docs/BIEN/svn_repo/ # initial runtime: 1.5 h ("08:21:38" - "06:45:26") @vegbiendev
|
430
|
(cd ~/Dropbox/docs/BIEN/git/; git svn fetch)
|
431
|
overwrite=1 src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/svn_repo/ # runtime: 1 min ("1:05.08")
|
432
|
then review diff, and rerun with `l=1` prepended
|
433
|
overwrite=1 src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/git/
|
434
|
then review diff, and rerun with `l=1` prepended
|
435
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
436
|
# use absolute path for vegbiendev commands because the Ubuntu 14.04
|
437
|
version of rsync doesn't expand ~ properly
|
438
|
overwrite=1 swap=1 src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/svn_repo/ # runtime: 30 s ("36.19")
|
439
|
then review diff, and rerun with `l=1` prepended
|
440
|
overwrite=1 swap=1 src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/git/
|
441
|
then review diff, and rerun with `l=1` prepended
|
442
|
to synchronize a Mac's settings with my testing machine's:
|
443
|
download:
|
444
|
**WARNING**: this will overwrite all your user's settings!
|
445
|
on your machine:
|
446
|
overwrite=1 swap=1 sync_local_dir=~/Library/ sync_remote_subdir=Library/ bin/sync_upload --exclude="/Saved Application State"
|
447
|
then review diff, and rerun with `l=1` prepended
|
448
|
upload:
|
449
|
do step when changes are made on vegbiendev > on your machine, download
|
450
|
ssh aaronmk@jupiter.nceas.ucsb.edu
|
451
|
(cd ~/Dropbox/svn/; up)
|
452
|
on your machine:
|
453
|
rm ~/'Library/Thunderbird/Profiles/9oo8rcyn.default/ImapMail/imap.googlemail.com/[Gmail].sbd/Spam'
|
454
|
# remove the downloaded Spam folder, because spam e-mails often contain viruses that would trigger clamscan
|
455
|
overwrite=1 del= sync_local_dir=~/Dropbox/svn/ sync_remote_subdir=Dropbox/svn/ bin/sync_upload --size-only # just update mtimes
|
456
|
then review diff, and rerun with `l=1` prepended
|
457
|
overwrite=1 inplace=1 sync_local_dir=~ sync_remote_subdir= bin/sync_upload ~/"VirtualBox VMs/**" # need inplace=1 because they are very large files
|
458
|
then review diff, and rerun with `l=1` prepended
|
459
|
overwrite=1 sync_local_dir=~ sync_remote_subdir= bin/sync_upload --exclude="/Library/Saved Application State" --exclude="/Library/Thunderbird/Profiles/9oo8rcyn.default/global-messages-db.sqlite" --exclude="/.Trash" --exclude="/bin" --exclude="/bin/pg_ctl" --exclude="/bin/unzip" --exclude="/Dropbox/home" --exclude="/.profile" --exclude="/.shrc" --exclude="/.bashrc" --exclude="/VirtualBox VMs/Ubuntu/Ubuntu.vdi"
|
460
|
then review diff, and rerun with `l=1` prepended
|
461
|
stop Dropbox: system tray > Dropbox icon > gear icon > Quit Dropbox
|
462
|
this prevents Dropbox from trying to capture filesystem
|
463
|
events while syncing
|
464
|
overwrite=1 sync_local_dir=~ sync_remote_url=~/Dropbox/home bin/sync_upload --exclude="/Library/Saved Application State" --exclude="/Library/Thunderbird/Profiles/9oo8rcyn.default/global-messages-db.sqlite" --exclude="/.Trash" --exclude="/.dropbox" --exclude="/Documents/BIEN" --exclude="/Dropbox" --exclude="/software" --exclude="/VirtualBox VMs/**.sav" --exclude="/VirtualBox VMs/**.vdi" --exclude="/VirtualBox VMs/**.vmdk"
|
465
|
then review diff, and rerun with `l=1` prepended
|
466
|
start Dropbox: /Applications > double-click Dropbox.app
|
467
|
to backup files not in Time Machine:
|
468
|
On local machine:
|
469
|
overwrite=1 src=/ dest=/Volumes/Time\ Machine\ Backups/ sudo -E put Library/PostgreSQL/9.3/data/
|
470
|
then review diff, and rerun with `l=1` prepended
|
471
|
pg_ctl. stop # stop the PostgreSQL server
|
472
|
overwrite=1 src=/ dest=/Volumes/Time\ Machine\ Backups/ sudo -E put Library/PostgreSQL/9.3/data/
|
473
|
then review diff, and rerun with `l=1` prepended
|
474
|
pg_ctl. start # start the PostgreSQL server
|
475
|
VegCore data dictionary:
|
476
|
Regularly, or whenever the VegCore data dictionary page
|
477
|
(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
|
478
|
is changed, regenerate mappings/VegCore.csv:
|
479
|
On local machine:
|
480
|
make mappings/VegCore.htm-remake; make mappings/
|
481
|
apply new data dict mappings to datasource mappings/staging tables:
|
482
|
inputs/run postprocess # runtime: see inputs/run
|
483
|
time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
|
484
|
svn di mappings/VegCore.tables.redmine
|
485
|
If there are changes, update the data dictionary's Tables section
|
486
|
When moving terms, check that no terms were lost: svn di
|
487
|
svn ci -m 'mappings/VegCore.htm: regenerated from wiki'
|
488
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
489
|
perform the steps under "apply new data dict mappings to
|
490
|
datasource mappings/staging tables" above
|
491
|
Important: Whenever you install a system update that affects PostgreSQL or
|
492
|
any of its dependencies, such as libc, you should restart the PostgreSQL
|
493
|
server. Otherwise, you may get strange errors like "the database system
|
494
|
is in recovery mode" which go away upon reimport, or you may not be able
|
495
|
to access the database as the postgres superuser. This applies to both
|
496
|
Linux and Mac OS X.
|
497
|
|
498
|
Backups:
|
499
|
Archived imports:
|
500
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
501
|
Back up: make backups/<version>.backup &
|
502
|
Note: To back up the last import, you must archive it first:
|
503
|
make schemas/rotate
|
504
|
Test: make -s backups/<version>.backup/test &
|
505
|
Restore: make backups/<version>.backup/restore &
|
506
|
Remove: make backups/<version>.backup/remove
|
507
|
Download: make backups/<version>.backup/download
|
508
|
TNRS cache:
|
509
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
510
|
Back up: make backups/TNRS.backup-remake &
|
511
|
runtime: 3 min ("real 2m48.859s")
|
512
|
Restore:
|
513
|
yes|make inputs/.TNRS/uninstall
|
514
|
make backups/TNRS.backup/restore &
|
515
|
runtime: 5.5 min ("real 5m35.829s")
|
516
|
yes|make schemas/public/reinstall
|
517
|
Must come after TNRS restore to recreate tnrs_input_name view
|
518
|
Full DB:
|
519
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
520
|
Back up: make backups/vegbien.<version>.backup &
|
521
|
Test: make -s backups/vegbien.<version>.backup/test &
|
522
|
Restore: make backups/vegbien.<version>.backup/restore &
|
523
|
Download: make backups/vegbien.<version>.backup/download
|
524
|
Import logs:
|
525
|
On local machine:
|
526
|
Download: make inputs/download-logs live=1
|
527
|
|
528
|
Datasource refreshing:
|
529
|
VegBank:
|
530
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
531
|
make inputs/VegBank/vegbank.sql-remake
|
532
|
make inputs/VegBank/reinstall quiet=1 &
|
533
|
|
534
|
Schema changes:
|
535
|
On local machine:
|
536
|
When changing the analytical views, run sync_analytical_..._to_view()
|
537
|
to update the corresponding table
|
538
|
Remember to update the following files with any renamings:
|
539
|
schemas/filter_ERD.csv
|
540
|
mappings/VegCore-VegBIEN.csv
|
541
|
mappings/verify.*.sql
|
542
|
Regenerate schema from installed DB: make schemas/remake
|
543
|
Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
|
544
|
**WARNING**: This will delete the public schema of your VegBIEN DB!
|
545
|
If needed, reinstall staging tables:
|
546
|
On local machine:
|
547
|
sudo -E -u postgres psql <<<'ALTER DATABASE vegbien RENAME TO vegbien_prev'
|
548
|
make db
|
549
|
. bin/reinstall_all
|
550
|
Fix any bugs and retry until no errors
|
551
|
make schemas/public/install
|
552
|
This must be run *after* the datasources are installed, because
|
553
|
views in public depend on some of the datasources
|
554
|
sudo -E -u postgres psql <<<'DROP DATABASE vegbien_prev'
|
555
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
556
|
repeat the above steps
|
557
|
**WARNING**: Do not run this until reinstall_all runs successfully
|
558
|
on the local machine, or the live DB may be unrestorable!
|
559
|
update mappings and staging table column names:
|
560
|
on local machine:
|
561
|
inputs/run postprocess # runtime: see inputs/run
|
562
|
time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
|
563
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
564
|
manually apply schema changes to the live public schema
|
565
|
do steps under "on local machine" above
|
566
|
Sync ERD with vegbien.sql schema:
|
567
|
Run make schemas/vegbien.my.sql
|
568
|
Open schemas/vegbien.ERD.mwb in MySQLWorkbench
|
569
|
Go to File > Export > Synchronize With SQL CREATE Script...
|
570
|
For Input File, select schemas/vegbien.my.sql
|
571
|
Click Continue
|
572
|
In the changes list, select each table with an arrow next to it
|
573
|
Click Update Model
|
574
|
Click Continue
|
575
|
Note: The generated SQL script will be empty because we are syncing in
|
576
|
the opposite direction
|
577
|
Click Execute
|
578
|
Reposition any lines that have been reset
|
579
|
Add any new tables by dragging them from the Catalog in the left sidebar
|
580
|
to the diagram
|
581
|
Remove any deleted tables by right-clicking the table's diagram element,
|
582
|
selecting Delete '<table name>', and clicking Delete
|
583
|
Save
|
584
|
If desired, update the graphical ERD exports (see below)
|
585
|
Update graphical ERD exports:
|
586
|
Go to File > Export > Export as PNG...
|
587
|
Select schemas/vegbien.ERD.png and click Save
|
588
|
Go to File > Export > Export as SVG...
|
589
|
Select schemas/vegbien.ERD.svg and click Save
|
590
|
Go to File > Export > Export as Single Page PDF...
|
591
|
Select schemas/vegbien.ERD.1_pg.pdf and click Save
|
592
|
Go to File > Print...
|
593
|
In the lower left corner, click PDF > Save as PDF...
|
594
|
Set the Title and Author to ""
|
595
|
Select schemas/vegbien.ERD.pdf and click Save
|
596
|
Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
|
597
|
Refactoring tips:
|
598
|
To rename a table:
|
599
|
In vegbien.sql, do the following:
|
600
|
Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
|
601
|
This is necessary because the table name is *everywhere*
|
602
|
Search for <new>
|
603
|
Manually change back any replacements inside comments
|
604
|
To rename a column:
|
605
|
Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
|
606
|
Recreate any foreign key for the column, removing CONSTRAINT <name>
|
607
|
This resets the foreign key name using the new column name
|
608
|
Creating a poster of the ERD:
|
609
|
Determine the poster size:
|
610
|
Measure the line height (from the bottom of one line to the bottom
|
611
|
of another): 16.3cm/24 lines = 0.679cm
|
612
|
Measure the height of the ERD: 35.4cm*2 = 70.8cm
|
613
|
Zoom in as far as possible
|
614
|
Measure the height of a capital letter: 3.5mm
|
615
|
Measure the line height: 8.5mm
|
616
|
Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
|
617
|
Calculate the text height: 0.679cm*0.41 = 0.28cm
|
618
|
Calculate the text height's fraction of the ERD height:
|
619
|
0.28cm/70.8cm = 0.0040
|
620
|
Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
|
621
|
Calculate the VegBIEN poster height to make the text the same size:
|
622
|
0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
|
623
|
The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
|
624
|
Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
|
625
|
The minimum VegBIEN poster size is 35x54in portrait
|
626
|
Determine the cost:
|
627
|
The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
|
628
|
charges the following for posters:
|
629
|
base: $7.25/sq ft
|
630
|
lamination: $3/sq ft
|
631
|
mounting on a board: $8/sq ft
|
632
|
|
633
|
Testing:
|
634
|
On a development machine, you should put the following in your .profile:
|
635
|
umask ug=rwx,o= # prevent files from becoming web-accessible
|
636
|
export log= n=2
|
637
|
For development machine specs, see /planning/resources/dev_machine.specs/
|
638
|
On local machine:
|
639
|
Mapping process: make test
|
640
|
Including column-based import: make test by_col=1
|
641
|
If the row-based and column-based imports produce different inserted
|
642
|
row counts, this usually means that a table is underconstrained
|
643
|
(the unique indexes don't cover all possible rows).
|
644
|
This can occur if you didn't use COALESCE(field, null_value) around
|
645
|
a nullable field in a unique index. See sql_gen.null_sentinels for
|
646
|
the appropriate null value to use.
|
647
|
Map spreadsheet generation: make remake
|
648
|
Missing mappings: make missing_mappings
|
649
|
Everything (for most complete coverage): make test-all
|
650
|
|
651
|
Debugging:
|
652
|
"Binary chop" debugging:
|
653
|
(This is primarily useful for regressions that occurred in a previous
|
654
|
revision, which was committed without running all the tests)
|
655
|
up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
|
656
|
.htaccess:
|
657
|
mod_rewrite:
|
658
|
**IMPORTANT**: whenever you change the DirectorySlash setting for a
|
659
|
directory, you *must* clear your browser's cache to ensure that
|
660
|
a cached redirect is not used. this is because RewriteRule
|
661
|
redirects are (by default) temporary, but DirectorySlash
|
662
|
redirects are permanent.
|
663
|
for Firefox:
|
664
|
press Cmd+Shift+Delete
|
665
|
check only Cache
|
666
|
press Enter or click Clear Now
|
667
|
|
668
|
WinMerge setup:
|
669
|
In a Windows VM:
|
670
|
Install WinMerge from <http://winmerge.org/>
|
671
|
Open WinMerge
|
672
|
Go to Edit > Options and click Compare in the left sidebar
|
673
|
Enable "Moved block detection", as described at
|
674
|
<http://manual.winmerge.org/Configuration.html#d0e5892>.
|
675
|
Set Whitespace to Ignore change, as described at
|
676
|
<http://manual.winmerge.org/Configuration.html#d0e5758>.
|
677
|
|
678
|
Documentation:
|
679
|
To generate a Redmine-formatted list of steps for column-based import:
|
680
|
On local machine:
|
681
|
make schemas/public/reinstall
|
682
|
make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
|
683
|
To import and scrub just the test taxonomic names:
|
684
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
685
|
inputs/test_taxonomic_names/test_scrub
|
686
|
|
687
|
General:
|
688
|
To see a program's description, read its top-of-file comment
|
689
|
To see a program's usage, run it without arguments
|
690
|
To remake a directory: make <dir>/remake
|
691
|
To remake a file: make <file>-remake
|