1
|
Installation:
|
2
|
Check out svn: svn co https://code.nceas.ucsb.edu/code/projects/bien
|
3
|
cd bien/
|
4
|
Install: make install
|
5
|
WARNING: This will delete the current public schema of your VegBIEN DB!
|
6
|
Uninstall: make uninstall
|
7
|
WARNING: This will delete your entire VegBIEN DB!
|
8
|
This includes all archived imports and staging tables.
|
9
|
|
10
|
Maintenance:
|
11
|
VegCore data dictionary:
|
12
|
Regularly, or whenever the VegCore data dictionary page
|
13
|
(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
|
14
|
is changed, regenerate mappings/VegCore.csv:
|
15
|
make mappings/VegCore.htm-remake; make mappings/
|
16
|
svn ci -m "mappings/VegCore.csv: Regenerated from wiki"
|
17
|
Important: Whenever you install a system update that affects PostgreSQL or
|
18
|
any of its dependencies, such as libc, you should restart the PostgreSQL
|
19
|
server. Otherwise, you may get strange errors like "the database system
|
20
|
is in recovery mode" which go away upon reimport, or you may not be able
|
21
|
to access the database as the postgres superuser. This applies to both
|
22
|
Linux and Mac OS X.
|
23
|
|
24
|
Single datasource import:
|
25
|
(Re)import and scrub: make inputs/<datasrc>/reimport_scrub
|
26
|
(Re)import only: make inputs/<datasrc>/reimport
|
27
|
(Re)scrub: make inputs/<datasrc>/rescrub
|
28
|
Note that these commands also work if the datasource is not yet imported
|
29
|
|
30
|
Full database import:
|
31
|
On local machine:
|
32
|
make inputs/upload
|
33
|
make test by_col=1
|
34
|
See note under Testing below
|
35
|
On vegbiendev:
|
36
|
Ensure there are no local modifications: svn st
|
37
|
svn up
|
38
|
For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
|
39
|
Update the auxiliary schemas: make schemas/reinstall
|
40
|
The public schema will be installed separately by the import process
|
41
|
Delete imports before the last so they won't bloat the full DB backup:
|
42
|
make backups/vegbien.<version>.backup/remove
|
43
|
To keep a previous import other than the public schema:
|
44
|
export dump_opts='--exclude-schema=public --exclude-schema=<version>'
|
45
|
Make sure there is at least 150GB of disk space on /: df -h
|
46
|
The import schema is 100GB, and may use additional space for temp tables
|
47
|
To free up space, remove backups that have been archived on jupiter:
|
48
|
List backups/ to view older backups
|
49
|
Check their MD5 sums using the steps under On jupiter below
|
50
|
Remove these backups
|
51
|
unset version
|
52
|
Start column-based import: . bin/import_all by_col=1
|
53
|
To use row-based import: . bin/import_all
|
54
|
To stop all running imports: . bin/stop_imports
|
55
|
WARNING: Do NOT run import_all in the background, or the jobs it creates
|
56
|
won't be owned by your shell.
|
57
|
Note that import_all will take several hours to import the NCBI backbone
|
58
|
and TNRS names before returning control to the shell.
|
59
|
Wait (overnight) for the import to finish
|
60
|
On local machine: make inputs/download-logs
|
61
|
In PostgreSQL:
|
62
|
Check that the provider_count and source tables contain entries for all
|
63
|
inputs
|
64
|
Check that unscrubbed_taxondetermination_view returns no rows
|
65
|
Check that there are taxondeterminations whose source_id is
|
66
|
source_by_shortname('TNRS')
|
67
|
tail inputs/{.,}*/*/logs/$version.log.sql
|
68
|
In the output, search for "Command exited with non-zero status"
|
69
|
For inputs that have this, fix the associated bug(s)
|
70
|
If many inputs have errors, discard the current (partial) import:
|
71
|
make schemas/$version/uninstall
|
72
|
Otherwise, continue
|
73
|
make schemas/$version/publish
|
74
|
unset version
|
75
|
sudo backups/fix_perms
|
76
|
make backups/upload
|
77
|
On jupiter:
|
78
|
cd /data/dev/aaronmk/bien/backups
|
79
|
For each newly-archived backup:
|
80
|
make -s <backup>.md5/test
|
81
|
Check that "OK" is printed next to the filename
|
82
|
On nimoy:
|
83
|
cd /home/bien/svn/
|
84
|
svn up
|
85
|
export version=<version>
|
86
|
make backups/analytical_aggregate.$version.csv/download
|
87
|
make -s backups/analytical_aggregate.$version.csv.md5/test
|
88
|
Check that "OK" is printed next to the filename
|
89
|
In the bien_web DB:
|
90
|
Create the analytical_aggregate_<version> table using its schema
|
91
|
in schemas/vegbien.my.sql
|
92
|
env table=analytical_aggregate_$version bin/publish_analytical_db \
|
93
|
backups/analytical_aggregate.$version.csv
|
94
|
If desired, record the import times in inputs/import.stats.xls:
|
95
|
Open inputs/import.stats.xls
|
96
|
Insert a copy of the leftmost "By column" column group before it
|
97
|
bin/import_date inputs/{.,}*/*/logs/$version.log.sql
|
98
|
Update the import date in the upper-right corner
|
99
|
bin/import_times inputs/{.,}*/*/logs/$version.log.sql
|
100
|
Paste the output over the # Rows/Time columns, making sure that the
|
101
|
row counts match up with the previous import's row counts
|
102
|
If the row counts do not match up, insert or reorder rows as needed
|
103
|
until they do
|
104
|
Commit: svn ci -m "inputs/import.stats.xls: Updated import times"
|
105
|
To scrub unscrubbed taxondeterminations: make scrub by_col=1 &
|
106
|
To view progress:
|
107
|
tail -f inputs/.TNRS/public.unscrubbed_taxondetermination_view/logs/$version.log.sql
|
108
|
To remake analytical DB: bin/make_analytical_db &
|
109
|
To view progress:
|
110
|
tail -f inputs/analytical_db/logs/make_analytical_db.log.sql
|
111
|
|
112
|
Backups:
|
113
|
Archived imports:
|
114
|
Back up: make backups/<version>.backup &
|
115
|
Note: To back up the last import, you must archive it first:
|
116
|
make schemas/rotate
|
117
|
Test: make -s backups/<version>.backup/test &
|
118
|
Restore: make backups/<version>.backup/restore &
|
119
|
Remove: make backups/<version>.backup/remove
|
120
|
Download: make backups/download
|
121
|
TNRS cache:
|
122
|
Back up: make backups/TNRS.backup-remake &
|
123
|
Restore:
|
124
|
yes|make inputs/.TNRS/uninstall
|
125
|
make backups/TNRS.backup/restore &
|
126
|
yes|make schemas/public/reinstall
|
127
|
Must come after TNRS restore to recreate tnrs_input_name view
|
128
|
Full DB:
|
129
|
Back up: make backups/vegbien.<version>.backup &
|
130
|
Test: make -s backups/vegbien.<version>.backup/test &
|
131
|
Restore: make backups/vegbien.<version>.backup/restore &
|
132
|
Download: make backups/download
|
133
|
Import logs:
|
134
|
Download: make inputs/download-logs
|
135
|
|
136
|
Datasource setup:
|
137
|
Add a new datasource: make inputs/<datasrc>/add
|
138
|
<datasrc> may not contain spaces, and should be abbreviated.
|
139
|
If the datasource is a herbarium, <datasrc> should be the herbarium code
|
140
|
as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
|
141
|
For MySQL inputs (exports and live DB connections):
|
142
|
For .sql exports:
|
143
|
Place the original .sql file in _src/ (*not* in _MySQL/)
|
144
|
Create a database for the MySQL export in phpMyAdmin
|
145
|
mysql -p database <inputs/<datasrc>/_src/export.sql
|
146
|
mkdir inputs/<datasrc>/_MySQL/
|
147
|
cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
|
148
|
Edit _MySQL/*.make for the DB connection
|
149
|
For a .sql export, use your local MySQL DB
|
150
|
Install the export according to Install the staging tables below
|
151
|
Add input data for each table present in the datasource:
|
152
|
For .sql exports, you must use the name of the table in the DB export
|
153
|
For CSV files, you can use any name. It's recommended to use a table
|
154
|
name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
|
155
|
Note that if this table will be joined together with another table, its
|
156
|
name must end in ".src"
|
157
|
make inputs/<datasrc>/<table>/add
|
158
|
Important: DO NOT just create an empty directory named <table>!
|
159
|
This command also creates necessary subdirs, such as logs/.
|
160
|
If the table is in a .sql export: make inputs/<datasrc>/<table>/install
|
161
|
Otherwise, place the CSV(s) for the table in
|
162
|
inputs/<datasrc>/<table>/ OR place a query joining other tables
|
163
|
together in inputs/<datasrc>/<table>/create.sql
|
164
|
Important: When exporting relational databases to CSVs, you MUST ensure
|
165
|
that embedded quotes are escaped by doubling them, *not* by
|
166
|
preceding them with a "\" as is the default in phpMyAdmin
|
167
|
If there are multiple part files for a table, and the header is repeated
|
168
|
in each part, make sure each header is EXACTLY the same.
|
169
|
(If the headers are not the same, the CSV concatenation script
|
170
|
assumes the part files don't have individual headers and treats the
|
171
|
subsequent headers as data rows.)
|
172
|
Add <table> to inputs/<datasrc>/import_order.txt before other tables
|
173
|
that depend on it
|
174
|
Install the staging tables:
|
175
|
make inputs/<datasrc>/reinstall quiet=1 &
|
176
|
To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
|
177
|
View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
|
178
|
tail provides a header line with the filename
|
179
|
+1 starts at the first line, to show the whole file
|
180
|
For every file with an error 'column "..." specified more than once':
|
181
|
Add a header override file "+header.<ext>" in <table>/:
|
182
|
Note: The leading "+" should sort it before the flat files.
|
183
|
"_" unfortunately sorts *after* capital letters in ASCII.
|
184
|
Create a text file containing the header line of the flat files
|
185
|
Add an ! at the beginning of the line
|
186
|
This signals cat_csv that this is a header override.
|
187
|
For empty names, use their 0-based column # (by convention)
|
188
|
For duplicate names, add a distinguishing suffix
|
189
|
For long names that collided, rename them to <= 63 chars long
|
190
|
Do NOT make readability changes in this step; that is what the
|
191
|
map spreadsheets (below) are for.
|
192
|
Save
|
193
|
If you made any changes, re-run the install command above
|
194
|
Auto-create the map spreadsheets: make inputs/<datasrc>/
|
195
|
Map each table's columns:
|
196
|
In each <table>/ subdir, for each "via map" map.csv:
|
197
|
Open the map in a spreadsheet editor
|
198
|
Open the "core map" /mappings/Veg+-VegBIEN.csv
|
199
|
In each row of the via map, set the right column to a value from the
|
200
|
left column of the core map
|
201
|
Save
|
202
|
Regenerate the derived maps: make inputs/<datasrc>/
|
203
|
Accept the test cases:
|
204
|
make inputs/<datasrc>/test
|
205
|
When prompted to "Accept new test output", enter y and press ENTER
|
206
|
If you instead get errors, do one of the following for each one:
|
207
|
- If the error was due to a bug, fix it
|
208
|
- Add a SQL function that filters or transforms the invalid data
|
209
|
- Make an empty mapping for the columns that produced the error.
|
210
|
Put something in the Comments column of the map spreadsheet to
|
211
|
prevent the automatic mapper from auto-removing the mapping.
|
212
|
When accepting tests, it's helpful to use WinMerge
|
213
|
(see WinMerge setup below for configuration)
|
214
|
make inputs/<datasrc>/test by_col=1
|
215
|
If you get errors this time, this always indicates a bug, usually in
|
216
|
the VegBIEN unique constraints or column-based import itself
|
217
|
Add newly-created files: make inputs/<datasrc>/add
|
218
|
Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
|
219
|
Update vegbiendev:
|
220
|
On vegbiendev: svn up
|
221
|
On local machine: make inputs/upload
|
222
|
On vegbiendev:
|
223
|
Follow the steps under Install the staging tables above
|
224
|
make inputs/<datasrc>/test
|
225
|
|
226
|
Datasource refreshing:
|
227
|
VegBank:
|
228
|
make inputs/VegBank/vegbank.sql-remake
|
229
|
make inputs/VegBank/reinstall quiet=1 &
|
230
|
|
231
|
Schema changes:
|
232
|
Remember to update the following files with any renamings:
|
233
|
schemas/filter_ERD.csv
|
234
|
mappings/VegCore-VegBIEN.csv
|
235
|
mappings/verify.*.sql
|
236
|
Regenerate schema from installed DB: make schemas/remake
|
237
|
Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
|
238
|
WARNING: This will delete the current public schema of your VegBIEN DB!
|
239
|
Reinstall staging tables: . bin/reinstall_all
|
240
|
Sync ERD with vegbien.sql schema:
|
241
|
Run make schemas/vegbien.my.sql
|
242
|
Open schemas/vegbien.ERD.mwb in MySQLWorkbench
|
243
|
Go to File > Export > Synchronize With SQL CREATE Script...
|
244
|
For Input File, select schemas/vegbien.my.sql
|
245
|
Click Continue
|
246
|
In the changes list, select each table with an arrow next to it
|
247
|
Click Update Model
|
248
|
Click Continue
|
249
|
Note: The generated SQL script will be empty because we are syncing in
|
250
|
the opposite direction
|
251
|
Click Execute
|
252
|
Reposition any lines that have been reset
|
253
|
Add any new tables by dragging them from the Catalog in the left sidebar
|
254
|
to the diagram
|
255
|
Remove any deleted tables by right-clicking the table's diagram element,
|
256
|
selecting Delete '<table name>', and clicking Delete
|
257
|
Save
|
258
|
If desired, update the graphical ERD exports (see below)
|
259
|
Update graphical ERD exports:
|
260
|
Go to File > Export > Export as PNG...
|
261
|
Select schemas/vegbien.ERD.png and click Save
|
262
|
Go to File > Export > Export as SVG...
|
263
|
Select schemas/vegbien.ERD.svg and click Save
|
264
|
Go to File > Export > Export as Single Page PDF...
|
265
|
Select schemas/vegbien.ERD.1_pg.pdf and click Save
|
266
|
Go to File > Print...
|
267
|
In the lower left corner, click PDF > Save as PDF...
|
268
|
Set the Title and Author to ""
|
269
|
Select schemas/vegbien.ERD.pdf and click Save
|
270
|
Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
|
271
|
Refactoring tips:
|
272
|
To rename a table:
|
273
|
In vegbien.sql, do the following:
|
274
|
Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
|
275
|
This is necessary because the table name is *everywhere*
|
276
|
Search for <new>
|
277
|
Manually change back any replacements inside comments
|
278
|
To rename a column:
|
279
|
Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
|
280
|
Recreate any foreign key for the column, removing CONSTRAINT <name>
|
281
|
This resets the foreign key name using the new column name
|
282
|
Creating a poster of the ERD:
|
283
|
Determine the poster size:
|
284
|
Measure the line height (from the bottom of one line to the bottom
|
285
|
of another): 16.3cm/24 lines = 0.679cm
|
286
|
Measure the height of the ERD: 35.4cm*2 = 70.8cm
|
287
|
Zoom in as far as possible
|
288
|
Measure the height of a capital letter: 3.5mm
|
289
|
Measure the line height: 8.5mm
|
290
|
Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
|
291
|
Calculate the text height: 0.679cm*0.41 = 0.28cm
|
292
|
Calculate the text height's fraction of the ERD height:
|
293
|
0.28cm/70.8cm = 0.0040
|
294
|
Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
|
295
|
Calculate the VegBIEN poster height to make the text the same size:
|
296
|
0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
|
297
|
The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
|
298
|
Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
|
299
|
The minimum VegBIEN poster size is 35x54in portrait
|
300
|
Determine the cost:
|
301
|
The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
|
302
|
charges the following for posters:
|
303
|
base: $7.25/sq ft
|
304
|
lamination: $3/sq ft
|
305
|
mounting on a board: $8/sq ft
|
306
|
|
307
|
Testing:
|
308
|
On a development machine, you should put the following in your .profile:
|
309
|
export log= n=2
|
310
|
Mapping process: make test
|
311
|
Including column-based import: make test by_col=1
|
312
|
If the row-based and column-based imports produce different inserted
|
313
|
row counts, this usually means that a table is underconstrained
|
314
|
(the unique indexes don't cover all possible rows).
|
315
|
This can occur if you didn't use COALESCE(field, null_value) around
|
316
|
a nullable field in a unique index. See sql_gen.null_sentinels for
|
317
|
the appropriate null value to use.
|
318
|
Map spreadsheet generation: make remake
|
319
|
Missing mappings: make missing_mappings
|
320
|
Everything (for most complete coverage): make test-all
|
321
|
|
322
|
Debugging:
|
323
|
"Binary chop" debugging:
|
324
|
(This is primarily useful for regressions that occurred in a previous
|
325
|
revision, which was committed without running all the tests)
|
326
|
svn up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
|
327
|
|
328
|
WinMerge setup:
|
329
|
Install WinMerge from <http://winmerge.org/>
|
330
|
Open WinMerge
|
331
|
Go to Edit > Options and click Compare in the left sidebar
|
332
|
Enable "Moved block detection", as described at
|
333
|
<http://manual.winmerge.org/Configuration.html#d0e5892>.
|
334
|
Set Whitespace to Ignore change, as described at
|
335
|
<http://manual.winmerge.org/Configuration.html#d0e5758>.
|
336
|
|
337
|
Documentation:
|
338
|
To generate a Redmine-formatted list of steps for column-based import:
|
339
|
make schemas/public/reinstall
|
340
|
make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
|
341
|
To import and scrub just the test taxonomic names:
|
342
|
inputs/test_taxonomic_names/test_scrub
|
343
|
|
344
|
General:
|
345
|
To see a program's description, read its top-of-file comment
|
346
|
To see a program's usage, run it without arguments
|
347
|
To remake a directory: make <dir>/remake
|
348
|
To remake a file: make <file>-remake
|