1 |
702
|
aaronmk
|
Installation:
|
2 |
|
|
Install: make install
|
3 |
3370
|
aaronmk
|
WARNING: This will delete the current public schema of your VegBIEN DB!
|
4 |
702
|
aaronmk
|
Uninstall: make uninstall
|
5 |
3370
|
aaronmk
|
WARNING: This will delete your entire VegBIEN DB!
|
6 |
3374
|
aaronmk
|
This includes all archived imports and staging tables.
|
7 |
554
|
aaronmk
|
|
8 |
3674
|
aaronmk
|
Maintenance:
|
9 |
|
|
Important: Whenever you install a system update that affects PostgreSQL or
|
10 |
|
|
any of its dependencies, such as libc, you should restart the PostgreSQL
|
11 |
|
|
server. Otherwise, you may get strange errors like "the database system
|
12 |
|
|
is in recovery mode" which go away upon reimport.
|
13 |
|
|
|
14 |
702
|
aaronmk
|
Data import:
|
15 |
4487
|
aaronmk
|
On local machine:
|
16 |
|
|
make test by_col=1
|
17 |
5016
|
aaronmk
|
See note under Testing below
|
18 |
4487
|
aaronmk
|
On vegbiendev:
|
19 |
4482
|
aaronmk
|
svn up
|
20 |
6059
|
aaronmk
|
make inputs/upload
|
21 |
|
|
For each newly-uploaded datasource: make inputs/<datasrc>/reinstall
|
22 |
4482
|
aaronmk
|
Update the schemas: make schemas/reinstall
|
23 |
|
|
WARNING: This will delete the current public schema of your VegBIEN DB!
|
24 |
|
|
To save it: make schemas/rotate
|
25 |
4793
|
aaronmk
|
Important: This must be done *after* running make_analytical_db on a
|
26 |
|
|
previous import
|
27 |
5206
|
aaronmk
|
Start column-based import: . bin/import_all by_col=1
|
28 |
|
|
To use row-based import: . bin/import_all
|
29 |
4481
|
aaronmk
|
To stop all running imports: . bin/stop_imports
|
30 |
5318
|
aaronmk
|
WARNING: Do NOT run import_all in the background, or the jobs it creates
|
31 |
|
|
won't be owned by your shell.
|
32 |
5563
|
aaronmk
|
Note that import_all will several hours to import the NCBI backbone and
|
33 |
|
|
TNRS names before returning control to the shell.
|
34 |
4481
|
aaronmk
|
Wait (overnight) for the import to finish
|
35 |
6103
|
aaronmk
|
tail inputs/{.,}*/*/logs/*.r<revision>[.-]*log.sql
|
36 |
|
|
Check that every input's log ends in "Encountered 0 error(s)"
|
37 |
|
|
If many do not, fix the bug and discard the current (partial) import:
|
38 |
|
|
make schemas/public/reinstall
|
39 |
|
|
Otherwise, continue
|
40 |
6105
|
aaronmk
|
Determine the import name:
|
41 |
|
|
bin/import_name inputs/{.,}*/*/logs/*.r<revision>[.-]*log.sql
|
42 |
|
|
Archive the last import: make schemas/rename/public.<import_name>
|
43 |
4790
|
aaronmk
|
Important: This must be done *after* running make_analytical_db
|
44 |
4481
|
aaronmk
|
Delete previous imports so they won't bloat the full DB backup:
|
45 |
5461
|
aaronmk
|
make backups/public.<version>.backup/remove
|
46 |
5917
|
aaronmk
|
make backups/TNRS.backup-remake &
|
47 |
5461
|
aaronmk
|
make backups/public.<version>.backup/test &
|
48 |
|
|
make backups/vegbien.<version>.backup/test &
|
49 |
4481
|
aaronmk
|
On local machine:
|
50 |
6049
|
aaronmk
|
make inputs/download-logs
|
51 |
4623
|
aaronmk
|
make backups/download
|
52 |
4677
|
aaronmk
|
If desired, record the import times in inputs/import.stats.xls:
|
53 |
|
|
Open inputs/import.stats.xls
|
54 |
|
|
Insert a copy of the leftmost Column-based column group before it
|
55 |
|
|
Update the import date in the upper-right corner
|
56 |
5461
|
aaronmk
|
./bin/import_times inputs/{.,}*/*/logs/*.r<revision>[.-]*log.sql
|
57 |
4677
|
aaronmk
|
Paste the output over the # Rows/Time columns, making sure that the
|
58 |
|
|
row counts match up with the previous import's row counts
|
59 |
|
|
If the row counts do not match up, insert or reorder rows as needed
|
60 |
|
|
until they do
|
61 |
4777
|
aaronmk
|
Commit: svn ci -m "inputs/import.stats.xls: Updated import times"
|
62 |
6212
|
aaronmk
|
To remake analytical DB: bin/make_analytical_db &
|
63 |
|
|
To view progress:
|
64 |
|
|
tail -f inputs/analytical_db/logs/make_analytical_db.log.sql
|
65 |
3381
|
aaronmk
|
|
66 |
3545
|
aaronmk
|
Backups:
|
67 |
3408
|
aaronmk
|
Archived imports:
|
68 |
|
|
Back up: make backups/public.<date>.backup &
|
69 |
3546
|
aaronmk
|
Note: To back up the last import, you must archive it first (above)
|
70 |
3410
|
aaronmk
|
Test: make backups/public.<date>.backup/test &
|
71 |
3408
|
aaronmk
|
Restore: make backups/public.<date>.backup/restore &
|
72 |
|
|
Remove: make backups/public.<date>.backup/remove
|
73 |
3701
|
aaronmk
|
Download: make backups/download
|
74 |
5801
|
aaronmk
|
TNRS cache:
|
75 |
|
|
Back up: make backups/TNRS.backup-remake &
|
76 |
|
|
Restore:
|
77 |
|
|
yes|make inputs/.TNRS/uninstall
|
78 |
|
|
make backups/TNRS.backup/restore &
|
79 |
|
|
yes|make schemas/public/reinstall
|
80 |
|
|
Must come after TNRS restore to recreate tnrs_input_name view
|
81 |
3408
|
aaronmk
|
Full DB:
|
82 |
3546
|
aaronmk
|
Back up, test, and rotate: make backups/vegbien.backup/all &
|
83 |
3439
|
aaronmk
|
Back up and rotate: make backups/vegbien.backup/rotate &
|
84 |
|
|
Test: make backups/vegbien.<date>.backup/test &
|
85 |
|
|
Restore: make backups/vegbien.<date>.backup/restore &
|
86 |
3701
|
aaronmk
|
Download: make backups/download
|
87 |
3698
|
aaronmk
|
Import logs:
|
88 |
|
|
Download: make inputs/download-logs
|
89 |
554
|
aaronmk
|
|
90 |
1773
|
aaronmk
|
Datasource setup:
|
91 |
4219
|
aaronmk
|
Add a new datasource: make inputs/<datasrc>/add
|
92 |
|
|
<datasrc> may not contain spaces, and should be abbreviated.
|
93 |
|
|
If the datasource is a herbarium, <datasrc> should be the herbarium code
|
94 |
|
|
as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
|
95 |
4360
|
aaronmk
|
Install any MySQL export:
|
96 |
|
|
Create database in phpMyAdmin
|
97 |
|
|
mysql -p database <export.sql
|
98 |
4218
|
aaronmk
|
Add input data for each table present in the datasource:
|
99 |
|
|
Choose a table name from <https://projects.nceas.ucsb.edu/nceas/projects
|
100 |
|
|
/bien/wiki/VegCSV#Suggested-table-names>, or use a custom name
|
101 |
4264
|
aaronmk
|
Note that if this table will be joined together with another table, its
|
102 |
|
|
name must end in ".src"
|
103 |
4219
|
aaronmk
|
make inputs/<datasrc>/<table>/add
|
104 |
4342
|
aaronmk
|
Important: DO NOT just create an empty directory named <table>!
|
105 |
|
|
This command also creates necessary subdirs, such as logs/.
|
106 |
4219
|
aaronmk
|
Place the CSV for the table in inputs/<datasrc>/<table>/
|
107 |
4264
|
aaronmk
|
OR place a query joining other tables together in
|
108 |
5881
|
aaronmk
|
inputs/<datasrc>/<table>/create.sql
|
109 |
4212
|
aaronmk
|
Important: When exporting relational databases to CSVs, you MUST ensure
|
110 |
|
|
that embedded quotes are escaped by doubling them, *not* by
|
111 |
|
|
preceding them with a "\" as is the default in phpMyAdmin
|
112 |
3612
|
aaronmk
|
If there are multiple part files for a table, and the header is repeated
|
113 |
|
|
in each part, make sure each header is EXACTLY the same.
|
114 |
|
|
(If the headers are not the same, the CSV concatenation script
|
115 |
|
|
assumes the part files don't have individual headers and treats the
|
116 |
|
|
subsequent headers as data rows.)
|
117 |
4255
|
aaronmk
|
Add <table> to inputs/<datasrc>/import_order.txt before other tables
|
118 |
4220
|
aaronmk
|
that depend on it
|
119 |
3611
|
aaronmk
|
Install the staging tables:
|
120 |
4219
|
aaronmk
|
make inputs/<datasrc>/reinstall quiet=1 &
|
121 |
|
|
To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
|
122 |
|
|
View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
|
123 |
3611
|
aaronmk
|
tail provides a header line with the filename
|
124 |
|
|
+1 starts at the first line, to show the whole file
|
125 |
|
|
For every file with an error 'column "..." specified more than once':
|
126 |
4182
|
aaronmk
|
Add a header override file "+header.<ext>" in <table>/:
|
127 |
3611
|
aaronmk
|
Note: The leading "+" should sort it before the flat files.
|
128 |
|
|
"_" unfortunately sorts *after* capital letters in ASCII.
|
129 |
|
|
Create a text file containing the header line of the flat files
|
130 |
|
|
Add an ! at the beginning of the line
|
131 |
|
|
This signals cat_csv that this is a header override.
|
132 |
|
|
For empty names, use their 0-based column # (by convention)
|
133 |
|
|
For duplicate names, add a distinguishing suffix
|
134 |
|
|
For long names that collided, rename them to <= 63 chars long
|
135 |
|
|
Do NOT make readability changes in this step; that is what the
|
136 |
|
|
map spreadsheets (below) are for.
|
137 |
|
|
Save
|
138 |
|
|
If you made any changes, re-run the install command above
|
139 |
6015
|
aaronmk
|
Auto-create the map spreadsheets: make inputs/<datasrc>/
|
140 |
3576
|
aaronmk
|
Map each table's columns:
|
141 |
4125
|
aaronmk
|
In each <table>/ subdir, for each "via map" map.csv:
|
142 |
3576
|
aaronmk
|
Open the map in a spreadsheet editor
|
143 |
4125
|
aaronmk
|
Open the "core map" /mappings/Veg+-VegBIEN.csv
|
144 |
3576
|
aaronmk
|
In each row of the via map, set the right column to a value from the
|
145 |
|
|
left column of the core map
|
146 |
|
|
Save
|
147 |
4219
|
aaronmk
|
Regenerate the derived maps: make inputs/<datasrc>/
|
148 |
3593
|
aaronmk
|
Accept the test cases:
|
149 |
4219
|
aaronmk
|
make inputs/<datasrc>/test
|
150 |
3593
|
aaronmk
|
When prompted to "Accept new test output", enter y and press ENTER
|
151 |
3690
|
aaronmk
|
If you instead get errors, do one of the following for each one:
|
152 |
|
|
- If the error was due to a bug, fix it
|
153 |
|
|
- Add a SQL function that filters or transforms the invalid data
|
154 |
|
|
- Make an empty mapping for the columns that produced the error.
|
155 |
|
|
Put something in the Comments column of the map spreadsheet to
|
156 |
|
|
prevent the automatic mapper from auto-removing the mapping.
|
157 |
3783
|
aaronmk
|
When accepting tests, it's helpful to use WinMerge
|
158 |
|
|
(see WinMerge setup below for configuration)
|
159 |
4476
|
aaronmk
|
make inputs/<datasrc>/test by_col=1
|
160 |
|
|
If you get errors this time, this always indicates a bug, usually in
|
161 |
|
|
either the unique constraints or column-based import itself
|
162 |
5881
|
aaronmk
|
Add newly-created files: make inputs/<datasrc>/add
|
163 |
4219
|
aaronmk
|
Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
|
164 |
3585
|
aaronmk
|
Update vegbiendev:
|
165 |
|
|
On vegbiendev: svn up
|
166 |
|
|
On local machine: make inputs/upload
|
167 |
4291
|
aaronmk
|
On vegbiendev:
|
168 |
|
|
Follow the steps under Install the staging tables above
|
169 |
|
|
make inputs/<datasrc>/test
|
170 |
1773
|
aaronmk
|
|
171 |
702
|
aaronmk
|
Schema changes:
|
172 |
5227
|
aaronmk
|
Remember to update the following files with any renamings:
|
173 |
|
|
schemas/filter_ERD.csv
|
174 |
|
|
mappings/VegCore-VegBIEN.csv
|
175 |
6058
|
aaronmk
|
mappings/verify.*.sql
|
176 |
702
|
aaronmk
|
Regenerate schema from installed DB: make schemas/remake
|
177 |
1967
|
aaronmk
|
Reinstall DB from schema: make schemas/reinstall
|
178 |
3370
|
aaronmk
|
WARNING: This will delete the current public schema of your VegBIEN DB!
|
179 |
3589
|
aaronmk
|
Reinstall staging tables: . bin/reinstall_all
|
180 |
702
|
aaronmk
|
Sync ERD with vegbien.sql schema:
|
181 |
|
|
Run make schemas/vegbien.my.sql
|
182 |
|
|
Open schemas/vegbien.ERD.mwb in MySQLWorkbench
|
183 |
|
|
Go to File > Export > Synchronize With SQL CREATE Script...
|
184 |
|
|
For Input File, select schemas/vegbien.my.sql
|
185 |
|
|
Click Continue
|
186 |
|
|
Click in the changes list and press Ctrl+A or Apple+A to select all
|
187 |
|
|
Click Update Model
|
188 |
|
|
Click Continue
|
189 |
|
|
Note: The generated SQL script will be empty because we are syncing in
|
190 |
|
|
the opposite direction
|
191 |
|
|
Click Execute
|
192 |
|
|
Reposition any lines that have been reset
|
193 |
|
|
Add any new tables by dragging them from the Catalog in the left sidebar
|
194 |
|
|
to the diagram
|
195 |
|
|
Remove any deleted tables by right-clicking the table's diagram element,
|
196 |
|
|
selecting Delete '<table name>', and clicking Delete
|
197 |
|
|
Save
|
198 |
1774
|
aaronmk
|
If desired, update the graphical ERD exports (see below)
|
199 |
|
|
Update graphical ERD exports:
|
200 |
702
|
aaronmk
|
Go to File > Export > Export as PNG...
|
201 |
1774
|
aaronmk
|
Select schemas/vegbien.ERD.png and click Save
|
202 |
702
|
aaronmk
|
Go to File > Export > Export as SVG...
|
203 |
1774
|
aaronmk
|
Select schemas/vegbien.ERD.svg and click Save
|
204 |
702
|
aaronmk
|
Go to File > Export > Export as Single Page PDF...
|
205 |
4087
|
aaronmk
|
Select schemas/vegbien.ERD.1_pg.pdf and click Save
|
206 |
1774
|
aaronmk
|
Go to File > Print...
|
207 |
|
|
In the lower left corner, click PDF > Save as PDF...
|
208 |
|
|
Set the Title and Author to ""
|
209 |
4087
|
aaronmk
|
Select schemas/vegbien.ERD.pdf and click Save
|
210 |
5226
|
aaronmk
|
Refactoring tips:
|
211 |
|
|
To rename a table:
|
212 |
|
|
In vegbien.sql, do the following:
|
213 |
|
|
Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
|
214 |
|
|
This is necessary because the table name is *everywhere*
|
215 |
|
|
Search for <new>
|
216 |
|
|
Manually change back any replacements inside comments
|
217 |
|
|
To rename a column:
|
218 |
|
|
Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
|
219 |
|
|
Recreate any foreign key for the column, removing CONSTRAINT <name>
|
220 |
|
|
This resets the foreign key name using the new column name
|
221 |
203
|
aaronmk
|
|
222 |
1459
|
aaronmk
|
Testing:
|
223 |
|
|
Mapping process: make test
|
224 |
4292
|
aaronmk
|
Including column-based import: make test by_col=1
|
225 |
4985
|
aaronmk
|
If the row-based and column-based imports produce different inserted
|
226 |
|
|
row counts, this usually means that a table is underconstrained
|
227 |
|
|
(the unique indexes don't cover all possible rows).
|
228 |
|
|
This can occur if you didn't use COALESCE(field, null_value) around
|
229 |
|
|
a nullable field in a unique index. See sql_gen.null_sentinels for
|
230 |
|
|
the appropriate null value to use.
|
231 |
1459
|
aaronmk
|
Map spreadsheet generation: make remake
|
232 |
1744
|
aaronmk
|
Missing mappings: make missing_mappings
|
233 |
1459
|
aaronmk
|
Everything (for most complete coverage): make test-all
|
234 |
702
|
aaronmk
|
|
235 |
3783
|
aaronmk
|
WinMerge setup:
|
236 |
|
|
Install WinMerge from <http://winmerge.org/>
|
237 |
3785
|
aaronmk
|
Open WinMerge
|
238 |
|
|
Go to Edit > Options and click Compare in the left sidebar
|
239 |
3783
|
aaronmk
|
Enable "Moved block detection", as described at
|
240 |
|
|
<http://manual.winmerge.org/Configuration.html#d0e5892>.
|
241 |
3784
|
aaronmk
|
Set Whitespace to Ignore change, as described at
|
242 |
|
|
<http://manual.winmerge.org/Configuration.html#d0e5758>.
|
243 |
3783
|
aaronmk
|
|
244 |
3133
|
aaronmk
|
Documentation:
|
245 |
|
|
To generate a Redmine-formatted list of steps for column-based import:
|
246 |
6291
|
aaronmk
|
make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
|
247 |
5210
|
aaronmk
|
To import and scrub just the test taxonomic names:
|
248 |
5415
|
aaronmk
|
inputs/test_taxonomic_names/test_scrub
|
249 |
3133
|
aaronmk
|
|
250 |
702
|
aaronmk
|
General:
|
251 |
|
|
To see a program's description, read its top-of-file comment
|
252 |
|
|
To see a program's usage, run it without arguments
|
253 |
3389
|
aaronmk
|
To remake a directory: make <dir>/remake
|
254 |
|
|
To remake a file: make <file>-remake
|