1 |
702
|
aaronmk
|
Installation:
|
2 |
14747
|
aaronmk
|
open a terminal window
|
3 |
13764
|
aaronmk
|
Check out svn:
|
4 |
14742
|
aaronmk
|
sudo apt-get --yes install subversion # not preinstalled on Ubuntu
|
5 |
13764
|
aaronmk
|
svn co https://code.nceas.ucsb.edu/code/projects/bien/trunk bien
|
6 |
8458
|
aaronmk
|
cd bien/
|
7 |
14699
|
aaronmk
|
Install:
|
8 |
12226
|
aaronmk
|
**WARNING**: This will delete the public schema of your VegBIEN DB!
|
9 |
14902
|
aaronmk
|
# to install a complete DB with all the datasources:
|
10 |
|
|
$ make install all=1
|
11 |
|
|
# or, to install a blank DB:
|
12 |
|
|
$ make install
|
13 |
14743
|
aaronmk
|
# at "reload PATH" (if displayed), do what it says
|
14 |
14746
|
aaronmk
|
# at "Are you sure you want to continue connecting", type "yes" and
|
15 |
|
|
press Enter
|
16 |
14756
|
aaronmk
|
# at "aaronmk@jupiter's password", enter the applicable password
|
17 |
14699
|
aaronmk
|
# at "[sudo] password for user", enter your password and press Enter
|
18 |
|
|
# at "Modifying postgresql.conf and pg_hba.conf", type y and press Enter
|
19 |
|
|
# at "kernel.shmmax [...] Press ENTER to continue":
|
20 |
|
|
# open a new window
|
21 |
|
|
# run what it says
|
22 |
|
|
# press Ctrl-D
|
23 |
|
|
# return to the previous window
|
24 |
|
|
# press Enter
|
25 |
|
|
# at "restart PostgreSQL manually ... Press ENTER to continue":
|
26 |
|
|
# open a new window
|
27 |
|
|
# run what it says
|
28 |
|
|
# press Ctrl-D
|
29 |
|
|
# return to the previous window
|
30 |
|
|
# press Enter
|
31 |
|
|
# at "This will delete the current public schema of your VegBIEN DB",
|
32 |
|
|
type y and press Enter
|
33 |
14757
|
aaronmk
|
# at "If asked for MySQL root password", copy the password to the
|
34 |
|
|
clipboard and press Enter
|
35 |
|
|
# at "Web server to reconfigure automatically", select apache2 and click
|
36 |
|
|
Ok
|
37 |
|
|
# at "Configure database for phpmyadmin with dbconfig-common?", click
|
38 |
|
|
Yes
|
39 |
|
|
# at "Password of the database's administrative user", paste the
|
40 |
|
|
password and click Ok
|
41 |
|
|
# at "MySQL application password for phpmyadmin", just click Ok
|
42 |
|
|
# at "An error occurred while installing the database", click Ok
|
43 |
|
|
# at "Next step for database installation", select ignore and click Ok
|
44 |
14758
|
aaronmk
|
# at "aaronmk@jupiter's password", enter the applicable password
|
45 |
8458
|
aaronmk
|
Uninstall: make uninstall
|
46 |
12226
|
aaronmk
|
**WARNING**: This will delete your entire VegBIEN DB!
|
47 |
8458
|
aaronmk
|
This includes all archived imports and staging tables.
|
48 |
554
|
aaronmk
|
|
49 |
11515
|
aaronmk
|
Connecting to vegbiendev:
|
50 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
51 |
13763
|
aaronmk
|
cd /home/bien # should happen automatically at login
|
52 |
11515
|
aaronmk
|
|
53 |
14651
|
aaronmk
|
Single datasource refresh:
|
54 |
14653
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
55 |
14651
|
aaronmk
|
# -> Maintenance > to back up the vegbiendev databases
|
56 |
14787
|
aaronmk
|
# place updated extract in inputs/$datasrc/_src/
|
57 |
14654
|
aaronmk
|
# place extracted flat file(s) in the appropriate table subdirs
|
58 |
14651
|
aaronmk
|
rm=1 inputs/<datasrc>/run # reload staging tables
|
59 |
|
|
make inputs/<datasrc>/reimport_scrub by_col=1 &
|
60 |
|
|
# this works whether or not datasource is already imported
|
61 |
14652
|
aaronmk
|
tail -150 inputs/<datasrc>/*/logs/public.log.sql # view progress
|
62 |
14651
|
aaronmk
|
# -> Full database import > To re-run geoscrubbing
|
63 |
|
|
# -> Full database import > To remake analytical DB
|
64 |
14791
|
aaronmk
|
# -> Full database import > To back up DB
|
65 |
14651
|
aaronmk
|
# -> Maintenance > to back up the vegbiendev databases
|
66 |
|
|
|
67 |
14788
|
aaronmk
|
datasource removal:
|
68 |
|
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
69 |
14868
|
aaronmk
|
$ make inputs/$datasrc/rm # runtime: see
|
70 |
|
|
# http://vegpath.org/wiki/Individual_datasource_refresh#datasource-removal-runtimes
|
71 |
14788
|
aaronmk
|
|
72 |
13024
|
aaronmk
|
Notes on system stability:
|
73 |
14091
|
aaronmk
|
**WARNING**: when shutting down the VM, always first stop Postgres:
|
74 |
|
|
sudo service postgresql stop
|
75 |
|
|
this prevents the OS from SIGKILLing Postgres, which sometimes causes
|
76 |
|
|
database corruption
|
77 |
13024
|
aaronmk
|
|
78 |
12011
|
aaronmk
|
Notes on running programs:
|
79 |
|
|
**WARNING**: always start with a clean shell, to avoid spurious bugs. the
|
80 |
|
|
shell should not have changes to the env vars. (there have been bugs
|
81 |
|
|
that went away after closing and reopening the terminal window.) note
|
82 |
|
|
that running `exec bash` is not sufficient to *reset* the env vars.
|
83 |
|
|
|
84 |
11967
|
aaronmk
|
Notes on editing files:
|
85 |
|
|
**WARNING**: shell scripts should always be read-only, so that editing them
|
86 |
|
|
while an import is in progress will not crash the import (see
|
87 |
|
|
http://vegpath.org/links/#**%20modifying%20a%20running%20shell%20script)
|
88 |
|
|
|
89 |
7287
|
aaronmk
|
Full database import:
|
90 |
12226
|
aaronmk
|
**WARNING**: You must perform *every single* step listed below, to avoid
|
91 |
9499
|
aaronmk
|
breaking column-based import
|
92 |
12011
|
aaronmk
|
**WARNING**: always start with a clean shell, as described above under
|
93 |
|
|
"Notes on running programs"
|
94 |
13021
|
aaronmk
|
**IMPORTANT**: the beginning of the import should be scheduled at a time
|
95 |
|
|
when the DB will not be needed for other uses. this is necessary because
|
96 |
|
|
vegbiendev will be slow for the first few hours of the import, due to
|
97 |
|
|
the import using all the available cores.
|
98 |
13000
|
aaronmk
|
do steps under Maintenance > "to synchronize vegbiendev, jupiter, and
|
99 |
|
|
your local machine"
|
100 |
8458
|
aaronmk
|
On local machine:
|
101 |
|
|
make inputs/upload
|
102 |
10025
|
aaronmk
|
make inputs/upload live=1
|
103 |
14077
|
aaronmk
|
make test by_col=1 # runtime: 1 h ("53m7.383s") @starscream
|
104 |
10549
|
aaronmk
|
if you encounter errors, they are most likely related to the
|
105 |
|
|
PostgreSQL error parsing in /lib/sql.py parse_exception()
|
106 |
8458
|
aaronmk
|
See note under Testing below
|
107 |
14887
|
aaronmk
|
-> Maintenance > to back up the vegbiendev databases > back up Postgres
|
108 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
109 |
8458
|
aaronmk
|
Ensure there are no local modifications: svn st
|
110 |
12998
|
aaronmk
|
up
|
111 |
8458
|
aaronmk
|
make inputs/download
|
112 |
10025
|
aaronmk
|
make inputs/download live=1
|
113 |
8458
|
aaronmk
|
For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
|
114 |
|
|
Update the auxiliary schemas: make schemas/reinstall
|
115 |
12226
|
aaronmk
|
**WARNING**: requires sudo access!
|
116 |
8458
|
aaronmk
|
The public schema will be installed separately by the import process
|
117 |
|
|
Delete imports before the last so they won't bloat the full DB backup:
|
118 |
|
|
make backups/vegbien.<version>.backup/remove
|
119 |
|
|
To keep a previous import other than the public schema:
|
120 |
|
|
export dump_opts='--exclude-schema=public --exclude-schema=<version>'
|
121 |
13009
|
aaronmk
|
# env var will be inherited by `screen` shell
|
122 |
13016
|
aaronmk
|
restart Postgres to free up any disk space used by temp tables from the last
|
123 |
|
|
import (this is apparently not automatically reclaimed):
|
124 |
|
|
make postgres_restart
|
125 |
13022
|
aaronmk
|
Make sure there is at least 1 TB of disk space on /: df -h
|
126 |
|
|
although the import schema itself is only 315 GB, Postgres uses
|
127 |
13023
|
aaronmk
|
significant temporary space at the beginning of the import.
|
128 |
13028
|
aaronmk
|
the total disk usage oscillates between 1.2 TB and the entire disk
|
129 |
|
|
for the first day (for import started @12:55:09, high-water marks of
|
130 |
13031
|
aaronmk
|
1.7 TB @14:00:25, 1.8 TB @15:38:32; then next day w/ 2 datasources
|
131 |
|
|
running: entire disk for 4 min @05:35:44, 1.8 TB @11:15:05).
|
132 |
8458
|
aaronmk
|
To free up space, remove backups that have been archived on jupiter:
|
133 |
|
|
List backups/ to view older backups
|
134 |
|
|
Check their MD5 sums using the steps under On jupiter below
|
135 |
|
|
Remove these backups
|
136 |
14895
|
aaronmk
|
unlock hardlinked files:
|
137 |
|
|
$ chmod ug+w inputs/{.[^as.],}*/*/{map.csv,new_terms.csv,unmapped_terms.csv}
|
138 |
13423
|
aaronmk
|
for full import:
|
139 |
|
|
screen
|
140 |
|
|
Press ENTER
|
141 |
14893
|
aaronmk
|
unset TMOUT # TMOUT causes shell to exit even with background processes
|
142 |
14087
|
aaronmk
|
$0 # nested shell to prevent errexit from closing the window
|
143 |
13422
|
aaronmk
|
the following must happen within screen to avoid affecting the outer shell:
|
144 |
13428
|
aaronmk
|
unset TMOUT # TMOUT causes shell to exit even with background processes
|
145 |
|
|
set -o ignoreeof # prevent Ctrl+D from exiting shell to keep attached jobs
|
146 |
13426
|
aaronmk
|
on local machine:
|
147 |
|
|
unset n # clear any limit set in .profile (unless desired)
|
148 |
|
|
unset log # allow logging output to go to log files
|
149 |
13424
|
aaronmk
|
unset version # clear any version from last import, etc.
|
150 |
|
|
if no commits have been made since the last import (eg. if retrying an
|
151 |
|
|
import), set a custom version that differs from the auto-assigned one
|
152 |
|
|
(would otherwise cause a collision with the last import):
|
153 |
|
|
svn info
|
154 |
|
|
extract the svn revision after "Revision:"
|
155 |
|
|
export version=r[revision]_2 # +suffix to distinguish from last import
|
156 |
|
|
# env var will be inherited by `screen` shell
|
157 |
13119
|
aaronmk
|
to import just a subset of the datasources:
|
158 |
13427
|
aaronmk
|
declare -ax inputs; inputs=(inputs/{src,...}/) # no () in declare on Mac
|
159 |
13119
|
aaronmk
|
# array vars *not* inherited by `screen` shell
|
160 |
|
|
export version=custom_import_name
|
161 |
10579
|
aaronmk
|
Start column-based import: . bin/import_all
|
162 |
|
|
To use row-based import: . bin/import_all by_col=
|
163 |
8458
|
aaronmk
|
To stop all running imports: . bin/stop_imports
|
164 |
12226
|
aaronmk
|
**WARNING**: Do NOT run import_all in the background, or the jobs it
|
165 |
|
|
creates won't be owned by your shell.
|
166 |
8458
|
aaronmk
|
Note that import_all will take up to an hour to import the NCBI backbone
|
167 |
|
|
and other metadata before returning control to the shell.
|
168 |
12026
|
aaronmk
|
To view progress:
|
169 |
14882
|
aaronmk
|
tail inputs/{.[^as.],}*/*/logs/$version.log.sql
|
170 |
13020
|
aaronmk
|
note: at the beginning of the import, the system may send out CPU load
|
171 |
|
|
warning e-mails. these can safely be ignored. (they happen because the
|
172 |
|
|
parallel imports use all the available cores.)
|
173 |
13425
|
aaronmk
|
for test import, turn off DB backup (also turns off analytical DB creation):
|
174 |
13429
|
aaronmk
|
kill % # cancel after_import()
|
175 |
10850
|
aaronmk
|
Wait (4 days) for the import to finish
|
176 |
14197
|
aaronmk
|
**WARNING**: do *not* run backups/pg_snapshot while the import is running,
|
177 |
|
|
due to continuously-changing files
|
178 |
|
|
**WARNING**: do *not* run backups/pg_snapshot until the previous import has
|
179 |
|
|
been replaced, to avoid running into disk space limits
|
180 |
8458
|
aaronmk
|
To recover from a closed terminal window: screen -r
|
181 |
10583
|
aaronmk
|
To restart an aborted import for a specific table:
|
182 |
|
|
export version=<version>
|
183 |
11800
|
aaronmk
|
(set -o errexit; make inputs/<datasrc>/<table>/import_scrub by_col=1 continue=1; make inputs/<datasrc>/publish) &
|
184 |
10588
|
aaronmk
|
bin/after_import $! & # $! can also be obtained from `jobs -l`
|
185 |
8458
|
aaronmk
|
Get $version: echo $version
|
186 |
|
|
Set $version in all vegbiendev terminals: export version=<version>
|
187 |
13017
|
aaronmk
|
When there are no more running jobs, exit `screen`: exit # not Ctrl+D
|
188 |
13025
|
aaronmk
|
upload logs: make inputs/upload live=1
|
189 |
10025
|
aaronmk
|
On local machine: make inputs/download-logs live=1
|
190 |
13030
|
aaronmk
|
check for disk space errors:
|
191 |
14882
|
aaronmk
|
grep --files-with-matches -F 'No space left on device' inputs/{.[^as.],}*/*/logs/$version.log.sql
|
192 |
13030
|
aaronmk
|
if there are any matches:
|
193 |
|
|
manually reimport these datasources using the steps under
|
194 |
|
|
Single datasource import
|
195 |
|
|
bin/after_import &
|
196 |
|
|
wait for the import to finish
|
197 |
14882
|
aaronmk
|
tail inputs/{.[^as.],}*/*/logs/$version.log.sql
|
198 |
13029
|
aaronmk
|
In the output, search for "Command exited with non-zero status"
|
199 |
|
|
For inputs that have this, fix the associated bug(s)
|
200 |
|
|
If many inputs have errors, discard the current (partial) import:
|
201 |
|
|
make schemas/$version/uninstall
|
202 |
|
|
Otherwise, continue
|
203 |
8458
|
aaronmk
|
In PostgreSQL:
|
204 |
11568
|
aaronmk
|
Go to wiki.vegpath.org/VegBIEN_contents
|
205 |
11728
|
aaronmk
|
Get the # observations
|
206 |
|
|
Get the # datasources
|
207 |
|
|
Get the # datasources with observations
|
208 |
11892
|
aaronmk
|
in the r# schema:
|
209 |
11569
|
aaronmk
|
Check that analytical_stem contains [# observations] rows
|
210 |
12148
|
aaronmk
|
Check that source contains [# datasources] rows up through XAL. If this
|
211 |
|
|
is not the case, manually check the entries in source against the
|
212 |
|
|
datasources list on the wiki page (some datasources may be near the
|
213 |
|
|
end depending on import order).
|
214 |
11568
|
aaronmk
|
Check that provider_count contains [# datasources with observations]
|
215 |
|
|
rows with dataset="(total)" (at the top when the table is unsorted)
|
216 |
9492
|
aaronmk
|
Check that TNRS ran successfully:
|
217 |
|
|
tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
|
218 |
|
|
If the log ends in an AssertionError
|
219 |
|
|
"assert sql.table_col_names(db, table) == header":
|
220 |
|
|
Figure out which TNRS CSV columns have changed
|
221 |
|
|
On local machine:
|
222 |
10784
|
aaronmk
|
Make the changes in the DB's TNRS and public schemas
|
223 |
|
|
rm=1 inputs/.TNRS/schema.sql.run export_
|
224 |
9492
|
aaronmk
|
make schemas/remake
|
225 |
10785
|
aaronmk
|
inputs/test_taxonomic_names/test_scrub # re-run TNRS
|
226 |
10784
|
aaronmk
|
rm=1 inputs/.TNRS/data.sql.run export_
|
227 |
9492
|
aaronmk
|
Commit
|
228 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
229 |
9492
|
aaronmk
|
If dropping a column, save the dependent views
|
230 |
|
|
Make the same changes in the live TNRS.tnrs table on vegbiendev
|
231 |
|
|
If dropping a column, recreate the dependent views
|
232 |
|
|
Restart the TNRS client: make scrub by_col=1 &
|
233 |
9498
|
aaronmk
|
Publish the new import:
|
234 |
12226
|
aaronmk
|
**WARNING**: Before proceeding, be sure you have done *every single*
|
235 |
9498
|
aaronmk
|
verification step listed above. Otherwise, a previous valid import
|
236 |
|
|
could incorrectly be overwritten with a broken one.
|
237 |
10864
|
aaronmk
|
make schemas/$version/publish # runtime: 1 min ("real 1m10.451s")
|
238 |
8458
|
aaronmk
|
unset version
|
239 |
10027
|
aaronmk
|
make backups/upload live=1
|
240 |
11897
|
aaronmk
|
on local machine:
|
241 |
|
|
make backups/vegbien.$version.backup/download live=1
|
242 |
|
|
# download backup to local machine
|
243 |
12396
|
aaronmk
|
ssh aaronmk@jupiter.nceas.ucsb.edu
|
244 |
8458
|
aaronmk
|
cd /data/dev/aaronmk/bien/backups
|
245 |
|
|
For each newly-archived backup:
|
246 |
|
|
make -s <backup>.md5/test
|
247 |
|
|
Check that "OK" is printed next to the filename
|
248 |
|
|
If desired, record the import times in inputs/import.stats.xls:
|
249 |
11573
|
aaronmk
|
On local machine:
|
250 |
8458
|
aaronmk
|
Open inputs/import.stats.xls
|
251 |
14878
|
aaronmk
|
click the "current" tab
|
252 |
8458
|
aaronmk
|
If the rightmost import is within 5 columns of column IV:
|
253 |
|
|
Copy the current tab to <leftmost-date>~<rightmost-date>
|
254 |
|
|
Remove the previous imports from the current tab because they are
|
255 |
|
|
now in the copied tab instead
|
256 |
|
|
Insert a copy of the leftmost "By column" column group before it
|
257 |
|
|
export version=<version>
|
258 |
14882
|
aaronmk
|
bin/import_date inputs/{.[^as.],}*/*/logs/$version.log.sql
|
259 |
8458
|
aaronmk
|
Update the import date in the upper-right corner
|
260 |
14882
|
aaronmk
|
bin/import_times inputs/{.[^as.],}*/*/logs/$version.log.sql
|
261 |
8458
|
aaronmk
|
Paste the output over the # Rows/Time columns, making sure that the
|
262 |
|
|
row counts match up with the previous import's row counts
|
263 |
|
|
If the row counts do not match up, insert or reorder rows as needed
|
264 |
|
|
until they do. Get the datasource names from the log file footers:
|
265 |
14882
|
aaronmk
|
tail inputs/{.[^as.],}*/*/logs/$version.log.sql
|
266 |
14883
|
aaronmk
|
update the Postprocessing times:
|
267 |
|
|
analytical DB remake time:
|
268 |
|
|
from the end of inputs/analytical_db/logs/make_analytical_db.log.sql,
|
269 |
|
|
search upwards for "_individual_view_modify" followed by a
|
270 |
|
|
line of -'s
|
271 |
|
|
enter as: =[ms]/1000/3600/24
|
272 |
11573
|
aaronmk
|
Commit: svn ci -m 'inputs/import.stats.xls: updated import times'
|
273 |
10885
|
aaronmk
|
Running individual steps separately:
|
274 |
9497
|
aaronmk
|
To run TNRS:
|
275 |
9996
|
aaronmk
|
To use an import other than public: export version=<version>
|
276 |
13594
|
aaronmk
|
to rescrub all names:
|
277 |
|
|
make inputs/.TNRS/reinstall
|
278 |
|
|
re-create public-schema views that were cascadingly deleted
|
279 |
9995
|
aaronmk
|
make scrub &
|
280 |
8458
|
aaronmk
|
To view progress:
|
281 |
|
|
tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
|
282 |
14447
|
aaronmk
|
To re-run geoscrubbing:
|
283 |
|
|
$ screen
|
284 |
14871
|
aaronmk
|
# press Enter
|
285 |
14802
|
aaronmk
|
$ unset TMOUT # TMOUT causes shell to exit even with background processes
|
286 |
14530
|
aaronmk
|
# to use an import other than public: $ export version=<version>
|
287 |
|
|
$ bin/psql_verbose_vegbien <<<'SELECT geoscrub_input_view_modify();' &
|
288 |
14877
|
aaronmk
|
# runtime: 6 min ("6:02.30") @r14827 @vegbiendev
|
289 |
14531
|
aaronmk
|
# wait until done
|
290 |
14536
|
aaronmk
|
$ rm=1 exports/geoscrub_input.csv.run
|
291 |
14877
|
aaronmk
|
# runtime: 1 min ("1m2.962s") @r14827 @vegbiendev
|
292 |
14792
|
aaronmk
|
$ $0 # subshell to avoid closing screen on errexit
|
293 |
14447
|
aaronmk
|
$ rm=1 inputs/.geoscrub/geoscrub_output/geoscrub.csv.run &
|
294 |
14877
|
aaronmk
|
# runtime: 1.5 h ("84m55.408s") @r14827 @vegbiendev
|
295 |
14447
|
aaronmk
|
# wait until done
|
296 |
14798
|
aaronmk
|
$ rm=1 inputs/.geoscrub/geoscrub_output/run &
|
297 |
14877
|
aaronmk
|
# runtime: 12 min ("11m35.693s") @r14827 @vegbiendev
|
298 |
14447
|
aaronmk
|
# wait until done
|
299 |
14824
|
aaronmk
|
# re-create public-schema views that were cascadingly deleted (currently
|
300 |
|
|
plot.**, view_full_occurrence_individual_view, geoscrub_input_new)
|
301 |
14447
|
aaronmk
|
# press Ctrl+D
|
302 |
|
|
# remake the analytical DB (below)
|
303 |
9497
|
aaronmk
|
To remake analytical DB:
|
304 |
9996
|
aaronmk
|
To use an import other than public: export version=<version>
|
305 |
11089
|
aaronmk
|
bin/make_analytical_db & # runtime: 13 h ("12:43:57elapsed")
|
306 |
8458
|
aaronmk
|
To view progress:
|
307 |
10600
|
aaronmk
|
tail -150 inputs/analytical_db/logs/make_analytical_db.log.sql
|
308 |
8458
|
aaronmk
|
To back up DB (staging tables and last import):
|
309 |
10578
|
aaronmk
|
To use an import *other than public*: export version=<version>
|
310 |
10743
|
aaronmk
|
make backups/TNRS.backup-remake &
|
311 |
10577
|
aaronmk
|
dump_opts=--exclude-schema=public make backups/vegbien.$version.backup/test &
|
312 |
10578
|
aaronmk
|
If after renaming to public, instead set dump_opts='' and replace
|
313 |
|
|
$version with the appropriate revision
|
314 |
10744
|
aaronmk
|
make backups/upload live=1
|
315 |
3381
|
aaronmk
|
|
316 |
1773
|
aaronmk
|
Datasource setup:
|
317 |
11516
|
aaronmk
|
On local machine:
|
318 |
11090
|
aaronmk
|
Example steps for a datasource: wiki.vegpath.org/Import_process_for_Madidi
|
319 |
8469
|
aaronmk
|
umask ug=rwx,o= # prevent files from becoming web-accessible
|
320 |
8458
|
aaronmk
|
Add a new datasource: make inputs/<datasrc>/add
|
321 |
|
|
<datasrc> may not contain spaces, and should be abbreviated.
|
322 |
|
|
If the datasource is a herbarium, <datasrc> should be the herbarium code
|
323 |
|
|
as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
|
324 |
11018
|
aaronmk
|
For a new-style datasource (one containing a ./run runscript):
|
325 |
11019
|
aaronmk
|
"cp" -f inputs/.NCBI/{Makefile,run,table.run} inputs/<datasrc>/
|
326 |
8458
|
aaronmk
|
For MySQL inputs (exports and live DB connections):
|
327 |
|
|
For .sql exports:
|
328 |
|
|
Place the original .sql file in _src/ (*not* in _MySQL/)
|
329 |
|
|
Follow the steps starting with Install the staging tables below.
|
330 |
|
|
This is for an initial sync to get the file onto vegbiendev.
|
331 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
332 |
8458
|
aaronmk
|
Create a database for the MySQL export in phpMyAdmin
|
333 |
9494
|
aaronmk
|
Give the bien user all database-specific privileges *except*
|
334 |
|
|
UPDATE, DELETE, ALTER, DROP. This prevents bugs in the
|
335 |
|
|
import scripts from accidentally deleting data.
|
336 |
8458
|
aaronmk
|
bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
|
337 |
|
|
mkdir inputs/<datasrc>/_MySQL/
|
338 |
|
|
cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
|
339 |
|
|
Edit _MySQL/*.make for the DB connection
|
340 |
|
|
For a .sql export, use server=vegbiendev and --user=bien
|
341 |
|
|
Skip the Add input data for each table section
|
342 |
|
|
For MS Access databases:
|
343 |
|
|
Place the .mdb or .accdb file in _src/
|
344 |
14661
|
aaronmk
|
Download and install Bullzip's MS Access to PostgreSQL from
|
345 |
|
|
http://bullzip.com/download.php > Access To PostgreSQL > Download
|
346 |
8458
|
aaronmk
|
Use Access To PostgreSQL to export the database:
|
347 |
|
|
Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
|
348 |
14662
|
aaronmk
|
using the settings in the associated .ini file where available
|
349 |
|
|
Export just the data to inputs/<datasrc>/<file>.data.sql using the
|
350 |
|
|
settings in the associated .ini file where available
|
351 |
8458
|
aaronmk
|
In <file>.schema.sql, make the following changes:
|
352 |
14813
|
aaronmk
|
Replace text "^CREATE DATABASE .*?;$" with "/*$0*/"
|
353 |
8458
|
aaronmk
|
Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
|
354 |
|
|
Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
|
355 |
|
|
Skip the Add input data for each table section
|
356 |
|
|
Add input data for each table present in the datasource:
|
357 |
|
|
For .sql exports, you must use the name of the table in the DB export
|
358 |
|
|
For CSV files, you can use any name. It's recommended to use a table
|
359 |
|
|
name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
|
360 |
|
|
Note that if this table will be joined together with another table, its
|
361 |
|
|
name must end in ".src"
|
362 |
|
|
make inputs/<datasrc>/<table>/add
|
363 |
|
|
Important: DO NOT just create an empty directory named <table>!
|
364 |
|
|
This command also creates necessary subdirs, such as logs/.
|
365 |
|
|
If the table is in a .sql export: make inputs/<datasrc>/<table>/install
|
366 |
|
|
Otherwise, place the CSV(s) for the table in
|
367 |
|
|
inputs/<datasrc>/<table>/ OR place a query joining other tables
|
368 |
|
|
together in inputs/<datasrc>/<table>/create.sql
|
369 |
|
|
Important: When exporting relational databases to CSVs, you MUST ensure
|
370 |
|
|
that embedded quotes are escaped by doubling them, *not* by
|
371 |
|
|
preceding them with a "\" as is the default in phpMyAdmin
|
372 |
|
|
If there are multiple part files for a table, and the header is repeated
|
373 |
|
|
in each part, make sure each header is EXACTLY the same.
|
374 |
8466
|
aaronmk
|
(If the headers are not the same, the CSV concatenation script
|
375 |
|
|
assumes the part files don't have individual headers and treats the
|
376 |
|
|
subsequent headers as data rows.)
|
377 |
8458
|
aaronmk
|
Add <table> to inputs/<datasrc>/import_order.txt before other tables
|
378 |
|
|
that depend on it
|
379 |
11018
|
aaronmk
|
For a new-style datasource:
|
380 |
|
|
"cp" -f inputs/.NCBI/nodes/run inputs/<datasrc>/<table>/
|
381 |
|
|
inputs/<datasrc>/<table>/run
|
382 |
8458
|
aaronmk
|
Install the staging tables:
|
383 |
|
|
make inputs/<datasrc>/reinstall quiet=1 &
|
384 |
|
|
For a MySQL .sql export:
|
385 |
|
|
At prompt "[you]@vegbiendev's password:", enter your password
|
386 |
|
|
At prompt "Enter password:", enter the value in config/bien_password
|
387 |
|
|
To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
|
388 |
|
|
View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
|
389 |
|
|
tail provides a header line with the filename
|
390 |
|
|
+1 starts at the first line, to show the whole file
|
391 |
|
|
For every file with an error 'column "..." specified more than once':
|
392 |
|
|
Add a header override file "+header.<ext>" in <table>/:
|
393 |
|
|
Note: The leading "+" should sort it before the flat files.
|
394 |
|
|
"_" unfortunately sorts *after* capital letters in ASCII.
|
395 |
|
|
Create a text file containing the header line of the flat files
|
396 |
|
|
Add an ! at the beginning of the line
|
397 |
|
|
This signals cat_csv that this is a header override.
|
398 |
|
|
For empty names, use their 0-based column # (by convention)
|
399 |
|
|
For duplicate names, add a distinguishing suffix
|
400 |
|
|
For long names that collided, rename them to <= 63 chars long
|
401 |
|
|
Do NOT make readability changes in this step; that is what the
|
402 |
|
|
map spreadsheets (below) are for.
|
403 |
|
|
Save
|
404 |
|
|
If you made any changes, re-run the install command above
|
405 |
|
|
Auto-create the map spreadsheets: make inputs/<datasrc>/
|
406 |
|
|
Map each table's columns:
|
407 |
|
|
In each <table>/ subdir, for each "via map" map.csv:
|
408 |
|
|
Open the map in a spreadsheet editor
|
409 |
|
|
Open the "core map" /mappings/Veg+-VegBIEN.csv
|
410 |
|
|
In each row of the via map, set the right column to a value from the
|
411 |
|
|
left column of the core map
|
412 |
|
|
Save
|
413 |
|
|
Regenerate the derived maps: make inputs/<datasrc>/
|
414 |
|
|
Accept the test cases:
|
415 |
11018
|
aaronmk
|
For a new-style datasource:
|
416 |
|
|
inputs/<datasrc>/run
|
417 |
|
|
svn di inputs/<datasrc>/*/test.xml.ref
|
418 |
|
|
If you get errors, follow the steps for old-style datasources below
|
419 |
|
|
For an old-style datasource:
|
420 |
|
|
make inputs/<datasrc>/test
|
421 |
8458
|
aaronmk
|
When prompted to "Accept new test output", enter y and press ENTER
|
422 |
|
|
If you instead get errors, do one of the following for each one:
|
423 |
|
|
- If the error was due to a bug, fix it
|
424 |
|
|
- Add a SQL function that filters or transforms the invalid data
|
425 |
|
|
- Make an empty mapping for the columns that produced the error.
|
426 |
|
|
Put something in the Comments column of the map spreadsheet to
|
427 |
|
|
prevent the automatic mapper from auto-removing the mapping.
|
428 |
|
|
When accepting tests, it's helpful to use WinMerge
|
429 |
|
|
(see WinMerge setup below for configuration)
|
430 |
|
|
make inputs/<datasrc>/test by_col=1
|
431 |
|
|
If you get errors this time, this always indicates a bug, usually in
|
432 |
|
|
the VegBIEN unique constraints or column-based import itself
|
433 |
|
|
Add newly-created files: make inputs/<datasrc>/add
|
434 |
|
|
Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
|
435 |
|
|
Update vegbiendev:
|
436 |
12396
|
aaronmk
|
ssh aaronmk@jupiter.nceas.ucsb.edu
|
437 |
12998
|
aaronmk
|
up
|
438 |
8458
|
aaronmk
|
On local machine:
|
439 |
|
|
./fix_perms
|
440 |
|
|
make inputs/upload
|
441 |
10025
|
aaronmk
|
make inputs/upload live=1
|
442 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
443 |
12998
|
aaronmk
|
up
|
444 |
8458
|
aaronmk
|
make inputs/download
|
445 |
10025
|
aaronmk
|
make inputs/download live=1
|
446 |
8458
|
aaronmk
|
Follow the steps under Install the staging tables above
|
447 |
1773
|
aaronmk
|
|
448 |
10884
|
aaronmk
|
Maintenance:
|
449 |
|
|
on a live machine, you should put the following in your .profile:
|
450 |
|
|
--
|
451 |
|
|
# make svn files web-accessible. this does not affect unversioned files, because
|
452 |
|
|
# these get the right permissions on the local machine instead.
|
453 |
|
|
umask ug=rwx,o=rx
|
454 |
|
|
|
455 |
|
|
unset TMOUT # TMOUT causes screen to exit even with background processes
|
456 |
|
|
--
|
457 |
|
|
if http://vegbiendev.nceas.ucsb.edu/phppgadmin/ goes down:
|
458 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
459 |
12548
|
aaronmk
|
make phppgadmin-Linux
|
460 |
13027
|
aaronmk
|
regularly, re-run full-database import so that bugs in it don't pile up.
|
461 |
|
|
it needs to be kept in working order so that it works when it's needed.
|
462 |
13466
|
aaronmk
|
to back up the vegbiendev databases:
|
463 |
|
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
464 |
|
|
back up MySQL: # usually few changes, so do this first
|
465 |
14554
|
aaronmk
|
backups/mysql_snapshot
|
466 |
13466
|
aaronmk
|
l=1 overwrite=1 inplace=1 local_dir=/ remote_url="$USER@jupiter:/data/dev/aaronmk/Documents/BIEN/" subpath=/var/lib/mysql.bak/ sudo -E env PATH="$PATH" bin/sync_upload
|
467 |
|
|
on local machine:
|
468 |
|
|
l=1 swap=1 overwrite=1 inplace=1 local_dir=~ sync_remote_subdir= subpath=~/Documents/BIEN/var/lib/mysql.bak/ bin/sync_upload
|
469 |
|
|
back up Postgres:
|
470 |
14892
|
aaronmk
|
$ screen
|
471 |
|
|
# press Enter
|
472 |
|
|
$ unset TMOUT # TMOUT causes shell to exit even with background processes
|
473 |
14894
|
aaronmk
|
$ backups/pg_snapshot # runtime when queries have been run: 1 h ("64m11.586s")
|
474 |
10884
|
aaronmk
|
to synchronize vegbiendev, jupiter, and your local machine:
|
475 |
12226
|
aaronmk
|
**WARNING**: pay careful attention to all files that will be deleted or
|
476 |
10884
|
aaronmk
|
overwritten!
|
477 |
|
|
install put if needed:
|
478 |
|
|
download https://uutils.googlecode.com/svn/trunk/bin/put to ~/bin/ and `chmod +x` it
|
479 |
|
|
when changes are made on vegbiendev:
|
480 |
12951
|
aaronmk
|
avoid extraneous diffs when rsyncing:
|
481 |
14670
|
aaronmk
|
on local machine:
|
482 |
|
|
up; ./fix_perms
|
483 |
|
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
484 |
|
|
up; ./fix_perms
|
485 |
|
|
ssh aaronmk@jupiter.nceas.ucsb.edu
|
486 |
|
|
up; ./fix_perms
|
487 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
488 |
12396
|
aaronmk
|
upload:
|
489 |
14888
|
aaronmk
|
del=1 bin/sync_upload --size-only
|
490 |
13337
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
491 |
12396
|
aaronmk
|
on your machine:
|
492 |
|
|
download:
|
493 |
14889
|
aaronmk
|
overwrite=1 swap=1 src=. dest='aaronmk@jupiter.nceas.ucsb.edu:~/bien' put --exclude=.svn web/BIEN3/TWiki; ./fix_perms
|
494 |
13337
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
495 |
12957
|
aaronmk
|
swap=1 bin/sync_upload backups/TNRS.backup
|
496 |
13337
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
497 |
14888
|
aaronmk
|
del=1 swap=1 bin/sync_upload --size-only
|
498 |
13337
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
499 |
14891
|
aaronmk
|
sync_remote_url=~/Dropbox/svn/ bin/sync_upload --existing --size-only --no-perms
|
500 |
14890
|
aaronmk
|
# --size-only: just update mtimes
|
501 |
|
|
# --no-perms: don't transfer the hardlink lock status
|
502 |
14891
|
aaronmk
|
# no overwrite=1: preserve uncommitted changes
|
503 |
13337
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
504 |
12959
|
aaronmk
|
to back up e-mails:
|
505 |
|
|
on local machine:
|
506 |
|
|
/Applications/gmvault-v1.8.1-beta/bin/gmvault sync --multiple-db-owner --type quick aaronmk.nceas@gmail.com
|
507 |
|
|
open Thunderbird
|
508 |
|
|
click the All Mail folder for each account and wait for it to download the e-mails in it
|
509 |
|
|
to back up the version history:
|
510 |
13333
|
aaronmk
|
# back up first on the local machine, because often only the svnsync
|
511 |
|
|
command gets run, and that way it will get backed up immediately to
|
512 |
|
|
Dropbox (and hourly to Time Machine), while vegbiendev only gets
|
513 |
|
|
backed up daily to tape
|
514 |
|
|
on local machine:
|
515 |
13331
|
aaronmk
|
svnsync sync file://"$HOME"/Dropbox/docs/BIEN/svn_repo/ # initial runtime: 1.5 h ("08:21:38" - "06:45:26") @vegbiendev
|
516 |
12959
|
aaronmk
|
(cd ~/Dropbox/docs/BIEN/git/; git svn fetch)
|
517 |
14565
|
aaronmk
|
# use absolute path for vegbiendev commands because the Ubuntu 14.04
|
518 |
|
|
version of rsync doesn't expand ~ properly
|
519 |
13332
|
aaronmk
|
overwrite=1 src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/svn_repo/ # runtime: 1 min ("1:05.08")
|
520 |
13337
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
521 |
13332
|
aaronmk
|
overwrite=1 src=~ dest='aaronmk@jupiter.nceas.ucsb.edu:/data/dev/aaronmk/' put Dropbox/docs/BIEN/git/
|
522 |
13337
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
523 |
14553
|
aaronmk
|
to back up vegbiendev:
|
524 |
14568
|
aaronmk
|
do steps under Maintenance > "to synchronize vegbiendev, jupiter, and
|
525 |
|
|
your local machine"
|
526 |
14553
|
aaronmk
|
on local machine:
|
527 |
14912
|
aaronmk
|
# **IMPORTANT**: can't use inplace=1 optimization because this
|
528 |
|
|
# messes up the shared hardlinks used by
|
529 |
|
|
# ~/Documents/BIEN/vegbiendev.2014-2-2_1-07-32PT*/
|
530 |
|
|
l=1 overwrite=1 src=root@vegbiendev.nceas.ucsb.edu:/ dest=~/Documents/BIEN/vegbiendev/ sudo -E put --exclude=/var/lib/mysql.bak --exclude=/var/lib/postgresql.bak --exclude='/var/lib/postgresql/9.3/main/*/' --exclude=/home/aaronmk/bien
|
531 |
14601
|
aaronmk
|
# enable --link-dest to work:
|
532 |
|
|
chmod -R o+r ~/bien/.svn/; find ~/bien/.svn -type d -exec chmod o+rx {} \; # match perms
|
533 |
|
|
l=1 overwrite=1 del= src='aaronmk@vegbiendev.nceas.ucsb.edu:~/bien/' dest=~/bien/ put --existing --size-only .svn/pristine/ # match times and perms
|
534 |
14912
|
aaronmk
|
l=1 overwrite=1 src=aaronmk@vegbiendev.nceas.ucsb.edu:/ dest=~/Documents/BIEN/vegbiendev/ sudo -E put --link-dest="$HOME"/Documents/BIEN/svn/ --no-owner --no-group home/aaronmk/bien/
|
535 |
14601
|
aaronmk
|
# --no-owner --no-group: needed to allow --link-dest to work
|
536 |
|
|
# --link-dest: relative to dest, not currdir, so need abs path
|
537 |
14911
|
aaronmk
|
./fix_perms # lock hardlinked files
|
538 |
14399
|
aaronmk
|
to back up the local machine's settings:
|
539 |
|
|
do step when changes are made on vegbiendev > on your machine, download
|
540 |
|
|
ssh aaronmk@jupiter.nceas.ucsb.edu
|
541 |
|
|
(cd ~/Dropbox/svn/; up)
|
542 |
|
|
on your machine:
|
543 |
14549
|
aaronmk
|
sudo find / -name .DS_Store -print -delete
|
544 |
14399
|
aaronmk
|
rm ~/'Library/Thunderbird/Profiles/9oo8rcyn.default/ImapMail/imap.googlemail.com/[Gmail].sbd/Spam'
|
545 |
|
|
# remove the downloaded Spam folder, because spam e-mails often contain viruses that would trigger clamscan
|
546 |
14548
|
aaronmk
|
overwrite=1 sync_local_dir=~/Dropbox/svn/ sync_remote_subdir=Dropbox/svn/ bin/sync_upload --size-only # just update mtimes
|
547 |
14399
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
548 |
14738
|
aaronmk
|
overwrite=1 inplace=1 sync_local_dir=~/ sync_remote_subdir= bin/sync_upload ~/"VirtualBox VMs/**" # need inplace=1 because they are very large files
|
549 |
14399
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
550 |
14738
|
aaronmk
|
overwrite=1 sync_local_dir=~/ sync_remote_subdir= sudo -E bin/sync_upload --exclude="/Library/Saved Application State/" --exclude="/.Trash/" --exclude="/bin/" --exclude="/bin/pg_ctl" --exclude="/bin/unzip" --exclude="/Dropbox/home/" --exclude="/.profile" --exclude="/.shrc" --exclude="/.bashrc" --exclude="/software/**/.svn/"
|
551 |
14693
|
aaronmk
|
# sudo -E: needed for Documents/BIEN/vegbiendev*/
|
552 |
14399
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
553 |
14692
|
aaronmk
|
pause Dropbox: system tray > Dropbox icon > gear icon > Pause Syncing
|
554 |
14399
|
aaronmk
|
this prevents Dropbox from trying to capture filesystem
|
555 |
|
|
events while syncing
|
556 |
14738
|
aaronmk
|
overwrite=1 sync_local_dir=~/ sync_remote_url=~/Dropbox/home/ bin/sync_upload --exclude="/Library/Saved Application State/" --exclude="/.Trash/" --exclude="/.dropbox/" --exclude="/Documents/BIEN/" --exclude="/Dropbox/" --exclude=/gmvault-db/ --exclude="/software/" --exclude="/VirtualBox VMs/**.sav" --exclude="/VirtualBox VMs/**.vdi" --exclude="/VirtualBox VMs/**.vmdk"
|
557 |
14399
|
aaronmk
|
then review diff, and rerun with `l=1` prepended
|
558 |
14692
|
aaronmk
|
resume Dropbox: system tray > Dropbox icon > gear icon > Resume Syncing
|
559 |
10884
|
aaronmk
|
to backup files not in Time Machine:
|
560 |
14667
|
aaronmk
|
**IMPORTANT**: need to use 2 TB external hard drive instead of Time
|
561 |
|
|
Machine drive because Time Machine drive does not have
|
562 |
|
|
~/Documents/BIEN/ in a location where it can be hardlinked against
|
563 |
11516
|
aaronmk
|
On local machine:
|
564 |
14656
|
aaronmk
|
on first run, create parent dirs:
|
565 |
14667
|
aaronmk
|
sudo mkdir -p '/Volumes/BIEN3.**SAVE**/Users/aaronmk/Documents/BIEN/'
|
566 |
|
|
sudo mkdir -p '/Volumes/BIEN3.**SAVE**/usr/local/var/postgres/'
|
567 |
|
|
l=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --existing
|
568 |
|
|
l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --include='/vegbiendev**' --exclude='**' Users/aaronmk/Documents/BIEN/
|
569 |
14660
|
aaronmk
|
# this cannot be backed up by Time Machine because it dereferences hard links:
|
570 |
|
|
# `sudo find /Volumes/Time\ Machine\ Backups/Backups.backupdb/ ! -type d -links +1`
|
571 |
|
|
# returns no files when there is a single timestamped backup, but
|
572 |
|
|
# `sudo find / ! -type d -links +1` does
|
573 |
14667
|
aaronmk
|
l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put usr/local/var/postgres/
|
574 |
14655
|
aaronmk
|
# this cannot be backed up by Time Machine because it prevents the backup process from ending
|
575 |
14916
|
aaronmk
|
brew services stop postgresql # if doesn't work, run `rm /usr/local/var/postgres/postmaster.pid` and retry
|
576 |
14667
|
aaronmk
|
l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put usr/local/var/postgres/
|
577 |
14916
|
aaronmk
|
brew services start postgresql # if doesn't work, run `rm /usr/local/var/postgres/postmaster.pid` and retry
|
578 |
14400
|
aaronmk
|
to back up the local machine's hard drive:
|
579 |
|
|
turn on and connect the 2 TB external hard drive
|
580 |
14917
|
aaronmk
|
# open "/Applications/Utilities/Disk Utility.app"
|
581 |
|
|
# resize the BIEN3.**SAVE** partition so there is enough space
|
582 |
14608
|
aaronmk
|
screen
|
583 |
14623
|
aaronmk
|
# --exclude='/\**': exclude *-files indicating the (differing) retention
|
584 |
|
|
# statuses of the partitions involved
|
585 |
14919
|
aaronmk
|
# don't need to pause Dropbox, since it will be reinitialized for the
|
586 |
|
|
# new partition UUID anyway
|
587 |
14920
|
aaronmk
|
# **IMPORTANT**: turn off Time Machine. this prevents Time Machine from
|
588 |
|
|
# running when the backup is booted, which messes up the Time Machine
|
589 |
|
|
# backup because the partition UUID is different.
|
590 |
14915
|
aaronmk
|
brew services stop postgresql # if doesn't work, run `rm /usr/local/var/postgres/postmaster.pid` and retry
|
591 |
14627
|
aaronmk
|
l=1 overwrite=1 src=/ dest='/Volumes/BIEN3.**SAVE**/' sudo -E put --exclude='/\**' --exclude=/.fseventsd/ --exclude=/private/var/vm/
|
592 |
14696
|
aaronmk
|
# no --extended-attributes: rsync has to visit every file for this
|
593 |
|
|
# runtime: 10 min (~600); initial runtime: 4-13 h ("2422.84"+"12379.91" .. "45813.19"+"747.96")
|
594 |
14915
|
aaronmk
|
brew services start postgresql # if doesn't work, run `rm /usr/local/var/postgres/postmaster.pid` and retry
|
595 |
14920
|
aaronmk
|
# turn on Time Machine
|
596 |
14401
|
aaronmk
|
to restore from Time Machine:
|
597 |
|
|
# restart holding Alt
|
598 |
|
|
# select Time Machine Backups
|
599 |
|
|
# restore the last Time Machine backup to Macintosh HD
|
600 |
|
|
# restart holding Alt
|
601 |
|
|
# select Macintosh HD
|
602 |
|
|
$ screen
|
603 |
14607
|
aaronmk
|
$ l=1 swap=1 src=/ dest=/Volumes/Time\ Machine\ Backups/ sudo -E put usr/local/var/postgres/ # runtime: 1 h ("4020.61")
|
604 |
14401
|
aaronmk
|
$ make postgres_restart
|
605 |
10884
|
aaronmk
|
VegCore data dictionary:
|
606 |
|
|
Regularly, or whenever the VegCore data dictionary page
|
607 |
|
|
(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
|
608 |
|
|
is changed, regenerate mappings/VegCore.csv:
|
609 |
11516
|
aaronmk
|
On local machine:
|
610 |
10884
|
aaronmk
|
make mappings/VegCore.htm-remake; make mappings/
|
611 |
12716
|
aaronmk
|
apply new data dict mappings to datasource mappings/staging tables:
|
612 |
12883
|
aaronmk
|
inputs/run postprocess # runtime: see inputs/run
|
613 |
12887
|
aaronmk
|
time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
|
614 |
10884
|
aaronmk
|
svn di mappings/VegCore.tables.redmine
|
615 |
|
|
If there are changes, update the data dictionary's Tables section
|
616 |
|
|
When moving terms, check that no terms were lost: svn di
|
617 |
|
|
svn ci -m 'mappings/VegCore.htm: regenerated from wiki'
|
618 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
619 |
12717
|
aaronmk
|
perform the steps under "apply new data dict mappings to
|
620 |
|
|
datasource mappings/staging tables" above
|
621 |
10884
|
aaronmk
|
Important: Whenever you install a system update that affects PostgreSQL or
|
622 |
|
|
any of its dependencies, such as libc, you should restart the PostgreSQL
|
623 |
|
|
server. Otherwise, you may get strange errors like "the database system
|
624 |
|
|
is in recovery mode" which go away upon reimport, or you may not be able
|
625 |
|
|
to access the database as the postgres superuser. This applies to both
|
626 |
|
|
Linux and Mac OS X.
|
627 |
|
|
|
628 |
|
|
Backups:
|
629 |
|
|
Archived imports:
|
630 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
631 |
10884
|
aaronmk
|
Back up: make backups/<version>.backup &
|
632 |
|
|
Note: To back up the last import, you must archive it first:
|
633 |
|
|
make schemas/rotate
|
634 |
|
|
Test: make -s backups/<version>.backup/test &
|
635 |
|
|
Restore: make backups/<version>.backup/restore &
|
636 |
|
|
Remove: make backups/<version>.backup/remove
|
637 |
|
|
Download: make backups/<version>.backup/download
|
638 |
|
|
TNRS cache:
|
639 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
640 |
10884
|
aaronmk
|
Back up: make backups/TNRS.backup-remake &
|
641 |
|
|
runtime: 3 min ("real 2m48.859s")
|
642 |
|
|
Restore:
|
643 |
|
|
yes|make inputs/.TNRS/uninstall
|
644 |
|
|
make backups/TNRS.backup/restore &
|
645 |
|
|
runtime: 5.5 min ("real 5m35.829s")
|
646 |
|
|
yes|make schemas/public/reinstall
|
647 |
|
|
Must come after TNRS restore to recreate tnrs_input_name view
|
648 |
|
|
Full DB:
|
649 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
650 |
10884
|
aaronmk
|
Back up: make backups/vegbien.<version>.backup &
|
651 |
|
|
Test: make -s backups/vegbien.<version>.backup/test &
|
652 |
|
|
Restore: make backups/vegbien.<version>.backup/restore &
|
653 |
|
|
Download: make backups/vegbien.<version>.backup/download
|
654 |
|
|
Import logs:
|
655 |
11516
|
aaronmk
|
On local machine:
|
656 |
10884
|
aaronmk
|
Download: make inputs/download-logs live=1
|
657 |
|
|
|
658 |
6484
|
aaronmk
|
Datasource refreshing:
|
659 |
8458
|
aaronmk
|
VegBank:
|
660 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
661 |
8458
|
aaronmk
|
make inputs/VegBank/vegbank.sql-remake
|
662 |
|
|
make inputs/VegBank/reinstall quiet=1 &
|
663 |
6484
|
aaronmk
|
|
664 |
702
|
aaronmk
|
Schema changes:
|
665 |
11516
|
aaronmk
|
On local machine:
|
666 |
8458
|
aaronmk
|
When changing the analytical views, run sync_analytical_..._to_view()
|
667 |
|
|
to update the corresponding table
|
668 |
|
|
Remember to update the following files with any renamings:
|
669 |
|
|
schemas/filter_ERD.csv
|
670 |
|
|
mappings/VegCore-VegBIEN.csv
|
671 |
|
|
mappings/verify.*.sql
|
672 |
|
|
Regenerate schema from installed DB: make schemas/remake
|
673 |
|
|
Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
|
674 |
12226
|
aaronmk
|
**WARNING**: This will delete the public schema of your VegBIEN DB!
|
675 |
12227
|
aaronmk
|
If needed, reinstall staging tables:
|
676 |
8837
|
aaronmk
|
On local machine:
|
677 |
8840
|
aaronmk
|
sudo -E -u postgres psql <<<'ALTER DATABASE vegbien RENAME TO vegbien_prev'
|
678 |
8845
|
aaronmk
|
make db
|
679 |
8837
|
aaronmk
|
. bin/reinstall_all
|
680 |
|
|
Fix any bugs and retry until no errors
|
681 |
8846
|
aaronmk
|
make schemas/public/install
|
682 |
|
|
This must be run *after* the datasources are installed, because
|
683 |
|
|
views in public depend on some of the datasources
|
684 |
8842
|
aaronmk
|
sudo -E -u postgres psql <<<'DROP DATABASE vegbien_prev'
|
685 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
686 |
12396
|
aaronmk
|
repeat the above steps
|
687 |
12226
|
aaronmk
|
**WARNING**: Do not run this until reinstall_all runs successfully
|
688 |
|
|
on the local machine, or the live DB may be unrestorable!
|
689 |
12927
|
aaronmk
|
update mappings and staging table column names:
|
690 |
12881
|
aaronmk
|
on local machine:
|
691 |
12883
|
aaronmk
|
inputs/run postprocess # runtime: see inputs/run
|
692 |
12887
|
aaronmk
|
time yes|make inputs/{NVS,SALVIAS,TEAM}/test # old-style import; runtime: 1 min ("0m59.692s") @starscream
|
693 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
694 |
12928
|
aaronmk
|
manually apply schema changes to the live public schema
|
695 |
12929
|
aaronmk
|
do steps under "on local machine" above
|
696 |
8458
|
aaronmk
|
Sync ERD with vegbien.sql schema:
|
697 |
|
|
Run make schemas/vegbien.my.sql
|
698 |
|
|
Open schemas/vegbien.ERD.mwb in MySQLWorkbench
|
699 |
|
|
Go to File > Export > Synchronize With SQL CREATE Script...
|
700 |
|
|
For Input File, select schemas/vegbien.my.sql
|
701 |
|
|
Click Continue
|
702 |
|
|
In the changes list, select each table with an arrow next to it
|
703 |
|
|
Click Update Model
|
704 |
|
|
Click Continue
|
705 |
|
|
Note: The generated SQL script will be empty because we are syncing in
|
706 |
|
|
the opposite direction
|
707 |
|
|
Click Execute
|
708 |
|
|
Reposition any lines that have been reset
|
709 |
|
|
Add any new tables by dragging them from the Catalog in the left sidebar
|
710 |
|
|
to the diagram
|
711 |
|
|
Remove any deleted tables by right-clicking the table's diagram element,
|
712 |
|
|
selecting Delete '<table name>', and clicking Delete
|
713 |
|
|
Save
|
714 |
|
|
If desired, update the graphical ERD exports (see below)
|
715 |
|
|
Update graphical ERD exports:
|
716 |
|
|
Go to File > Export > Export as PNG...
|
717 |
|
|
Select schemas/vegbien.ERD.png and click Save
|
718 |
|
|
Go to File > Export > Export as SVG...
|
719 |
|
|
Select schemas/vegbien.ERD.svg and click Save
|
720 |
|
|
Go to File > Export > Export as Single Page PDF...
|
721 |
|
|
Select schemas/vegbien.ERD.1_pg.pdf and click Save
|
722 |
|
|
Go to File > Print...
|
723 |
|
|
In the lower left corner, click PDF > Save as PDF...
|
724 |
|
|
Set the Title and Author to ""
|
725 |
|
|
Select schemas/vegbien.ERD.pdf and click Save
|
726 |
|
|
Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
|
727 |
|
|
Refactoring tips:
|
728 |
|
|
To rename a table:
|
729 |
|
|
In vegbien.sql, do the following:
|
730 |
|
|
Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
|
731 |
|
|
This is necessary because the table name is *everywhere*
|
732 |
|
|
Search for <new>
|
733 |
|
|
Manually change back any replacements inside comments
|
734 |
|
|
To rename a column:
|
735 |
|
|
Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
|
736 |
|
|
Recreate any foreign key for the column, removing CONSTRAINT <name>
|
737 |
|
|
This resets the foreign key name using the new column name
|
738 |
|
|
Creating a poster of the ERD:
|
739 |
|
|
Determine the poster size:
|
740 |
|
|
Measure the line height (from the bottom of one line to the bottom
|
741 |
|
|
of another): 16.3cm/24 lines = 0.679cm
|
742 |
|
|
Measure the height of the ERD: 35.4cm*2 = 70.8cm
|
743 |
|
|
Zoom in as far as possible
|
744 |
|
|
Measure the height of a capital letter: 3.5mm
|
745 |
|
|
Measure the line height: 8.5mm
|
746 |
|
|
Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
|
747 |
|
|
Calculate the text height: 0.679cm*0.41 = 0.28cm
|
748 |
|
|
Calculate the text height's fraction of the ERD height:
|
749 |
|
|
0.28cm/70.8cm = 0.0040
|
750 |
|
|
Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
|
751 |
|
|
Calculate the VegBIEN poster height to make the text the same size:
|
752 |
|
|
0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
|
753 |
|
|
The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
|
754 |
|
|
Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
|
755 |
|
|
The minimum VegBIEN poster size is 35x54in portrait
|
756 |
|
|
Determine the cost:
|
757 |
|
|
The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
|
758 |
|
|
charges the following for posters:
|
759 |
|
|
base: $7.25/sq ft
|
760 |
|
|
lamination: $3/sq ft
|
761 |
|
|
mounting on a board: $8/sq ft
|
762 |
203
|
aaronmk
|
|
763 |
1459
|
aaronmk
|
Testing:
|
764 |
8458
|
aaronmk
|
On a development machine, you should put the following in your .profile:
|
765 |
8469
|
aaronmk
|
umask ug=rwx,o= # prevent files from becoming web-accessible
|
766 |
8458
|
aaronmk
|
export log= n=2
|
767 |
11985
|
aaronmk
|
For development machine specs, see /planning/resources/dev_machine.specs/
|
768 |
11516
|
aaronmk
|
On local machine:
|
769 |
8458
|
aaronmk
|
Mapping process: make test
|
770 |
|
|
Including column-based import: make test by_col=1
|
771 |
|
|
If the row-based and column-based imports produce different inserted
|
772 |
|
|
row counts, this usually means that a table is underconstrained
|
773 |
|
|
(the unique indexes don't cover all possible rows).
|
774 |
|
|
This can occur if you didn't use COALESCE(field, null_value) around
|
775 |
|
|
a nullable field in a unique index. See sql_gen.null_sentinels for
|
776 |
|
|
the appropriate null value to use.
|
777 |
|
|
Map spreadsheet generation: make remake
|
778 |
|
|
Missing mappings: make missing_mappings
|
779 |
|
|
Everything (for most complete coverage): make test-all
|
780 |
702
|
aaronmk
|
|
781 |
7183
|
aaronmk
|
Debugging:
|
782 |
8458
|
aaronmk
|
"Binary chop" debugging:
|
783 |
|
|
(This is primarily useful for regressions that occurred in a previous
|
784 |
|
|
revision, which was committed without running all the tests)
|
785 |
12998
|
aaronmk
|
up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
|
786 |
8470
|
aaronmk
|
.htaccess:
|
787 |
|
|
mod_rewrite:
|
788 |
12226
|
aaronmk
|
**IMPORTANT**: whenever you change the DirectorySlash setting for a
|
789 |
8471
|
aaronmk
|
directory, you *must* clear your browser's cache to ensure that
|
790 |
|
|
a cached redirect is not used. this is because RewriteRule
|
791 |
|
|
redirects are (by default) temporary, but DirectorySlash
|
792 |
|
|
redirects are permanent.
|
793 |
8470
|
aaronmk
|
for Firefox:
|
794 |
|
|
press Cmd+Shift+Delete
|
795 |
|
|
check only Cache
|
796 |
|
|
press Enter or click Clear Now
|
797 |
7183
|
aaronmk
|
|
798 |
3783
|
aaronmk
|
WinMerge setup:
|
799 |
11516
|
aaronmk
|
In a Windows VM:
|
800 |
8458
|
aaronmk
|
Install WinMerge from <http://winmerge.org/>
|
801 |
|
|
Open WinMerge
|
802 |
|
|
Go to Edit > Options and click Compare in the left sidebar
|
803 |
|
|
Enable "Moved block detection", as described at
|
804 |
|
|
<http://manual.winmerge.org/Configuration.html#d0e5892>.
|
805 |
|
|
Set Whitespace to Ignore change, as described at
|
806 |
|
|
<http://manual.winmerge.org/Configuration.html#d0e5758>.
|
807 |
3783
|
aaronmk
|
|
808 |
3133
|
aaronmk
|
Documentation:
|
809 |
8458
|
aaronmk
|
To generate a Redmine-formatted list of steps for column-based import:
|
810 |
11516
|
aaronmk
|
On local machine:
|
811 |
8458
|
aaronmk
|
make schemas/public/reinstall
|
812 |
|
|
make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
|
813 |
|
|
To import and scrub just the test taxonomic names:
|
814 |
13284
|
aaronmk
|
ssh -t vegbiendev.nceas.ucsb.edu exec sudo -u aaronmk -i
|
815 |
8458
|
aaronmk
|
inputs/test_taxonomic_names/test_scrub
|
816 |
3133
|
aaronmk
|
|
817 |
702
|
aaronmk
|
General:
|
818 |
8458
|
aaronmk
|
To see a program's description, read its top-of-file comment
|
819 |
|
|
To see a program's usage, run it without arguments
|
820 |
|
|
To remake a directory: make <dir>/remake
|
821 |
|
|
To remake a file: make <file>-remake
|