Revision 8458
Added by Aaron Marcuse-Kubitza over 11 years ago
README.TXT | ||
---|---|---|
1 | 1 |
Installation: |
2 |
Check out svn: svn co https://code.nceas.ucsb.edu/code/projects/bien
|
|
3 |
cd bien/
|
|
4 |
Install: make install
|
|
5 |
WARNING: This will delete the current public schema of your VegBIEN DB!
|
|
6 |
Uninstall: make uninstall
|
|
7 |
WARNING: This will delete your entire VegBIEN DB!
|
|
8 |
This includes all archived imports and staging tables.
|
|
2 |
Check out svn: svn co https://code.nceas.ucsb.edu/code/projects/bien
|
|
3 |
cd bien/
|
|
4 |
Install: make install
|
|
5 |
WARNING: This will delete the current public schema of your VegBIEN DB!
|
|
6 |
Uninstall: make uninstall
|
|
7 |
WARNING: This will delete your entire VegBIEN DB!
|
|
8 |
This includes all archived imports and staging tables.
|
|
9 | 9 |
|
10 | 10 |
Maintenance: |
11 |
VegCore data dictionary:
|
|
12 |
Regularly, or whenever the VegCore data dictionary page
|
|
13 |
(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
|
|
14 |
is changed, regenerate mappings/VegCore.csv:
|
|
15 |
make mappings/VegCore.htm-remake; make mappings/
|
|
16 |
svn di mappings/VegCore.tables.redmine
|
|
17 |
If there are changes, update the data dictionary's Tables section
|
|
18 |
When moving terms, check that no terms were lost: svn di
|
|
19 |
svn ci -m "mappings/VegCore.htm: Regenerated from wiki"
|
|
20 |
Important: Whenever you install a system update that affects PostgreSQL or
|
|
21 |
any of its dependencies, such as libc, you should restart the PostgreSQL
|
|
22 |
server. Otherwise, you may get strange errors like "the database system
|
|
23 |
is in recovery mode" which go away upon reimport, or you may not be able
|
|
24 |
to access the database as the postgres superuser. This applies to both
|
|
25 |
Linux and Mac OS X.
|
|
11 |
VegCore data dictionary:
|
|
12 |
Regularly, or whenever the VegCore data dictionary page
|
|
13 |
(https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
|
|
14 |
is changed, regenerate mappings/VegCore.csv:
|
|
15 |
make mappings/VegCore.htm-remake; make mappings/
|
|
16 |
svn di mappings/VegCore.tables.redmine
|
|
17 |
If there are changes, update the data dictionary's Tables section
|
|
18 |
When moving terms, check that no terms were lost: svn di
|
|
19 |
svn ci -m "mappings/VegCore.htm: Regenerated from wiki"
|
|
20 |
Important: Whenever you install a system update that affects PostgreSQL or
|
|
21 |
any of its dependencies, such as libc, you should restart the PostgreSQL
|
|
22 |
server. Otherwise, you may get strange errors like "the database system
|
|
23 |
is in recovery mode" which go away upon reimport, or you may not be able
|
|
24 |
to access the database as the postgres superuser. This applies to both
|
|
25 |
Linux and Mac OS X.
|
|
26 | 26 |
|
27 | 27 |
Single datasource import: |
28 |
(Re)import and scrub: make inputs/<datasrc>/reimport_scrub by_col=1
|
|
29 |
(Re)import only: make inputs/<datasrc>/reimport by_col=1
|
|
30 |
(Re)scrub: make inputs/<datasrc>/rescrub by_col=1
|
|
31 |
Note that these commands also work if the datasource is not yet imported
|
|
28 |
(Re)import and scrub: make inputs/<datasrc>/reimport_scrub by_col=1
|
|
29 |
(Re)import only: make inputs/<datasrc>/reimport by_col=1
|
|
30 |
(Re)scrub: make inputs/<datasrc>/rescrub by_col=1
|
|
31 |
Note that these commands also work if the datasource is not yet imported
|
|
32 | 32 |
|
33 | 33 |
Full database import: |
34 |
On jupiter: svn up
|
|
35 |
On local machine:
|
|
36 |
./fix_perms
|
|
37 |
make inputs/upload
|
|
38 |
make test by_col=1
|
|
39 |
See note under Testing below
|
|
40 |
On vegbiendev:
|
|
41 |
Ensure there are no local modifications: svn st
|
|
42 |
svn up
|
|
43 |
make inputs/download
|
|
44 |
For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
|
|
45 |
Update the auxiliary schemas: make schemas/reinstall
|
|
46 |
The public schema will be installed separately by the import process
|
|
47 |
Delete imports before the last so they won't bloat the full DB backup:
|
|
48 |
make backups/vegbien.<version>.backup/remove
|
|
49 |
To keep a previous import other than the public schema:
|
|
50 |
export dump_opts='--exclude-schema=public --exclude-schema=<version>'
|
|
51 |
Make sure there is at least 150GB of disk space on /: df -h
|
|
52 |
The import schema is 100GB, and may use additional space for temp tables
|
|
53 |
To free up space, remove backups that have been archived on jupiter:
|
|
54 |
List backups/ to view older backups
|
|
55 |
Check their MD5 sums using the steps under On jupiter below
|
|
56 |
Remove these backups
|
|
57 |
unset version
|
|
58 |
screen
|
|
59 |
Press ENTER
|
|
60 |
Start column-based import: . bin/import_all by_col=1
|
|
61 |
To use row-based import: . bin/import_all
|
|
62 |
To stop all running imports: . bin/stop_imports
|
|
63 |
WARNING: Do NOT run import_all in the background, or the jobs it creates
|
|
64 |
won't be owned by your shell.
|
|
65 |
Note that import_all will take up to an hour to import the NCBI backbone
|
|
66 |
and other metadata before returning control to the shell.
|
|
67 |
Wait (overnight) for the import to finish
|
|
68 |
To recover from a closed terminal window: screen -r
|
|
69 |
When there are no more running jobs, exit the screen
|
|
70 |
Get $version: echo $version
|
|
71 |
Set $version in all vegbiendev terminals: export version=<version>
|
|
72 |
Upload logs (run on vegbiendev): make inputs/upload
|
|
73 |
On local machine: make inputs/download-logs
|
|
74 |
In PostgreSQL:
|
|
75 |
Check that the provider_count and source tables contain entries for all
|
|
76 |
inputs
|
|
77 |
tail inputs/{.,}*/*/logs/$version.log.sql
|
|
78 |
In the output, search for "Command exited with non-zero status"
|
|
79 |
For inputs that have this, fix the associated bug(s)
|
|
80 |
If many inputs have errors, discard the current (partial) import:
|
|
81 |
make schemas/$version/uninstall
|
|
82 |
Otherwise, continue
|
|
83 |
make schemas/$version/publish
|
|
84 |
unset version
|
|
85 |
backups/fix_perms
|
|
86 |
make backups/upload
|
|
87 |
On jupiter:
|
|
88 |
cd /data/dev/aaronmk/bien/backups
|
|
89 |
For each newly-archived backup:
|
|
90 |
make -s <backup>.md5/test
|
|
91 |
Check that "OK" is printed next to the filename
|
|
92 |
On nimoy:
|
|
93 |
cd /home/bien/svn/
|
|
94 |
svn up
|
|
95 |
export version=<version>
|
|
96 |
make backups/analytical_stem.$version.csv/download
|
|
97 |
In the bien_web DB:
|
|
98 |
Create the analytical_stem_<version> table using its schema
|
|
99 |
in schemas/vegbien.my.sql
|
|
100 |
make -s backups/analytical_stem.$version.csv.md5/test
|
|
101 |
Check that "OK" is printed next to the filename
|
|
102 |
env table=analytical_stem_$version bin/publish_analytical_db \
|
|
103 |
backups/analytical_stem.$version.csv
|
|
104 |
If desired, record the import times in inputs/import.stats.xls:
|
|
105 |
Open inputs/import.stats.xls
|
|
106 |
If the rightmost import is within 5 columns of column IV:
|
|
107 |
Copy the current tab to <leftmost-date>~<rightmost-date>
|
|
108 |
Remove the previous imports from the current tab because they are
|
|
109 |
now in the copied tab instead
|
|
110 |
Insert a copy of the leftmost "By column" column group before it
|
|
111 |
export version=<version>
|
|
112 |
bin/import_date inputs/{.,}*/*/logs/$version.log.sql
|
|
113 |
Update the import date in the upper-right corner
|
|
114 |
bin/import_times inputs/{.,}*/*/logs/$version.log.sql
|
|
115 |
Paste the output over the # Rows/Time columns, making sure that the
|
|
116 |
row counts match up with the previous import's row counts
|
|
117 |
If the row counts do not match up, insert or reorder rows as needed
|
|
118 |
until they do. Get the datasource names from the log file footers:
|
|
119 |
tail inputs/{.,}*/*/logs/$version.log.sql
|
|
120 |
Commit: svn ci -m "inputs/import.stats.xls: Updated import times"
|
|
121 |
To run TNRS: make scrub by_col=1 &
|
|
122 |
export version=<version>
|
|
123 |
To view progress:
|
|
124 |
tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
|
|
125 |
To remake analytical DB: bin/make_analytical_db &
|
|
126 |
export version=<version>
|
|
127 |
To view progress:
|
|
128 |
tail -100 inputs/analytical_db/logs/make_analytical_db.log.sql
|
|
129 |
To back up DB (staging tables and last import):
|
|
130 |
export version=<version>
|
|
131 |
If before renaming to public: export dump_opts=--exclude-schema=public
|
|
132 |
make backups/vegbien.$version.backup/test &
|
|
34 |
On jupiter: svn up
|
|
35 |
On local machine:
|
|
36 |
./fix_perms
|
|
37 |
make inputs/upload
|
|
38 |
make test by_col=1
|
|
39 |
See note under Testing below
|
|
40 |
On vegbiendev:
|
|
41 |
Ensure there are no local modifications: svn st
|
|
42 |
svn up
|
|
43 |
make inputs/download
|
|
44 |
For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
|
|
45 |
Update the auxiliary schemas: make schemas/reinstall
|
|
46 |
The public schema will be installed separately by the import process
|
|
47 |
Delete imports before the last so they won't bloat the full DB backup:
|
|
48 |
make backups/vegbien.<version>.backup/remove
|
|
49 |
To keep a previous import other than the public schema:
|
|
50 |
export dump_opts='--exclude-schema=public --exclude-schema=<version>'
|
|
51 |
Make sure there is at least 150GB of disk space on /: df -h
|
|
52 |
The import schema is 100GB, and may use additional space for temp tables
|
|
53 |
To free up space, remove backups that have been archived on jupiter:
|
|
54 |
List backups/ to view older backups
|
|
55 |
Check their MD5 sums using the steps under On jupiter below
|
|
56 |
Remove these backups
|
|
57 |
unset version
|
|
58 |
screen
|
|
59 |
Press ENTER
|
|
60 |
Start column-based import: . bin/import_all by_col=1
|
|
61 |
To use row-based import: . bin/import_all
|
|
62 |
To stop all running imports: . bin/stop_imports
|
|
63 |
WARNING: Do NOT run import_all in the background, or the jobs it creates
|
|
64 |
won't be owned by your shell.
|
|
65 |
Note that import_all will take up to an hour to import the NCBI backbone
|
|
66 |
and other metadata before returning control to the shell.
|
|
67 |
Wait (overnight) for the import to finish
|
|
68 |
To recover from a closed terminal window: screen -r
|
|
69 |
When there are no more running jobs, exit the screen
|
|
70 |
Get $version: echo $version
|
|
71 |
Set $version in all vegbiendev terminals: export version=<version>
|
|
72 |
Upload logs (run on vegbiendev): make inputs/upload
|
|
73 |
On local machine: make inputs/download-logs
|
|
74 |
In PostgreSQL:
|
|
75 |
Check that the provider_count and source tables contain entries for all
|
|
76 |
inputs
|
|
77 |
tail inputs/{.,}*/*/logs/$version.log.sql
|
|
78 |
In the output, search for "Command exited with non-zero status"
|
|
79 |
For inputs that have this, fix the associated bug(s)
|
|
80 |
If many inputs have errors, discard the current (partial) import:
|
|
81 |
make schemas/$version/uninstall
|
|
82 |
Otherwise, continue
|
|
83 |
make schemas/$version/publish
|
|
84 |
unset version
|
|
85 |
backups/fix_perms
|
|
86 |
make backups/upload
|
|
87 |
On jupiter:
|
|
88 |
cd /data/dev/aaronmk/bien/backups
|
|
89 |
For each newly-archived backup:
|
|
90 |
make -s <backup>.md5/test
|
|
91 |
Check that "OK" is printed next to the filename
|
|
92 |
On nimoy:
|
|
93 |
cd /home/bien/svn/
|
|
94 |
svn up
|
|
95 |
export version=<version>
|
|
96 |
make backups/analytical_stem.$version.csv/download
|
|
97 |
In the bien_web DB:
|
|
98 |
Create the analytical_stem_<version> table using its schema
|
|
99 |
in schemas/vegbien.my.sql
|
|
100 |
make -s backups/analytical_stem.$version.csv.md5/test
|
|
101 |
Check that "OK" is printed next to the filename
|
|
102 |
env table=analytical_stem_$version bin/publish_analytical_db \
|
|
103 |
backups/analytical_stem.$version.csv
|
|
104 |
If desired, record the import times in inputs/import.stats.xls:
|
|
105 |
Open inputs/import.stats.xls
|
|
106 |
If the rightmost import is within 5 columns of column IV:
|
|
107 |
Copy the current tab to <leftmost-date>~<rightmost-date>
|
|
108 |
Remove the previous imports from the current tab because they are
|
|
109 |
now in the copied tab instead
|
|
110 |
Insert a copy of the leftmost "By column" column group before it
|
|
111 |
export version=<version>
|
|
112 |
bin/import_date inputs/{.,}*/*/logs/$version.log.sql
|
|
113 |
Update the import date in the upper-right corner
|
|
114 |
bin/import_times inputs/{.,}*/*/logs/$version.log.sql
|
|
115 |
Paste the output over the # Rows/Time columns, making sure that the
|
|
116 |
row counts match up with the previous import's row counts
|
|
117 |
If the row counts do not match up, insert or reorder rows as needed
|
|
118 |
until they do. Get the datasource names from the log file footers:
|
|
119 |
tail inputs/{.,}*/*/logs/$version.log.sql
|
|
120 |
Commit: svn ci -m "inputs/import.stats.xls: Updated import times"
|
|
121 |
To run TNRS: make scrub by_col=1 &
|
|
122 |
export version=<version>
|
|
123 |
To view progress:
|
|
124 |
tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
|
|
125 |
To remake analytical DB: bin/make_analytical_db &
|
|
126 |
export version=<version>
|
|
127 |
To view progress:
|
|
128 |
tail -100 inputs/analytical_db/logs/make_analytical_db.log.sql
|
|
129 |
To back up DB (staging tables and last import):
|
|
130 |
export version=<version>
|
|
131 |
If before renaming to public: export dump_opts=--exclude-schema=public
|
|
132 |
make backups/vegbien.$version.backup/test &
|
|
133 | 133 |
|
134 | 134 |
Backups: |
135 |
Archived imports:
|
|
136 |
Back up: make backups/<version>.backup &
|
|
137 |
Note: To back up the last import, you must archive it first:
|
|
138 |
make schemas/rotate
|
|
139 |
Test: make -s backups/<version>.backup/test &
|
|
140 |
Restore: make backups/<version>.backup/restore &
|
|
141 |
Remove: make backups/<version>.backup/remove
|
|
142 |
Download: make backups/download
|
|
143 |
TNRS cache:
|
|
144 |
Back up: make backups/TNRS.backup-remake &
|
|
145 |
Restore:
|
|
146 |
yes|make inputs/.TNRS/uninstall
|
|
147 |
make backups/TNRS.backup/restore &
|
|
148 |
yes|make schemas/public/reinstall
|
|
149 |
Must come after TNRS restore to recreate tnrs_input_name view
|
|
150 |
Full DB:
|
|
151 |
Back up: make backups/vegbien.<version>.backup &
|
|
152 |
Test: make -s backups/vegbien.<version>.backup/test &
|
|
153 |
Restore: make backups/vegbien.<version>.backup/restore &
|
|
154 |
Download: make backups/download
|
|
155 |
Import logs:
|
|
156 |
Download: make inputs/download-logs
|
|
135 |
Archived imports:
|
|
136 |
Back up: make backups/<version>.backup &
|
|
137 |
Note: To back up the last import, you must archive it first:
|
|
138 |
make schemas/rotate
|
|
139 |
Test: make -s backups/<version>.backup/test &
|
|
140 |
Restore: make backups/<version>.backup/restore &
|
|
141 |
Remove: make backups/<version>.backup/remove
|
|
142 |
Download: make backups/download
|
|
143 |
TNRS cache:
|
|
144 |
Back up: make backups/TNRS.backup-remake &
|
|
145 |
Restore:
|
|
146 |
yes|make inputs/.TNRS/uninstall
|
|
147 |
make backups/TNRS.backup/restore &
|
|
148 |
yes|make schemas/public/reinstall
|
|
149 |
Must come after TNRS restore to recreate tnrs_input_name view
|
|
150 |
Full DB:
|
|
151 |
Back up: make backups/vegbien.<version>.backup &
|
|
152 |
Test: make -s backups/vegbien.<version>.backup/test &
|
|
153 |
Restore: make backups/vegbien.<version>.backup/restore &
|
|
154 |
Download: make backups/download
|
|
155 |
Import logs:
|
|
156 |
Download: make inputs/download-logs
|
|
157 | 157 |
|
158 | 158 |
Datasource setup: |
159 |
Add a new datasource: make inputs/<datasrc>/add
|
|
160 |
<datasrc> may not contain spaces, and should be abbreviated.
|
|
161 |
If the datasource is a herbarium, <datasrc> should be the herbarium code
|
|
162 |
as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
|
|
163 |
For MySQL inputs (exports and live DB connections):
|
|
164 |
For .sql exports:
|
|
165 |
Place the original .sql file in _src/ (*not* in _MySQL/)
|
|
166 |
Follow the steps starting with Install the staging tables below.
|
|
167 |
This is for an initial sync to get the file onto vegbiendev.
|
|
168 |
On vegbiendev:
|
|
169 |
Create a database for the MySQL export in phpMyAdmin
|
|
170 |
bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
|
|
171 |
mkdir inputs/<datasrc>/_MySQL/
|
|
172 |
cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
|
|
173 |
Edit _MySQL/*.make for the DB connection
|
|
174 |
For a .sql export, use server=vegbiendev and --user=bien
|
|
175 |
Skip the Add input data for each table section
|
|
176 |
For MS Access databases:
|
|
177 |
Place the .mdb or .accdb file in _src/
|
|
178 |
Download and install Access To PostgreSQL from
|
|
179 |
http://www.bullzip.com/download.php
|
|
180 |
Use Access To PostgreSQL to export the database:
|
|
181 |
Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
|
|
182 |
Export just the data to inputs/<datasrc>/<file>.data.sql
|
|
183 |
In <file>.schema.sql, make the following changes:
|
|
184 |
Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
|
|
185 |
Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
|
|
186 |
Skip the Add input data for each table section
|
|
187 |
Add input data for each table present in the datasource:
|
|
188 |
For .sql exports, you must use the name of the table in the DB export
|
|
189 |
For CSV files, you can use any name. It's recommended to use a table
|
|
190 |
name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
|
|
191 |
Note that if this table will be joined together with another table, its
|
|
192 |
name must end in ".src"
|
|
193 |
make inputs/<datasrc>/<table>/add
|
|
194 |
Important: DO NOT just create an empty directory named <table>!
|
|
195 |
This command also creates necessary subdirs, such as logs/.
|
|
196 |
If the table is in a .sql export: make inputs/<datasrc>/<table>/install
|
|
197 |
Otherwise, place the CSV(s) for the table in
|
|
198 |
inputs/<datasrc>/<table>/ OR place a query joining other tables
|
|
199 |
together in inputs/<datasrc>/<table>/create.sql
|
|
200 |
Important: When exporting relational databases to CSVs, you MUST ensure
|
|
201 |
that embedded quotes are escaped by doubling them, *not* by
|
|
202 |
preceding them with a "\" as is the default in phpMyAdmin
|
|
203 |
If there are multiple part files for a table, and the header is repeated
|
|
204 |
in each part, make sure each header is EXACTLY the same.
|
|
205 |
(If the headers are not the same, the CSV concatenation script
|
|
206 |
assumes the part files don't have individual headers and treats the
|
|
207 |
subsequent headers as data rows.)
|
|
208 |
Add <table> to inputs/<datasrc>/import_order.txt before other tables
|
|
209 |
that depend on it
|
|
210 |
Install the staging tables:
|
|
211 |
make inputs/<datasrc>/reinstall quiet=1 &
|
|
212 |
For a MySQL .sql export:
|
|
213 |
At prompt "[you]@vegbiendev's password:", enter your password
|
|
214 |
At prompt "Enter password:", enter the value in config/bien_password
|
|
215 |
To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
|
|
216 |
View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
|
|
217 |
tail provides a header line with the filename
|
|
218 |
+1 starts at the first line, to show the whole file
|
|
219 |
For every file with an error 'column "..." specified more than once':
|
|
220 |
Add a header override file "+header.<ext>" in <table>/:
|
|
221 |
Note: The leading "+" should sort it before the flat files.
|
|
222 |
"_" unfortunately sorts *after* capital letters in ASCII.
|
|
223 |
Create a text file containing the header line of the flat files
|
|
224 |
Add an ! at the beginning of the line
|
|
225 |
This signals cat_csv that this is a header override.
|
|
226 |
For empty names, use their 0-based column # (by convention)
|
|
227 |
For duplicate names, add a distinguishing suffix
|
|
228 |
For long names that collided, rename them to <= 63 chars long
|
|
229 |
Do NOT make readability changes in this step; that is what the
|
|
230 |
map spreadsheets (below) are for.
|
|
231 |
Save
|
|
232 |
If you made any changes, re-run the install command above
|
|
233 |
Auto-create the map spreadsheets: make inputs/<datasrc>/
|
|
234 |
Map each table's columns:
|
|
235 |
In each <table>/ subdir, for each "via map" map.csv:
|
|
236 |
Open the map in a spreadsheet editor
|
|
237 |
Open the "core map" /mappings/Veg+-VegBIEN.csv
|
|
238 |
In each row of the via map, set the right column to a value from the
|
|
239 |
left column of the core map
|
|
240 |
Save
|
|
241 |
Regenerate the derived maps: make inputs/<datasrc>/
|
|
242 |
Accept the test cases:
|
|
243 |
make inputs/<datasrc>/test
|
|
244 |
When prompted to "Accept new test output", enter y and press ENTER
|
|
245 |
If you instead get errors, do one of the following for each one:
|
|
246 |
- If the error was due to a bug, fix it
|
|
247 |
- Add a SQL function that filters or transforms the invalid data
|
|
248 |
- Make an empty mapping for the columns that produced the error.
|
|
249 |
Put something in the Comments column of the map spreadsheet to
|
|
250 |
prevent the automatic mapper from auto-removing the mapping.
|
|
251 |
When accepting tests, it's helpful to use WinMerge
|
|
252 |
(see WinMerge setup below for configuration)
|
|
253 |
make inputs/<datasrc>/test by_col=1
|
|
254 |
If you get errors this time, this always indicates a bug, usually in
|
|
255 |
the VegBIEN unique constraints or column-based import itself
|
|
256 |
Add newly-created files: make inputs/<datasrc>/add
|
|
257 |
Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
|
|
258 |
Update vegbiendev:
|
|
259 |
On jupiter: svn up
|
|
260 |
On local machine:
|
|
261 |
./fix_perms
|
|
262 |
make inputs/upload
|
|
263 |
On vegbiendev:
|
|
264 |
svn up
|
|
265 |
make inputs/download
|
|
266 |
Follow the steps under Install the staging tables above
|
|
159 |
Add a new datasource: make inputs/<datasrc>/add
|
|
160 |
<datasrc> may not contain spaces, and should be abbreviated.
|
|
161 |
If the datasource is a herbarium, <datasrc> should be the herbarium code
|
|
162 |
as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
|
|
163 |
For MySQL inputs (exports and live DB connections):
|
|
164 |
For .sql exports:
|
|
165 |
Place the original .sql file in _src/ (*not* in _MySQL/)
|
|
166 |
Follow the steps starting with Install the staging tables below.
|
|
167 |
This is for an initial sync to get the file onto vegbiendev.
|
|
168 |
On vegbiendev:
|
|
169 |
Create a database for the MySQL export in phpMyAdmin
|
|
170 |
bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
|
|
171 |
mkdir inputs/<datasrc>/_MySQL/
|
|
172 |
cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
|
|
173 |
Edit _MySQL/*.make for the DB connection
|
|
174 |
For a .sql export, use server=vegbiendev and --user=bien
|
|
175 |
Skip the Add input data for each table section
|
|
176 |
For MS Access databases:
|
|
177 |
Place the .mdb or .accdb file in _src/
|
|
178 |
Download and install Access To PostgreSQL from
|
|
179 |
http://www.bullzip.com/download.php
|
|
180 |
Use Access To PostgreSQL to export the database:
|
|
181 |
Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
|
|
182 |
Export just the data to inputs/<datasrc>/<file>.data.sql
|
|
183 |
In <file>.schema.sql, make the following changes:
|
|
184 |
Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
|
|
185 |
Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
|
|
186 |
Skip the Add input data for each table section
|
|
187 |
Add input data for each table present in the datasource:
|
|
188 |
For .sql exports, you must use the name of the table in the DB export
|
|
189 |
For CSV files, you can use any name. It's recommended to use a table
|
|
190 |
name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
|
|
191 |
Note that if this table will be joined together with another table, its
|
|
192 |
name must end in ".src"
|
|
193 |
make inputs/<datasrc>/<table>/add
|
|
194 |
Important: DO NOT just create an empty directory named <table>!
|
|
195 |
This command also creates necessary subdirs, such as logs/.
|
|
196 |
If the table is in a .sql export: make inputs/<datasrc>/<table>/install
|
|
197 |
Otherwise, place the CSV(s) for the table in
|
|
198 |
inputs/<datasrc>/<table>/ OR place a query joining other tables
|
|
199 |
together in inputs/<datasrc>/<table>/create.sql
|
|
200 |
Important: When exporting relational databases to CSVs, you MUST ensure
|
|
201 |
that embedded quotes are escaped by doubling them, *not* by
|
|
202 |
preceding them with a "\" as is the default in phpMyAdmin
|
|
203 |
If there are multiple part files for a table, and the header is repeated
|
|
204 |
in each part, make sure each header is EXACTLY the same.
|
|
205 |
(If the headers are not the same, the CSV concatenation script
|
|
206 |
assumes the part files don't have individual headers and treats the
|
|
207 |
subsequent headers as data rows.)
|
|
208 |
Add <table> to inputs/<datasrc>/import_order.txt before other tables
|
|
209 |
that depend on it
|
|
210 |
Install the staging tables:
|
|
211 |
make inputs/<datasrc>/reinstall quiet=1 &
|
|
212 |
For a MySQL .sql export:
|
|
213 |
At prompt "[you]@vegbiendev's password:", enter your password
|
|
214 |
At prompt "Enter password:", enter the value in config/bien_password
|
|
215 |
To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
|
|
216 |
View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
|
|
217 |
tail provides a header line with the filename
|
|
218 |
+1 starts at the first line, to show the whole file
|
|
219 |
For every file with an error 'column "..." specified more than once':
|
|
220 |
Add a header override file "+header.<ext>" in <table>/:
|
|
221 |
Note: The leading "+" should sort it before the flat files.
|
|
222 |
"_" unfortunately sorts *after* capital letters in ASCII.
|
|
223 |
Create a text file containing the header line of the flat files
|
|
224 |
Add an ! at the beginning of the line
|
|
225 |
This signals cat_csv that this is a header override.
|
|
226 |
For empty names, use their 0-based column # (by convention)
|
|
227 |
For duplicate names, add a distinguishing suffix
|
|
228 |
For long names that collided, rename them to <= 63 chars long
|
|
229 |
Do NOT make readability changes in this step; that is what the
|
|
230 |
map spreadsheets (below) are for.
|
|
231 |
Save
|
|
232 |
If you made any changes, re-run the install command above
|
|
233 |
Auto-create the map spreadsheets: make inputs/<datasrc>/
|
|
234 |
Map each table's columns:
|
|
235 |
In each <table>/ subdir, for each "via map" map.csv:
|
|
236 |
Open the map in a spreadsheet editor
|
|
237 |
Open the "core map" /mappings/Veg+-VegBIEN.csv
|
|
238 |
In each row of the via map, set the right column to a value from the
|
|
239 |
left column of the core map
|
|
240 |
Save
|
|
241 |
Regenerate the derived maps: make inputs/<datasrc>/
|
|
242 |
Accept the test cases:
|
|
243 |
make inputs/<datasrc>/test
|
|
244 |
When prompted to "Accept new test output", enter y and press ENTER
|
|
245 |
If you instead get errors, do one of the following for each one:
|
|
246 |
- If the error was due to a bug, fix it
|
|
247 |
- Add a SQL function that filters or transforms the invalid data
|
|
248 |
- Make an empty mapping for the columns that produced the error.
|
|
249 |
Put something in the Comments column of the map spreadsheet to
|
|
250 |
prevent the automatic mapper from auto-removing the mapping.
|
|
251 |
When accepting tests, it's helpful to use WinMerge
|
|
252 |
(see WinMerge setup below for configuration)
|
|
253 |
make inputs/<datasrc>/test by_col=1
|
|
254 |
If you get errors this time, this always indicates a bug, usually in
|
|
255 |
the VegBIEN unique constraints or column-based import itself
|
|
256 |
Add newly-created files: make inputs/<datasrc>/add
|
|
257 |
Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
|
|
258 |
Update vegbiendev:
|
|
259 |
On jupiter: svn up
|
|
260 |
On local machine:
|
|
261 |
./fix_perms
|
|
262 |
make inputs/upload
|
|
263 |
On vegbiendev:
|
|
264 |
svn up
|
|
265 |
make inputs/download
|
|
266 |
Follow the steps under Install the staging tables above
|
|
267 | 267 |
|
268 | 268 |
Datasource refreshing: |
269 |
VegBank:
|
|
270 |
make inputs/VegBank/vegbank.sql-remake
|
|
271 |
make inputs/VegBank/reinstall quiet=1 &
|
|
269 |
VegBank:
|
|
270 |
make inputs/VegBank/vegbank.sql-remake
|
|
271 |
make inputs/VegBank/reinstall quiet=1 &
|
|
272 | 272 |
|
273 | 273 |
Schema changes: |
274 |
When changing the analytical views, run sync_analytical_..._to_view()
|
|
275 |
to update the corresponding table
|
|
276 |
Remember to update the following files with any renamings:
|
|
277 |
schemas/filter_ERD.csv
|
|
278 |
mappings/VegCore-VegBIEN.csv
|
|
279 |
mappings/verify.*.sql
|
|
280 |
Regenerate schema from installed DB: make schemas/remake
|
|
281 |
Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
|
|
282 |
WARNING: This will delete the current public schema of your VegBIEN DB!
|
|
283 |
Reinstall staging tables: . bin/reinstall_all
|
|
284 |
Sync ERD with vegbien.sql schema:
|
|
285 |
Run make schemas/vegbien.my.sql
|
|
286 |
Open schemas/vegbien.ERD.mwb in MySQLWorkbench
|
|
287 |
Go to File > Export > Synchronize With SQL CREATE Script...
|
|
288 |
For Input File, select schemas/vegbien.my.sql
|
|
289 |
Click Continue
|
|
290 |
In the changes list, select each table with an arrow next to it
|
|
291 |
Click Update Model
|
|
292 |
Click Continue
|
|
293 |
Note: The generated SQL script will be empty because we are syncing in
|
|
294 |
the opposite direction
|
|
295 |
Click Execute
|
|
296 |
Reposition any lines that have been reset
|
|
297 |
Add any new tables by dragging them from the Catalog in the left sidebar
|
|
298 |
to the diagram
|
|
299 |
Remove any deleted tables by right-clicking the table's diagram element,
|
|
300 |
selecting Delete '<table name>', and clicking Delete
|
|
301 |
Save
|
|
302 |
If desired, update the graphical ERD exports (see below)
|
|
303 |
Update graphical ERD exports:
|
|
304 |
Go to File > Export > Export as PNG...
|
|
305 |
Select schemas/vegbien.ERD.png and click Save
|
|
306 |
Go to File > Export > Export as SVG...
|
|
307 |
Select schemas/vegbien.ERD.svg and click Save
|
|
308 |
Go to File > Export > Export as Single Page PDF...
|
|
309 |
Select schemas/vegbien.ERD.1_pg.pdf and click Save
|
|
310 |
Go to File > Print...
|
|
311 |
In the lower left corner, click PDF > Save as PDF...
|
|
312 |
Set the Title and Author to ""
|
|
313 |
Select schemas/vegbien.ERD.pdf and click Save
|
|
314 |
Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
|
|
315 |
Refactoring tips:
|
|
316 |
To rename a table:
|
|
317 |
In vegbien.sql, do the following:
|
|
318 |
Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
|
|
319 |
This is necessary because the table name is *everywhere*
|
|
320 |
Search for <new>
|
|
321 |
Manually change back any replacements inside comments
|
|
322 |
To rename a column:
|
|
323 |
Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
|
|
324 |
Recreate any foreign key for the column, removing CONSTRAINT <name>
|
|
325 |
This resets the foreign key name using the new column name
|
|
326 |
Creating a poster of the ERD:
|
|
327 |
Determine the poster size:
|
|
328 |
Measure the line height (from the bottom of one line to the bottom
|
|
329 |
of another): 16.3cm/24 lines = 0.679cm
|
|
330 |
Measure the height of the ERD: 35.4cm*2 = 70.8cm
|
|
331 |
Zoom in as far as possible
|
|
332 |
Measure the height of a capital letter: 3.5mm
|
|
333 |
Measure the line height: 8.5mm
|
|
334 |
Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
|
|
335 |
Calculate the text height: 0.679cm*0.41 = 0.28cm
|
|
336 |
Calculate the text height's fraction of the ERD height:
|
|
337 |
0.28cm/70.8cm = 0.0040
|
|
338 |
Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
|
|
339 |
Calculate the VegBIEN poster height to make the text the same size:
|
|
340 |
0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
|
|
341 |
The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
|
|
342 |
Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
|
|
343 |
The minimum VegBIEN poster size is 35x54in portrait
|
|
344 |
Determine the cost:
|
|
345 |
The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
|
|
346 |
charges the following for posters:
|
|
347 |
base: $7.25/sq ft
|
|
348 |
lamination: $3/sq ft
|
|
349 |
mounting on a board: $8/sq ft
|
|
274 |
When changing the analytical views, run sync_analytical_..._to_view()
|
|
275 |
to update the corresponding table
|
|
276 |
Remember to update the following files with any renamings:
|
|
277 |
schemas/filter_ERD.csv
|
|
278 |
mappings/VegCore-VegBIEN.csv
|
|
279 |
mappings/verify.*.sql
|
|
280 |
Regenerate schema from installed DB: make schemas/remake
|
|
281 |
Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
|
|
282 |
WARNING: This will delete the current public schema of your VegBIEN DB!
|
|
283 |
Reinstall staging tables: . bin/reinstall_all
|
|
284 |
Sync ERD with vegbien.sql schema:
|
|
285 |
Run make schemas/vegbien.my.sql
|
|
286 |
Open schemas/vegbien.ERD.mwb in MySQLWorkbench
|
|
287 |
Go to File > Export > Synchronize With SQL CREATE Script...
|
|
288 |
For Input File, select schemas/vegbien.my.sql
|
|
289 |
Click Continue
|
|
290 |
In the changes list, select each table with an arrow next to it
|
|
291 |
Click Update Model
|
|
292 |
Click Continue
|
|
293 |
Note: The generated SQL script will be empty because we are syncing in
|
|
294 |
the opposite direction
|
|
295 |
Click Execute
|
|
296 |
Reposition any lines that have been reset
|
|
297 |
Add any new tables by dragging them from the Catalog in the left sidebar
|
|
298 |
to the diagram
|
|
299 |
Remove any deleted tables by right-clicking the table's diagram element,
|
|
300 |
selecting Delete '<table name>', and clicking Delete
|
|
301 |
Save
|
|
302 |
If desired, update the graphical ERD exports (see below)
|
|
303 |
Update graphical ERD exports:
|
|
304 |
Go to File > Export > Export as PNG...
|
|
305 |
Select schemas/vegbien.ERD.png and click Save
|
|
306 |
Go to File > Export > Export as SVG...
|
|
307 |
Select schemas/vegbien.ERD.svg and click Save
|
|
308 |
Go to File > Export > Export as Single Page PDF...
|
|
309 |
Select schemas/vegbien.ERD.1_pg.pdf and click Save
|
|
310 |
Go to File > Print...
|
|
311 |
In the lower left corner, click PDF > Save as PDF...
|
|
312 |
Set the Title and Author to ""
|
|
313 |
Select schemas/vegbien.ERD.pdf and click Save
|
|
314 |
Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
|
|
315 |
Refactoring tips:
|
|
316 |
To rename a table:
|
|
317 |
In vegbien.sql, do the following:
|
|
318 |
Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
|
|
319 |
This is necessary because the table name is *everywhere*
|
|
320 |
Search for <new>
|
|
321 |
Manually change back any replacements inside comments
|
|
322 |
To rename a column:
|
|
323 |
Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
|
|
324 |
Recreate any foreign key for the column, removing CONSTRAINT <name>
|
|
325 |
This resets the foreign key name using the new column name
|
|
326 |
Creating a poster of the ERD:
|
|
327 |
Determine the poster size:
|
|
328 |
Measure the line height (from the bottom of one line to the bottom
|
|
329 |
of another): 16.3cm/24 lines = 0.679cm
|
|
330 |
Measure the height of the ERD: 35.4cm*2 = 70.8cm
|
|
331 |
Zoom in as far as possible
|
|
332 |
Measure the height of a capital letter: 3.5mm
|
|
333 |
Measure the line height: 8.5mm
|
|
334 |
Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
|
|
335 |
Calculate the text height: 0.679cm*0.41 = 0.28cm
|
|
336 |
Calculate the text height's fraction of the ERD height:
|
|
337 |
0.28cm/70.8cm = 0.0040
|
|
338 |
Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
|
|
339 |
Calculate the VegBIEN poster height to make the text the same size:
|
|
340 |
0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
|
|
341 |
The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
|
|
342 |
Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
|
|
343 |
The minimum VegBIEN poster size is 35x54in portrait
|
|
344 |
Determine the cost:
|
|
345 |
The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
|
|
346 |
charges the following for posters:
|
|
347 |
base: $7.25/sq ft
|
|
348 |
lamination: $3/sq ft
|
|
349 |
mounting on a board: $8/sq ft
|
|
350 | 350 |
|
351 | 351 |
Testing: |
352 |
On a development machine, you should put the following in your .profile:
|
|
353 |
export log= n=2
|
|
354 |
Mapping process: make test
|
|
355 |
Including column-based import: make test by_col=1
|
|
356 |
If the row-based and column-based imports produce different inserted
|
|
357 |
row counts, this usually means that a table is underconstrained
|
|
358 |
(the unique indexes don't cover all possible rows).
|
|
359 |
This can occur if you didn't use COALESCE(field, null_value) around
|
|
360 |
a nullable field in a unique index. See sql_gen.null_sentinels for
|
|
361 |
the appropriate null value to use.
|
|
362 |
Map spreadsheet generation: make remake
|
|
363 |
Missing mappings: make missing_mappings
|
|
364 |
Everything (for most complete coverage): make test-all
|
|
352 |
On a development machine, you should put the following in your .profile:
|
|
353 |
export log= n=2
|
|
354 |
Mapping process: make test
|
|
355 |
Including column-based import: make test by_col=1
|
|
356 |
If the row-based and column-based imports produce different inserted
|
|
357 |
row counts, this usually means that a table is underconstrained
|
|
358 |
(the unique indexes don't cover all possible rows).
|
|
359 |
This can occur if you didn't use COALESCE(field, null_value) around
|
|
360 |
a nullable field in a unique index. See sql_gen.null_sentinels for
|
|
361 |
the appropriate null value to use.
|
|
362 |
Map spreadsheet generation: make remake
|
|
363 |
Missing mappings: make missing_mappings
|
|
364 |
Everything (for most complete coverage): make test-all
|
|
365 | 365 |
|
366 | 366 |
Debugging: |
367 |
"Binary chop" debugging:
|
|
368 |
(This is primarily useful for regressions that occurred in a previous
|
|
369 |
revision, which was committed without running all the tests)
|
|
370 |
svn up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
|
|
367 |
"Binary chop" debugging:
|
|
368 |
(This is primarily useful for regressions that occurred in a previous
|
|
369 |
revision, which was committed without running all the tests)
|
|
370 |
svn up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
|
|
371 | 371 |
|
372 | 372 |
WinMerge setup: |
373 |
Install WinMerge from <http://winmerge.org/>
|
|
374 |
Open WinMerge
|
|
375 |
Go to Edit > Options and click Compare in the left sidebar
|
|
376 |
Enable "Moved block detection", as described at
|
|
377 |
<http://manual.winmerge.org/Configuration.html#d0e5892>.
|
|
378 |
Set Whitespace to Ignore change, as described at
|
|
379 |
<http://manual.winmerge.org/Configuration.html#d0e5758>.
|
|
373 |
Install WinMerge from <http://winmerge.org/>
|
|
374 |
Open WinMerge
|
|
375 |
Go to Edit > Options and click Compare in the left sidebar
|
|
376 |
Enable "Moved block detection", as described at
|
|
377 |
<http://manual.winmerge.org/Configuration.html#d0e5892>.
|
|
378 |
Set Whitespace to Ignore change, as described at
|
|
379 |
<http://manual.winmerge.org/Configuration.html#d0e5758>.
|
|
380 | 380 |
|
381 | 381 |
Documentation: |
382 |
To generate a Redmine-formatted list of steps for column-based import:
|
|
383 |
make schemas/public/reinstall
|
|
384 |
make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
|
|
385 |
To import and scrub just the test taxonomic names:
|
|
386 |
inputs/test_taxonomic_names/test_scrub
|
|
382 |
To generate a Redmine-formatted list of steps for column-based import:
|
|
383 |
make schemas/public/reinstall
|
|
384 |
make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
|
|
385 |
To import and scrub just the test taxonomic names:
|
|
386 |
inputs/test_taxonomic_names/test_scrub
|
|
387 | 387 |
|
388 | 388 |
General: |
389 |
To see a program's description, read its top-of-file comment |
|
390 |
To see a program's usage, run it without arguments |
|
391 |
To remake a directory: make <dir>/remake |
|
392 |
To remake a file: make <file>-remake |
|
389 |
To see a program's description, read its top-of-file comment |
|
390 |
To see a program's usage, run it without arguments |
|
391 |
To remake a directory: make <dir>/remake |
|
392 |
To remake a file: make <file>-remake |
Also available in: Unified diff
README.TXT: converted spaces to tabs for easier readability in a whitespace-highlighting text editor, and so that pressing backspace deletes an indent rather than just a character (creating a misaligned indent)