Project

General

Profile

1
Installation:
2
    Check out svn: svn co https://code.nceas.ucsb.edu/code/projects/bien
3
    cd bien/
4
    Install: make install
5
        WARNING: This will delete the current public schema of your VegBIEN DB!
6
    Uninstall: make uninstall
7
        WARNING: This will delete your entire VegBIEN DB!
8
        This includes all archived imports and staging tables.
9

    
10
Maintenance:
11
    VegCore data dictionary:
12
        Regularly, or whenever the VegCore data dictionary page
13
            (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore)
14
            is changed, regenerate mappings/VegCore.csv:
15
            make mappings/VegCore.htm-remake; make mappings/
16
            When moving terms, check that no terms were lost: svn di
17
            svn ci -m "mappings/VegCore.htm: Regenerated from wiki"
18
    Important: Whenever you install a system update that affects PostgreSQL or
19
        any of its dependencies, such as libc, you should restart the PostgreSQL
20
        server. Otherwise, you may get strange errors like "the database system
21
        is in recovery mode" which go away upon reimport, or you may not be able
22
        to access the database as the postgres superuser. This applies to both
23
        Linux and Mac OS X.
24

    
25
Single datasource import:
26
    (Re)import and scrub: make inputs/<datasrc>/reimport_scrub
27
    (Re)import only: make inputs/<datasrc>/reimport
28
    (Re)scrub: make inputs/<datasrc>/rescrub
29
    Note that these commands also work if the datasource is not yet imported
30

    
31
Full database import:
32
    On jupiter: svn up
33
    On local machine:
34
        ./fix_perms
35
        make inputs/upload
36
        make test by_col=1
37
            See note under Testing below
38
    On vegbiendev:
39
    Ensure there are no local modifications: svn st
40
    svn up
41
    make inputs/download
42
    For each newly-uploaded datasource above: make inputs/<datasrc>/reinstall
43
    Update the auxiliary schemas: make schemas/reinstall
44
        The public schema will be installed separately by the import process
45
    Delete imports before the last so they won't bloat the full DB backup:
46
        make backups/vegbien.<version>.backup/remove
47
        To keep a previous import other than the public schema:
48
            export dump_opts='--exclude-schema=public --exclude-schema=<version>'
49
    Make sure there is at least 150GB of disk space on /: df -h
50
        The import schema is 100GB, and may use additional space for temp tables
51
        To free up space, remove backups that have been archived on jupiter:
52
            List backups/ to view older backups
53
            Check their MD5 sums using the steps under On jupiter below
54
            Remove these backups
55
    unset version
56
    Start column-based import: . bin/import_all by_col=1
57
        To use row-based import: . bin/import_all
58
        To stop all running imports: . bin/stop_imports
59
        WARNING: Do NOT run import_all in the background, or the jobs it creates
60
            won't be owned by your shell.
61
        Note that import_all will take up to an hour to import the NCBI backbone
62
            and other metadata before returning control to the shell.
63
    Wait (overnight) for the import to finish
64
    Get $version: echo $version
65
    Set $version in all vegbiendev terminals: export version=<version>
66
    Upload logs (run on vegbiendev): make inputs/upload
67
    On local machine: make inputs/download-logs
68
    In PostgreSQL:
69
        Check that the provider_count and source tables contain entries for all
70
            inputs
71
        Check that unscrubbed_taxondetermination_view returns no rows
72
            (takes 90 s with LIMIT 1)
73
        Check that there are taxondeterminations whose source_id is
74
            source_by_shortname('TNRS')
75
    tail inputs/{.,}*/*/logs/$version.log.sql
76
    In the output, search for "Command exited with non-zero status"
77
    For inputs that have this, fix the associated bug(s)
78
    If many inputs have errors, discard the current (partial) import:
79
        make schemas/$version/uninstall
80
    Otherwise, continue
81
    make schemas/$version/publish
82
    unset version
83
    backups/fix_perms
84
    make backups/upload
85
    On jupiter:
86
        cd /data/dev/aaronmk/bien/backups
87
        For each newly-archived backup:
88
            make -s <backup>.md5/test
89
            Check that "OK" is printed next to the filename
90
    On nimoy:
91
        cd /home/bien/svn/
92
        svn up
93
        export version=<version>
94
        make backups/analytical_aggregate.$version.csv/download
95
        In the bien_web DB:
96
            Create the analytical_aggregate_<version> table using its schema
97
                in schemas/vegbien.my.sql
98
        make -s backups/analytical_aggregate.$version.csv.md5/test
99
        Check that "OK" is printed next to the filename
100
        env table=analytical_aggregate_$version bin/publish_analytical_db \
101
            backups/analytical_aggregate.$version.csv
102
    If desired, record the import times in inputs/import.stats.xls:
103
        Open inputs/import.stats.xls
104
        Insert a copy of the leftmost "By column" column group before it
105
        export version=<version>
106
        bin/import_date inputs/{.,}*/*/logs/$version.log.sql
107
        Update the import date in the upper-right corner
108
        bin/import_times inputs/{.,}*/*/logs/$version.log.sql
109
        Paste the output over the # Rows/Time columns, making sure that the
110
            row counts match up with the previous import's row counts
111
        If the row counts do not match up, insert or reorder rows as needed
112
            until they do
113
        Commit: svn ci -m "inputs/import.stats.xls: Updated import times"
114
    To run TNRS: make scrub by_col=1 &
115
        To view progress:
116
            tail -100 inputs/.TNRS/tnrs/logs/tnrs.make.log.sql
117
    To remake analytical DB: bin/make_analytical_db &
118
        To view progress:
119
            tail -100 inputs/analytical_db/logs/make_analytical_db.log.sql
120
    To back up DB (staging tables and last import):
121
        export version=<version>
122
        If before renaming to public: export dump_opts=--exclude-schema=public
123
        make backups/vegbien.$version.backup/test &
124

    
125
Backups:
126
    Archived imports:
127
        Back up: make backups/<version>.backup &
128
            Note: To back up the last import, you must archive it first:
129
                make schemas/rotate
130
        Test: make -s backups/<version>.backup/test &
131
        Restore: make backups/<version>.backup/restore &
132
        Remove: make backups/<version>.backup/remove
133
        Download: make backups/download
134
    TNRS cache:
135
        Back up: make backups/TNRS.backup-remake &
136
        Restore:
137
            yes|make inputs/.TNRS/uninstall
138
            make backups/TNRS.backup/restore &
139
            yes|make schemas/public/reinstall
140
                Must come after TNRS restore to recreate tnrs_input_name view
141
    Full DB:
142
        Back up: make backups/vegbien.<version>.backup &
143
        Test: make -s backups/vegbien.<version>.backup/test &
144
        Restore: make backups/vegbien.<version>.backup/restore &
145
        Download: make backups/download
146
    Import logs:
147
        Download: make inputs/download-logs
148

    
149
Datasource setup:
150
    Add a new datasource: make inputs/<datasrc>/add
151
        <datasrc> may not contain spaces, and should be abbreviated.
152
        If the datasource is a herbarium, <datasrc> should be the herbarium code
153
            as defined by the Index Herbariorum <http://sweetgum.nybg.org/ih/>
154
    For MySQL inputs (exports and live DB connections):
155
        For .sql exports:
156
            Place the original .sql file in _src/ (*not* in _MySQL/)
157
            Follow the steps starting with Install the staging tables below.
158
                This is for an initial sync to get the file onto vegbiendev.
159
            On vegbiendev:
160
                Create a database for the MySQL export in phpMyAdmin
161
                bin/mysql_bien database <inputs/<datasrc>/_src/export.sql &
162
        mkdir inputs/<datasrc>/_MySQL/
163
        cp -p lib/MySQL.{data,schema}.sql.make inputs/<datasrc>/_MySQL/
164
        Edit _MySQL/*.make for the DB connection
165
            For a .sql export, use server=vegbiendev and --user=bien
166
        Skip the Add input data for each table section
167
    For MS Access databases:
168
        Place the .mdb or .accdb file in _src/
169
        Download and install Access To PostgreSQL from
170
            http://www.bullzip.com/download.php
171
        Use Access To PostgreSQL to export the database:
172
            Export just the tables/indexes to inputs/<datasrc>/<file>.schema.sql
173
            Export just the data to inputs/<datasrc>/<file>.data.sql
174
        In <file>.schema.sql, make the following changes:
175
            Replace text "BOOLEAN" with "/*BOOLEAN*/INTEGER"
176
            Replace text "DOUBLE PRECISION NULL" with "DOUBLE PRECISION"
177
        Skip the Add input data for each table section
178
    Add input data for each table present in the datasource:
179
        For .sql exports, you must use the name of the table in the DB export
180
        For CSV files, you can use any name. It's recommended to use a table
181
            name from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names>
182
        Note that if this table will be joined together with another table, its
183
            name must end in ".src"
184
        make inputs/<datasrc>/<table>/add
185
            Important: DO NOT just create an empty directory named <table>!
186
                This command also creates necessary subdirs, such as logs/.
187
        If the table is in a .sql export: make inputs/<datasrc>/<table>/install
188
            Otherwise, place the CSV(s) for the table in
189
            inputs/<datasrc>/<table>/ OR place a query joining other tables
190
            together in inputs/<datasrc>/<table>/create.sql
191
        Important: When exporting relational databases to CSVs, you MUST ensure
192
            that embedded quotes are escaped by doubling them, *not* by
193
            preceding them with a "\" as is the default in phpMyAdmin
194
        If there are multiple part files for a table, and the header is repeated
195
            in each part, make sure each header is EXACTLY the same.
196
             (If the headers are not the same, the CSV concatenation script
197
             assumes the part files don't have individual headers and treats the
198
             subsequent headers as data rows.)
199
        Add <table> to inputs/<datasrc>/import_order.txt before other tables
200
            that depend on it
201
    Install the staging tables:
202
        make inputs/<datasrc>/reinstall quiet=1 &
203
            For a MySQL .sql export, this must be done on vegbiendev
204
        To view progress: tail -f inputs/<datasrc>/<table>/logs/install.log.sql
205
        View the logs: tail -n +1 inputs/<datasrc>/*/logs/install.log.sql
206
            tail provides a header line with the filename
207
            +1 starts at the first line, to show the whole file
208
        For every file with an error 'column "..." specified more than once':
209
            Add a header override file "+header.<ext>" in <table>/:
210
                Note: The leading "+" should sort it before the flat files.
211
                    "_" unfortunately sorts *after* capital letters in ASCII.
212
                Create a text file containing the header line of the flat files
213
                Add an ! at the beginning of the line
214
                    This signals cat_csv that this is a header override.
215
                For empty names, use their 0-based column # (by convention)
216
                For duplicate names, add a distinguishing suffix
217
                For long names that collided, rename them to <= 63 chars long
218
                Do NOT make readability changes in this step; that is what the
219
                    map spreadsheets (below) are for.
220
                Save
221
        If you made any changes, re-run the install command above
222
    Auto-create the map spreadsheets: make inputs/<datasrc>/
223
    Map each table's columns:
224
        In each <table>/ subdir, for each "via map" map.csv:
225
            Open the map in a spreadsheet editor
226
            Open the "core map" /mappings/Veg+-VegBIEN.csv
227
            In each row of the via map, set the right column to a value from the
228
                left column of the core map
229
            Save
230
        Regenerate the derived maps: make inputs/<datasrc>/
231
    Accept the test cases:
232
        make inputs/<datasrc>/test
233
            When prompted to "Accept new test output", enter y and press ENTER
234
            If you instead get errors, do one of the following for each one:
235
            -   If the error was due to a bug, fix it
236
            -   Add a SQL function that filters or transforms the invalid data
237
            -   Make an empty mapping for the columns that produced the error.
238
                Put something in the Comments column of the map spreadsheet to
239
                prevent the automatic mapper from auto-removing the mapping.
240
            When accepting tests, it's helpful to use WinMerge
241
                (see WinMerge setup below for configuration)
242
        make inputs/<datasrc>/test by_col=1
243
            If you get errors this time, this always indicates a bug, usually in
244
                the VegBIEN unique constraints or column-based import itself
245
    Add newly-created files: make inputs/<datasrc>/add
246
    Commit: svn ci -m "Added inputs/<datasrc>/" inputs/<datasrc>/
247
    Update vegbiendev:
248
        On jupiter: svn up
249
        On local machine:
250
            ./fix_perms
251
            make inputs/upload
252
        On vegbiendev:
253
            svn up
254
            make inputs/download
255
            Follow the steps under Install the staging tables above
256

    
257
Datasource refreshing:
258
    VegBank:
259
        make inputs/VegBank/vegbank.sql-remake
260
        make inputs/VegBank/reinstall quiet=1 &
261

    
262
Schema changes:
263
    When changing the analytical views, run sync_analytical_..._to_view()
264
        to update the corresponding table
265
    Remember to update the following files with any renamings:
266
        schemas/filter_ERD.csv
267
        mappings/VegCore-VegBIEN.csv
268
        mappings/verify.*.sql
269
    Regenerate schema from installed DB: make schemas/remake
270
    Reinstall DB from schema: make schemas/public/reinstall schemas/reinstall
271
        WARNING: This will delete the current public schema of your VegBIEN DB!
272
    Reinstall staging tables: . bin/reinstall_all
273
    Sync ERD with vegbien.sql schema:
274
        Run make schemas/vegbien.my.sql
275
        Open schemas/vegbien.ERD.mwb in MySQLWorkbench
276
        Go to File > Export > Synchronize With SQL CREATE Script...
277
        For Input File, select schemas/vegbien.my.sql
278
        Click Continue
279
        In the changes list, select each table with an arrow next to it
280
        Click Update Model
281
        Click Continue
282
        Note: The generated SQL script will be empty because we are syncing in
283
            the opposite direction
284
        Click Execute
285
        Reposition any lines that have been reset
286
        Add any new tables by dragging them from the Catalog in the left sidebar
287
            to the diagram
288
        Remove any deleted tables by right-clicking the table's diagram element,
289
            selecting Delete '<table name>', and clicking Delete
290
        Save
291
        If desired, update the graphical ERD exports (see below)
292
    Update graphical ERD exports:
293
        Go to File > Export > Export as PNG...
294
        Select schemas/vegbien.ERD.png and click Save
295
        Go to File > Export > Export as SVG...
296
        Select schemas/vegbien.ERD.svg and click Save
297
        Go to File > Export > Export as Single Page PDF...
298
        Select schemas/vegbien.ERD.1_pg.pdf and click Save
299
        Go to File > Print...
300
        In the lower left corner, click PDF > Save as PDF...
301
        Set the Title and Author to ""
302
        Select schemas/vegbien.ERD.pdf and click Save
303
        Commit: svn ci -m "schemas/vegbien.ERD.mwb: Regenerated exports"
304
    Refactoring tips:
305
        To rename a table:
306
            In vegbien.sql, do the following:
307
                Replace regexp (?<=_|\b)<old>(?=_|\b) with <new>
308
                    This is necessary because the table name is *everywhere*
309
                Search for <new>
310
                Manually change back any replacements inside comments
311
        To rename a column:
312
            Rename the column: ALTER TABLE <table> RENAME <old> TO <new>;
313
            Recreate any foreign key for the column, removing CONSTRAINT <name>
314
                This resets the foreign key name using the new column name
315
    Creating a poster of the ERD:
316
        Determine the poster size:
317
            Measure the line height (from the bottom of one line to the bottom
318
                of another): 16.3cm/24 lines = 0.679cm
319
            Measure the height of the ERD: 35.4cm*2 = 70.8cm
320
            Zoom in as far as possible
321
            Measure the height of a capital letter: 3.5mm
322
            Measure the line height: 8.5mm
323
            Calculate the text's fraction of the line height: 3.5mm/8.5mm = 0.41
324
            Calculate the text height: 0.679cm*0.41 = 0.28cm
325
            Calculate the text height's fraction of the ERD height:
326
                0.28cm/70.8cm = 0.0040
327
            Measure the text height on the *VegBank* ERD poster: 5.5mm = 0.55cm
328
            Calculate the VegBIEN poster height to make the text the same size:
329
                0.55cm/0.0040 = 137.5cm H; *1in/2.54cm = 54.1in H
330
            The ERD aspect ratio is 11 in W x (2*8.5in H) = 11x17 portrait
331
            Calculate the VegBIEN poster width: 54.1in H*11W/17H = 35.0in W
332
            The minimum VegBIEN poster size is 35x54in portrait
333
        Determine the cost:
334
            The FedEx Kinkos near NCEAS (1030 State St, Santa Barbara, CA 93101)
335
                charges the following for posters:
336
                base: $7.25/sq ft
337
                lamination: $3/sq ft
338
                mounting on a board: $8/sq ft
339

    
340
Testing:
341
    On a development machine, you should put the following in your .profile:
342
        export log= n=2
343
    Mapping process: make test
344
        Including column-based import: make test by_col=1
345
            If the row-based and column-based imports produce different inserted
346
            row counts, this usually means that a table is underconstrained
347
            (the unique indexes don't cover all possible rows).
348
            This can occur if you didn't use COALESCE(field, null_value) around
349
            a nullable field in a unique index. See sql_gen.null_sentinels for
350
            the appropriate null value to use.
351
    Map spreadsheet generation: make remake
352
    Missing mappings: make missing_mappings
353
    Everything (for most complete coverage): make test-all
354

    
355
Debugging:
356
    "Binary chop" debugging:
357
        (This is primarily useful for regressions that occurred in a previous
358
        revision, which was committed without running all the tests)
359
        svn up -r <rev>; make inputs/.TNRS/reinstall; make schemas/public/reinstall; make <failed-test>.xml
360

    
361
WinMerge setup:
362
    Install WinMerge from <http://winmerge.org/>
363
    Open WinMerge
364
    Go to Edit > Options and click Compare in the left sidebar
365
    Enable "Moved block detection", as described at
366
        <http://manual.winmerge.org/Configuration.html#d0e5892>.
367
    Set Whitespace to Ignore change, as described at
368
        <http://manual.winmerge.org/Configuration.html#d0e5758>.
369

    
370
Documentation:
371
    To generate a Redmine-formatted list of steps for column-based import:
372
        make schemas/public/reinstall
373
        make inputs/ACAD/Specimen/logs/steps.by_col.log.sql
374
    To import and scrub just the test taxonomic names:
375
        inputs/test_taxonomic_names/test_scrub
376

    
377
General:
378
    To see a program's description, read its top-of-file comment
379
    To see a program's usage, run it without arguments
380
    To remake a directory: make <dir>/remake
381
    To remake a file: make <file>-remake
(2-2/6)