Table of contents
Import steps

Import steps¶

Underlined steps are remaining to do
The normalize step is the most complicated. We are trying to replace the XML normalization algorithm with a much simpler insert-ignore.

export .sql file from live DB connection ¶

README sections: Datasource setup > MySQL inputs > .sql exports
examples: inputs/SALVIAS/_MySQL/salvias_plots.{schema,data}.sql.make
templates: lib/MySQL.{schema,data}.sql.make
scripts: bin/my2pg_export ; bin/mysql_bien

translate DB engines' SQL to PostgreSQL ¶

README sections: Datasource setup > Install the staging tables
make targets: inputs/input.Makefile > `%.sql: _MySQL/%.sql`, `%.sql: %.sql.make`
scripts: bin/my2pg ; bin/my2pg.data

standardize CSVs to PostgreSQL-compatible format ¶

Excel dialect; no ragged rows

README sections: Datasource setup > Install the staging tables
make targets: inputs/input.Makefile > `%/install: _always` containing $(import_install_)
scripts: bin/csv2db
utils: lib/csvs.py > reader_and_header(), InputRewriter

install staging tables ¶

README sections: Datasource setup > Install the staging tables

run each SQL script (idempotent)
- make targets: inputs/input.Makefile > sql/install
- scripts: inputs/*/*.sql
create tables from CSV headers (idempotent)
- make targets: inputs/input.Makefile > `%/install: _always` containing $(import_install_)
- scripts: bin/csv2db
- utils: lib/sql_io.py > import_csv()
COPY FROM each standardized CSV (idempotent)
- utils: lib/sql_io.py > append_csv(), cleanup_table()

map to VegCore ¶

README sections: Datasource setup > Map each table's columns; Maintenance > VegCore data dictionary
make targets: inputs/input.Makefile > %/.map.csv.last_cleanup ; mappings/Makefile > targets containing VegCore.htm
scripts: bin/canon ; bin/translate_ci ; bin/redmine_synonyms
mappings: inputs/*/*/map.csv ; mappings/VegCore.thesaurus.csv ; mappings/VegCore.vocab.csv

rename staging table columns according to the VegCore mapping ¶

also needed for datasource-general postprocessing step

scripts: lib/runscripts/table.run > map_table()
mappings: inputs/*/*/map.csv

datasource-specific postprocessing ¶

make targets: inputs/input.Makefile > %.sql/run, postprocess, %/install ($(selfMake) $*/postprocess.sql/run)
scripts: inputs/*/*/postprocess.sql ; inputs/*/*/import ; inputs/*/import

planned: datasource-general postprocessing ¶

needed for adding derived columns
scripts: schemas/VegCore/mk_derived ; lib/import.sh > mk_derived() ; inputs/*/*/import > mk_derived .

populate ID fields
populate analytical derived fields
1. run TNRS and left join TNRS fields

planned: create occurrence_all view which joins the datasource's tables together ¶

scripts: inputs/*/occurrence_all/import .
the column names must all be globally unique for this to work

planned: create occurrence_VegCore view which includes just the VegCore columns in occurrence_all ¶

planned: create public.occurrence_VegCore view which appends each datasource's occurrence_VegCore view ¶

normalize ¶

either of the following:

run VegBank XML normalization algorithm (a.k.a. column-based import/row-based import) ¶

this is the very complicated step we are trying to replace with the underlined steps
it includes several files/functions which are very long, marked (HUGE)
takes a very long time

README sections: Single datasource import, Full database import (HUGE)
scripts: bin/import_all ; bin/map (HUGE)
make targets: inputs/input.Makefile > %/import using $(import)
utils: lib/db_xml.py > put_table(), put() (HUGE) ; lib/sql_io.py > put_table() (HUGE)
mappings: mappings/VegCore-VegBIEN.csv (HUGE) ; inputs/*/*/VegBIEN.csv ; inputs/*/*/test.xml.ref

planned: try inserting each input table into each output table ¶

ignore duplicates
may be able to use existing import mechanism, but it must ignore the mapping to VegBIEN because the columns have already been renamed
runtime is less time than XML normalization

Files (0)

Project

General

Profile

Wiki

Import steps¶

export .sql file from live DB connection ¶

translate DB engines' SQL to PostgreSQL ¶

standardize CSVs to PostgreSQL-compatible format ¶

install staging tables ¶

map to VegCore ¶

rename staging table columns according to the VegCore mapping ¶

datasource-specific postprocessing ¶

planned: datasource-general postprocessing ¶

planned: create occurrence_all view which joins the datasource's tables together ¶

planned: create occurrence_VegCore view which includes just the VegCore columns in occurrence_all ¶

planned: create public.occurrence_VegCore view which appends each datasource's occurrence_VegCore view ¶

normalize ¶

run VegBank XML normalization algorithm (a.k.a. column-based import/row-based import) ¶

planned: try inserting each input table into each output table ¶