2014-04-03 conference call¶

Martha's notes ¶

upcoming¶

the next conference call is next week at the usual time (Th. 9am PT/9am Tucson/12pm ET)

availability¶

Brad will now be able to stay until the end of April, not March
however, Brad will not be available, even for conference calls, after April
Mark will be around for the next 2 weeks, then traveling
please add your availability for spring 2014 to the *spreadsheet*:

Loading Google Spreadsheet...

decisions¶

aggregating validations ¶

we do want to run all queries for each datasource, even those queries that are not applicable to all datasources, because they contain columns that some datasources do not provide (Brad)
however, this is actually not necessary, as shown in the validations on sparse datasources infographic
even though the validations just test the normalization mechanism, not the source-specific preprocessing, we want to test all datasources, in case the example datasource used to develop the queries does not test every edge case
assume person running scripts has access to the shell as well as the DB (ie. the pipeline does not have to be self-contained inside the DB) (Brad)

specimens aggregating validations ¶

~~will implement workaround for the slowdown in query #12 instead of moving validations sequencing to shell scripts to prevent a single query from rolling back the the set of diffs~~
note that if there are future query slowdowns, this can also be fixed by setting statement_timeout locally on public_validations.rematerialize_out_view(text, regclass) .

plots aggregating validations ¶

defer denormalizing normalized plots datasources (TEAM, Madidi) so that this doesn't hold up the aspects of the validations that need to get done while Brad is available

import process¶

should switch to importing to a denormalized VegCore schema, so that we don't import each datasource differently

Postgres upgrades¶

in general, wait a bit before upgrading (Martha)

to do for Aaron¶

aggregating validations ¶

~~specimens queries~~
1. ~~implement workaround for the slowdown in query #12~~
2. ~~run pipeline on NY to generate diffs~~
plots queries
1. write denormalized plots input queries, using VegBank as the example datasource
2. finish fixing plots output queries
validate datasources
1. SALVIAS
2. denormalized plots datasources: VegBank, CVS, CTFS
3. specimens: NY, others
4. FIA (special case, with separate input queries)
5. normalized plots datasources: TEAM, Madidi
  1. denormalize
  2. validate

later¶

move validations sequencing to shell scripts

import process ¶

~~estimate how long it would take to switch to importing to a denormalized VegCore schema~~ see Importing to a denormalized VegCore schema
create denormalized VegCore schema
- should contain all the terms in the VegCore data dictionary
switch to importing to a denormalized VegCore schema
1. denormalize the remaining normalized datasources: TEAM, Madidi
2. materialize the denormalized views of the staging tables
3. rename all specimens tables and materialized views to the same table name
4. for each datasource, add NULL columns for all VegCore terms that are not present in the datasource
5. remove source-specific columns
6. re-order columns to match the order in the denormalized VegCore schema

info¶

TNRS¶

after-next PHP upgrade will break TNRS (Brad)

MySQL¶

MySQL nested queries can be slow, because MySQL sometimes won't use indexes for them (Brad)
- can solve this by running the nested query to a temp table and then using the temp table

Files (0)

Project

General

Profile

Wiki

2014-04-03 conference call¶

Martha's notes ¶

upcoming¶

availability¶

decisions¶

aggregating validations ¶

specimens aggregating validations ¶

plots aggregating validations ¶

import process¶

Postgres upgrades¶

to do for Aaron¶

aggregating validations ¶

later¶

import process ¶

info¶

TNRS¶

MySQL¶

Project

General

Profile

Wiki

2014-04-03 conference call¶

Martha's notes¶

upcoming¶

availability¶

decisions¶

aggregating validations¶

specimens aggregating validations¶

plots aggregating validations¶

import process¶

Postgres upgrades¶

to do for Aaron¶

aggregating validations¶

later¶

import process¶

info¶

TNRS¶

MySQL¶

Martha's notes ¶

aggregating validations ¶

specimens aggregating validations ¶

plots aggregating validations ¶

aggregating validations ¶

import process ¶