2014-03-06 conference call¶
Brad's notes¶
After our call, Martha and I spent some time discussing how to speed up progress. Here are the decisions we made:
1. You should complete all output queries first, both plot and specimens. That will enable me to ensure that any input queries I return exactly the same columns as your output queries.
2. You should complete the generic input queries (denormalized vegCore plots and vegCore specimens) after completing all output queries. This will help you focus on ensuring that the queries run (that is, that they do not throw run-time errors) without worrying about troubleshooting validation issues.
3. You should proceed with the actual validation only after you have written all the queries.
4. You should denormalize TEAM, Madidi, and CTFS only AFTER validating all the other sources, both specimens and plots. We do not know how long denormalization will take, and would prefer to get the other sources completed before embarking on denormalization.
5. I will write the input validations against the original data for SALVIAS and FIA. I will also validate them. You do not need to work on the input queries for either of these sources, nor run the validations.Bearing the above in mind, here is the new order of priority:
1. Finish all plot output queries. Make sure they work and are stable. This is a single set of 18 queries modeled after the SALVIAS queries I provided.
2. Write specimens output queries. Make sure they work and are stable. This is a single set of 16 queries modeled after the NYBG queries I provided.
3. Write plots input queries against the data in the denormalize vegCore schema. Make sure they work and are stable. This is a single set of 18 queries that match the plot output queries.
4. Write specimens input queries against the specimen data in the vegCore schema. Make sure they work and are stable. This is a single set of 16 queries that match the specimen output queries.
5. Run plot validations for two plot data sets that have already been denormalized, excluding FIA (VegBank, CVS)
6. Run all specimen validations
7. Denormalize the remaining plot datasources TEAM, Madidi, and CTFS so they will work with the denormalized plot input queries
8. Complete plot validations on TEAM, Madidi, and CTFSI appreciate that the output queries are challenging due to normalization, but the input queries should be faster to write. As you work on completing the queries, please send me each day the most recent version of the queries you are working on (plot output, specimen output, plot input or specimen input) so I can gauge progress and help with any problems you might be encountering.
upcoming¶
- the next conference call is next week at the usual time (Th. 9am PT/10am Tucson/12pm ET)
availability¶
- Brad is leaving at the end of the month
- please add your availability for spring 2014 to the *spreadsheet*:
decisions¶
aggregating validations¶
- OK for the aggregating validations to validate just the normalization process, not also the staging table loading and denormalization (Martha, Brad)
- better to denormalize input data first, so that all the input queries can be written once on a common schema (Martha, Brad)
schema design¶
- better to use dummy entries in tables to avoid
LEFT JOINs
(Brad)
to do for Brad¶
aggregating validations¶
renumber FIA input queries to match the SALVIAS query #s- validate SALVIAS against the modified output queries
write FIA input queries to match the modified output queries
to do for Aaron¶
aggregating validations¶
make sure all the plots output queries work and are stable- write specimens output queries
- write denormalized plots input queries from the FIA input queries
write specimens input queries from the NY input queriessee specimens input queries (done using the steps under Aggregating validations refactoring)- run all validations
- denormalize the remaining plots datasources TEAM, Madidi, and
CTFS[we aren't using the normalized part] so they will work with the denormalized plots validation queries
VegCore¶
- make denormalized VegCore SQL schema from the data dictionary
schema changes¶
insert a dummywe will use LEFT JOINs toproject
for everylocationevent
that doesn't have oneproject
instead- specimens data should have one project per herbarium