Project

General

Profile

Aggregating validations on sparse datasources

sparse vs. full datasource

a sparse datasource is one in which the datasource itself does not provide all the VegCore columns used in the validations queries

specimens

sparseness doesn't seem to be a problem for almost all specimens datasources: so far, it looks like all except UNCC provide the columns used in the validations queries. UNCC itself is only missing lat/long, which we could add NULL columns for if needed.

columns provided
query columns used full datasource: NY sparse datasource: UNCC
_specimens_14_count_of_all_invalid_verbatim_lat_long decimalLatitude
decimalLongitude
decimalLatitude
decimalLongitude
-
-

plots

sparseness is primarily an issue for projects. however, project-related sparseness could easily be solved just by adding a NULL project column to applicable datasources (FIA, etc.). note that this does not require adding NULL columns for every single VegCore term as in the denormalized VegCore schema approach.

columns provided
query columns used full datasource: SALVIAS sparse datasource: FIA
_plots_02_list_of_project_names project_name project_name -

current approach vs. with denormalized VegCore schema

the following infographic shows how our current validations approach applies to sparse datasources, and demonstrates that it successfully handles these sparse datasources by bypassing non-applicable queries. since the infographic shows that our current approach works, this also implies that we don't need to switch to a different approach (such as a denormalized VegCore schema) for the validations to work.

essentially, the denormalized VegCore schema approach delays the validations for potentially 2 more weeks (until the last week of April, when Brad will only have 1 week left), just for the sake of running a few non-applicable queries on some datasources.

click for editable version