Aggregating validations on sparse datasources¶
sparse vs. full datasource¶
a sparse datasource is one in which the datasource itself does not provide all the VegCore columns used in the validations queries
specimens¶
sparseness doesn't seem to be a problem for almost all specimens datasources: so far, it looks like all except UNCC provide the columns used in the validations queries. UNCC itself is only missing lat/long, which we could add NULL
columns for if needed.
columns provided | |||
query | columns used | full datasource: NY | sparse datasource: UNCC |
_specimens_14_count_of_all_invalid_verbatim_lat_long |
decimalLatitude decimalLongitude |
decimalLatitude decimalLongitude |
- - |
plots¶
sparseness is primarily an issue for projects. however, project-related sparseness could easily be solved just by adding a NULL
project column to applicable datasources (FIA, etc.). note that this does not require adding NULL
columns for every single VegCore term as in the denormalized VegCore schema approach.
columns provided | |||
query | columns used | full datasource: SALVIAS | sparse datasource: FIA |
_plots_02_list_of_project_names |
project_name |
project_name |
- |
current approach vs. with denormalized VegCore schema¶
the following infographic shows how our current validations approach applies to sparse datasources, and demonstrates that it successfully handles these sparse datasources by bypassing non-applicable queries. since the infographic shows that our current approach works, this also implies that we don't need to switch to a different approach (such as a denormalized VegCore schema) for the validations to work.
essentially, the denormalized VegCore schema approach delays the validations for potentially 2 more weeks (until the last week of April, when Brad will only have 1 week left), just for the sake of running a few non-applicable queries on some datasources.