Project

General

Profile

Progress update

Individual datasource removal

  • individual datasource removal was optimized to enable datasources to be removed quickly
  • it now takes 30 s to remove a medium-sized datasource (ACAD, 45,000 rows) and 1 hour to remove a large datasource (MO, 4 million rows)
  • this was done by adding covering indexes on foreign keys, which allows the cascading deletes to quickly find the rows to delete

Individual datasource refresh

  • individual datasource refresh was also optimized to enable datasources to be reloaded quickly after fixing mappings issues
  • this was done by test-reloading a sample small datasource, and examining slow running queries for missing indexes, etc.
  • datasources can now be reloaded without needing to remove the existing import
  • this ensures that the database always contains a complete import of each datasource, even during a reload

Datasource validation

Derived data version info

Source-level tracking of import and revision

  • fields to store this have been added to the source metadata table:
    • datecreated
    • createdby
    • datelastmodified
    • lastmodifiedby
  • the dates are populated with the start time of the import

Normalized VegCore schema

  • when modeling issues are discovered that cannot easily be fixed in VegBIEN, these are instead fixed in normalized VegCore.
    this provides important documentation about the structure of our vegetation data, in a format that is easier to understand than the comparatively verbose VegBIEN ERD.
  • each table in the *VegCore ERD* now links to the corresponding table in *phpMyAdmin* (login bien_read), which contains table and column comments that help explain the schema