Project

General

Profile

Import issues

Merging duplicates with children

Problem: When a new table entry duplicates an existing entry, the 1:1 tables of that table and those tables' children are not merged, causing them to become orphaned

Affected tables:

  • specimenreplicate->taxonoccurrence->locationevent->location

Solutions:

Use inheritance and unique columns where this is an issue

  • Probably the simplest solution because it doesn't require major changes to the import algorithm
  • Only works where inheritance makes sense, but this is usually true where this is a problem
  • In PostgreSQL, need to translate some foreign keys to triggers because "foreign key constraints only apply to single tables, not to their inheritance children" _1
  • Where inheritance doesn't make sense because the relationship isn't always 1:1, use the parent table's unique columns to uniquify the child table
    • e.g. specimenreplicate->locationevent

1 http://www.postgresql.org/docs/9.1/static/ddl-inherit.html#DDL-INHERIT-CAVEATS

Avoid creating duplicate descendants by adding the descendants after their parents are inserted/uniquified

  • For non-nullable descendants, do one of the following
    • Make them nullable and check the not-null constraint later
    • Insert 0 in the field and defer the foreign key constraint
      • Requires running the entire subtree in one transaction
    • Add a trigger that auto-adds an empty descendant, so there is something in the field

Merge descendants when duplicate parents are detected

  • Requires determining what the descendants are, using DB introspection of the foreign keys
  • Use deferred constraints to temporarily set the pkeys of the two children to the same value
  • Allow foreign keys' on update actions to propagate the change to tables that point to the child
  • Before/after doing that, merge all the descendants of the child
  • Delete the not yet attached child, specifying it using its oid