Project

General

Profile

Actions

Task #859

open

fix slowness in _taxonlabel_set_parent_id()

Added by Aaron Marcuse-Kubitza almost 11 years ago.

Status:
New
Priority:
Normal
Start date:
01/20/2014
Due date:
% Done:

0%

Estimated time:
Activity type:

Description

occurs when doing a full import of NCBI:

(make inputs/.NCBI/nodes/import_scrub by_col=1 continue=1; make inputs/.NCBI/publish) &

to reproduce problem:

  1. import NCBI:
    unset n; export log= n=40000 version=NCBI_test; inputs=(inputs/test_taxonomic_names/) # NCBI will be imported at beginning
    . bin/import_all
    
  2. search log output for INSERT INTO "_taxonlabel_set_parent_id(parent_id=taxonlabel_pkeys.out.taxonl"

stats:

n=10000: Took 0:00:13.805218 sec
n=20000: Took 0:01:08.731089 sec -> Took 0:00:31.031543 sec w/o nested xact; >6 min w/o indexes so not using the indexes isn't the problem
n=30000: Took 0:04:32.584036 sec -> Took 0:01:19.635390 sec
n=40000:                         -> Took 0:04:18.476969 sec
n=50000: very long (did not finish)
n="" (all 163907) @r11549: Took 0:05:31.246567 sec -> now: Took >3:16:50.253774 sec (>3:25:26.639058 sec on vegbiendev) (did not finish)
>O(n2) growth -> likely seq scan? should be O(n*(logn)^2) = # nodes * # ancest/node * O(logn) index scan/each

No data to display

Actions

Also available in: Atom PDF