Revision 5903

« Previous | Next »

Added by Aaron Marcuse-Kubitza over 12 years ago

sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.

Changes
View differences

added
modified
copied
renamed
deleted

lib
- sql.py (diff)

Project

General

Profile

Revision 5903

Added by Aaron Marcuse-Kubitza over 12 years ago