Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 almost 13 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  bin 3271 over 12 years Aaron Marcuse-Kubitza csv2db: verbosity defaults to 3 so that detaile...
  config 272 about 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 3282 over 12 years Aaron Marcuse-Kubitza inputs/import.stats.xls: Fixed date for most re...
  lib 3305 over 12 years Aaron Marcuse-Kubitza sql.py: flatten(): Auto-add a pkey on the creat...
  mappings 3229 over 12 years Aaron Marcuse-Kubitza mappings/VegX-VegBIEN.stems.csv: Sort the plant...
  schemas 3300 over 12 years Aaron Marcuse-Kubitza schemas/tree_cross-links.sql: Ancestors table: ...
  to_do 2547 over 12 years Aaron Marcuse-Kubitza to_do/timeline.doc: Updated to reflect the mont...
Makefile 10.5 KB 3249 over 12 years Aaron Marcuse-Kubitza root Makefile: VegBIEN DB: Schemas: Added schem...
README.TXT 2.96 KB 3205 over 12 years Aaron Marcuse-Kubitza README.TXT: Data import: Import data into VegBI...
map 1.21 KB 3140 over 12 years Aaron Marcuse-Kubitza top-level map: Added support for custom public ...

Latest revisions

# Date Author Comment
3305 07/10/2012 08:02 PM Aaron Marcuse-Kubitza

sql.py: flatten(): Auto-add a pkey on the created temp table. This should be standard practice for most temp tables, and for sql_io.put_table() especially this will be useful if we ever want to add back sorting the in_table by row_num (possibly by CLUSTERing on the pkey to avoid pkey index scans).

3304 07/10/2012 07:54 PM Aaron Marcuse-Kubitza

sql.run_query_into() calls: Use new add_pkey_ param instead of manually calling sql.add_pkey()

3303 07/10/2012 07:53 PM Aaron Marcuse-Kubitza

sql.py: run_query_into(): Changed add_indexes_ param to add_pkey_ and add just a pkey if it's set. It's no longer necessary to create indexes on every column of a temp table, because the covering indexes for the join columns have been fixed to have columns in the same order as the output table's corresponding index so that they can be used for a merge join.

3302 07/10/2012 07:41 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Add pkey on pkeys table right when it's created, so that any duplicates are detected right away instead of at the end of the iteration. (Duplicates are created as a result of joins matching multiple rows, which often indicates a database misconfiguration.)

3301 07/10/2012 07:34 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Adding pkey on pkeys table: Removed log message because adding an index is considered a low-level operation, which isn't included in the Redmine SQL

3300 07/10/2012 07:27 PM Aaron Marcuse-Kubitza

schemas/tree_cross-links.sql: Ancestors table: Synced with current definition, which removes unneeded fki_* indexes. Note that the index on ancestor_id might be needed in the future if we ever want to get all the descendants of a plantname/namedplace or perform deletions on plantname/namedplace (which cascade to *_ancestor). For getting all the plantnames/namedplaces (of any rank) for a plantconcept/locationdetermination, though, the *_ancestor_pkey index is sufficient because plantname_id/namedplace_id is the first column in it.

3299 07/10/2012 07:20 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: {plantname,namedplace}_update_ancestors(): Fixed slowdown due to removed index on {plantname,namedplace}.parent_id by adding COALESCE to enable using the plantname_unique index for the lookup instead

3298 07/10/2012 06:26 PM Aaron Marcuse-Kubitza

sql.mk_select() calls: Removed no longer needed start=0 to turn off missing WHERE, LIMIT, or OFFSET clause warning

3297 07/10/2012 06:21 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Don't output warning if there is no WHERE, LIMIT, or OFFSET clause, because column-based import has many queries where this is the case and it's annoying to need to specify start=0 to turn off this warning

3296 07/10/2012 06:04 PM Aaron Marcuse-Kubitza

sql.py: flatten(): Don't sort the input tables by the pkey because it doesn't matter what order the datasource's rows are inserted in. Note that PostgreSQL doesn't guarantee the order of rows in a table, so it is possible that the rows were being inserted in an unknown order before this change, as well.

View all revisions | View revisions

Also available in: Atom