Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 almost 13 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  bin 3271 over 12 years Aaron Marcuse-Kubitza csv2db: verbosity defaults to 3 so that detaile...
  config 272 about 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 3314 over 12 years Aaron Marcuse-Kubitza mappings/VegX-VegBIEN.stems.csv: Removed locati...
  lib 3313 over 12 years Aaron Marcuse-Kubitza sql.py: distinct_table(): Don't sort the insert...
  mappings 3314 over 12 years Aaron Marcuse-Kubitza mappings/VegX-VegBIEN.stems.csv: Removed locati...
  schemas 3300 over 12 years Aaron Marcuse-Kubitza schemas/tree_cross-links.sql: Ancestors table: ...
  to_do 2547 over 12 years Aaron Marcuse-Kubitza to_do/timeline.doc: Updated to reflect the mont...
Makefile 10.5 KB 3249 over 12 years Aaron Marcuse-Kubitza root Makefile: VegBIEN DB: Schemas: Added schem...
README.TXT 2.96 KB 3205 over 12 years Aaron Marcuse-Kubitza README.TXT: Data import: Import data into VegBI...
map 1.21 KB 3140 over 12 years Aaron Marcuse-Kubitza top-level map: Added support for custom public ...

Latest revisions

# Date Author Comment
3314 07/10/2012 09:07 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Removed locationevent.datasource_id mappings because locationevents are now scoped by their required location, which itself is scoped by datasource

3313 07/10/2012 08:42 PM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Don't sort the inserted rows by pkey because they should stay in the table order that they were in. (The select order with no ORDER BY should be the table order. Even if it isn't, it doesn't matter what order they are in for our current application.)

3312 07/10/2012 08:38 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Creating an empty pkeys table: Don't sort the inserted result by pkey because it's empty (limit=0)

3311 07/10/2012 08:32 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore(): Fixed bug where in_col's table needed to be changed to insert_in_table, because it's insert_in_table's rows that are being modified but mapping (which in_col comes from) qualifies columns by in_table

3310 07/10/2012 08:28 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore(): Also add an index on in_col if mapping the value to NULL

3309 07/10/2012 08:28 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore(): Only delete from the insert_in_table, because the invalid rows only need to be removed from the rows that are actually being inserted into the DB. If there are invalid rows in the full (not uniquified) in_table, that's OK, as they can still get a valid output pkey if the first copy of a row they were considered a duplicate of is valid (this is a very unusual situation, so this change should not affect most real data).

3308 07/10/2012 08:22 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore(): Merged filter_ var into sql.delete() call because that's the only place it's used

3307 07/10/2012 08:18 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): insert_into_pkeys(): Removed no longer used distinct param

3306 07/10/2012 08:16 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Getting output table pkeys of existing/inserted rows: Don't DISTINCT ON the joined rows by input pkey, because this adds sorting overhead. This should not be needed because there generally should not be any duplicate rows for the columns in a unique index (if there are, this is an index configuration problem and should be fixed in the schema). It's possible that partial indexes (with a filter condition) were causing this, but testing without it in place will be needed to determine the cause.

3305 07/10/2012 08:02 PM Aaron Marcuse-Kubitza

sql.py: flatten(): Auto-add a pkey on the created temp table. This should be standard practice for most temp tables, and for sql_io.put_table() especially this will be useful if we ever want to add back sorting the in_table by row_num (possibly by CLUSTERing on the pkey to avoid pkey index scans).

View all revisions | View revisions

Also available in: Atom