/ - Repository - BIEN 3 - NCEAS Projects

Name	Size	Revision	Age	Author	Comment
_archive		1598	over 12 years	Aaron Marcuse-Kubitza	Moved _archive/tapir2flatClient/trunk/client/ t...
analysis		3076	over 12 years	Aaron Marcuse-Kubitza	Added top-level analysis dir for range modeling
bin		3271	over 12 years	Aaron Marcuse-Kubitza	csv2db: verbosity defaults to 3 so that detaile...
config		272	almost 13 years	Aaron Marcuse-Kubitza	Moved bien_password to new config dir
inputs		3282	over 12 years	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Fixed date for most re...
lib		3309	over 12 years	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore(): Only delete f...
mappings		3229	over 12 years	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.stems.csv: Sort the plant...
schemas		3300	over 12 years	Aaron Marcuse-Kubitza	schemas/tree_cross-links.sql: Ancestors table: ...
to_do		2547	over 12 years	Aaron Marcuse-Kubitza	to_do/timeline.doc: Updated to reflect the mont...
Makefile	10.5 KB	3249	over 12 years	Aaron Marcuse-Kubitza	root Makefile: VegBIEN DB: Schemas: Added schem...
README.TXT	2.96 KB	3205	over 12 years	Aaron Marcuse-Kubitza	README.TXT: Data import: Import data into VegBI...
map	1.21 KB	3140	over 12 years	Aaron Marcuse-Kubitza	top-level map: Added support for custom public ...

#	Date	Author	Comment
3309	07/10/2012 08:28 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore(): Only delete from the insert_in_table, because the invalid rows only need to be removed from the rows that are actually being inserted into the DB. If there are invalid rows in the full (not uniquified) in_table, that's OK, as they can still get a valid output pkey if the first copy of a row they were considered a duplicate of is valid (this is a very unusual situation, so this change should not affect most real data).
3308	07/10/2012 08:22 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore(): Merged filter_ var into sql.delete() call because that's the only place it's used
3307	07/10/2012 08:18 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): insert_into_pkeys(): Removed no longer used distinct param
3306	07/10/2012 08:16 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Getting output table pkeys of existing/inserted rows: Don't DISTINCT ON the joined rows by input pkey, because this adds sorting overhead. This should not be needed because there generally should not be any duplicate rows for the columns in a unique index (if there are, this is an index configuration problem and should be fixed in the schema). It's possible that partial indexes (with a filter condition) were causing this, but testing without it in place will be needed to determine the cause.
3305	07/10/2012 08:02 PM	Aaron Marcuse-Kubitza	sql.py: flatten(): Auto-add a pkey on the created temp table. This should be standard practice for most temp tables, and for sql_io.put_table() especially this will be useful if we ever want to add back sorting the in_table by row_num (possibly by CLUSTERing on the pkey to avoid pkey index scans).
3304	07/10/2012 07:54 PM	Aaron Marcuse-Kubitza	sql.run_query_into() calls: Use new add_pkey_ param instead of manually calling sql.add_pkey()
3303	07/10/2012 07:53 PM	Aaron Marcuse-Kubitza	sql.py: run_query_into(): Changed add_indexes_ param to add_pkey_ and add just a pkey if it's set. It's no longer necessary to create indexes on every column of a temp table, because the covering indexes for the join columns have been fixed to have columns in the same order as the output table's corresponding index so that they can be used for a merge join.
3302	07/10/2012 07:41 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Add pkey on pkeys table right when it's created, so that any duplicates are detected right away instead of at the end of the iteration. (Duplicates are created as a result of joins matching multiple rows, which often indicates a database misconfiguration.)
3301	07/10/2012 07:34 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Adding pkey on pkeys table: Removed log message because adding an index is considered a low-level operation, which isn't included in the Redmine SQL
3300	07/10/2012 07:27 PM	Aaron Marcuse-Kubitza	schemas/tree_cross-links.sql: Ancestors table: Synced with current definition, which removes unneeded fki_* indexes. Note that the index on ancestor_id might be needed in the future if we ever want to get all the descendants of a plantname/namedplace or perform deletions on plantname/namedplace (which cascade to _ancestor). For getting all the plantnames/namedplaces (of any rank) for a plantconcept/locationdetermination, though, the _ancestor_pkey index is sufficient because plantname_id/namedplace_id is the first column in it.

Project

General

Profile

Latest revisions

Project

General

Profile

root @ 3309

Latest revisions