Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 almost 13 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  bin 3271 over 12 years Aaron Marcuse-Kubitza csv2db: verbosity defaults to 3 so that detaile...
  config 272 about 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 3282 over 12 years Aaron Marcuse-Kubitza inputs/import.stats.xls: Fixed date for most re...
  lib 3287 over 12 years Aaron Marcuse-Kubitza sql_io.py: put_table(): Save default values for...
  mappings 3229 over 12 years Aaron Marcuse-Kubitza mappings/VegX-VegBIEN.stems.csv: Sort the plant...
  schemas 3284 over 12 years Aaron Marcuse-Kubitza schemas/vegbien.sql: taxondetermination: Fixed ...
  to_do 2547 over 12 years Aaron Marcuse-Kubitza to_do/timeline.doc: Updated to reflect the mont...
Makefile 10.5 KB 3249 over 12 years Aaron Marcuse-Kubitza root Makefile: VegBIEN DB: Schemas: Added schem...
README.TXT 2.96 KB 3205 over 12 years Aaron Marcuse-Kubitza README.TXT: Data import: Import data into VegBI...
map 1.21 KB 3140 over 12 years Aaron Marcuse-Kubitza top-level map: Added support for custom public ...

Latest revisions

# Date Author Comment
3287 07/10/2012 04:36 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Save default values for all rows in new temp table full_in_table since in_table may have rows deleted

3286 07/10/2012 04:13 PM Aaron Marcuse-Kubitza

sql.py: Added mk_delete() and delete()

3285 07/10/2012 03:36 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): mk_main_select(): Turned off unnecessary ORDER BY to avoid sorting the entire table every time it's used. (PostgreSQL has no concept of reordering a table and re-using that ordering, so it just re-sorts the table each time. Index scans on the pkey do not appear to be used in practice, according to EXPLAIN results from live imports.) Document that we instead assume that identical SELECT queries retrieve rows in the same order.

3284 07/10/2012 01:56 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxondetermination: Fixed bug where taxondetermination_taxonoccurrence_id_fkey trigger was applied before the NOT NULL constraint on taxonoccurrence_id was checked, causing the trigger to fail on NULL taxonoccurrence_ids, by making it an AFTER trigger. (An AFTER trigger will still roll back the entire insert if it fails, even though it runs after the insert itself.)

3283 07/09/2012 05:45 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: institution_id: Fixed typo in comment

3282 07/09/2012 05:26 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Fixed date for most recent import

3281 07/09/2012 05:26 PM Aaron Marcuse-Kubitza

sql.py: DbConn.run_query(): Put the data source comment on a separate line in the log file instead of using a carriage return, which sometimes had the desired effect of overwriting the src comment with the first line of the query but sometimes the line lengths weren't right and there wasn't enough overlap

3280 07/09/2012 04:53 PM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Synced with schema

3279 07/09/2012 04:42 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed per-column indexes, which are no longer needed by either row-based or column-based import because they are able to do a merge join or lookup using the table's UNIQUE INDEX. Instead of forcing the database to build and maintain large indexes (15+ GB!) that are not used, optimization-only (non-UNIQUE) indexes should be added as needed only once the database is actually used for queries. In most cases it will not even be necessary to add additional indexes then, because most UNIQUE indexes can be reused for broad lookups (rather than just duplicate elimination). Even the foreign key covering indexes (fki_*) are not needed because we virtually never delete rows in the DB, and even if we were to start doing that regularly, the cost of maintaining the indexes on import is most likely not worth the speed improvements for cascading deletes.

3278 07/09/2012 04:32 PM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Removed per-column indexes on relational functions, which are no longer needed by row-based import because it is able to do a merge join-style lookup using the table's UNIQUE INDEX. (Note that column-based import doesn't use the (slower) relational functions at all anymore, and instead calls the corresponding SQL function directly using named arguments.)

View all revisions | View revisions

Also available in: Atom