Project

General

Profile

Task #883

Updated by Aaron Marcuse-Kubitza about 10 years ago

h3. issue 

 * due to recent bugs in Postgres[1], full-database import now causes all the available disk space to be used up, and crashes the import, causing disk space errors in 29 of [[VegBIEN_contents#datasources|41 datasources]]: 
 <pre> 
 ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk 
 export version=r13016 
 grep --files-with-matches -F "No space left on device" inputs/{.,}*/*/logs/$version.log.sql 
 # and uniqify by datasource 
 </pre> 

 * there is no soft limit on disk space inside Postgres, so the hard limit gets reached instead, causing an error which ricochets across the system and crashes various processes (similar to an out-of-memory condition caused by kernel overcommit, except that Postgres already has a throttle for that problem) 
 * since Postgres does not have a disk space throttle[1], throttle[2], our scripts need to do this instead 

 fn1((. this is a recent problem in Postgres, because we used to need only 100 GB of free disk space for the import (in r6802/2012-12-12), but now we need 1 _TB_ (10x as much). also, it only became an issue on the last import, and affects most datasources, so it was _not_ likely a latent problem that just never crossed the disk space limit. 

 fn2((. the @temp_file_limit@ config param seems to be intended to do this, but "throws an error":http://www.postgresql.org/docs/9.3/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-DISK instead of handling the problem by pausing (self-throttling) 
 . 

 h3. implementation 

 * @lib/sql.py@ @run_query()@ (the global function, not the method of @DbConn@) should trap @"OperationalError: could not extend file "...": No space left on device"@ and handle it by pausing until the disk space goes back down, and then retrying the last SQL command 
 ** the last SQL command does not need to be idempotent, because the error means that it was rolled back

Back