Task #883
Updated by Aaron Marcuse-Kubitza over 10 years ago
h3. issue * due to recent bugs in Postgres[1], full-database import now causes all the available disk space to be used up, and crashes the import, causing disk space errors in 29 of [[VegBIEN_contents#datasources|41 datasources]]: <pre> ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk export version=r13016 grep --files-with-matches -F "No space left on device" inputs/{.,}*/*/logs/$version.log.sql # and uniqify by datasource </pre> * there is no soft limit on disk space inside Postgres, so the hard limit gets reached instead, causing an error which ricochets across the system and crashes various processes (similar to an out-of-memory condition caused by kernel overcommit, except that Postgres already has a throttle for that problem) * since Postgres does not have a disk space throttle[2], our scripts need to do this instead fn1((. this is a recent problem in Postgres, because we used to need only 100 GB of free disk space for the import (in r6802/2012-12-12), but now we need 1 _TB_ (10x as much). although it only became an issue on the last import, it may have been a latent problem that just never crossed the disk space limit. fn2((. the @temp_file_limit@ config param seems to be intended to do this, but "throws an error":http://www.postgresql.org/docs/9.3/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-DISK instead of handling the problem by pausing (self-throttling) . h3. implementation * @lib/sql.py@ @run_query()@ (the global function, not the method of @DbConn@) should trap @"OperationalError: could not extend file "...": No space left on device"@ and handle it by pausing until the disk space goes back down, and then retrying the last SQL command ** the last SQL command does not need to be idempotent, because the error means that it was rolled back