Task #430
openEstimate total storage size of daily 1970-2010 1km dataset
100%
Description
I was curious how big the daily 1970-2010 1km dataset was going to be, so I did a back-of-the-envelope calculation assuming:
- Global area: 510,072,000 km^2
- Global land area: 148,940,000 km^2
- Count of days 1970-2000: 14,611
I then took various storage types and calculated the total storage requirements:
type bytes Just land(TB) Global (TB)
short int 2 4.0 13.6
long int 4 7.9 27.1
single 4 7.9 27.1
double 8 15.8 54.2
"Just land" assumes that we are only storing the values where there is land (i.e. not a full grid) and is likely an underestimate. If we scale and offset the data, we should be ok to store it as a short integer, so we're looking at somewhere between 4 and 14 TB for the full dataset. If we store it in a compressed format, we may be able to shrink it more.
The code I used to do this, if anyone is curious or wants to check it, is available here:[https://projects.nceas.ucsb.edu/nceas/projects/environment-and-orga/repository/entry/climate/extra/dailydatasize.r?rev=aw%2Fprecip]