Task #430
openEstimate total storage size of daily 1970-2010 1km dataset
100%
Description
I was curious how big the daily 1970-2010 1km dataset was going to be, so I did a back-of-the-envelope calculation assuming:
- Global area: 510,072,000 km^2
- Global land area: 148,940,000 km^2
- Count of days 1970-2000: 14,611
I then took various storage types and calculated the total storage requirements:
type bytes Just land(TB) Global (TB)
short int 2 4.0 13.6
long int 4 7.9 27.1
single 4 7.9 27.1
double 8 15.8 54.2
"Just land" assumes that we are only storing the values where there is land (i.e. not a full grid) and is likely an underestimate. If we scale and offset the data, we should be ok to store it as a short integer, so we're looking at somewhere between 4 and 14 TB for the full dataset. If we store it in a compressed format, we may be able to shrink it more.
The code I used to do this, if anyone is curious or wants to check it, is available here:[https://projects.nceas.ucsb.edu/nceas/projects/environment-and-orga/repository/entry/climate/extra/dailydatasize.r?rev=aw%2Fprecip]
Updated by Adam Wilson over 12 years ago
And this is for just one variable. So multiply by three for tmax, tmin, and ppt. And add more if we add any pixel-by-pixel error flags...