Project

General

Profile

Actions

Task #430

open

Estimate total storage size of daily 1970-2010 1km dataset

Added by Adam Wilson about 12 years ago. Updated about 12 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Climate
Start date:
06/14/2012
Due date:
% Done:

100%

Estimated time:
1.00 h
Activity type:
Coding/analysis

Description

I was curious how big the daily 1970-2010 1km dataset was going to be, so I did a back-of-the-envelope calculation assuming:

  • Global area: 510,072,000 km^2
  • Global land area: 148,940,000 km^2
  • Count of days 1970-2000: 14,611

I then took various storage types and calculated the total storage requirements:

type          bytes             Just land(TB)       Global (TB)                                                                                                                                                                                     
short int 2 4.0 13.6
long int 4 7.9 27.1
single 4 7.9 27.1
double 8 15.8 54.2

"Just land" assumes that we are only storing the values where there is land (i.e. not a full grid) and is likely an underestimate. If we scale and offset the data, we should be ok to store it as a short integer, so we're looking at somewhere between 4 and 14 TB for the full dataset. If we store it in a compressed format, we may be able to shrink it more.

The code I used to do this, if anyone is curious or wants to check it, is available here:[https://projects.nceas.ucsb.edu/nceas/projects/environment-and-orga/repository/entry/climate/extra/dailydatasize.r?rev=aw%2Fprecip]

Actions #1

Updated by Adam Wilson about 12 years ago

And this is for just one variable. So multiply by three for tmax, tmin, and ppt. And add more if we add any pixel-by-pixel error flags...

Actions

Also available in: Atom PDF