Project

General

Profile

Actions

Task #430

open

Estimate total storage size of daily 1970-2010 1km dataset

Added by Adam Wilson over 12 years ago. Updated over 12 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Climate
Start date:
06/14/2012
Due date:
% Done:

100%

Estimated time:
1.00 h
Activity type:
Coding/analysis

Description

I was curious how big the daily 1970-2010 1km dataset was going to be, so I did a back-of-the-envelope calculation assuming:

  • Global area: 510,072,000 km^2
  • Global land area: 148,940,000 km^2
  • Count of days 1970-2000: 14,611

I then took various storage types and calculated the total storage requirements:

type          bytes             Just land(TB)       Global (TB)                                                                                                                                                                                     
short int 2 4.0 13.6
long int 4 7.9 27.1
single 4 7.9 27.1
double 8 15.8 54.2

"Just land" assumes that we are only storing the values where there is land (i.e. not a full grid) and is likely an underestimate. If we scale and offset the data, we should be ok to store it as a short integer, so we're looking at somewhere between 4 and 14 TB for the full dataset. If we store it in a compressed format, we may be able to shrink it more.

The code I used to do this, if anyone is curious or wants to check it, is available here:[https://projects.nceas.ucsb.edu/nceas/projects/environment-and-orga/repository/entry/climate/extra/dailydatasize.r?rev=aw%2Fprecip]

Actions

Also available in: Atom PDF