Project

General

Profile

Download (8.64 KB) Statistics
| Branch: | Revision:
1

    
2
LST Temporal Aggregation Evaluation
3
====
4

    
5

    
6

    
7
### Adam M. Wilson and Giuseppe Amatulli
8

    
9
A short script to explore the implications of a 3-month moving window Land Surface Temperature Climatology.
10

    
11

    
12

    
13

    
14
# Data
15
The following data were extracted from the shapefiles used to fit the tiled interpolations that Alberto has already run. They are limited to the CONUS region and include mean monthly maximum temperature observed at the station locations, as well as the LST from the month before, the month of, and the month following.  Here's what the data look like:
16

    
17
|station      |month    |    TMax|    lst|  lstp1|  lstm1|tile         |
18
|:------------|:--------|-------:|------:|------:|------:|:------------|
19
|USC00100528  |January  |  0.9592|  1.880|  3.406|  1.303|50.0_-115.0  |
20
|USC00100667  |January  |  2.5101|  1.630|  2.264|  0.340|50.0_-115.0  |
21
|USC00101079  |January  |  0.8731|  1.598|  3.976|  3.330|50.0_-115.0  |
22
|USC00101363  |January  |  1.5249|  2.682|  3.616|  1.030|50.0_-115.0  |
23
|USC00101956  |January  |  2.6857|  2.354|  4.175|  3.123|50.0_-115.0  |
24
|USC00102159  |January  |  2.6410|  4.978|  5.748|  3.740|50.0_-115.0  |
25

    
26

    
27

    
28
# TMax~LST relationship seasonal variability 
29

    
30
First let's explore the variability of the TMax~LST relationship by month, grouped by tile.  In this plot grey lines indicate a Tmax~LST regression within each tile (stations may be present in multiple tiles). Variability among grey lines represents spatial variability in the fitted lapse rate and the heavy black line is overall mean relationship (by month).
31
![plot of chunk unnamed-chunk-4](http://i.imgur.com/KYSRkMV.png) 
32

    
33

    
34

    
35
# Comparison of monthly means with 3-month moving window
36
Here we compare the monthly means with a three month moving window (e.g. January mean LST with December-February mean LST).  Note that the relationship is very good (R^2 >0.95) but slightly weaker in winter months, likely due primarily to seasonal minimums.  Heavy black line is 1:1, and thin line is the fitted regression.
37

    
38
![plot of chunk unnamed-chunk-5](http://i.imgur.com/kInQcBr.png) 
39

    
40

    
41
# Comparison of monthly means with 2-month moving window that does not include the month
42
Here we compare the monthly means with a two month moving window that does not include the month of interest (e.g. January mean LST with December and February mean LST, but not including the January LST).  Heavy black line is 1:1, and thin line is the fitted regression.
43

    
44
![plot of chunk unnamed-chunk-6](http://i.imgur.com/8AfNmiq.png) 
45

    
46

    
47
Now let's look at the differences between the 1-month and 2-month LST values.  These represent a measure of how wrong we would be if we only had data from the two surrounding months and not the month in question.  
48

    
49

    
50
```r
51
histogram(dlst2$lst - dlst2$lst2m, col = "grey", xlab = "Anomolies (1 month - 2 month means)")
52
```
53

    
54
![plot of chunk unnamed-chunk-7](http://i.imgur.com/7E9SHnc.png) 
55

    
56
The 95th quantile of the absolute value is only 4.2, so the differences are quite small. From this analysis, it appears that broadening the temporal window will maintain a relatively consistent estimate of LST (R^2 ranged from 0.88-0.9) even when using only data from the surrounding months.
57

    
58
Let's see how the seasonal cycle is represented by these proxies for a few randomly selected stations.  Here the red line is the observed TMax data, the heavy black line represents the mean LST in that month, and the green line is a three-month moving window, while the purple line is the 2-month window (not including the month of interest).
59

    
60

    
61

    
62

    
63
![plot of chunk unnamed-chunk-8](http://i.imgur.com/wOsUVGT.png) 
64

    
65

    
66
# Processing Options
67

    
68
## Solution 1 (relatively easy)
69

    
70
1. Spatial interpolation: fill in LST when another observation is available within  some small distance (3? 5? 7? kilometers)
71
2. Temporal interpolation: Estimate the monthly value at the 15th day by linear interpolation (in the temporal domain) considering the temporally closest observations in the previous and following month.
72

    
73
This is more robust than the simple mean of 3 months.  For example, imagine using a simple mean in the following case:  
74
* month -1: no observation 
75
* month  0: no observation                                     
76
* month +1: observation in the end of the month
77

    
78
Would lead to a prediction = month +1.
79

    
80
Or, alternatively, one in which:
81
* month -1: has data for the full month
82
* month 0: has no data
83
* month +1: has only data near end of month
84

    
85
In this case a mean would be 'pulled' towards the mean value of month-1.  Using a fixed number of observations in t-1 and t+1 and a linear interpolation that accounts for the time of those observations would prevent 
86

    
87
## New Solution 2 (also relatively easy)
88

    
89
1. Create 12 monthly LST climatologies if at least 5 or 10 observations are available for each pixel-month.
90
2. Fill in missing months using temporal interpolation:
91
   * Fit function through all available months (rather than just +/- 1 month) to capture seasonal variabilty.
92
   * Use some function that can capture the cyclical nature (sin, spline, polynomial, etc.)
93

    
94

    
95
Remember from above that the LST curves are fairly smooth (caveat: we've only looked at locations in the CONUS dataset).  So let's try fitting a simple sin function to estimate missing months:
96

    
97
```r
98
### Curve fitting function Set parameter starting values and limits
99
parms = list(phi = 5, A = 0.05)  #starting parameters
100
low = list(phi = -999, A = 0)  #lower bounds - constrain s3 to be positive 
101
high = list(phi = 999, A = 999)  #lower bounds - constrain s3 to be positive
102

    
103
## define a simple model estimating a sin function with variable shift and
104
## amplitude
105
fit.model <- function(parms, x) {
106
    fit = sin(parms[["phi"]] + (x * 2 * pi)) * abs(parms[["A"]])
107
    fit[fit == Inf] = .Machine$double.xmax  #if Inf, replace with maximum value
108
    fit[fit == -Inf] = .Machine$double.xmin  #if Inf, replace with maximum value
109
    list(fit = fit)
110
}
111
## function to minimize the RMSE of predictions
112
minimize.model <- function(parms, x, y) {
113
    ss = sum((y - fit.model(parms, x)$fit)^2)
114
    ss = min(ss, .Machine$double.xmax)  #if Inf, replace with maximum value
115
    ss = max(ss, .Machine$double.xmin)  #if -Inf, replace with maximum value
116
    list(ss = ss)
117
}
118

    
119
## Process station data and make predictions for each month
120
makepreds = function(dat, drop = NULL) {
121
    dat = na.omit(dat)
122
    dat = dat[dat$variable == "lst", ]
123
    ## drop
124
    if (nrow(dat) == 0) {
125
        return(NULL)
126
    }
127
    ## drop some proportion of data if drop is specified
128
    if (!is.null(drop)) 
129
        dat = dat[sample(1:nrow(dat), round(nrow(dat) * (1 - drop))), ]
130
    dat = dat[order(dat$month), ]
131
    dat$time = (1:nrow(dat))/nrow(dat)
132
    xbar = mean(dat$value, na.rm = T)
133
    fit = optim(unlist(parms), minimize.model, x = dat$time, y = dat$value - 
134
        xbar, control = list(trace = 0, maxit = 10000), method = "L-BFGS-B", 
135
        lower = low, upper = high)
136
    
137
    dat$pred = xbar + sin(fit$par[["phi"]] + (dat$time * 2 * pi)) * fit$par[["A"]]
138
    dat$A = fit$par[["A"]]
139
    dat$phi = fit$par[["phi"]]
140
    dat
141
}
142
## apply the function
143
d4 = ddply(dlst2ls, .(station), .fun = makepreds)
144
d4l = melt(d4, id.vars = c("station", "month"), measure.vars = c("value", "pred"))
145
```
146

    
147

    
148
Let's see what that looks  for the 10 example stations above. The red line is the observed LST value and the black line is the predicted LST from the sinusoidal function.
149
![plot of chunk unnamed-chunk-10](http://i.imgur.com/WYgS7qx.png) 
150

    
151
And a histogram of the residuals for these stations:
152
![plot of chunk unnamed-chunk-11](http://i.imgur.com/qRekbih.png) 
153

    
154

    
155
Not bad... Now let's drop 25% of the observations from each station and do it again:
156
![plot of chunk unnamed-chunk-12](http://i.imgur.com/n6ZP6yB.png) 
157

    
158

    
159
And the  of residuals for these 10 stations with 25% of the observations removed:
160
![plot of chunk unnamed-chunk-13](http://i.imgur.com/8Z1Niwd.png) 
161

    
162

    
163
### Apply this function to the full CONUS dataset described above.
164
Now look at the distributions of the residuals for the full conus dataset.
165
![plot of chunk unnamed-chunk-14](http://i.imgur.com/iSdwpOe.png) 
166

    
167
And summarize the residuals into RMSE's by station:
168
![plot of chunk unnamed-chunk-15](http://i.imgur.com/M29nxV1.png) 
169

    
170
So the vast majority of stations will have a RMSE of <5 using this simple method even if 25% of the data are missing.
171

    
172
# Next steps  
173
We should pick 3-4 problematic tiles and 1 tile with little missing data and explore how these various options differ in infilling LST.  It would also be ideal if we could complete the interpolation using these different datasets and compare the resulting estimates of TMax over these tiles. 
174

    
175
#  Remaining Questions  
176

    
177
1. How does this variability compare to spatial variability (e.g. which is worse, spatial or temporal smoothing)?
178
2. How do the anomolies from different temporal windows vary spatially?  
(7-7/16)