Project

General

Profile

Download (11.7 KB) Statistics
| Branch: | Revision:
1

    
2
LST Temporal Aggregation Evaluation
3
====
4
```{r,echo=FALSE,message=FALSE}
5
## get repo info
6
githash=system("git --git-dir /Users/adamw/work/environmental-layers/.git log --pretty=format:'%h' -n 1",intern=T)
7
ghead=paste("Compiled on",date()," using code hash:",githash)
8
#(`r ghead`)
9
```
10

    
11
### Adam M. Wilson and Giuseppe Amatulli
12

    
13
A short script to explore the implications of a 3-month moving window Land Surface Temperature Climatology.
14

    
15
```{r,echo=FALSE,results='hide',message=F}
16
## some setup
17
 uploadimages=T  # upload all images to imgur.com for easy viewing on github
18
opts_knit$set(progress = TRUE, verbose = TRUE,root.dir="~/Downloads",base.url = NULL)
19
if(uploadimages) opts_knit$set(upload.fun =imgur_upload)
20

    
21
opts_chunk$set(fig.width=12, fig.height=8, cache=!uploadimages)
22

    
23
### libraries
24
library(sp)
25
library(reshape)
26
library(plyr)
27
library(latticeExtra)
28
library(knitr)
29

    
30
## Import the data
31
setwd("~")
32
dlst=read.csv("/Users/adamw/Downloads/lst_tmax.csv")
33

    
34
## add month names
35
dlst$month=factor(month.name[dlst$month],ordered=T,levels=month.name)
36

    
37
## add 3 month moving window
38
dlst2=unique(dlst[,c("station","month","TMax","lst","lstp1","lstm1")])
39
dlst2$lst3m=apply(dlst2[,c("lst","lstp1","lstm1")],1,mean)
40
dlst2$lst2m=apply(dlst2[,c("lstp1","lstm1")],1,mean)
41

    
42
```
43

    
44
# Data
45
The following data were extracted from the shapefiles used to fit the tiled interpolations that Alberto has already run. They are limited to the CONUS region and include mean monthly maximum temperature observed at the station locations, as well as the LST from the month before, the month of, and the month following.  Here's what the data look like:
46

    
47
```{r,results="asis", echo=F}
48
kable(head(dlst[,-1]), format="markdown")
49
```
50

    
51

    
52
# TMax~LST relationship seasonal variability 
53

    
54
First let's explore the variability of the TMax~LST relationship by month, grouped by tile.  In this plot grey lines indicate a Tmax~LST regression within each tile (stations may be present in multiple tiles). Variability among grey lines represents spatial variability in the fitted lapse rate and the heavy black line is overall mean relationship (by month).
55
```{r,echo=FALSE}
56
xyplot (TMax~lst | month, groups=tile,data=dlst , 
57
        main="Tmax~LST at station locations by month",
58
        sub="Colors indicate  initial tiled-CONUS TMAX interpolation",
59
        ylab="Observations Tmax", xlab="LST",cex=.2,pch=16,auto.key=list(space="right"))+
60
  layer(panel.abline(0,1,lwd=3,col="black"))+
61
  glayer(panel.abline(lm (y~x),col="grey"),groups=dlst$tile,under=T)
62
```
63

    
64

    
65
# Comparison of monthly means with 3-month moving window
66
Here we compare the monthly means with a three month moving window (e.g. January mean LST with December-February mean LST).  Note that the relationship is very good (R^2 >0.95) but slightly weaker in winter months, likely due primarily to seasonal minimums.  Heavy black line is 1:1, and thin line is the fitted regression.
67

    
68
```{r,echo=FALSE}
69
xyplot (lst~lst3m | month,data=dlst2 , 
70
        main="Tmax~LST at station locations by month",
71
        ylab="LST (monthly)", xlab="LST (3-month Mean)",cex=.5,pch=16,auto.key=list(space="right"))+
72
  layer(panel.abline(0,1,lwd=3,col="black"))+
73
  layer(panel.ablineq(lm(y ~ x), r.sq = TRUE, at = 0.1, digits=2,adj=0:1,pos=4,offset=3,col="black"), style = 2)
74

    
75
```
76

    
77
# Comparison of monthly means with 2-month moving window that does not include the month
78
Here we compare the monthly means with a two month moving window that does not include the month of interest (e.g. January mean LST with December and February mean LST, but not including the January LST).  Heavy black line is 1:1, and thin line is the fitted regression.
79

    
80
```{r,echo=FALSE}
81
xyplot (lst~lst2m | month,data=dlst2 , 
82
        main="LST at station locations by month",
83
        ylab="LST (monthly)", xlab="LST (2-month Mean)",cex=.5,pch=16,auto.key=list(space="right"))+
84
  layer(panel.abline(0,1,lwd=3,col="black"))+
85
  layer(panel.ablineq(lm(y ~ x), r.sq = TRUE, at = 0.2, digits=2,adj=0:1,pos=4,offset=2,col="black"), style = 2)
86

    
87
```
88

    
89
Now let's look at the differences between the 1-month and 2-month LST values.  These represent a measure of how wrong we would be if we only had data from the two surrounding months and not the month in question.  
90

    
91
```{r,fig.height=3}
92
histogram(dlst2$lst-dlst2$lst2m,col="grey",xlab="Anomolies (1 month - 2 month means)")
93
```
94
The 95th quantile of the absolute value is only `r round(quantile(abs(dlst2$lst-dlst2$lst2m),na.rm=T,0.95),1)`, so the differences are quite small. From this analysis, it appears that broadening the temporal window will maintain a relatively consistent estimate of LST (R^2 ranged from 0.88-0.9) even when using only data from the surrounding months.
95

    
96
Let's see how the seasonal cycle is represented by these proxies for a few randomly selected stations.  Here the red line is the observed TMax data, the heavy black line represents the mean LST in that month, and the green line is a three-month moving window, while the purple line is the 2-month window (not including the month of interest).
97

    
98
```{rm,echo=FALSE}
99
## subset to 10 stations
100
dlst2l=melt(dlst2,id.vars=c("station","month"),measure.vars=c("TMax","lst","lst3m","lst2m"))
101
dlst2ls=dlst2l[dlst2l$station%in%sample(unique(dlst2l$station),10),]
102
```
103

    
104
```{r,echo=FALSE}
105
trellis.par.set(superpose.line = list(col = c("red","black","green","blue"),lwd=c(2,2,1,1)),
106
                superpose.symbol = list(col = c("red","black","green","blue"),lwd=c(2,2,1,1)))
107
xyplot (value~month|station, data=dlst2ls, groups=variable,type="l",
108
        main="LST at station locations by month",
109
        ylab="LST (monthly)", xlab="LST (2-month Mean)",cex=.5,pch=16,auto.key=list(space="right"),scales=list(x=list(rot=45)),
110
        layout=c(2,5))
111

    
112
```
113

    
114
# Processing Options
115

    
116
## Solution 1 (relatively easy)
117

    
118
1. Spatial interpolation: fill in LST when another observation is available within  some small distance (3? 5? 7? kilometers)
119
2. Temporal interpolation: Estimate the monthly value at the 15th day by linear interpolation (in the temporal domain) considering the temporally closest observations in the previous and following month.
120

    
121
This is more robust than the simple mean of 3 months.  For example, imagine using a simple mean in the following case:  
122
* month -1: no observation 
123
* month  0: no observation                                     
124
* month +1: observation in the end of the month
125

    
126
Would lead to a prediction = month +1.
127

    
128
Or, alternatively, one in which:
129
* month -1: has data for the full month
130
* month 0: has no data
131
* month +1: has only data near end of month
132

    
133
In this case a mean would be 'pulled' towards the mean value of month-1.  Using a fixed number of observations in t-1 and t+1 and a linear interpolation that accounts for the time of those observations would prevent 
134

    
135
## New Solution 2 (also relatively easy)
136

    
137
1. Create 12 monthly LST climatologies if at least 5 or 10 observations are available for each pixel-month.
138
2. Fill in missing months using temporal interpolation:
139
   * Fit function through all available months (rather than just +/- 1 month) to capture seasonal variabilty.
140
   * Use some function that can capture the cyclical nature (sin, spline, polynomial, etc.)
141

    
142

    
143
Remember from above that the LST curves are fairly smooth (caveat: we've only looked at locations in the CONUS dataset).  So let's try fitting a simple sin function to estimate missing months:
144
```{r,echo=T}
145
### Curve fitting function
146
### Set parameter starting values and limits
147
parms=list(phi=5,A=.05)  #starting parameters
148
low=list(phi=-999,A=0)  #lower bounds - constrain s3 to be positive 
149
high=list(phi=999,A=999)  #lower bounds - constrain s3 to be positive
150

    
151
## define a simple model estimating a sin function with variable shift and amplitude
152
fit.model<-function(parms,x) {
153
      fit=sin(parms[["phi"]]+(x*2*pi))*abs(parms[["A"]])
154
      fit[fit==Inf]=.Machine$double.xmax  #if Inf, replace with maximum value
155
      fit[fit==-Inf]=.Machine$double.xmin  #if Inf, replace with maximum value
156
    list(fit=fit)
157
  } 
158
## function to minimize the RMSE of predictions
159
 minimize.model<-function(parms,x,y) {
160
    ss=sum((y-fit.model(parms,x)$fit)^2)
161
    ss=min(ss,.Machine$double.xmax)  #if Inf, replace with maximum value
162
    ss=max(ss,.Machine$double.xmin)  #if -Inf, replace with maximum value
163
    list(ss=ss)
164
  }
165

    
166
## Process station data and make predictions for each month
167
makepreds=function(dat,drop=NULL){
168
  dat=na.omit(dat)
169
  dat=dat[dat$variable=="lst",]
170
  ## drop
171
  if(nrow(dat)==0){ return(NULL)}
172
  ## drop some proportion of data if drop is specified
173
  if(!is.null(drop)) dat=dat[sample(1:nrow(dat),round(nrow(dat)*(1-drop))),]
174
  dat=dat[order(dat$month),]
175
  dat$time=(1:nrow(dat))/nrow(dat)
176
  xbar=mean(dat$value,na.rm=T)
177
  fit=optim(unlist(parms),minimize.model,
178
            x=dat$time,y=dat$value-xbar,
179
            control=list(trace=0,maxit=10000),method="L-BFGS-B",lower=low,upper=high)
180

    
181
  dat$pred=xbar+sin(fit$par[["phi"]]+(dat$time*2*pi))*fit$par[["A"]]
182
  dat$A=fit$par[["A"]]
183
  dat$phi=fit$par[["phi"]]
184
  dat
185
}
186
## apply the function
187
d4=ddply(dlst2ls,.(station),.fun=makepreds)
188
d4l=melt(d4,id.vars=c("station","month"),measure.vars=c("value","pred"))
189
```
190

    
191
Let's see what that looks  for the 10 example stations above. The red line is the observed LST value and the black line is the predicted LST from the sinusoidal function.
192
```{r,echo=FALSE}
193
xyplot (value~month|station, data=d4l, groups=variable,type="l",
194
        main="LST at station locations by month",
195
        ylab="LST (monthly)", xlab="LST (2-month Mean)",cex=.5,pch=16,auto.key=list(space="right"),scales=list(x=list(rot=45)))
196
```
197
And a histogram of the residuals for these stations:
198
```{r,echo=FALSE,fig.height=3}
199
d4$anomoly=d4$value-d4$pred
200
hist(d4$anomoly,col="grey",main="Histogram of residuals for predictions from sinusoidal model",xlab="Degrees C")
201
```
202

    
203
Not bad... Now let's drop 25% of the observations from each station and do it again:
204
```{r,echo=F}
205
## apply the function
206
d4a=ddply(dlst2ls,.(station),.fun=makepreds,drop=.25)
207
d4a$anomoly=d4a$value-d4a$pred
208
d4al=melt(d4a,id.vars=c("station","month"),measure.vars=c("value","pred"))
209

    
210
xyplot (value~month|station, data=d4al, groups=variable,type="l",
211
        main="LST at station locations by month",
212
        ylab="LST (monthly)", xlab="LST (2-month Mean)",cex=.5,pch=16,auto.key=list(space="right"),scales=list(x=list(rot=45)))
213
```
214

    
215
And the  of residuals for these 10 stations with 25% of the observations removed:
216
```{r,echo=FALSE,fig.height=3}
217
hist(d4a$anomoly,col="grey",main="Histogram of residuals for predictions from sinusoidal model",xlab="Degrees C")
218
```
219

    
220
### Apply this function to the full CONUS dataset described above.
221
Now look at the distributions of the residuals for the full conus dataset.
222
```{r,echo=FALSE,fig.height=3}
223
## apply the function
224
d5=ddply(dlst2l,.(station),.fun=makepreds,drop=.25)
225
d5$anomoly=d5$value-d5$pred
226
hist(d5$anomoly,col="grey",main="Histogram of residuals for predictions from sinusoidal model",xlab="Degrees C")
227
```
228
And summarize the residuals into RMSE's by station:
229
```{r, echo=FALSE,fig.height=3}
230
d5_rmse=ddply(d5,.(station),.fun=function(dat){
231
  sqrt(mean( (dat$pred - dat$value)^2, na.rm = TRUE))
232
})
233
hist(d5_rmse$V1,col="grey",main="Distribution of station RMSE's for predictions from sinusoidal model",xlab="RMSE")
234
```
235
So the vast majority of stations will have a RMSE of <5 using this simple method even if 25% of the data are missing.
236

    
237
# Next steps  
238
We should pick 3-4 problematic tiles and 1 tile with little missing data and explore how these various options differ in infilling LST.  It would also be ideal if we could complete the interpolation using these different datasets and compare the resulting estimates of TMax over these tiles. 
239

    
240
#  Remaining Questions  
241

    
242
1. How does this variability compare to spatial variability (e.g. which is worse, spatial or temporal smoothing)?
243
2. How do the anomolies from different temporal windows vary spatially?  
(5-5/16)