Project

General

Profile

Wilson updates

2013-03-11

What I did these past week(s)
Alberto, Giuseppe, Benoit, and I just had a useful conference call about using NEX/Pleiades. Alberto walked us through the process of loading software and running a job.
  • Giuseppe and Benoit recently received their key to login and will start using it.
  • Alberto described where he's putting the interpolation and LST climatology outputs and we discussed a scheme to keep files organized
  • Alberto will make his directories readable and share his setup (environ.sh) scripts that load all the necessary software to run the interpolation script

2013-02-03

What I did these past week(s)
I'm in the final stages of processing and validating the MOD09 cloud frequency dataset. The climatologies need to be processed/corrected to remove latitudinal banding due to orbital artefacts and I'm currently exploring various methods to do this. I am also working on the manuscript, which is fairly well developed and just needs the final methodology and results to be completed.

2013-12-02

What I did these past week(s)
Working on the MOD09 cloud climatology paper and validation. See [[https://projects.nceas.ucsb.edu/nceas/documents/189
]] for a presentation describing progress (including an overview of the MOD35 problems).

2013-11-05

What I did these past week(s)
The MOD35 Cloud mask paper has been accepted to Remote Sensing of Environment. Alberto finished the global monthly LST averages. He's also close to finishing the code to create/subset/mosaic all the covariate layers to the wgs84 grid.

Next Steps
We'll need to inspect the LST data before beginning the moving-window interpolation.

2013-10-07

What I did these past week(s)
The Bayesian interpolation paper was accepted to the International Journal of Climatology. I've also been exploring different options to improve the quantification of the landcover bias in the mod35 c5 cloud mask. The paper was conditionally accepted with few changes, but adding a section with a more thorough evaluation would make it a stronger paper.

Next Steps
Resubmit the land cover bias paper. Then move to the global cloud climatology manuscript.

What obstacles are blocking progress
I still need to decide which cloud product (MOD35 Collection 6 or MOD09 Collection 5) will be used for the final cloud climatology paper.
h2. 2013-09-24

What I did these past week(s)
I have mostly finished revisions to the MOD35 cloud mask bias paper. I plan to resubmit in the next week. I have also nearly finished revisions to the bayesian interpolation methodology (copy available upon request).

Next Steps
Resubmit both manuscripts in the next two weeks. Finalize global cloud climatology manuscript.

What obstacles are blocking progress
I still need to decide which cloud product (MOD35 Collection 6 or MOD09 Collection 5) will be used for the final cloud climatology paper.
h2. 2013-09-10

What I did these past week(s)
I've worked out all known bugs in the MOD35_L2 gridding procedure. The final one was due to a gdal bug that replaced the missingvalue attribute with 0 if there was no missing data. This led to many clear pixels (cloud=0) being labeled as missing in the summary procedure. I've added a workaround for this (e5c2e69b5f7f7ba). However, the Collection 6 still shows some processing-path artefacts, particularly around water/coastlines. A global draft should be available soon.

I received the reviews for the manuscript describing the problems I've found in the Collection 5 MOD35 cloud mask and they are favorable. I plan to resubmit in the next week. I also finally received reviews on my manuscript describing a bayesian interpolation methodology for generating good credible intervals on derived climate metrics (from my dissertation).

Next Steps
Resubmit both manuscripts in the next month. Finalize global cloud climatology manuscript.

What obstacles are blocking progress
I still need to decide which cloud product (MOD35 Collection 6 or MOD09 Collection 5) will be used for the final cloud climatology paper.

2013-08-13

What I did these past week(s)
I'm beginning to write cloud climatology paper (introduction, methods). I evaluated the updated swtif (see below) and found that it still had interpolation artefacts. I put together a summary of the issue with some example granules and sent these to the developer. He recently sent me another updated version that I haven't checked yet.

Next Steps
Continue work on the cloud climatology paper and check the updated swtif program. If the swtif program is fixed, begin processing the global cloud climatologies.

What obstacles are blocking progress
The ongoing delays in gridding the cloud data have been disappointing. Hopefully the new version will be functional.
h2. 2013-07-30

What I did these past week(s)
The MOD35 landcover bias paper has been submitted to Remote Sensing of Environment.

Next Steps
Start working on the cloud climatology paper. The methods will be fairly straightforward, but I'd like to incorporate some biological data to illustrate the utility of the cloud layer. Any suggestions on this are welcome.

What obstacles are blocking progress
The HEG developer has sent me an updated version of the gridding tool (swtif), but the interpolation bug remains. He tells me he's found the source and may have a fixed version this week. I'm putting the cloud processing on hold to see if this actually happens.

2013-07-16

What I did these past week(s)
The short communication about the C5 MOD35 Cloud Product's land cover bias has been reviewed by the cloud mask developer and I'll submit it soon (maybe this week). If anyone has further comments feel free to pass them along.

Next Steps
I'm still seeing gridding artefacts from the swtif program. The developer of this software acknowledged the problem and said he may be able to work on this bug in mid-July, but I'm not too hopeful that I'll get the updated software soon....
So we'll need to choose what to use for this. There are two options, as I see it:
  • C6 MOD35
    • still has the gridding artefacts between tiles and isn't suitable as is (way better than it was a month ago, but not good enough).
    • We would have to wait for updated swtif gridding software to make this right.
  • C5 MOD09 internal cloud mask:
    • In the station comparison I've done with the C5 MOD35 product it performs better.
    • No obvious landcover artefacts
    • Already gridded so would be much easier to work with.
    • The algorithm is less complex/sophisticated, but this is not always a bad thing.
    • It "looks" good because it doesn't seem to use landcover at all, so you get smooth transitions across boundaries and even on coastlines, while MOD35 has more processing artefacts. However, they are >50% different in some regions (see Figure 1 in the attached MS).
    • However, if I go with this, I'll have to rewrite the climatology script to work with these data (which are already on NEX) instead of the swath data. This would probably take ~1 week.

Validation: The station dataset I have ([[http://cdiac.ornl.gov/epubs/ndp/ndp026d/ndp026d.html]]) is perfect, but I'll have to make some decisions about how to do the validation. Since the focus is on climatologies I'll probably limit it to 12-year monthly means, but it would be possible to do timeseries of annual or even monthly cloud frequencies over 2000-2009 (station data stops in 2009).

What obstacles are blocking progress
I've spent far too long trying to de-bug NASA's HEG software and develop my own workarounds to account for the bugs. I'm leaning towards just using the MOD09 cloud mask for the reasons listed above.

2013-03-01

What I did this past week(s)
I'm wrapping up a short communication about the C5 MOD35 Cloud Product's land cover bias including artefacts introduced into MOD11 (LST) and MOD17 (NPP).

What I'm working on now
1) I'm wrapping up the paper comparing the various cloud masks (see previous weeks note for details) and 2) preparing to generate the climatology on the full dataset (2000-2012). This has required some thought/experimentation into dealing with the large data volume and planning for archiving various stages of data on the Pleiades cluster.

What obstacles are blocking progress
The full MOD35 dataset is now available on Pleiades, so we're approaching a full global run. So far, I've done summaries of one full year (2009). The first run of these showed gridding artefacts (adjacent tiles with different mean cloudiness that varies at the tile boundary) that I'm still working to figure out. Because the underlying data are much larger than tiles, the tiles are cut from the same swaths and there shouldn't be and shifts at tile boundaries.

2013-03-01

What I did this past week(s)
I'm writing a short communication about the C5 MOD35 Cloud Product's land cover bias.

The cloud flags from both the MOD35 and MOD09 algorithm were extracted from the “state_1km” Scientific Data Set of the daily MODIS surface reflectance product (MOD09GA) using the Google Earth Engine (GEE, http://earthengine.google.org/). The MOD35 mask is contained in bits 0-1, while the MOD09 ‘internal’ algorithm is in bit 10. The MOD35 bits encode four categories: confidently clear (confidence > 0.99), probably clear (0.99 ≥ confidence > 0.95), probably cloudy (0.95 ≥ confidence > 0.66), and confidently cloudy (confidence ≤ 0.66). We binned “confidently clear” and “probably clear” together as “clear” and the other two classes as “cloudy.” The MOD35 and MOD09 daily cloud mask timeseries were then summarized to climatologies by calculating the proportion of cloudy days during 2009. I also aggregated the MCD12Q1 MODIS land cover product from 2005 to 1km resolution using the modal land cover type within each 500m pixel. To assess the impact of the MOD35 algorithm’s processing paths (water, coast, desert, land) on cloud frequency, I extracted a global map of the processing path (bits 6-7 of the cloud mask) (Ackerman et al., 2010) from a global set of Collection 5 MOD35_L2 swaths. To explore how the landcover-associated patterns in cloud cover affect Level 3 MODIS products, we show the proportion of missing data in the 8-Day composite Land Surface Temperature (MOD11A2) and the decadal mean Net Primary Productivity (MOD17 Collection 55). For MOD11, we calculated the proportion of 8-day temporal composites with missing values in 2009 using GEE. For MOD17, we use the proportion of 8-day input FPAR/LAI values which were infilled over 2000-2012 provided by the Numerical Terradynamic Simulation Group (http://www.ntsg.umt.edu/project/mod17). This includes 4 (draft) figures available here (https://projects.nceas.ucsb.edu/nceas/documents/55).

What I'm working on now
Wrapping up the paper comparing the various cloud masks.

What obstacles are blocking progress

2013-03-01

What I did this past week(s)
I evaluated the Collection 6 MOD35 Cloud Mask and determined that it greatly reduces (though doesn't completely remove) the land-cover artefacts in cloud frequency. In summary. there are far fewer days (~20% fewer) reported as cloudy over non-forest in the Venezuela tile in Collection 6. This has broad implications for all derivative MODIS data because it affects the quantity of available information for summaries (such as the 16-day vegetation indices, land surface temperature, and net primary productivity). I also updated and simplified the processing script to calculate the mean cloud 'confidence' (which takes into account the certainty of a pixel being cloudy) in addition to the cloud frequency calculated by thresholding the cloud 'confidence.' See code updates here: https://projects.nceas.ucsb.edu/nceas/projects/environment-and-orga/repository?utf8=%E2%9C%93&rev=aw%2Fprecip. See https://projects.nceas.ucsb.edu/nceas/documents/48 a presentation describing this work that I presented at the last call.

I also looked into the expected timeline of collection 6 reprocessing:
  • MOD35 (Cloud Mask): 11/2012 (finished)
  • MOD06 (Cloud Top Properties): Late Summer 2013
  • MOD11/MOD13 (LST/VI): Late 2013-2014

Given that the rest of the products will not be available for months, the current plan is to process the C6 MOD35 data to incorporate into the precipitation interpolation.

What I'm working on now
I'm working on a short communication describing the issue (because it has such broad implications). I expect to have a draft ready in the next week or two. We have downloaded the Aqua data for h11v08 and we are currently downloading the global Collection 6 MOD35 dataset to produce the global cloud climatology.

What obstacles are blocking progress
There is what appears to be a bug in the CDO software that leads to strange lines of missing data in the summary product (that is not in the original dataset) in some months. I just got an alternative program (NCO ncra) working, though I'd like to understand why cdo is failing... UPDATE: I figured out that the problem was a mismatch between the data type (byte) and the _FillValue attribute that CDO was automatically assigning (short). I edited the missing value attribute using ncatted (in NCO) and the missing data lines disappeared.

2013-03-01

What I did this past week(s)
I've identified a significant land cover bias in the MODIS cloud mask (MOD35 Collection5) which affects most MODIS products (including MOD06 and probably MOD11). See issue #583 for details. However there is hope that MODIS Collection 6 includes updates that will correct (or reduce) the problem. I also presented an update to the Yale Climate and Energy Institute (who is currently funding me). The presentation, which includes mostly information this group has already seen, is available here [[https://projects.nceas.ucsb.edu/nceas/documents/40]]. I also uploaded the presentation I gave during the 1/28 call here [[https://projects.nceas.ucsb.edu/nceas/documents/41]].

I also identified a dataset that should be very useful to validate the cloud climatology (when it is finished). The Cloud Climatology for Land Stations Worldwide, 1971-2009 [[http://cdiac.ornl.gov/epubs/ndp/ndp026d/ndp026d.html]] seems perfect to assess the accuracy of the MODIS cloud climatology. I plan to generate and then validate monthly maps using these stations.

What I'm working on now
The MOD35 Collection 6 data only recently became available and so I am going to process it for tile h11v08 to see if the landcover artefacts are still present. If the new collection improves the cloud climatologies, I'll probably wait for the MOD06 Collection 6 data to become available (in the next month or so, according to the developer) before continuing work on the cloud climatology.

What obstacles are blocking progress
Working with the MOD35 data will require editing the processing code I wrote for MOD06. Hopefully this will not be too arduous, but it will take some time to ensure that I'm translating the bitfields correctly. I also recently realized that gdal can translate MODIS swath data, though this seems to be an undocumented (or not-well-documented) feature. Working with GDAL would be preferable to the idiosyncratic HEG tool provided by NASA.

2013-01-14

What I did this past week(s)
I presented a poster about the progress to date on the precipitation front. Poster is available here: [[https://projects.nceas.ucsb.edu/nceas/documents/34]].

All the code I've been working on is fully functional on Pleiades. Typical R packages are usually easy to install (in a local directory), so I don't forsee any more significant software troubles.

2012-12-19

What I did this past week(s)
In mid-December we worked out all the known bugs in our software requirements on Pleiades (grass/gdal/r/rgdal/HEG/etc). I now have a processing routine that takes a tile and list of dates and processes each day's MOD06 level 2 (swath) cloud data as separate jobs (possibly on separate nodes) and then generates the climatologies as netcdf files with metadata. With ~400 nodes it takes only about an hour to process one tile. So far, I've processed the tiles for Oregon (h11v08), Venezuela (h09v04), and Kenya (h21v09).

My initial question for the MOD06 data is whether the additional information in the continuous cloud metrics (such as mean optical thickness) is worth the extra effort over the MOD35 cloud mask (% cloudy days) that is available in the LST data and other MODIS products. To evaluate this, I compared several GAMs predicting (log) mean monthly precipitation at all stations with at least 10 years of data and various combinations of location, elevation, and cloud data. Here are the models I'm considering:
lppt~a(y,x)
lppt~s(y,x)+s(dem)
lppt~s(y,x,dem)
lppt~s(y,x,dem)+s(cld)
lppt~s(y,x)+s(dem)+s(cld)
lppt~s(y,x)+s(cld)
lppt~s(y,x)+s(cot)
lppt~s(y,x)+s(dem)+s(cot)
lppt~s(y,x,dem)+s(cot)
lppt~s(y,x)+s(cer20)
lppt~s(y,x)+s(dem)+cld+cot+cer20
lppt~s(y,x,dem)+cld+cot+cer20
lppt~s(y,x)+s(dem)+s(cld,cot,cer20
lppt~s(y,x)+s(dem)+s(cld)+s(cot)+s(cer20)
I've been using a 10% holdout for repeated random sub-sampling validation and averaging the predictive performance. The best model varies by tile and month, but the results suggest that inclusion of the continuous cloud metrics does improve the predictive performance over (x,y,dem) models. This is true for RMSE and R^2 (using validation data).

What I'm working on now
I'm putting together a poster to present these results (and the MOD06 processing workflow) at the IBS meeting in Miami.

What obstacles are blocking progress
  • Some regions have unexpectedly high numbers of 'missing' cloud data (especially Venezuela) that seems to be somehow related to land cover (more missing data over deforested areas). I haven't figured out what is going on there, though it may have to do with uncertainties in the cloud flag over areas with different albedo than expected (as might be the case in recently deforested areas). It's difficult to figure out because these patterns are not easily visible in the day-to-day swaths. It is only after creating the climatologies that they become apparent.
  • There are some gridding artifacts in the cloud climatologies. I'm not yet sure where these are being introduced.

What's next
I will present this information on a poster at the IBS meeting in Miami in a few weeks.

2012-11-19

What I did this past week(s)
I have been working to get my MOD06 processing script running in parallel. This is an R script that calls on HEG to grid the swath data and then GRASS to process the desired metrics from the various swath files. I can log into an interactive job on NASA's cluster and successfully run the script. I can also successfully process a series of dates sequentially on a single node. Of course what we would like is the ability to process various dates/tiles in parallel (as a one-program-many-data collection of jobs). So far, I've explored two options:
  • Using Petr Votava's mqueue script which serves as a wrapper for qsub and allows submission of one-program-many-data type analysis. The roadblock I've hit on that approach is, for some unknown reason, I cannot load the gdal (or rgdal) libraries when the job was submitted using mqueue. If I submit a single day (using qsub or within an interactive job) it works fine. But if I use mqueue, it fails with the line "ctrl_connect/connect: Connection refused"
  • The other method is to use the Rmpi library to drive the parallelization from within R (using foreach() or other methods). For this approach I've been stuck trying to get the Rmpi library correctly compiled and linked to the appropriate libraries/modules. I've been communicating with Johnny Chang at the NAS help desk about this issue (see below for transcript). So far, he has been encouraging me to compile my own version of Rmpi rather than add it to the list of packages available in the R module (though I suspect that most people using R on this system would like the ability to parallelize their code and Rmpi is, as far as I know, the best way to do this.).

Either approach would be fine with me, though I have a preference for simply getting Rmpi installed correctly as it will offer more flexible operations for parallelization in the future. Preliminary analysis revealed that there is significant spatial variability across the Venezuela tile (I was worried it might just be evenly 'cloudy' all the time).

What I'm working on now
We are in contact with software engineers at NASA to resolve the software problem.

What obstacles are blocking progress

The script (which uses GRASS, R, and GDAL) I've been trying to run has been ready for months (I completed testing on our local machine in mid-summer, and got it running on a single node months ago), the hold-up has been the availability of completely functional software (pxargs/grass/R/gdal) to run it on Pleiades. It was my understanding that our role in this project is to develop the science and that we would have programming assistance from NASA to run the software on the cluster. I've received helpful suggestions, but the responsibility still seems to be on us to troubleshoot compilation of standard software. Given the number of available modules, compilers, libraries, etc., this is not always a trivial task. For example, I have successfully compiled gdal and grass, but when I call the script using the mqueue/pxargs parallelization program, it failed on the slave nodes. Andrew graciously separately compiled a copy of gdal which works with pxargs, but when I use that version GRASS can no longer write out netcdf files (which also uses gdal). This course seems like a time sink for all of us, since I don't know the system well enough to choose which modules/versions will play nicely with others, and others have to try to understand where my attempts are going wrong. I have another help thread going (INC000000035246) about trying to get the R mpi library (Rmpi) running as an alternative to mqueue/pxargs though I've had no reply since October 26.

What's next
Once we figure out the processing glitches, I'll move to evaluating the MOD06 tile for use in precipitation interpolation as was done for Oregon. Once this is done, we'll be able to decide whether processing the MOD06 data globally will be worth it. The next step for additional regions will be to first acquire the data (which took a long time for the Venezuela tile) and then run the same program to generate the climatologies.

2012-10-5

What I did this past week(s)
I compiled GRASS, NetCDF 4, and NCO tools on the Pleiades cluster. I finished an initial program for processing the MOD06 cloud data to generate the monthly cloud climatologies. We've downloaded and are currently processing the MOD06 data for the Venezuela tile (h11v08). The current version of this code is available in the repository.

What I'm working on now
I'm now error checking the output from this tile to be sure it aligns with other MODIS (MODLAND) tiles. Along the way I'm continuing to refine the processing workflow.

What obstacles are blocking progress
I've been learning to use NASA's Pleiades cluster, which is powerful but more complex than other clusters I've used. Also, the HEG program which is used to convert from swath to grid seems to have bugs when projecting to sinusoidal format (which is strange because NASA uses sinusoidal for virtually everything MODIS). I'm still concerned that there are some projection bugs that I need to resolve.

What's next
Evaluating the MOD06 tile for use in precipitation interpolation as was done for Oregon. Once this is done, we'll be able to decide whether processing the MOD06 data globally will be worth it. The next step for additional regions will be to first acquire the data (which took a long time for the Venezuela tile) and then run the same program to generate the climatologies.

2012-08-17

What I did this past week
It's been a busy few weeks. I have given a careful review to Benoit's first draft of the interpolation literature review and made a few suggestions for additional sections that are needed. I've also been working with the folks at NASA to figure out the best way to process the MOD06 cloud data to generate the climatologies that we need. Our collaborators there have not worked with swath data before, so it's new to them as well. I successfully compiled GRASS on Pleiades, so could run my existing script (with a few major edits to make it work on their system) but would prefer to figure out a way to accomplish the job using the HEG tool and NASA's existing programs. I'm close to getting a procedure for mosaicing and then gridding the swaths but the challenge of generating tiles with the exact domain of the MODLAND tiles is surprisingly difficult. The program designed for this (HEG) will not mosaic and grid to sinusoidal projection, surprisingly. So we may have to separate the process into two (or more) steps. I also attended ESA in Portland, OR.

What I'm working on now
Continuing to find a workflow for the MOD06 data and working with Benoit to finalize the interpolation comparisons.

What obstacles are blocking progress
Lack of defined workflow steps for processing of a region. I think we need to get this ironed out soon so we can finalize the data preparation steps (in a way that they can be used for any modis tile).

What's next
I only have a few more weeks (till the end of August) working full time on this project. After that I'm starting a fellowship to look at leaf color in the Northeastern U.S. I will continue to be involved (perhaps 20% time) on this project.

2012-07-13

What I did this past week
I spent some time thinking about the 'fusion' method and how it relates to other methods. I've also begun editing the code to process a full modis tile rather than a smaller region (Oregon) to make comparison with the other regions more straightforward. This involves processing the DEM data for the tile and other data preparation steps.

What I'm working on now
I want to decide whether the additional information in the MOD06 cloud product is worth the extra effort over just using the cloud mask. To do this I'm going to do model selection between various models for monthly and daily precipitation.

What obstacles are blocking progress
Lack of defined workflow steps for processing of a region. I think we need to get this ironed out soon so we can finalize the data preparation steps (in a way that they can be used for any modis tile).

What's next
Estimated Timeline:
  • Comparing MOD06 and the cloud mask in interpolation precipitation for Oregon

2012-07-06

What I did this past week
Communicated with Petr Votava (NASA) about processing the MOD06 cloud data for additional regions. He's started downloading data for h11v08 (Venezuela), installed HEG (program to grid and subset the swath data), and is working on some test scripts. I've started the process of merging my interpolation scripts with Benoit's so we have a single unified workflow. This included adding a buffer (optional) to the station subsetting procedure (Issue #438) to keep stations just outside the region of interest to minimize edge effects and stitching artifacts

What I'm working on now
I want to decide whether the additional information in the MOD06 cloud product is worth the extra effort over just using the cloud mask. To do this I'm going to do model selection between various models for monthly and daily precipitation.

What obstacles are blocking progress
MOD06 processing

What's next
Estimated Timeline:
  • Comparing MOD06 and the cloud mask in interpolation precipitation for Oregon (7/13)
  • When Petr has the additional tiles downloaded, process them and conduct the same analysis for additional regions (Depends on Petr, hopefully by 7/20)
  • After deciding whether to use the MOD06 or cloud mask, return to model comparison (a la Benoit) for interpolation of precipitation (8/30)
  • ESA conference (Portland, Oregon): August 4-12
  • September 1st - my Yale Climate & Energy Institute fellowship begins (for a separate project) and my time available for this project decreases significantly.

2012-06-22

What I did this past week
Calculated an estimate of the total size of the global, daily 1970-2000 , 1km dataset [https://projects.nceas.ucsb.edu/nceas/issues/430]. I also explored the interactions between LST and LULC using the full grid [https://projects.nceas.ucsb.edu/nceas/issues/418] rather than just the points. See the issue page for a summary.

What I'm working on now
I'm off on a week's vacation!

What obstacles are blocking progress
none

What's next
Working with NASA folks to get MOD06 data for several other regions. Then starting interpolation of precipitation.

2012-06-08

What I did this past week
I focused on exploratory data analysis of the MOD06 data. There are encouraging relationships between several of the continuous cloud products (optical thickness and effective radius). I'll show some summary figures next tuesday. One open question is whether the continuous cloud variables will be better at predicting rainfall than some summary of the cloud mask, such as % cloudy days (which would be much easier to process). I've also done some quick comparisons with PRISM data and the cloud data have fairly similar spatial patterns. I've also been working with Benoit on the NASA report that summarizes our progress to date.

What I'm working on now
1) A summary of the exploratory analysis to present on Tuesday. 2) a workflow diagram (possibly using Kepler) to organize the various steps of the process.

What obstacles are blocking progress
none

What's next
Incorporate these data into Benoit's interpolation procedures to start generating interpolated surfaces.

2012-06-01

What I did this past week
I finally finished a workflow to process the MOD06 cloud data ("Issue #415":[[https://projects.nceas.ucsb.edu/nceas/issues/415]]) from the 'raw' swaths to gridded daily summaries and monthly climatologies. I've done some preliminary comparisons with mean monthly station data and the results are encouraging. I will summarize these at the next group call.

What I'm working on now
More detailed analysis and comparisons of the MOD06 products with station data and prism data.

What obstacles are blocking progress
none

What's next
Incorporate these data into Benoit's interpolation procedures to start generating interpolated surfaces.

2012-05-25

What I did this past week
Monday and Tuesday Benoit and I worked together to outline the interpolation portions of the climate project. In the process, we developed a mind map of the project that is now hosted in the repository.

What I'm working on now
The processing of the swath data has proven much slower than I anticipated. I have figured out how to subset the observations based on various combinations of the QA data. Right now I am only keeping 'useful' observations with 'very good' quality. I'm then using the cloud mask to replace the NAs with 0s in 'confidently clear' pixels (to separate NAs due to poor quality with NAs due to clear skies). I am then averaging the multiple scenes per day to generate a daily image. I've also considered calculating the maximum rather than the mean.

What obstacles are blocking progress
I'm using the spgrass6 package in R and would like to parallelize the processing procedure. I am currently processing each day's swath data (multiple scenes) in a different location (though different mapsets work the same) and the different processes are failing if run in parallel. I'm trying to work out why this happens.

What's next

Complete summary metrics and compare to station precipitation.

2012-05-11

What I did this past week
I finished the swath->grid procedure using the hegtool and then imported the resulting tif files into GRASS for further processing.

What I'm working on now
Processing the gridded swaths to summary metrics (monthly means, etc.)

What obstacles are blocking progress
none

What's next

Complete summary metrics and compare to station precipitation.

2012-05-11

What I did this past week
I'm still working on the MODIS cloud product (MOD06) data. After much arm wrestling, I was able to get the hegtool [[http://newsroom.gsfc.nasa.gov/sdptoolkit/HEG/HEGDownload.html]] to work on my machine (it's only offered as a binary for RH with 32-bit dependencies). I wrote a function that generates the parameter file for a given MOD06 hdf file to automate the swath->grid process. I've also ordered the full 10 year record for Oregon. Here's the presentation: [[https://projects.nceas.ucsb.edu/nceas/documents/18]]

What I'm working on now
I'm now working on the code to turn the scene-by-scene hdf files to some (hopefully) useful summary metrics. This includes first converting them to netcdf files with appropriate CF metadata [[http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html#grid-mappings-and-projections]] so I can use some existing software (NCO and CDO) to quickly process the data to the desired metrics.

What obstacles are blocking progress
The HEG software unfortunately doesn't seem able to correctly transfer the metadata, so I have to enter it manually (well, using code, but it's not automatic). It's also not well documented how exactly the output grid is defined (centroids, lower left corner?). And getting the HDFEOS correctly converted to NetCDF takes time. I've also only worked with these kinds of data with geographic coordinates. I'm trying to keep these data in sinusoidal, but it may be that the software won't play nicely.

What's next

Summary metrics of several variables from the cloud product.

2012-05-04

What I did this past week
I've downloaded and started exploring the MODIS MOD06 cloud data [[http://modis-atmos.gsfc.nasa.gov/MOD06_L2/index.html]] for Oregon. These are level 2 (swath) data and so quite messy to work with. For example, after spatial subsetting, there are ~1,600 files (2.8 GB) available that intersect Oregon for the year 2010. These include all view angles, solar angles, data qualities, etc.

I also installed and set up the new server in the Jetz lab. This included figuring out how to compile R with the Intel Math Kernel Library, which has resulted in a 3-40x increase of R's processing speed. I put the details on my blog ([[http://planetflux.adamwilson.us/2012_05_01_archive.html]]).

What I'm working on now
Developing gridded daily summaries of these data to compare with station data.

What obstacles are blocking progress
The MODIS swath-to-grid toolbox [[http://nsidc.org/data/modis/ms2gt/index.html]] was not made to process the MOD06 product (and I can't find any alternative). It may be possible to edit some of the scripts (perl and IDL) that do this processing, but that will take time (I have limited experience with perl and have never worked with IDL). I'm also not sure if the program will work with the open source version of IDL or if we would need to buy a license. Alternatively, I could potentially write my own using R/CDO/NCO.

Also, deciding how to process the cloud data has the potential to be very complicated. View angle becomes much more important when looking at clouds well above the surface of the earth and so I imagine the gridding process will be tricky to ensure that everything is lining up spatially. This may be the reason that the MODIS atmosphere folks have only produced a summary with 1 degree spatial resolution. Fortunately we don't need to accurately capture cloud physics and are merely looking for surfaces that correlate with rainfall.

What's next

I plan to explore ways to summarize the various cloud data (daily average, daily maximum) and eventually compare them to station data