Project

General

Profile

August 19, 2011 Range Modeling

page added by Martha

How to Scale Up Range Mapping

Participants

John Donoghue, Brian Enquist, Brian McGill, Nirav Merchant, Martha Narro 

Discussion

JD: Have 120-150K species they want to run range maps on.
  • Takes 7 min per range map
  • Estimate 400 days total run time
  • Thinking he needs about 100 processors
  • Each species is a separate, completely independent computation.
NM: I'm thinking of a map reduce approach.
  • Are there individual files?
  • JD: data is in MySQL database
    • Could write out to a text file, then put results back in db.
  • NM: Port data to SQLite? Then use it as the query engine. 
  • BM: Dump each species into a text file, then run from that 
  • NM: starting with data in text file may be better; database becomes a bottle neck
    • Shared nothing infrastructure
  • JD: put each species in a text file
    • Environmental data is large
  • NM: how large?
  • JD: a few gigs per input file, 120K input files 
  • BM: But the environmental data is same for all
  • NM: At TACC luster has shared data file, so that would work
  • JD: The computation is done with one R script that uses R libraries plus MaxInt (in java; .jar file) 
  • NM: We need to know the dependencies. Do they just pipe one to another or is there a step of looking at output? 
  • J: just feeds from one to the next 
NM: How quickly can a set of test files be put together?
  • JD: Monday
  • NM: we already have a server provisioned for this group.  A VM with 30-40 gigs storage and \___\_ ram.
    • Use that to do trials while getting A TACC account account.
    • Go ahead and move files into iPlant data store.
    • On campus, 1 TB of data should transfer in less than 1 hr.
    • Organize the data in the data store, then the system will pull it in parallel from TACC and run. Workflow engine will do the orchestration.
    • There will be one copy of R, of the environmental layer data and of each input file
    • Task runner will process each file
    • There are workers on each node that talk to the task manager.
    • 100 workers will occupy the 100 cores.
    • Benchmark it on UA and TACC, to determine how best to run.
    • The progression for determining how to run is first run each file. If performance not adequate, use Makeflow. If still not fast enough, use Pegasus. 
  • BM: write a sql script lat long for each species
  • Action Item: John, organize data and make a flow chart on wiki. 
  • NM: if 4-6 TB of data storage is needed at TACC, the group can have that. Already talked to Dan.
    • Sangeeta and Edwin can help with benchmarking.
    • IRODS is what iPlant’s Data Store is built on. It does parapllel tansfers, is federated, supports rules (e.g., to automatically run analyses on new files when they are put in folders). 
  • BM: geospatial portal is python 
  • NM: The server set up for the group is Python friendly. Has Unbutu too.
Discussion of developer resources
  • BE: Who from iPlant will be dedicated to the geospatial project?
  • NM: The first step was getting servers for you guys.
    • We need to know what geospatial platform to use. Expertise needs to come from NCEAS.
    • Customizing GeoDjango for their needs. Need NCEAS person with GIS expertise.
    • Interface expertise will come from iPlant since needs GCI and HTML expertise. But need the GIS person in place first. 
  • BM: We have a prototype with a RESTful interface. Need GUI for it. Is that ready enough for an iPlant developer to work on it?
  • NM: Yes. But if want it connected with a map, need the Geodjango expert at NCEAS.
  • Open stream map and open layers are possible options. 
  • NM: You aren’t looking for a straight web interface. You want it integrated with a map.
  • BM: The NCEAS person being recruited won’t have GeoDjango experience.
  • NM: Then may need to do something generic enough for them to click on a layer.
  • JD: Open street map and javascript
  • BM: It’s really a query tool
  • NM: That’s easier. Doesn’t require GIS expertise. If turning layers on and off, it’s more complex and calculations are complex. 
  • BM\(?): Next steps: get prototype running at iPlant. Have advisory group meet. A few weeks away from that. Have restful api, so can someone have put gui on. 
  • Portal is an \__\_ portal.
  • NM: Standing up a map server is not hard and neither is a UI.
    • Want to make sure the next level of expertise is available. TNRS was a honeymoon period. Have fewer developers available now. Can’t compare the current situation to TNRS. Need to keep expectations realistic. 
  • BM: The portal is being driven by iPlant’s needs. When does iPlant need it? Thought was by renewal.
  • BE: By October
  • NM: Will see who may be available. Next week challenging with the semester beginning. 
  • BE: For the NCEAS hires, one person is dedicated to environmental layers, the other to dynamic BIEN db.
  • They’ll have expertise. 
Discussion of Advisory Group Meeting
  • BE: When can the GIS Advisory Group meet? Need 2-3 hrs.
    • Wants an in-person meeting in Tucson or at least as many people as possible.
  • NM: Need progress on portal before meeting the at NCEAS.
  • BM: Get prototype up at iPlant in next couple of weeks. 
    • Others on advisory group haven’t seen recent version.
    • Aim for a conf. call mid September.
    • Face to face in Oct. at NCEAS.
  • BE: Wants the meeting with iPlant person who will do the work present.
  • NM: Will look at finding a person as soon as possible. But to do it quickly, person may come and go. May not have a dedicated person.
  • BE: Who will have the institutional memory of project goals and technical details?
  • NM: Martha and Nirav for project goals. Technical details will be split among several people (2 or 3). Provides safety on institutional memory in case one person leaves.
  • BE: That’s a larger team that we were talking about.
  • BM: Phylogeny and ranges are the only things we don’t have in the prototype

Action Items

  • John, organize data and make a workflow chart on wiki.
  • Martha: Schedule conference call of Advisory Group. Try for the week after Labor Day toward end of week.
  • Martha help John find wiki tool for creating workflow diagrams
    • On the wiki, under "Add" there is "UI Mockup"; it's Basalmic; not available to all wiki users due to license; John tried it and found it not to be the best thing for workflows. He created a workflow as a bullet list. It's [here|~jdonoghue].
  • Martha: Milestones to advisory and working groups.