Exploring Clouds for Acceleration of Science
Internet2 leads the "Exploring Clouds for Acceleration of Science (E-CAS)" project in partnership with representative commercial cloud providers to accelerate scientific discoveries. The effort demonstrates the effectiveness of commercial cloud platforms and services in supporting applications critical to growing academic and research computing and computational science communities, and will illustrate the viability of these services as an option for leading-edge research across a broad scope of science. The project helps researchers understand the potential benefit of larger-scale commercial platforms for simulation and application workflows such as those currently using NSF's High-Performance Computing (HPC), and explores how scientific workflows can innovatively leverage advancements in real-time analytics, artificial intelligence, machine learning, accelerated processing hardware, automation in deployment and scaling, and management of serverless applications in order to provide digital research platforms to a wider range of science. The project aims to accelerate scientific discovery through integration and optimization of commercial cloud service advancements with NSF's cyberinfrastructure resources; identify gaps between cloud provider capabilities and their potential for enhancing academic research; and provide initial steps in documenting emerging tools and leading deployment practices to share with the community.
Cloud computing has revolutionized enterprise computing over the past decade and it has the potential to provide similar impact for campus-based scientific workloads. The E-CAS project explores this potential by providing two phases of funded campus-based projects addressing acceleration of science. Each phase is followed by a community-led workshop to assess lessons learned and to define leading practices. Projects are selected from two categories; time-to-science (to achieve the best time-to-solution for scientific application/workflows that may be time or situation sensitive) and innovation (to explore innovative use of heterogeneous hardware resources, serverless applications and/or machine learning to support and extend application workflows). The project is guided by an external advisory board including leading academic experts in computational science and other fields, commercial cloud representatives, NSF program officers, and others. It leverages prior and concurrent NSF investments while creating a new model of scalable cloud service partnerships to enhance science in a broad spectrum of disciplines.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. Read the NSF announcement.
The proposal submission process for the first phase closed on Friday 1st of February 2019 at 5:00 p.m. (submitters' local time-zone).
For questions or comments, please send an email to firstname.lastname@example.org.
News about the E-CAS Project
SDSC’s Phylogenetics Science Gateway Awarded NSF/Internet2 Grant
Apr 09, 2019
E-CAS Project to Explore Clouds for Acceleration of Science
Mar 27, 2019
Internet2 and National Science Foundation Announce Selection of First-Phase Research Proposals for Exploring Clouds for Acceleration of Science (E-CAS) Project
Mar 26, 2019
Internet2 and National Science Foundation Partnership Explores Commercial Cloud Computing in Support of Scientific Research
Dec 13, 2018
National Science Foundation and Internet2 to Explore Cloud Computing to Accelerate Science Frontiers
Nov 15, 2018
There are 6 projects chosen to participate in the first phase of the Exploring Clouds for Acceleration of Science (E-CAS) project based on their need for on-demand, scalable infrastructure, and their innovative use of newer technologies such as hardware accelerators and machine learning platforms. Read the official announcement.
The successful proposals for the year-long first phase of the E-CAS project are:
Accelerating Science by Integrating Commercial Cloud Resources in the CIPRES Science Gateway
Mark Miller, San Diego Supercomputing Center (UCSD)
CIPRES is a web portal that allows scientists around the world to analyze DNA and protein sequence data to determine the natural history of a group or groups of living things. For example, one can ask where mammals originated, or how does Ebola virus spread, or whether a given plant is really a new species, or an unwelcome imported species, or how does a given species interact with other species and its environment over long periods of time. CIPRES helps answer these kinds of questions by providing access to parallel phylogenetics codes run on large HPC clusters provided by the NSF XSEDE program. CIPRES currently runs analyses for about 12,000 scientists per year, and that number is growing each year. CIPRES accelerates research by increasing each researcher’s throughput. Job runs go faster using parallel codes, and users can run many jobs simultaneously on large clusters. For example, CIPRES provides access to P100 GPUs that can speed up some jobs by 100-fold relative to a single core run. But GPUs are in short supply in the XSEDE portfolio, and so usage must be strictly limited. This project will develop the infrastructure needed to cloudburst CIPRES jobs to newer, faster V100 GPUs at AWS. As a result, individual jobs will run up to 1.5 fold faster, and users will have access to twice as many GPU nodes as they did in the previous year. The infrastructure created will also open the door for scalable access to AWS cloud resources through CIPRES for all users.
Investigating Heterogeneous Computing at the Large Hadron Collider
Philip Harris, Massachusetts Institute of Technology (MIT)
At 40 Million collisions per second, data rates at the Large Hadron Collider are some of the largest in the world. To contend with these large data rates a tiered system is utilized to filter out and reconstruct the most interesting collisions. Unfortunately, this system has limitations. At each tier, events are not selected that contain important physics processes, some these events include Higgs bosons and potentially dark matter. With expected increases in data rates, these limitations will get worse. To overcome this limitation, we propose to redesign the algorithms using modern machine learning techniques and then to incorporate these algorithms into heterogeneous computing systems. Dramatic improvements in processing time can be obtained by exploiting the high level of parallelization of machine learning algorithms used in conjunction with specialized processors, such as Field Programmable Gate Arrays. By migrating to this paradigm, more data can be processed at the Large Hadron Collider leading to larger physics output and potentially foundational discoveries in the field. While we focus on the Large Hadron Collider, the lessons are far-reaching and can impact many fields were large data flow is present.
Ice Cube computing in the cloud
Benedikt Riedel, University of Wisconsin
The IceCube Neutrino observatory located at the South Pole supports science from a number of disciplines including astrophysics, particle physics, and geographical sciences operating continuously being simultaneously sensitive to the whole sky. Astrophysical Neutrinos yield understanding of the most energetic events in the universe and could show the origin of cosmic rays. Being able to burst into cloud supports follow-up computations of observed events & alerts to and from the community such as other telescopes and LIGO. This project plans to use custom spot instances and FPGA based filters in AWS and GPU/TensorFlow Machine Learning in GCP.
Building Clouds: Worldwide building typology modelling from images
Daniel Aliaga, Purdue University
This Exploring Clouds for Acceleration of Science (E-CAS) project will exploit the computational power and network connectivity to provide a world-scalable solution for generating building-level information for urban canopy parameters as well as for improving the information for estimating local climate zones, both of which are critical to high resolution urban meteorological/environmental models. The challenge is that current computational models have a bottleneck, not just in terms of the physics and processes within the land surface and boundary layer schemes, but even more critically the need is for providing a robust means of generating parameter values that define the urban landscape. This is how the proposed E-CAS inverse modeling approach comes into play. By utilizing images and world-wide input about building properties, we can infer a sampling of 3D building models at world scale containing more than just the geometrical shape information and enable world-scale urban weather modeling.
Deciphering the Brain's Neural Code Through Large-Scale Detailed Simulation of Motor Cortex Circuits
William Lytton, State University of New York (SUNY Downstate MC)
This project will investigate how the brain encodes and processes information through very large-scale and detailed simulations of the brain cortical circuits. The brain cortex is the outermost layer of the brain and is responsible for most high-level functions like vision, language or reasoning. We have developed the most detailed computational model of the motor cortex circuits using experimental data from over 30 studies. It includes details at multiple scales, from molecular effects inside the neuron to long-range connections from other brain regions. This means we now have our own in silico brain cortex that we can experiment with precisely and repeatedly to try to decipher the neural code. We will use NetPyNE, our own software tool for brain modeling, to run thousands of parallelized simulations exploring different conditions and inputs to the system. Google Cloud and SLURM will enable us to run thousands of these simulations at the same time by employing up to 50k cores concurrently. These cloud computing resource will therefore vastly accelerate our research and help decipher the brain's neural coding mechanisms. This knowledge has far-reaching applications, including developing treatments for brain disorders (which affect 1 out of 4 people), advancing brain-machine interfaces for people with paralysis, and developing novel artificial intelligence algorithms.
Development of BioCompute Objects for Integration into Galaxy in a Cloud Computing Environment
Raja Mazumder, George Washington University
BioCompute Objects allow researchers to describe bioinformatic analyses comprised of any number of algorithmic steps and variables to make computational experimental results clearly understandable and easier to repeat. Galaxy is a widely used bioinformatics platform that aims to make computational biology accessible to research scientists that do not have programming experience. The project will create a library of BioCompute objects that describe bioinformatic workflows on Amazon Web Services, which can be accessed and contributed to by Galaxy users from all over the world. This project also plans to utilize AWS Direct Connect over Internet2 to connect the library of biocomputer objects to the campus HPC environment at George Washington University.
What is E-CAS?
Who is involved?
What are the objectives of this project?
How will this help the research and science community?
What background and experience does Internet2 bring to this project?
What is Internet2's role in this project?
- Establishing, convening and managing an external Advisory Board
- Working with cloud providers to provide access, documentation, and support for cloud resources
- Working with regional network providers, and cloud service providers to establish appropriate connectivity to enable data pipelining between data sources and compute facilities
- Managing the Phase I and II proposal submission, review, and selection processes; and
- Managing and implementing the Phase I and II awards.
How is the project governed?
What is the Advisory Board and what is their role?
Who is on the advisory board?
- Dr. Amy Apon, Ph.D. (Chair)
Co-Director, Complex Systems, Analytics, and Visualization Institute
Professor and Chair, Division of Computer Science, School of Computing
- Professor Thomas E. Cheatham, III
Department of Medicinal Chemistry, College of Pharmacy
Director, Research Computing and CHPC, UIT
University of Utah
- Dr. Valerie Taylor, Ph.D.
Director, Mathematics and Computer Science Division
Argonne National Laboratory
- Dan Stanzione, Ph.D.
Executive Director, Texas Advanced Computing Center
Associate Vice President for Research
The University of Texas at Austin
- Marla Meehl
Section Head: Network Engineering and Telecommunications Section (NETS)
Manager: Front Range GigaPoP (FRGP)
President: Westnet Education and Research Consortium (WERC)
- Sanjay Padhi, Ph.D
AWS Research Initiatives
Worldwide Public Sector, Amazon Web Services
- Karan Bhatia, PhD
Google Cloud, High Performance Computing
- Jenny Tsai-Smith
Vice President, Oracle Cloud Innovation Accelerator - Higher Education & Research
- Principal Investigator (PI)
President and CEO, Internet2
Jim Bottum, Ana Hunsinger
Why were Google and AWS selected for this project and what role will each play?
Will other cloud providers have the opportunity to participate in the project?
What are the specific focus areas of the project?
Can you describe the specifics of the project?
Phase I: The project’s first phase will include the submission, review, selection, and funding of an estimated six Phase I projects. Each recipient will have one year to perform a six-month operations study and corresponding development work. At the end of Phase I, Internet 2 will host a project workshop to assess lessons learned.
Phase II: The project’s second phase will include submission, review, and selection of two-Phase II awards. The Phase II awards will be selected from the six Phase I projects and will ideally include one from each area of focus – one from Acceleration of Science and the other from Innovation. The Phase II awardees will have one year to complete their project. At the end of Phase II, Internet2 will host a final workshop to help define and document best practices, lessons learned, and recommendations for sustained, scalable commercial cloud service adoption research environments.
What are the criteria for proposal submissions?
How will the project submission process work?
Phase II Proposals will be selected from the 6 awarded Phase I proposals. Phase II proposal submissions will include a narrative of up to 15 pages addressing general project goals, previous work within the applicant’s Phase I project demonstrating value, necessity, and potential impact, methodology, and justification of requested resources. Project proposals will also require a 2-page, NSF-format bio sketch for the Project PI. Full requirements can be found at www.internet2.edu/ecas.
How will proposals get reviewed?
Who can be an external reviewer?
What is the funding available and how will that aspect of the project work?
Each of the Phase II accepted proposals will be allocated as a one-year sub-award from Internet2 to the awardee including direct costs for partial salary support for a staff member, postdoctoral fellow, or graduate student (including fringe benefits). The sub-award will also include up to $500,000 in cloud services from an assigned cloud provider (to be transmitted by the awardee directly to the provider from funds received under their Internet2 sub-award under a separate agreement including appropriate award, agency, and OMB compliance requirements). Phase II sub-awards will include indirect cost recovery on salary, fringe benefits, and cloud services.
How will cloud credits be allocated from the cloud service providers?
How do I apply for this project?
Are the submission deadlines flexible?
Who is eligible to submit proposals for this project?
When will the awards be announced?
If I submit a proposal, would my home institution need to be connected to the Internet2 network?