XSEDENet: An Advanced Network Advancing Science in the Age of Big Data
It’s all about headroom—headroom for world-class scientists to work together to accelerate new discoveries that make all our lives healthier, safer and better. To solve today’s most pressing scientific problems more quickly in disciplines such as climatology, oceanography, physics, energy and genomics, scientists must be able to perform collaborative research. This means reliably and efficiently exchanging critical Big Data on a global basis over an advanced, research-grade network.
In April, 2013, the Extreme Science and Engineering Discovery Environment (XSEDE)—a virtual computing system designed specifically to enable collaborative science—substantially boosted its data communications capacity. XSEDENet upgraded from a 10-Gigabit Ethernet (GE) network to an overlay network on top of the Internet2 Network’s 100GE Advanced Layer 2 Service. Under the leadership of XSEDE and Internet2, the new infrastructure is providing the most advanced interconnection available for the global scientific community.
- XSEDE Service Provider (SP) sites including FutureGrid, Indiana University, the National Center for Atmospheric Research (NCAR), the National Institute for Computational Sciences (NICS), the National Center for Supercomputing Applications (NCSA), Pittsburgh Supercomputing Center (PSC), the San Diego Supercomputer Center (SDSC), and the Texas Advanced Computing Center (TACC)
Products & Services
The Project – Advanced Networking for the Global Research Community
To support more than 8,000 global researchers engaged in compute-intensive team science disciplines such as energy, genomics, climatology and physics, XSEDE provides a scalable, resilient high-speed networking infrastructure capable of transporting massive datasets and delivering innovative applications and cloud services.
The network consists of nine individual sites supporting 17 supercomputers as well as Big Data analysis and visualization capabilities.
Although the enhanced XSEDENet is one of the most advanced networks available today, continuous improvements are needed to stay ahead of the scientific Big Data curve. Internet2 and XSEDE provide a platform to further improve performance, reliability, and predictability for distributed science applications. Further, through the NSF Campus Cyberinfrastructure – Network Infrastructure and Engineering Program (CC-NIE), campus-level network enhancements are being made to improve data transfer and movement among university-based researchers.
“...by focusing on the full potential of this new fully-programmable network...we can produce new capabilities for research collaboration on a daily basis.”
“Our mission is to help drive major increases in researcher productivity by helping them harness the power of collaborating with Big Data,” says XSEDE Principal Investigator John Towns. “To accomplish this, we are going beyond providing networking services associated with high performance computing (HPC). By offering researchers access to many rich sources of data, we’re fostering collaboration on a global basis. And, by focusing on the full potential of this new fully-programmable network (Software Defined Network [SDN] with OpenFlow), we can produce new capabilities for research collaboration on a daily basis. There aren’t many organizations in the world that can offer that kind of service.”
The Problem – Dealing with Big Data
Many scientific disciplines are being inundated by today’s data deluge. Scientific research is increasingly data-driven and takes place in an environment that includes large collaborative partnerships consisting of researchers, HPC clusters and other supercomputers, and a wide variety of instruments.
The size, complexity and rate of data capture have grown explosively. Drivers of this growth include experiments, observational studies, scientific instruments, simulations and a tidal wave of unstructured data in the form of emails, videos, images, Internet transactions, etc. For example, the Large Hadron Collider and the Square Kilometer Array are instruments that produce petabytes of data, which is shared and analyzed by thousands of scientists located around the globe.
Scientific collaboration at this level requires a networking infrastructure that can quickly and transparently move large datasets between locations, as well as providing access to the analysis and visualization tools needed to work with massive amounts of data.
But many campus and data center networks, designed to support a wide variety of organizational missions, have not been designed to handle the movement of the large data files generated by big science.
The Solution – A New Topology
To support collaborative efforts, data must flow freely, unhampered by throughput and other technical network limitations. The XSEDE networking group determined that the best solution to handle the growth of Big Data in the scientific community was to implement an overlay network on top of Internet2’s 100GE Advanced Layer 2 Service.
An overlay network is a private network configured on top of an existing, often shared network infrastructure. The upgraded version of XSEDENet supports the seamless flow of data across the network, allowing researchers in geographically distributed locations to collaborate in state-of-the-art science.
Integral to the Advanced Layer 2 Service backbone is the Internet2 Innovation Platform, a combination of new technologies and services that provide end-to-end architecture and unified capabilities at the national, regional and campus level.
The three core components of the platform—100GE abundant bandwidth, software-defined networking (SDN) and a Science DMZ utilizing perfSONAR—are critical elements in providing network performance monitoring and a secure blueprint for architecting and optimizing local networks to support the unique needs of transmitting high-bandwidth research data.
The XSEDE-Wide File System (XWFS) is an important part of the XSEDE infrastructure. XWFS facilitates the sharing of data on multiple XSEDE digital services by presenting a single file system view of data, which can be stored and accessed from systems at NICS, PSC, SDSC and TACC. XWFS uses the XSEDENet network to move data between client and server nodes. XSEDENet is the high-performance interconnect between the XSEDE level 1 Service Providers and other XSEDENet participants.
XSEDE also makes use of Globus Online—a project of the partnership between The University of Chicago and Argonne National Laboratory—which automates the management of data transfer between a variety of resources, such as XSEDE service providers, other supercomputing facilities, cloud resources, campus clients, lab servers and desktop or laptop computers.
Globus Online leverages GridFTP, an extension of the standard File Transfer Protocol (FTP). GridFTP—developed by a working group from the Open Grid Forum—provides reliable, high performance file transfer for grid computing applications and addresses the problem of incompatibility between storage and access systems.
“We are seamlessly transporting multiple terabytes—nearly petabytes—of data each month between the various XSEDE Service Provider sites, and this trend is only increasing.”
Together, these key elements of the XSEDENet infrastructure provide the headroom and resilience needed to support researchers working with extremely large datasets. The NSF CC-NIE program, with its focus on campus-level network service improvements to support data transfer and movement, is contributing to the development of XSEDENet’s innovation platform.
The Result – Advanced Science Now, A Roadmap for the Future
With the upgrading of XSEDENet, research scientists and engineers using XSEDE services now have an infrastructure in place that directly addresses their Big Data issues. It opens up new levels of collaboration by allowing researchers to share large datasets between remote geographic locations and engage in collaborative team science. The new infrastructure also provides scientists with access to the analysis and visualization tools they need to understand and analyze massive amounts of data.
Towns notes that XSEDENet is already coping with the data deluge. “We are seamlessly transporting multiple terabytes—nearly petabytes—of data each month between the various XSEDE Service Provider sites, and this trend is only increasing,” he says.
Important cancer research utilizing genomic analysis is making good use of the XSEDE infrastructure. Galaxy, a data-intensive bioinformatics program at Penn State, has more than 10,000 users who run four to five thousand analyses every day. Many of them employ XSEDE support and networking resources, as well as a direct connection to Pittsburgh Supercomputing Center’s Blacklight supercomputer, to perform memory-intensive DNA sequence assembly tasks that were previously beyond their reach. Galaxy is a harbinger of what’s to come as research datasets continue their dramatic growth.
At Stony Brook University in New York, researchers are collaborating on a revolution in chemistry by discovering new materials and applications like two-dimensional metals, where electricity is conducted along the layers of the structure. The discovery may also have application in the planetary sciences, where high-pressure phenomena abound. Professor Artem R. Organov’s team utilized his widely-used methodology and XSEDE to run the USPEX application—the world leading code for crystal structure prediction—on Stampede, a supercomputer at the Texas Advanced Computing Center (TACC).
Researchers at TACC are using XSEDENet to access their supercomputers in simulations of the impact of orbital debris on spacecraft as well as fragment impacts on body armor. A leading partner in XSEDE, TACC offers resources that include more than one petaflop of computing capability and more than 30 petabytes of online and archival data storage. As part of the project, TACC provides access to its Ranger, Lonestar, Longhorn, Spur and Ranch supercomputing systems through XSEDE quarterly allocations.
PSC is also playing a leading role in XSEDENet, providing management of the network’s XSEDE side, supporting users, and working directly with Internet2 staff to ensure the system’s smooth operation. PSC offers access via XSEDENet to Blacklight and the Data Supercell, a disk-based, petabyte-scale archival system.
In short, the XSEDENet capabilities now allow scientists to:
- Couple the massive computational capabilities of XSEDE directly to the most advanced network capabilities available Promote team science by moving massive amounts of information across the network with the broadest reach among the U.S. and global research communities
- Cope with Big Data issues such as the rapid growth of scientific datasets by exploiting an ultra-fast, resilient high-speed infrastructure with ample headroom
- Benefit from instantaneous, seamless failover and restoration services
- Work with technologists to develop new applications, such as an XSEDE-wide File System (XWFS), which will allow big data to be moved rapidly between XSEDE sites
- Work with a flexible, scalable network that is constantly being upgraded by XSEDE and Internet2 to handle the constantly growing datasets that characterize today’s research efforts
“The collaboration between XSEDE and Internet2 is a growing and evolving relationship,” says Towns. “We are creating an ecosystem that accelerates science by allowing researchers to make full use of the wealth of HPC, data and visualization resources available over XSEDENet. Despite the incredible growth of Big Data, we feel confident that we have a scalable infrastructure in place that can meet the scientific community’s needs now and in the future.”