Wolff's World: (Part 2) Sustaining America's High-Performance Computing Infrastructure
Recommendations on the National Research Council's Interim Report on Advanced Computing Infrastructure
SELECTED COMMUNITY INPUT ON THE REPORT
As noted in Part 1 of this blog, the National Research Council’s (NRC) Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020: Interim Report, published in 2014, highlighted specific funding models that could be used to address advanced computing in support of National Science Foundation (NSF) programs.
After publishing the Interim Report, the NRC asked for comments from the community to take into consideration for the Final Report, due out this summer. Several groups offered input and opinion:
- Internet2’s comments focused on the networking needs of a high-performance computing infrastructure, especially in the context of a global network infrastructure. Internet2 highlighted the importance of funding projects using public-private partnerships, citing the benefits of collaborative processes and the checks and balances that come with ongoing management of these types of funded projects.
- Two workshops were held in Dallas, TX, in January and in February of this year; participants were HPC leaders from U.S. universities, from National and other USG laboratories, and from industry. Their discussions resulted in a white paper, An Integrative, Cross-Foundation Compute and Data Infrastructure for Science & Engineering Research.
- The three former Directors of the NSF Office of Cyberinfrastructure (Dan Atkins, Ed Seidel, and Alan Blatecky) wrote a letter to the Director of NSF, Dr. France A. Córdova, which, while not addressing the NRC preliminary report directly, makes a powerful argument for an organizational restructuring of cyberinfrastructure support within NSF that reiterates and amplifies a recommendation of the workshop white paper cited above.
THE NEED FOR NCDI, AND A PATH FORWARD
The Dallas workshops totaled five days of vigorous and detailed discussions of the dual roles of NSF in providing high-end cyberinfrastructure, and of the critical importance of high-end cyberinfrastructure to NSF’s mission to support leading-edge research in a variety of scientific disciplines.
The workshop white paper notes that there is ample scientific need for leadership-class machines, so that even though such machines are available through Federal mission agencies, their accessibility is restricted – either by agency mission priorities or by lack of resources (e.g., the success rate of requests for allocations under DoE’s 2015 INCITE program was 27%). The importance of mid-range machines is also stressed, but the two primary inadequacies identified by the white paper are (1) there is inadequate U.S. capacity to attack outstanding problems in the sciences, and (2) the lack of a public long-range plan for U.S. cyberinfrastructure leads to short-range and suboptimal planning in the computational sciences.
The white paper recommends that the resources, the services, and the connectivity of a comprehensive national cyberinfrastructure be integrated into a National Computational and Data Infrastructure (NCDI). NCDI would contain leadership-class machines as well as Tier 1 and Tier 2 machines and XSEDE resources, and would provide a framework for campuses implementing regional collaborations; it would moreover include data repositories and the network resources to connect them all effectively. NCDI could also integrate allocation policies and mechanisms when they are needed for shared resources and services. The NCDI long-range plan should provide for review of publicly-funded facilities and services at five-year intervals, with options to continue, shut down, or re-compete management; this would enable the stability to foster long-range planning in the client disciplines. It would also support a more orderly workforce development than is generally attainable under the present policies.
Stability is not stasis. The white paper cites as proof NCAR and the climate research community, for whom NSF’s GEO Directorate has provided stable Tier 1 computational resources for half a century.
Among the benefits of an NCDI are:
- Future major research facilities could use – and plan for – the coherent national infrastructure; augmenting what exists rather than building new cyberinfrastructure would be cost-effective, and the savings could be put toward the science.
- NCDI would provide a frame of reference for major data initiatives such as the National Data Service (NDS) and the Research Data Alliance (RDA) which are now perforce institutionally-focused.
- NCDI could facilitate major national and global collaborations such as the NIH BRAIN initiative and multi-messenger astrophysics.
- As mentioned above, NCDI could focus – even sponsor – the development and retention of a skilled workforce of cyberpractitioners, “an intellectual commons,” as the white paper puts it, who now have no natural national affiliations yet who are critical to the conduct of research in a rapidly-changing environment.
The roots of NCDI lie in NSF’s earlier and largely unfulfilled CIF21 program. A path forward for NCDI, which could realize the aspirations of CIF21, is the Major Research Experimental Facilities Construction (MREFC) program, under which NSF has funded projects such as the ALMA telescope array, the Ocean Observatory Initiative (OOI), and the IceCube neutrino observatory. The white paper also notes that “…the MREFC vehicle is used for multi-agency initiatives, e.g., between NSF and DOE, where MREFC and CD processes have to be intertwined. So this could in principle provide a mechanism for, say, NSF-NIH-DOE cooperation in providing a national computing and data infrastructure.”
To be sure, modifications to the MREFC process - which is tuned to a single investment in a large-scale facility – would be needed to accommodate instead the rolling funding of a continuously evolving infrastructure of computers at various scales, national data repositories at several sites, and the national network.
The ideas behind NCDI are neither new, nor exclusively a U.S. concern. At a recent meeting of PRACE – Partnership for Advanced Computing in Europe - Dr. Augusto Burgueño Arjona of the European Commission read the following snippet from a draft  resolution of the Council of the European Union, the EU’s highest legislative body:
“THE COUNCIL OF THE EUROPEAN UNION...
12. STRESSES the importance of PRACE, a world-class European High Performance Computing (HPC) infrastructure for research that provides access to computing resources and services for large-scale scientific and engineering applications; ACKNOWLEDGES the need to develop the new generation of HPC technologies and CALLS for the reinforcement of the interconnected network of data processing facilities GEANT. In this respect, INVITES ESFRI to explore mechanisms for better coordination of Member States' investment strategies in e-infrastructures, covering also HPC, distributed computing, scientific data and networks;”
(ESFRI is the European Strategy Forum on Research Infrastructures, a body of the European Commission with members appointed by the Research Ministers of member states.)
Although the U.S. and European research landscapes differ in many ways, the convergence of computing, data, and networking, together with the “Big Data” onslaught, are forcing reconsideration of 20th century models of cyberinfrastructure. And in Europe, these considerations have reached the highest level of government.
The organizational change at NSF that moved the Office of CyberInfrastructure from the Office of the Director of NSF to become the Division of Advanced Cyberinfrastructure in the CISE Directorate, has been a source of concern in the HPC community – and this fact is briefly mentioned near the end of the white paper. Arguments for moving OCI back to OD are made in a letter written by the three former Directors of OCI to the Director of NSF, Dr. France A. Córdova.
The letter begins by outlining the lengthy community consultation from the Atkins report of 2003 to the six Task Force reports written under the auspices of the Advisory Committee for Cyberinfrastructure in 2009. The resulting OCI produced shared infrastructure by “…leveraging co-investment with Directorates, through an effective coordination council, and through the director of OCI being a respected, full partner with the ADs and the Director.” That is, if funding for cyberinfrastructure comes “off the top” of the NSF budget, then the director of OCI must be both organizationally as well as professionally/intellectually the peer of the other contenders for slices of the NSF budget: the Assistant Directors. And OCI belongs in the Office of the Director of NSF because its activities cut indiscriminately across all NSF Directorates.
The letter notes “No university has placed the responsibility for campus-wide cyberinfrastructure-enabled research within their computer science department, analogous to what NSF has now done by moving OCI back into CISE.” The NSF GEO Directorate funds NCAR and the NCAR-Wyoming Supercomputer Center; this is an example of a research directorate funding facilities. But the crucial point is that the facilities are used almost exclusively by researchers in the geosciences, so that the facilities and their users form to some degree a homogeneous community within which “infrastructure vs. research” budget discussions can occur inside the GEO Directorate. The letter to Dr. Córdova makes the point that “Housing an OCI function in CISE also creates an inherent tension between funding of research for the CS community (a primary mission) and funding infrastructure and innovative infrastructure development for all of science.”
SUMMARY OF REPORT FINDINGS AND NEXT STEPS
Feedback to the NRC panel on its interim report from the two “Brainstorm HPCD” workshops and from Internet2 has:
- Re-emphasized the criticality of the U.S. national network – i.e., the Internet2 national infrastructure together with its ecosystem of state and regional optical networks – to a fully functional national cyberinfrastructure,
- Recalled past successes of NSF public-private partnerships and recommended their revival,
- Proposed a coherent National Computing and Data Infrastructure, NCDI, together with a new funding discipline and model based on a modification of the MREFC process that by design  supports both Federal interagency collaboration and NSF-wide participation on the NCDI, and
- Proposed returning the NSF Advanced Cyberinfrastructure program to the Office of the Director.
The importance of a comprehensive cyberinfrastructure to regional competitiveness has also been reaffirmed by the European Union.
The proposed organizational change at NSF is strongly supported by the three former Directors of the Office of Cyberinfrastructure. It is however important to note that the organizational change and the MREFC proposal are distinct and unrelated; i.e., either may occur without the other.
 On July 29, 2015, President Obama issued an Executive Order – “Creating a National Strategic Computing Initiative.” DoE, DoD, and NSF are the lead agencies for the NSCI.
 In his talk, Dr. Arjona stressed that this was the “final draft” of the resolution, unlikely to change before issuance.
 References to, and quotes from, the letter to Dr. Córdova are made with permission of the authors. I am grateful to Alan Blatecky for his permission and for obtaining these permissions from his fellow authors.