Key Takeaways from “Improving Data Mobility & Management for International Climate Science” CrossConnects Workshop
The second CrossConnects Workshop, a collaboration between Internet2 and ESnet focused on bringing domain scientists together with network engineers, was held July 14-16. “Improving Data Mobility & Management for International Climate Science,” hosted by NOAA’s Boulder Lab, focused on science drivers, facilities and best practices and allowed key stakeholders to discuss and brainstorm. One of the major goals of the workshop was to help establish a clear priority from the climate community to help serve them better and more efficiently.
Keynotes included Dr. Alexander “Sandy” MacDonald on the science that drives the data issues and Dr. Venkatramani Balaji, Head of the Modeling Systems Group for NOAA/GFDL and Princeton University, focused on the intersection of science and software (the use of information technology to facilitate the science).
Through presentations and ample discussion between the key stakeholders from multiple agencies and programs, the goal was to provide the climate science community with the knowledge, tools, and partners necessary to improve data transfer performance as data scale continues to increase.
Key takeaways from the workshop include:
- Knowledgebase for Climate focused workflow tools (like fasterdata.es.net)
- Find the common themes and questions that are asked by multiple researchers, many times. What are those common cases and common end-points/data centers/computational facilities?
- Should processing be where the data is or do we move the data to where the processing is? What does the science community want? A deep discussion ensued on this topic.
- Does the community need a service interface?
- Data origin is important - does it get affected when data moves around? Do you lose clarity on what’s the true source?
- How do you deal with data that is exascale - do you just replicate it in a few places and then deal with extracting a subset or just run the model where the data is?
- Server side analysis - build a virtual computer around the world
- Another way to view the question - where do you put the funding and IT responsibility if you put the responsibility of curation of data on the end user?
- Relevant tools - any tools that caught people’s attention?