|
|
|
|
The Office of Science (SC) is the
single largest supporter of basic research in the physical sciences in the
United States, É providing more than 40 percent of total funding É for the
NationÕs research programs in high-energy physics, nuclear physics, and
fusion energy sciences. (http://www.science.doe.gov) – SC funds 25,000
PhDs and PostDocs |
|
A primary mission of SCÕs National Labs
is to build and operate very large scientific instruments - particle
accelerators, synchrotron light sources, very large supercomputers - that
generate massive amounts of data and involve very large, distributed
collaborations |
|
Distributed data analysis and
simulation is the emerging approach for these complex problems |
|
ESnet is an SC program whose primary
mission is to enable the large-scale science of the Office of Science (SC)
that depends on: |
|
Sharing of massive amounts of data |
|
Supporting thousands of collaborators
world-wide |
|
Distributed data processing |
|
Distributed data management |
|
Distributed simulation, visualization,
and computational steering |
|
Collaboration with the US and
International Research and Education community |
|
|
|
The systems are data intensive and
high-performance, typically moving terabytes a day for months at a time |
|
The system are high duty-cycle,
operating most of the day for months at a time in order to meet the
requirements for data movement |
|
The systems are widely distributed
– typically spread over continental or inter-continental distances |
|
Such systems depend on network
performance and availability, but these characteristics cannot be taken for
granted, even in well run networks, when the multi-domain network path is
considered |
|
The applications must be able to get
guarantees from the network that there is adequate bandwidth to accomplish
the task at hand |
|
The applications must be able to get
information from the network that allows graceful failure and auto-recovery
and adaptation to unexpected network conditions that are short of outright
failure |
|
|
|
|
Configurable |
|
Must be able to provide multiple,
specific ÒpathsÓ (specified by the user as end points) with specific
characteristics |
|
Schedulable |
|
Premium service such as guaranteed
bandwidth will be a scarce resource that is not always freely available,
therefore time slots obtained through a resource allocation process must be
schedulable |
|
Predictable |
|
A committed time slot should be
provided by a network service that is not brittle - reroute in the face of
network failures is important |
|
Reliable |
|
Reroutes should be largely transparent
to the user |
|
Informative |
|
When users do system planning they
should be able to see average path characteristics, including capacity |
|
When things do go wrong, the network
should report back to the user in ways that are meaningful to the user so
that informed decisions can about alternative approaches |
|
Scalable |
|
The underlying network should be able
to manage its resources to provide the appearance of scalability to the user |
|
Geographically comprehensive |
|
The R&E network community must act
in a coordinated fashion to provide this environment end-to-end |
|
|
|
|
Provide configurability,
schedulability, predictability, and reliability with a flexible virtual
circuit service - OSCARS |
|
User* specifies end points, bandwidth,
and schedule |
|
OSCARS can do fast reroute of the
underlying MPLS paths |
|
Provide useful, comprehensive, and
meaningful information on the state of the paths, or potential paths, to the
user |
|
perfSONAR, and associated tools,
provide real time information in a form that is useful to the user (via
appropriate network abstractions) and that is delivered through standard
interfaces that can be incorporated in to SOA type applications |
|
Techniques need to be developed to
monitor virtual circuits based on the approaches of the various R&E nets
- e.g. MPLS in ESnet, VLANs, TDM/grooming devices (e.g. Ciena Core
Directors), etc., and then integrate this into a perfSONAR framework |
|
|
|
|
|
ESnet OSCARS [OSCARS] project has as
its goals: |
|
Traffic isolation and traffic
engineering |
|
Provides for high-performance,
non-standard transport mechanisms that cannot co-exist with commodity
TCP-based transport |
|
Enables the engineering of explicit
paths to meet specific requirements |
|
e.g. bypass congested links, using
lower bandwidth, lower latency paths |
|
Guaranteed bandwidth (Quality of
Service (QoS)) |
|
User specified bandwidth |
|
Addresses deadline scheduling |
|
Where fixed amounts of data have to
reach sites on a fixed schedule,
so that the processing does not fall far enough behind that it could never
catch up – very important for experiment data analysis |
|
Reduces cost of handling high bandwidth
data flows |
|
Highly capable routers are not
necessary when every packet goes to the same place |
|
Use lower cost (factor of 5x) switches
to relatively route the packets |
|
Secure connections |
|
The circuits are ÒsecureÓ to the edges
of the network (the site boundary) because they are managed by the control
plane of the network which is isolated from the general traffic |
|
End-to-end (cross-domain) connections
between Labs and collaborating institutions |
|
|
|
|
|
To ensure compatibility, the design and
implementation is done in collaboration with the other major science R&E
networks and end sites |
|
Internet2: Bandwidth Reservation for
User Work (BRUW) |
|
Development of common code base |
|
GƒANT: Bandwidth on Demand (GN2-JRA3),
Performance and Allocated Capacity for End-users (SA3-PACE) and Advance
Multi-domain Provisioning System (AMPS) extends to NRENs |
|
BNL: TeraPaths - A QoS Enabled
Collaborative Data Sharing Infrastructure for Peta-scale Computing Research |
|
GA: Network Quality of Service for
Magnetic Fusion Research |
|
SLAC: Internet End-to-end Performance
Monitoring (IEPM) |
|
USN: Experimental Ultra-Scale Network
Testbed for Large-Scale Science |
|
DRAGON/HOPI: Optical testbed |
|
|
|
|
Remote, multi-institutional, identity
authentication is critical for distributed, collaborative science in order to
permit sharing widely distributed computing and data resources, and other
Grid services |
|
Public Key Infrastructure (PKI) is used
to formalize the existing web of trust within science collaborations and to
extend that trust into cyber space |
|
The function, form, and policy of the
ESnet trust services are driven entirely by the requirements of the science
community and by direct input from the science community |
|
International scope trust agreements
that encompass many organizations are crucial for large-scale collaborations |
|
ESnet has lead in negotiating and
managing the cross-site, cross-organization, and international trust
relationships to provide policies that are tailored for collaborative science |
|
This service, together with the
associated ESnet PKI service, is the basis of the routine sharing of HEP
Grid-based computing resources between US and Europe |
|
|
|
|
[OSCARS] |
|
For more information contact Chin
Guok (chin@es.net). Also see http://www.es.net/oscars |
|
[LHC/CMS] |
|
http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Activity::RatePlots?graph=quantity_cumulative&entity=src&src_filter=&dest_filter=&no_mss=true&period=l52w&upto= |
|
[ICFA SCIC] ÒNetworking for High Energy Physics.Ó International
Committee for Future Accelerators (ICFA), Standing Committee on
Inter-Regional Connectivity (SCIC), Professor Harvey Newman, Caltech,
Chairperson. |
|
http://monalisa.caltech.edu:8080/Slides/ICFASCIC2007/ |
|
[E2EMON] Geant2 E2E Monitoring System –developed and
operated by JRA4/WI3, with implementation done at DFN |
|
http://cnmdev.lrz-muenchen.de/e2e/html/G2_E2E_index.html |
|
http://wiki.perfsonar.net/jra1- wiki/index.php/PerfSONAR_support_for_E2E_Link_Monitoring |
|
[TrViz] ESnet PerfSONAR Traceroute Visualizer |
|
https://performance.es.net/cgi-bin/level0/perfsonar-trace.cgi |