Ultimate Integration
Joseph Lappa
Pittsburgh Supercomputing Center
 ESCC/Internet2 Joint Techs Workshop

Agenda
Supercomputing 2004 Conference
Application
Ultimate Integration
Resource Overview
Did it work?
What did we take from it?

Supercomputing 2004
Annual Conference
Supercomputers
Storage
Network hardware
Original reason for application
Bandwidth Challenge
Didn’t apply due to time

Application Requirements
Runs on Lemieux (PSC’s supercomputer)
Application Gateways (AGW)
Cisco CRS-1
40Gb/sec OC-768 cards
Few exist
Single application
Be used with another demo on the show floor if possible

Ultimate Integration Application
Checkpoint Recovery System
Program
Garden variety Laplace solver instrumented to save its memory state in checkpoint files
Checkpoints memory to remote network clients
Runs on 34 Lemieux nodes

Lemieux TCS System
750 Compaq Alphaserver ES45 nodes
SMP
Four 1GHz Alpha Processors
4 GB of Memory
Interconnection
Quadrics Cluster Interconnect
Shared memory library

Application Gateways
750 GigE connections are very expensive
Reuse Quadrics network to attach cheap Linux boxes with GigE
15 AGWS
Single processor Xeons
1 Quadrics card
2 Intel GigE
Each GigE card maxes out at 990Mb/sec
Only need 30 GigE to fill link to Teragrid
Web100 kernel

Application Gateways

Network
Cisco 6509
Sup720
WS-X6748-SFP
Two WS-X6704-10GE
Used 4 10GE interfaces
OSPF load balancing was my real worry
 >30 GE streams over 4 links

Network
Cisco CRS-1
40 Gb/sec slot
16 slots
For Demo
Two OC-768 cards
Ken Goodwin’s and Kevin McGratten’s big worry was the OC-768 transport
Two 8 Port 10 GE cards
Running production IOS-XR code
Had problems with tracking hardware
Ran both without 2 Switching Fabrics with no effects
on traffic

Network
Cisco CRS-1
One at Westinghouse Machine Room
One on show floor
Fork lift needed to place it
7 feet tall
939 lbs empty
1657 lbs fully loaded

The Magic Box
Stratalight – OTS 4040 transponder “compresses” the 40Gbs signal to fit into the spectral bandwidth of a traditional 10G wave
http://www.stratalight.com/
Uses proprietary encoding techniques
The Stratalight transponder was connected to the Mux/DMUX of the 15454 as an alien wavelength

Time Dependences
OC-768 wasn’t worked on until one week before the conference

OC-768

OC-768

OC-768

Where Does the Data Land?
Lustre Filesystem
http://www.lustre.org/
Developed by Cluster File Systems
http://www.clusterfs.com/
POSIX compliant, Open Source, parallel file system
Separates metadata and data objects to allow for speed and scaling

The Show Floor
8 Checkpoint Servers with a 10GigE and Infiniband connections
5 Lustre OSTs connected via Infiniband with 2 SCSI disk shelves (RAID5)
Lustre meta-data server (MDS) connected via Infiniband

The Show Floor

The Demo

How well did it run?
Laplace Solver w/ Checkpoint Recovery
Using 16 Application Gateways (32 GigE connections): 31.1Gbs
Only 32 Lemieux nodes were available
IPERF
Using 17 Application Gateways + 3 single GigE attached machines:   35 Gbs
Zero SONET errors reported on interface
Over 44TB were transferred

The Team

Just Demoware?
AGWs
qsub command now has AGW option
Can do accounting (and possibly billing)
Mysql database with Web100 stats
Validated that AGW was cost effective solution
OC-768 Metro can be done by mere mortals

Just Demoware??
Application receiver
Laplace solver ran at PSC
Checkpoint receiver program tested / run at both NCSA and SDSC
Ten IA64 compute nodes as receiver
~10 Gb/sec Network to Network (/dev/null)
990 Mb/sec * 10 streams

Thank You