FAST TCP
Bartek Wydrowski
Steven Low

Acks & Collaborators
Caltech
Bunn, Choe, Doyle, Hegde, Jin, Li, Low Newman, Papadoupoulous, Ravot, Singh, Tang, J. Wang, Wei, Wydrowski, Xia
UCLA
Paganini, Z. Wang
StarLight
deFanti, Winkler
CERN
Martin
SLAC
Cottrell
PSC
Mathis

Outline
 Background, motivation
 FAST TCP
Architecture and algorithms
Experimental evaluations
Loss recovery
 MaxNet, SUPA FAST

Performance at large windows

Slide 5

Slide 6

Congestion control
Example congestion measure pl(t)
Loss (Reno)
Queueing delay (Vegas)

TCP/AQM
Congestion control is a distributed asynchronous algorithm to share bandwidth
It has two components
TCP: adapts sending rate (window) to congestion
AQM: adjusts & feeds back congestion information
They form a distributed feedback control system
Equilibrium & stability depends on both TCP and AQM
And on delay, capacity, routing, #connections

Packet & flow level

Reno TCP

Reno TCP

Packet level

Flow level: Reno, HSTCP, STCP, FAST

Flow level: Reno, HSTCP, STCP, FAST

Implementation strategy

Difficulties at large window

Problem: no target

Solution: estimate target

Difficulties at large window

Problem: binary signal

Solution: multibit signal

Difficulties at large window

Outline
 Background, motivation
 FAST TCP
Architecture and algorithms
Experimental evaluations
Loss recovery
 MaxNet, SUPA FAST

Architecture

Architecture
Each component
 designed independently
 upgraded asynchronously

Architecture
Each component
 designed independently
 upgraded asynchronously

Window control algorithm
 Full utilization
 regardless of bandwidth-delay product
 Globally stable
 exponential convergence
 Fairness
 weighted proportional fairness
 parameter a

Window control algorithm

Window control algorithm

Outline
 Background, motivation
 FAST TCP
Architecture and algorithms
Experimental evaluations
Loss recovery
 MaxNet, SUPA FAST

Dynamic sharing: 3 flows

Dynamic sharing: 3 flows

Slide 33

Slide 34

Aggregate throughput
Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Fairness
Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Stability
Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Responsiveness
Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

I2LSR, SC2004 Bandwidth Challenge

Internet2 Abilene Weather Map

“Ultrascale” protocol development:
FAST TCP
FAST TCP
Based on TCP Vegas
Uses end-to-end delay and loss to dynamically adjust the congestion window
Defines an explicit equilibrium

Slide 42

Slide 43

Slide 44

Slide 45

Outline
 Background, motivation
 FAST TCP
Architecture and algorithms
Experimental evaluations
Loss recovery
 MaxNet, SUPA FAST

Loss Recovery Section Overview
Linux & TCP loss recovery has problems; esp. in non-congestion loss environments.
New Loss Architecture:
Determining packet loss & PIF
Decoupled window control
Testing in high loss environment
Receiver window issues
Forward Retransmission
SACK processing optimization
Reorder Detection
Testing in small buffer environment

New Loss Recovery Architecture
New Architecture for loss recovery motivated by new environments:
High loss wireless, 802.11, Satellite
Low loss, but large BDP
Measure of Path ‘difficulty’ should be extended
BDLP: Bandwidith x Delay x (1/(1-Loss))

Slide 49

Haystack - 1 Flow
(Atlanta-> Japan)

Haystack – 2 Flows from 1 machine  (Atlanta -> Japan)

Linux Loss Recovery Problem

New Loss Recovery Architecture
Decouple congestion control from loss recovery:
No rate halving, cwnd resets, etc upon loss.
Window primarily controlled by delay.
Efficient Retransmit mechansim:
Linux TCP does not account PIF well when there are retransmissions
cwnd limits PIF, not write queue length.
Accurately discriminate loss from reordering.
Construct accurate way of determining packet loss and PIF.

Loss Recovery

Loss Recovery – PIF model

Loss Recovery Architecture

Loss Recovery – Window control

Loss Recovery – Test scenario

Slide 59

Slide 60

Slide 61

Forward Retransmission
Current Linux Receiver Window limits transmission speed with high loss high BDP.
Forward retransmission needed to reduce require reorder and write queue sizes.

Slide 63

Slide 64

Slide 65

Forward Retransmission

Forward Retransmission

Slide 68

SACK Processing
Processing of SACK packets in Linux is CPU intensive as write queue needs to be traversed.
All SKB’s prior to SACKed SKB are marked as LOST.
Traversal can invalidate large amount of memory cache.
TOQ allows us to eliminate LOST and RETRANSMITTED flags in SKB’s in the write queue. This allows a number of optimizations
eliminates traversing write queue at each SACK.
Possible to go directly to SKB and mark that as SACKed. Then finding pointer to SKB can be quite fast with
SACK BLOCK pointer cache
SEQ->*SKB lookup table

SACK Processing
Architecture

SACK Processing
CPU Utilization

Packet Reordering

Detecting Reordering
record highest seq received so far (newest_seq)
reorder(t+1) = max(reorder(t),newest_seq – seq)
Easy if packet was never retransmitted; need to
identify retransmitted packets carefully:
If 2 retrans, and xmit time of 2nd << RTT ago, (S)ACK is for 1st xmit
If multiple retrans, unique sender timestamp will identify which xmit caused (S)ACK.

Reordering Detection - Experiment

Reordering Experiment: WITH REORDER DETECTION
Throughput Vs Reorder Injected
@ Different Link Capacities

Reordering Experiment:
Reorder Detected Vs Reorder Injected
@ Different Link Capacities

Conclusions
Decoupling of Loss and Congestion control that FAST has facilitated has allowed the development of a new highly efficient loss recovery mechanism.
By using delay as congestion measure, it is possible to achieve close to maximum possible good-put with loss.

Outline
 Background, motivation
 FAST TCP
Architecture and algorithms
Experimental evaluations
Loss recovery
MaxNet, SUPA FAST

Slide 79

Why explicit signal is useful?
Queuing delay necessary with delay based protocols:
Explicit protocols reduce queuing delay
And Reduce requirement for router buffer sizes
Explicit signal can be used to set right starting rate.
Some challenges with delay based protocols:
Alpha tuning
baseRTT (sampling, route change etc)
BQD Backward queuing delay
Delay noise (OS etc)
Interaction of Link layer retransmission.
Interaction with Wireless coding etc.

Slide 81

MaxNet: System

MaxNet: Source & Link Algorithm

Computing Explicit Signal in real routers

MaxNet & XCP Properties

MaxNet & XCP Properties

XCP Max-Min Fairness

Explicit Starting Rate

Signaling Unified Protocol Architecture
(SUPA FAST TCP)

SUPA FAST TCP
Network Components

Conclusion
MaxNet provides a framework for doing explicit signal congestion control.
A practical approach would involve combining different congestion signals.
Evolution of Internet from loss-based protocols to explicit signaling is possible in an incremental way.
Explicit protocols solve many of the challenges of using loss or delay as a congestion signal.
Widespread deployment not near-term. However, specialized applications where there is pain may be deployed sooner.