FAST TCP
|
|
|
Bartek Wydrowski |
|
Steven Low |
Acks & Collaborators
|
|
|
|
Caltech |
|
Bunn, Choe, Doyle, Hegde, Jin, Li, Low
Newman, Papadoupoulous, Ravot, Singh, Tang, J. Wang, Wei, Wydrowski, Xia |
|
UCLA |
|
Paganini, Z. Wang |
|
StarLight |
|
deFanti, Winkler |
|
CERN |
|
Martin |
|
SLAC |
|
Cottrell |
|
PSC |
|
Mathis |
Outline
|
|
|
|
Background, motivation |
|
FAST TCP |
|
Architecture and algorithms |
|
Experimental evaluations |
|
Loss recovery |
|
MaxNet, SUPA FAST |
Performance at large windows
Slide 5
Slide 6
Congestion control
|
|
|
|
Example congestion measure pl(t) |
|
Loss (Reno) |
|
Queueing delay (Vegas) |
TCP/AQM
|
|
|
|
Congestion control is a distributed
asynchronous algorithm to share bandwidth |
|
It has two components |
|
TCP: adapts sending rate (window) to
congestion |
|
AQM: adjusts & feeds back
congestion information |
|
They form a distributed feedback
control system |
|
Equilibrium & stability depends on
both TCP and AQM |
|
And on delay, capacity, routing,
#connections |
Packet & flow level
Reno TCP
Reno TCP
Packet level
Flow level: Reno, HSTCP,
STCP, FAST
Flow level: Reno, HSTCP,
STCP, FAST
Implementation strategy
Difficulties at large window
Problem: no target
Solution: estimate target
Difficulties at large window
Problem: binary signal
Solution: multibit signal
Difficulties at large window
Outline
|
|
|
|
Background,
motivation |
|
FAST TCP |
|
Architecture and algorithms |
|
Experimental evaluations |
|
Loss recovery |
|
MaxNet,
SUPA FAST |
Architecture
Architecture
|
|
|
Each component |
|
designed independently |
|
upgraded asynchronously |
Architecture
|
|
|
Each component |
|
designed independently |
|
upgraded asynchronously |
Window control algorithm
|
|
|
|
Full utilization |
|
regardless
of bandwidth-delay product |
|
Globally stable |
|
exponential convergence |
|
Fairness |
|
weighted proportional fairness |
|
parameter a |
Window control algorithm
Window control algorithm
Outline
|
|
|
|
Background,
motivation |
|
FAST TCP |
|
Architecture and algorithms |
|
Experimental evaluations |
|
Loss recovery |
|
MaxNet,
SUPA FAST |
Dynamic sharing: 3 flows
Dynamic sharing: 3 flows
Slide 33
Slide 34
Aggregate throughput
|
|
|
Dummynet: cap = 800Mbps; delay =
50-200ms; #flows = 1-14; 29 expts |
Fairness
|
|
|
Dummynet: cap = 800Mbps; delay =
50-200ms; #flows = 1-14; 29 expts |
Stability
|
|
|
Dummynet: cap = 800Mbps; delay =
50-200ms; #flows = 1-14; 29 expts |
Responsiveness
|
|
|
Dummynet: cap = 800Mbps; delay =
50-200ms; #flows = 1-14; 29 expts |
I2LSR, SC2004 Bandwidth
Challenge
Internet2 Abilene Weather
Map
“Ultrascale” protocol
development:
FAST TCP
|
|
|
|
FAST TCP |
|
Based on TCP Vegas |
|
Uses end-to-end delay and loss to
dynamically adjust the congestion window |
|
Defines an explicit equilibrium |
Slide 42
Slide 43
Slide 44
Slide 45
Outline
|
|
|
|
Background,
motivation |
|
FAST TCP |
|
Architecture and algorithms |
|
Experimental evaluations |
|
Loss recovery |
|
MaxNet, SUPA FAST |
Loss Recovery Section
Overview
|
|
|
|
Linux & TCP loss recovery has
problems; esp. in non-congestion loss environments. |
|
New Loss Architecture: |
|
Determining packet loss & PIF |
|
Decoupled window control |
|
Testing in high loss environment |
|
Receiver window issues |
|
Forward Retransmission |
|
SACK processing optimization |
|
Reorder Detection |
|
Testing in small buffer environment |
|
|
|
|
New Loss Recovery
Architecture
|
|
|
|
New Architecture for loss recovery
motivated by new environments: |
|
High loss wireless, 802.11, Satellite |
|
Low loss, but large BDP |
|
|
|
|
|
|
|
|
|
|
|
Measure of Path ‘difficulty’ should be
extended |
|
BDLP: Bandwidith x Delay x (1/(1-Loss)) |
|
|
|
|
Slide 49
Haystack - 1 Flow
(Atlanta-> Japan)
Haystack – 2 Flows from 1
machine (Atlanta -> Japan)
Linux Loss Recovery Problem
New Loss Recovery
Architecture
|
|
|
|
Decouple congestion control from loss
recovery: |
|
No rate halving, cwnd resets, etc upon
loss. |
|
Window primarily controlled by delay. |
|
Efficient Retransmit mechansim: |
|
Linux TCP does not account PIF well
when there are retransmissions |
|
cwnd limits PIF, not write queue
length. |
|
Accurately discriminate loss from
reordering. |
|
Construct accurate way of determining
packet loss and PIF. |
|
|
Loss Recovery
Loss Recovery – PIF model
Loss Recovery Architecture
Loss Recovery – Window
control
Loss Recovery – Test
scenario
Slide 59
Slide 60
Slide 61
Forward Retransmission
|
|
|
Current Linux Receiver Window limits
transmission speed with high loss high BDP. |
|
Forward retransmission needed to reduce
require reorder and write queue sizes. |
Slide 63
Slide 64
Slide 65
Forward Retransmission
Forward Retransmission
Slide 68
SACK Processing
|
|
|
|
|
Processing of SACK packets in Linux is
CPU intensive as write queue needs to be traversed. |
|
All SKB’s prior to SACKed SKB are
marked as LOST. |
|
Traversal can invalidate large amount
of memory cache. |
|
|
|
TOQ allows us to eliminate LOST and
RETRANSMITTED flags in SKB’s in the write queue. This allows a number of
optimizations |
|
eliminates traversing write queue at
each SACK. |
|
Possible to go directly to SKB and mark
that as SACKed. Then finding pointer to SKB can be quite fast with |
|
SACK BLOCK pointer cache |
|
SEQ->*SKB lookup table |
SACK
Processing
Architecture
SACK Processing
CPU Utilization
Packet Reordering
Detecting Reordering
|
|
|
|
record highest seq received so far
(newest_seq) |
|
reorder(t+1) =
max(reorder(t),newest_seq – seq) |
|
|
|
Easy if packet was never retransmitted;
need to
identify retransmitted packets carefully: |
|
If 2 retrans, and xmit time of 2nd
<< RTT ago, (S)ACK is for 1st xmit |
|
If multiple retrans, unique sender
timestamp will identify which xmit caused (S)ACK. |
Reordering Detection -
Experiment
Reordering Experiment: WITH
REORDER DETECTION
Throughput Vs Reorder Injected
@ Different Link Capacities
Reordering Experiment:
Reorder Detected Vs Reorder Injected
@ Different Link Capacities
Conclusions
|
|
|
Decoupling of Loss and Congestion
control that FAST has facilitated has allowed the development of a new highly
efficient loss recovery mechanism. |
|
|
|
By using delay as congestion measure,
it is possible to achieve close to maximum possible good-put with loss. |
|
|
|
|
Outline
|
|
|
|
Background,
motivation |
|
FAST TCP |
|
Architecture and algorithms |
|
Experimental evaluations |
|
Loss recovery |
|
MaxNet, SUPA FAST |
Slide 79
Why explicit signal is
useful?
|
|
|
|
Queuing delay necessary with delay
based protocols: |
|
Explicit protocols reduce queuing delay |
|
And Reduce requirement for router
buffer sizes |
|
Explicit signal can be used to set
right starting rate. |
|
Some challenges with delay based
protocols: |
|
Alpha tuning |
|
baseRTT (sampling, route change etc) |
|
BQD Backward queuing delay |
|
Delay noise (OS etc) |
|
Interaction of Link layer
retransmission. |
|
Interaction with Wireless coding etc. |
|
|
|
|
|
|
Slide 81
MaxNet: System
MaxNet: Source & Link
Algorithm
Computing Explicit Signal in
real routers
MaxNet & XCP Properties
MaxNet & XCP Properties
XCP Max-Min Fairness
Explicit Starting Rate
Signaling Unified Protocol
Architecture
(SUPA FAST TCP)
SUPA FAST TCP
Network Components
Conclusion
|
|
|
MaxNet provides a framework for doing
explicit signal congestion control. |
|
A practical approach would involve
combining different congestion signals. |
|
Evolution of Internet from loss-based
protocols to explicit signaling is possible in an incremental way. |
|
Explicit protocols solve many of the
challenges of using loss or delay as a congestion signal. |
|
Widespread deployment not near-term.
However, specialized applications where there is pain may be deployed sooner. |