stanislav shalunov: linkknot: TCP

Transmission Control Protocol (TCP) carries between 95% and 99% by different estimates. Most of the rest is DNS today (May 2001), streaming media and network games publicity notwithstanding.

TCP provides the end hosts with an abstraction of a data stream, even though it typically runs over a packet-switched network. That is, applications can listen for connections, open connections, and, once a connection is established, send octets. These octets are then transparently for the application packaged into packets, sent, acknowledged, and retransmitted over the network as necessary.

Each TCP connection has two directions, which are essentially independent. In the following we call one party a sender and the other a receiver; of course, data can flow in the other direction as well, with roles being reversed.

TCP demultiplexes IP addresses with its own addresses: 16-bit TCP port numbers. A TCP connection is characterized by a tuple consisting of source IP address, destination IP address, source TCP port number, and destination TCP port number.

TCP provides reliable and in-order data delivery, notification about failures, flow control, and congestion control.

Reliable data delivery is accomplished by transmitting sequence numbers with all packets, and acknowledging up to which octet offset the receiver has seen the stream. This gives the sender the ability to retransmit lost packets, while the receiver can recover correct sequence if reordering has occured.

If acknowledgments or other required responses don't come back, TCP will try retransmitting for a while but eventually it'll time out and notify the application about the failure.

Flow control is accomplished by a window being advertised by the receiver. The sender may not have more than window octets outstanding (not yet acknowledged) by the receiver. This gives the receiver ability to throttle the sender down to a rate that the receiver can accept.

Congestion control is somewhat more involved. The sender keeps track of a parameter called congestion window. The effective window is than the minimum of receiver-advertised window and the congestion window. Congestion window starts out as one (or, sometimes, experimentally, two) maximum segment sizes (MSS). The sender estimates current round-trip time (RTT) between itself and the receiver (based on acknowledgment packets that come in response to its packets) and keeps track of this estimate. It's typically an exponentially decaying running average of observed round-trip times. Initially the sender enters the slow start phase of the connection; in slow start phase, during each RTT when there was no detected loss congestion window is doubled (so it grows exponentially and "slow start" is in effect a misnomer). Once the first packet loss is detected, slow start is over and the normal phase begins. From this point on, during each RTT when no loss is detected, congestion window is increased by one maximum segment size; during each RTT when any losses were detected, congestion window is halved.

Loss detection is based upon receiving three duplicates of the same acknowledgment or non-arrival of acknowledgments within a given time frame. Timers are set so that they reflect network properties. (Usually currently estimated RTT plus four times currently estimated jitter.)

If latency is stable, sender's window is proportional to connection throughput.

Notice that this procedure, in line with the End-to-End Principle offloads most of the work to end hosts. It should be noted that even the determination of fair link share is offloaded to the end hosts: change the magic numbers (1 MSS, 1/2 multiplier), and you get more (or less) than others.

Assuming a steady state network (essentially this assumption means that "our" TCP flow only takes a negligible fraction of total capacity), estimated TCP throughput will be roughly proportional to MSS divided by the product of RTT and square root of loss. (This formula won't work too well for large loss probability values.)

Assuming, on the other extreme, that there's only one TCP connection and no random loss, the connection will get roughly 3/4 of the total link capacity, and router queues will oscillate as congestion window does its characteristic sawtooth.