2. Overview of Concepts

This section will introduce traffic control and examine reasons for it, identify a few advantages and disadvantages and introduce key concepts used in traffic control.

2.1. What is it?

Traffic control is the name given to the sets of queuing systems and mechanisms by which packets are received and transmitted on a router. This includes deciding (if and) which packets to accept at what rate on the input of an interface and determining which packets to transmit in what order at what rate on the output of an interface.

In the simplest possible model, traffic control consists of a single queue which collects entering packets and dequeues them as quickly as the hardware (or underlying device) can accept them. This sort of queue is a FIFO. This is like a single toll booth for entering a highway. Every car must stop and pay the toll. Other cars wait their turn.

Linux provides this simplest traffic control tool (FIFO), and in addition offers a wide variety of other tools that allow all sorts of control over packet handling.

Note

The default qdisc under Linux is the pfifo_fast, which is slightly more complex than the FIFO.

There are examples of queues in all sorts of software. The queue is a way of organizing the pending tasks or data (see also Section 2.5, “Queues”). Because network links typically carry data in a serialized fashion, a queue is required to manage the outbound data packets.

In the case of a desktop machine and an efficient webserver sharing the same uplink to the Internet, the following contention for bandwidth may occur. The web server may be able to fill up the output queue on the router faster than the data can be transmitted across the link, at which point the router starts to drop packets (its buffer is full!). Now, the desktop machine (with an interactive application user) may be faced with packet loss and high latency. Note that high latency sometimes leads to screaming users! By separating the internal queues used to service these two different classes of application, there can be better sharing of the network resource between the two applications.

Traffic control is a set of tools allowing an administrator granular control over these queues and the queuing mechanisms of a networked device. The power to rearrange traffic flows and packets with these tools is tremendous and can be complicated, but is no substitute for adequate bandwidth.

The term Quality of Service (QoS) is often used as a synonym for traffic control at an IP-layer.

2.2. Why use it?

Traffic control tools allow the implementer to apply preferences, organizational or business policies to packets or network flows transmitted into a network. This control allows stewardship over the network resources such as throughput or latency.

Fundamentally, traffic control becomes a necessity because of packet switching in networks.

In simplest terms, the traffic control tools allow somebody to enqueue packets into the network differently based on attributes of the packet. The various different tools each solve a different problem and many can be combined, to implement complex rules to meet a preference or business goal.

There are many practical reasons to consider traffic control, and many scenarios in which using traffic control makes sense. Below are some examples of common problems which can be solved or at least ameliorated with these tools.

The list below is not an exhaustive list of the sorts of solutions available to users of traffic control, but shows the types of common problems that can be solved by using traffic control tools to maximize the usability of the network.

Common traffic control solutions

  • Limit total bandwidth to a known rate; TBF, HTB with child class(es).

  • Limit the bandwidth of a particular user, service or client; HTB classes and classifying with a filter.

  • Maximize TCP throughput on an asymmetric link; prioritize transmission of ACK packets, wondershaper.

  • Reserve bandwidth for a particular application or user; HTB with children classes and classifying.

  • Prefer latency sensitive traffic; PRIO inside an HTB class.

  • Managed oversubscribed bandwidth; HTB with borrowing.

  • Allow equitable distribution of unreserved bandwidth; HTB with borrowing.

  • Ensure that a particular type of traffic is dropped; policer attached to a filter with a drop action.

Remember that, sometimes, it is simply better to purchase more bandwidth. Traffic control does not solve all problems!

2.3. Advantages

When properly employed, traffic control should lead to more predictable usage of network resources and less volatile contention for these resources. The network then meets the goals of the traffic control configuration. Bulk download traffic can be allocated a reasonable amount of bandwidth even as higher priority interactive traffic is simultaneously serviced. Even low priority data transfer such as mail can be allocated bandwidth without tremendously affecting the other classes of traffic.

In a larger picture, if the traffic control configuration represents policy which has been communicated to the users, then users (and, by extension, applications) know what to expect from the network.

2.4. Disdvantages

Complexity is easily one of the most significant disadvantages of using traffic control. There are ways to become familiar with traffic control tools which ease the learning curve about traffic control and its mechanisms, but identifying a traffic control misconfiguration can be quite a challenge.

Traffic control when used appropriately can lead to more equitable distribution of network resources. It can just as easily be installed in an inappropriate manner leading to further and more divisive contention for resources.

The computing resources required on a router to support a traffic control scenario need to be capable of handling the increased cost of maintaining the traffic control structures. Fortunately, this is a small incremental cost, but can become more significant as the configuration grows in size and complexity.

For personal use, there's no training cost associated with the use of traffic control, but a company may find that purchasing more bandwidth is a simpler solution than employing traffic control. Training employees and ensuring depth of knowledge may be more costly than investing in more bandwidth.

Although traffic control on packet-switched networks covers a larger conceptual area, you can think of traffic control as a way to provide [some of] the statefulness of a circuit-based network to a packet-switched.

2.5. Queues

Queues form the backdrop for all of traffic control and are the integral concept behind scheduling. A queue is a location (or buffer) containing a finite number of items waiting for an action or service. In networking, a queue is the place where packets (our units) wait to be transmitted by the hardware (the service). In the simplest model, packets are transmitted in a first-come first-serve basis [2]. In the discipline of computer networking (and more generally computer science), this sort of a queue is known as a FIFO.

Without any other mechanisms, a queue doesn't offer any promise for traffic control. There are only two interesting actions in a queue. Anything entering a queue is enqueued into the queue. To remove an item from a queue is to dequeue that item.

A queue becomes much more interesting when coupled with other mechanisms which can delay packets, rearrange, drop and prioritize packets in multiple queues. A queue can also use subqueues, which allow for complexity of behaviour in a scheduling operation.

From the perspective of the higher layer software, a packet is simply enqueued for transmission, and the manner and order in which the enqueued packets are transmitted is immaterial to the higher layer. So, to the higher layer, the entire traffic control system may appear as a single queue [3]. It is only by examining the internals of this layer that the traffic control structures become exposed and available.

In the image below a simplified high level overview of the queues on the transmit path of the Linux network stack:

Figure 1: Simplified high level overview of the queues on the transmit path of the Linux network stack.

See 2.9 for details about NIC interface and 4.9 for details about driver queue.

2.6. Flows

A flow is a distinct connection or conversation between two hosts. Any unique set of packets between two hosts can be regarded as a flow. Under TCP the concept of a connection with a source IP and port and destination IP and port represents a flow. A UDP flow can be similarly defined.

Traffic control mechanisms frequently separate traffic into classes of flows which can be aggregated and transmitted as an aggregated flow (consider DiffServ). Alternate mechanisms may attempt to divide bandwidth equally based on the individual flows.

Flows become important when attempting to divide bandwidth equally among a set of competing flows, especially when some applications deliberately build a large number of flows.

2.7. Tokens and buckets

Two of the key underpinnings of a shaping mechanisms are the interrelated concepts of tokens and buckets.

In order to control the rate of dequeuing, an implementation can count the number of packets or bytes dequeued as each item is dequeued, although this requires complex usage of timers and measurements to limit accurately. Instead of calculating the current usage and time, one method, used widely in traffic control, is to generate tokens at a desired rate, and only dequeue packets or bytes if a token is available.

Consider the analogy of an amusement park ride with a queue of people waiting to experience the ride. Let's imagine a track on which carts traverse a fixed track. The carts arrive at the head of the queue at a fixed rate. In order to enjoy the ride, each person must wait for an available cart. The cart is analogous to a token and the person is analogous to a packet. Again, this mechanism is a rate-limiting or shaping mechanism. Only a certain number of people can experience the ride in a particular period.

To extend the analogy, imagine an empty line for the amusement park ride and a large number of carts sitting on the track ready to carry people. If a large number of people entered the line together many (maybe all) of them could experience the ride because of the carts available and waiting. The number of carts available is a concept analogous to the bucket. A bucket contains a number of tokens and can use all of the tokens in bucket without regard for passage of time.

And to complete the analogy, the carts on the amusement park ride (our tokens) arrive at a fixed rate and are only kept available up to the size of the bucket. So, the bucket is filled with tokens according to the rate, and if the tokens are not used, the bucket can fill up. If tokens are used the bucket will not fill up. Buckets are a key concept in supporting bursty traffic such as HTTP.

The TBF qdisc is a classical example of a shaper (the section on TBF includes a diagram which may help to visualize the token and bucket concepts). The TBF generates rate tokens and only transmits packets when a token is available. Tokens are a generic shaping concept.

In the case that a queue does not need tokens immediately, the tokens can be collected until they are needed. To collect tokens indefinitely would negate any benefit of shaping so tokens are collected until a certain number of tokens has been reached. Now, the queue has tokens available for a large number of packets or bytes which need to be dequeued. These intangible tokens are stored in an intangible bucket, and the number of tokens that can be stored depends on the size of the bucket.

This also means that a bucket full of tokens may be available at any instant. Very predictable regular traffic can be handled by small buckets. Larger buckets may be required for burstier traffic, unless one of the desired goals is to reduce the burstiness of the flows.

In summary, tokens are generated at rate, and a maximum of a bucket's worth of tokens may be collected. This allows bursty traffic to be handled, while smoothing and shaping the transmitted traffic.

The concepts of tokens and buckets are closely interrelated and are used in both TBF (one of the classless qdiscs) and HTB (one of the classful qdiscs). Within the tcng language, the use of two- and three-color meters is indubitably a token and bucket concept.

2.8. Packets and frames

The terms for data sent across network changes depending on the layer the user is examining. This document will rather impolitely (and incorrectly) gloss over the technical distinction between packets and frames although they are outlined here.

The word frame is typically used to describe a layer 2 (data link) unit of data to be forwarded to the next recipient. Ethernet interfaces, PPP interfaces, and T1 interfaces all name their layer 2 data unit a frame. The frame is actually the unit on which traffic control is performed.

A packet, on the other hand, is a higher layer concept, representing layer 3 (network) units. The term packet is preferred in this documentation, although it is slightly inaccurate.

2.9. NIC, Network Interface Controller

A network interface controller is a computer hardware component, differently from previous ones thar are software components, that connects a computer to a computer network. The network controller implements the electronic circuitry required to communicate using a specific data link layer and physical layer standard such as Ethernet, Fibre Channel, Wi-Fi or Token Ring. Traffic control must deal with the physical constraints and characteristics of the NIC interface.

2.9.1. Huge Packets from the Stack

Most NICs have a fixed maximum transmission unit (MTU) which is the biggest frame which can be transmitted by the physical medium. For Ethernet the default MTU is 1500 bytes but some Ethernet networks support Jumbo Frames of up to 9000 bytes. Inside the IP network stack, the MTU can manifest as a limit on the size of the packets which are sent to the device for transmission. For example, if an application writes 2000 bytes to a TCP socket then the IP stack needs to create two IP packets to keep the packet size less than or equal to a 1500 byte MTU. For large data transfers the comparably small MTU causes a large number of small packets to be created and transferred through the driver queue.

In order to avoid the overhead associated with a large number of packets on the transmit path, the Linux kernel implements several optimizations: TCP segmentation offload (TSO), UDP fragmentation offload (UFO) and generic segmentation offload (GSO). All of these optimizations allow the IP stack to create packets which are larger than the MTU of the outgoing NIC. For IPv4, packets as large as the IPv4 maximum of 65,536 bytes can be created and queued to the driver queue. In the case of TSO and UFO, the NIC hardware takes responsibility for breaking the single large packet into packets small enough to be transmitted on the physical interface. For NICs without hardware support, GSO performs the same operation in software immediately before queueing to the driver queue.

Recall from earlier that the driver queue contains a fixed number of descriptors which each point to packets of varying sizes, Since TSO, UFO and GSO allow for much larger packets these optimizations have the side effect of greatly increasing the number of bytes which can be queued in the driver queue. Figure 3 illustrates this concept.

Figure 2: Large packets can be sent to the NIC when TSO, UFO or GSO are enabled. This can greatly increase the number of bytes in the driver queue.

2.10. Starvation and Latency

The queue between the IP stack and the hardware (see chapter 4.2 for detail about driver queue or see chapter 5.5 for how manage it) introduces two problems: starvation and latency.

If the NIC driver wakes to pull packets off of the queue for transmission and the queue is empty the hardware will miss a transmission opportunity thereby reducing the throughput of the system. This is referred to as starvation. Note that an empty queue when the system does not have anything to transmit is not starvation – this is normal. The complication associated with avoiding starvation is that the IP stack which is filling the queue and the hardware driver draining the queue run asynchronously. Worse, the duration between fill or drain events varies with the load on the system and external conditions such as the network interface’s physical medium. For example, on a busy system the IP stack will get fewer opportunities to add packets to the buffer which increases the chances that the hardware will drain the buffer before more packets are queued. For this reason it is advantageous to have a very large buffer to reduce the probability of starvation and ensures high throughput.

While a large queue is necessary for a busy system to maintain high throughput, it has the downside of allowing for the introduction of a large amount of latency.

Figure 3: Interactive packet (yellow) behind bulk flow packets (blue)

Figure 3 shows a driver queue which is almost full with TCP segments for a single high bandwidth, bulk traffic flow (blue). Queued last is a packet from a VoIP or gaming flow (yellow). Interactive applications like VoIP or gaming typically emit small packets at fixed intervals which are latency sensitive while a high bandwidth data transfer generates a higher packet rate and larger packets. This higher packet rate can fill the buffer between interactive packets causing the transmission of the interactive packet to be delayed. To further illustrate this behaviour consider a scenario based on the following assumptions:

  • A network interface which is capable of transmitting at 5 Mbit/sec or 5,000,000 bits/sec.

  • Each packet from the bulk flow is 1,500 bytes or 12,000 bits.

  • Each packet from the interactive flow is 500 bytes.

  • The depth of the queue is 128 descriptors

  • There are 127 bulk data packets and 1 interactive packet queued last.

Given the above assumptions, the time required to drain the 127 bulk packets and create a transmission opportunity for the interactive packet is (127 * 12,000) / 5,000,000 = 0.304 seconds (304 milliseconds for those who think of latency in terms of ping results). This amount of latency is well beyond what is acceptable for interactive applications and this does not even represent the complete round trip time – it is only the time required transmit the packets queued before the interactive one. As described earlier, the size of the packets in the driver queue can be larger than 1,500 bytes if TSO, UFO or GSO are enabled. This makes the latency problem correspondingly worse.

Choosing the correct size for the driver queue is a Goldilocks problem – it can’t be too small or throughput suffers, it can’t be too big or latency suffers.

2.11. Relationship between throughput and latency

In all traffic control systems, there is a relationship between throughput and latency. The maximum information rate of a network link is termed bandwidth, but for the user of a network the actually achieved bandwidth has a dedicated term, throughput.

latency

the delay in time between a sender's transmission and the recipient's decoding or receiving of the data; always non-negative and non-zero (time doesn't move backwards, then)

in principle, latency is unidirectional, however almost the entire Internet networking community talks about bidirectional delay —the delay in time between a sender's transmission of data and some sort of acknowledgement of receipt of that data; cf. ping

measured in milliseconds (ms); on Ethernet, latencies are typically between 0.3 and 1.0 ms and on wide-area networks (i.e. to your ISP, across a large campus or to a remote server) between 5 to 300 ms

throughput

a measure of the total amount of data that can be transmitted successfully between a sender and receiver

measured in bits per second; the measurement most often quoted by complaining users after buying a 10Mbit/s package from their provider and receiving 8.2Mbit/s.

Note

Latency and throughput are general computing terms. For example, application developers speak of user-perceived latency when trying to build responsive tools. Database and filesystem people speak about disk throughput. And, above the network layer, latency of a website name lookup in DNS is a major contributor to the perceived performance of a website. The remainder of this document concerns latency in the network domain, specifically the IP network layer.

During the millenial fin de siècle, many developed world network service providers had learned that users were interested in the highest possible download throughput (the above mentioned 10Mbit/s bandwidth figure).

In order to maximize this download throughput, gear vendors and providers commonly tuned their equipment to hold a large number of data packets. When the network was ready to accept another packet, the network gear was certain to have one in its queue and could simply send another packet. In practice, this meant that the user, who was measuring download throughput, would receive a high number and was likely to be happy. This was desirable for the provider because the delivered throughput could more likely meet the advertised number.

This technique effectively maximized throughput, at the cost of latency. Imagine that a high priority packet is waiting at the end of the big queue of packets mentioned above. Perhaps, the theoretical latency of the packet on this network might be 100ms, but it needs to wait its turn in the queue to be transmitted.

While the decision to maximize throughput has been wildly successful, the effect on latency is significant.

Despite a general warning from Stuart Cheshire in the mid-1990s called It's the Latency, Stupid, it took the novel term, bufferbloat, widely publicized about 15 years later by Jim Getty in an ACM Queue article Bufferbloat: Dark Buffers in the Internet and a Bufferbloat FAQ in his blog, to bring some focus onto the choice for maximizing throughput that both gear vendors and providers preferred.

The relationship (tension) between latency and throughput in packet-switched networks have been well-known in the academic, networking and Linux development community. Linux traffic control core data structures date back to the 1990s and have been continuously developed and extended with new schedulers and features.



[2] This queueing model has long been used in civilized countries to distribute scant food or provisions equitably. William Faulkner is reputed to have walked to the front of the line for to fetch his share of ice, proving that not everybody likes the FIFO model, and providing us a model for considering priority queuing.

[3] Similarly, the entire traffic control system appears as a queue or scheduler to the higher layer which is enqueuing packets into this layer.