TCP BBR: A ​​fast and easy way to speed up page loading. Yandex report

Modern application-level protocols use multiplexing to speed up data transmission, which increases the requirements for channel reliability. At the YaTalks conference Alexander Gryanko phasmatold how we speed up page loading on channels with high packet loss using the HTTP / 2 and TCP BBR protocols as an example.



- Hi. I am Sasha, I work at Yandex, for the last three years I have been developing an L7 load balancer. I'll tell you about a quick and easy way to speed up your network. We'll start at the seventh level, HTTP, and move down to the fourth level, TCP. Today we will only talk about these two levels and dwell on them in some detail.



In the last eight years I have been doing more backend development, and most likely my knowledge remained at the level of AngularJS in the first versions. You probably know better than me how it all works. You have already optimized everything, compressed everything, and here I cannot advise you anything.



But I can advise you on how to speed up your network by optimizing the server itself, the operating system itself.







To speed up something, you need metrics. In this case, we used the following: the average time to first byte shows us how fast the TCP layer is, and the second metric is the time to receive HTML from the first byte. We experimented, measured our metrics, and after turning on BBR, our acceleration was about ten percent.







To understand what ten percent is, we turn to the absolute value, which is 66 milliseconds. If you go to your favorite online multiplayer game, the Ping to Western European servers will be approximately 60-70 milliseconds.



How to do it quickly



All of our servers are managed using remote control protocols, in this case SSH. If you haven’t encountered the SSH protocol yet, you can ask your system administrator to configure your server. I will tell you how to convince him to do this.







What is BBR? This is one of the algorithms that allows us to control how packets go to the network. And it is configured with the following two parameters. The first is setting the package scheduler to FQ, then I'll tell you why you should use FQ. The second is the inclusion of the congestion control itself, that is, the BBR itself.



It seems that this is where we could end. But in fact, there are many pitfalls, and most likely, your sysadmin will not just turn on BBR on the server. Therefore, we will go further.



HTTP / 2 and multiplexing



We'll start at level 7, which is HTTP, and slowly work our way down to review our protocols.







We'll start with our browsers that we interact with every day. We see the web developer console. The console has an interesting field for us - protocol.



In our case, the protocols are HTTP / 1 and HTTP / 2. There is also the HTTP / 3 protocol, which is based on the QUIC protocol from Google. But today we will not return to it, since it is under development, not yet fully approved. Let's go back to HTTP / 1.







On the slide we see the Wireshark utility, which allows us to analyze our packets, the way we interact with the network. We see that one field is highlighted in green. This is our HTTP request. Below we can see the bytes, how they will be presented on the network.







What does HTTP / 1 look like in real life? This is a fairly simple protocol. It is completely text-based, that is, we just write text and send it to the network. Our characters are encoded with special hexadecimal values. On the right is an ASCII table, a small piece so you can navigate.



We have the first part in the form of headers, which is separated by the characters β€œ\ r \ n \ r \ n” from our body. We're just requesting a regular resource here with the GET method, so this request won't have a body. And we see that bytes are approximately similar to what is in the ASCII table. We are requesting some kind of JS, some kind of resource. There is also a Host header indicating the domain we are currently working with. And - some additional set of headers. They can be custom, you can use any.







HTTP / 2 is a more complex protocol. It is binary, and the smallest unit of information exchange is frames. There are many special cases, special types of these frames. You can see on the slide that they are highlighted.







We can also observe in the first line that two frames can fit in one packet at once. We will not dwell on what frames exist, there are quite a few of them. In this case, we will be interested in the headers frame, because it just allows us to request resources. I was a bit involved in the development of Wireshark, helping to improve it in this field.



We can see that there is a get request. We see that in the middle there is a textual representation of this get request. But in the right column we see only one selected byte, and it will be just this get method. Next, I'll explain why this is happening.



Next, we have the path header, which indicates the path to the resource, to our JS, which we will request. And there is a set of some additional headers that will also be present in our request.



So why are our bytes on the network not the same as how this is all drawn in our picture? The fact is that Wireshark shows us the final result, how he decoded it all. And on the network, these bytes, these headers, will be compressed with a special HPACK format. More details are better to be found on the Internet. Look for information, it is well documented.



Let's go back to our bites. There is a special field - content identifier. It indicates which resource these frames are currently working with. In this case, we sent the first frame, received data. When the server gives us the content bytes themselves, the data frame will already be used.







Our HTTP / 1 and HTTP / 2 protocols are very different. We have already talked about the fact that HTTP / 1 is a text protocol, and HTTP / 2 is a binary one, that is, it works using frames.



HTTP / 1, in the case of a request as a single connection, will return an unknown result, depending on the implementation of how the web server developers wrote it. That is, if we make two requests in one connection, most likely, either a response to the first or to the second request will be returned. To do this, in order to load resources in parallel, the browser establishes several connections, usually about six of them, and loads resources in parallel.



HTTP / 2, in turn, uses a single connection. That is, it sets the connection, and inside it, through frames, loads all the necessary data. This technique of packing multiple resources into one connection is called multiplexing.



It is clear from how our connections work: in case of packet loss in one of the connections, HTTP / 1 will work better. Most likely, we will not touch on other connections, they will continue to load at the same speed. And in the case of HTTP / 2, if our packet is lost, the loading of all resources starts to slow down.







It seems that HTTP / 2 is worse, it is also sensitive to packet loss. In fact, when we create each of these six connections, we are performing the following operation.



The client and server kind of establish a reliable connection, a TCP connection. We send two packets from the client to the server, and one packet is sent from the server side to the client. Thus, we seem to say that we are ready to transfer data. This, of course, creates overhead resources, we can do this for a long time.







There is also encryption. If you look at your browser now, you will most likely see a padlock icon. Many people call it SSL, but it's not really SSL. This is TLS. SSL has long been outdated, practically not supported anymore, and should be abandoned.



TLS also has packet exchange. That is, we, just like in the case of a TCP handshake, set a certain state, after which we can continue to work. At this point, we can also do optimization, but browsers do not yet support the things that we have already enabled from the server side. We'll wait for everyone to turn it on.







Once upon a time, HTTP / 1 tried to solve the problem of concurrent loading of resources. The RFC has it. And once upon a time, they did implement pipelining. Due to implementation complexity, Internet Explorer does not support it, while Firefox and Chrome do, but support has been dropped over time.







Each of our six connections that we have already created, in fact, will not close. That is, they will continue to work the same way as before. For this, a technique such as Keep-Alive is used. That is, we create a reliable connection to a specific server and continue working.



At the HTTP level, this is controlled by the header. In this case, it is connection. And at the TCP level, we will already use the operating system itself, it will decide for us what to do.







There are other problems with HTTP / 2 as well. In HTTP / 2, we can prioritize packets and send more needed data faster. In this case, when we are trying to send a lot of data at once, the buffer on the server may overflow. Then the higher priority packets will simply slow down and go to the end of the queue.



Packet loss is observed. They slow down our loading, and this blocking is called Head-of-line blocking.



How TCP Solves Packet Loss Problems



Now we will talk about TCP, that is, our fourth layer. In the next ten minutes, I'll cover how TCP works.







When we are visiting, we ask someone to pass the salt. When a person gives us salt, we confirm that the salt has reached us. In this case, we also take a segment, transmit it, and wait for confirmation. We transmit again. And if there is a loss, then we forward this segment and as a result it is delivered to us. This technique of sending one segment is called Stop and wait.



But our networks have accelerated tremendously over the past 30 years. Perhaps some of you remember dialup, Internet by megabytes. By now, you may already be able to connect gigabit internet at home.



Also, in this case, we can start sending several packages at once. In our example, there are three of them. We send the window in the form of three packages and wait for all of them to be confirmed.



In case of packet loss, we can start re-forwarding all packets, starting from our first loss. This technique is called Go-Back N. Otherwise, we can start tracking all packets and forward only those that were lost. This technique is called Selective Repeat. It is more expensive on the server side. When we were preparing the slides, it took a long time to figure out how to present it. I myself got confused in it, and therefore came up with such an analogy.







There are pipes known to all of us through which water flows. The pipes have different diameters, somewhere they can be thinner, and in this case the narrowest point will be just with our maximum throughput. We will not be able to pour more water than this bottleneck allows.



We will try to shoot balls from left to right. On the right side, we will be confirmed that the balls have flown. We are starting to send a stream of balls. Let's take a look at its cut. Now the balls fly in one direction, they are confirmed, and the number of our balls grows exponentially. At some point, the volume of the balls becomes so large that they slow down and begin to lose. After the loss, we slow down a little, reduce our window by half. Then we try to understand what happened to us. The first stage is called TCP Slow Start.







When we have closed the window twice, we can restore the connection and ask the guys to send us our balls again. They shout to us that we need to send the balls, we answer them - here are your balls. This phase is called Fast Recovery and Fast Retransmit.







When we realized that everything is fine with us, we begin to gradually increase the number of balls sent, starting from that collapsed window. This phase is called Congestion Avoidance. That is, we are trying to avoid losing our packets.



The phase when our window collapses twice is called Multiplicative decrease. And the slow phase of increasing the number of balls is called Additive Increase.



When our Congestion Avoidance loses packets again, we can do the next step. But at the moment we are more interested in the very image of this graph. We see such a saw, and this saw will be useful to us several times. Remember what it looks like.







We will return to the problems of conventional TCP protocols. By analogy with a pipe, we pour bags. Since there are other users on the Internet besides us, they also begin to pour packages into the pipe. At some point, the buffers of our routers can overflow and create problems for us sending packets.



There is also a problem with packet loss on wireless networks. Most likely, your laptop does not have an Ethernet port and you are watching a talk over Wi-Fi. Packet loss in Wi-Fi and mobile networks is not caused by the router itself, but by radio interference. This metric will not be very useful to us.



Difference of TCP BBR from other algorithms







Here we come to BBR. It stands for Bottleneck Bandwidth and Round Trip Time, these are metrics of the bandwidth when we do not completely clog our channel, and the travel time of the packet from us to the server and back.



When we send data, the ideal state in which packets are in a stable state, fly and have not yet been acknowledged is called Bandwidth-delay product. We can increase BDP by using buffers of network devices. When this buffer is exceeded, losses begin.



And the usual TCP algorithms just work on the right side of the graph, that is, where losses occur - we are pouring so many packets that losses are inevitable. The packets slow down and we start to collapse the window.



BBR, in turn, works on a different principle, close to our pipe. We just pour in as many bags as we can skip. In the start phase, that is, at the very beginning, we pour the bags until the congestion starts.



And sometimes packet loss is possible. But BBR is trying to avoid this moment. When we have filled our pipe, we begin to roll back. This phase is called drain.







We are returning to our stable connection, where it will be completely filled, but at the same time we will not use additional buffers, additional reservoirs. From this position, BBR continues to operate.



From time to time we will look at what is happening with our network. We track almost every package that is returned to us. When the packets are returned to us, we start trying to slightly speed up the number of packets, speed up the packets themselves by sending them to the network.







And if we don't have any problems, we can stay at this value. That is, to continue working at the pace that is comfortable for us. If, however, there were losses, we can roll back.



When we received confirmation and saw that the speed has improved, we can wait a little, look at the interval of ten seconds. And if during this interval we see that the speed of sending packets increases and packets are confirmed faster, then we can enter the probe RTT phase and check if everything has become better.







Such phases alternate, that is, we will constantly check what we have with the network.



The BBR algorithm is no longer based on packet loss, but on the channel width and packet travel time.







In fact, it is immune to packet loss. He practically does not react to them, and because of this we have some problems. Google promised that these issues will be fixed in BBR v2.



We have examined our phases, and before us is again the comb, which I have already shown. Normal TCP protocols are highlighted in red. So he picks up, picks up, slows down, and again loses packages. And BBR sets the pace that he needs, with which he will work all the time, and constantly checks our network to see if it has become a little better. And it may be accelerating.



Our metrics are constantly updated, we track every confirmation from the client's side and check if our network has accelerated or not.



How is this rate of sending packets controlled? We control the pace of sending using the pacing technique. It is implemented in the scheduler I mentioned earlier. This is the FQ scheduler. It is also implemented in the kernel itself, but I will talk about this later.



We try, like in a pipe, to pour more data, and at the same time not slow down, not lose our packages. But BBR is not that simple. Most likely, you live in containers or use multiple servers for databases - perhaps for pictures.







And all these servers interact with each other. There is normal TCP enabled, not BBR. And when you have the saw, which we have already seen, when the window starts to collapse, then perhaps BBR will begin to grope that the window is collapsing and increase the rate of sending packets. Thus, it will oust ordinary TCP from our network, dominate.



If the network is very bad, other problems are possible. Regular TCP will not work at all, and since BBR is practically insensitive to packet loss, it will continue to work at a certain rate.



We can solve this problem with data centers with the TCP_CONGESTION option. It is exposed for every socket, for every connection. Now, as far as I know, this option is not implemented in almost any web server. And our L7 balancer supports it. But back to our pacing. If you are working with older kernels, then there was a bug in the pacing implementation in kernels prior to version 4.20. In this case, it is worth using the FQ scheduler.







Now that you know how TCP works, you can go to your system administrator and tell him why you should enable BBR.



Let's go back to our ten percent. Where can they come from? Operator networks are now very large. It all comes down to money. You can build channels for 100, 200 terabits and skip a huge amount of 4K videos, for example. But your client will still be at the endpoint.



And most likely, this last mile to the client will be the source of problems. All our Wi-Fi and LTE will lose packages. In the case of using regular TCP, we will see slowdowns. BBR solves this problem. You can turn it on with just the two commands I indicated. Thanks to all.



All Articles