What you need to know if you change nginx to envoy: impressions after two years





We use envoy as a front edge proxy that redirects incoming traffic to multiple kubernetes clusters (for new services) and to the backends of the legacy architecture of the historical heritage. Those. it combines the functions of both a regular load balancer and ssl termination point, and api gateway.



Before envoy, we had nginx there, like many others. Cool software, I like it. The whole story with envoy began at the moment when microservices began in large numbers and even ansible templates did not save you from the increasing time spent managing the nginx config. It took a long time to roll out, plus the admins were discouraged from monotonous requests like "get me a domain for a new service." Better â„ĸ automation was clearly needed. Ideally, so that the one who needs to start something can do it himself and preferably in the same place where he configured other parameters of his service. In addition, I wanted more transparency in what happens inside the front proxy and in the segment between it and upstreams, and more native balancing capabilities (repetitive requests of different types, exclusion of unhealthy hosts under certain conditions, help checks). And attracted edge technology,of course.



Long story short, here is a translation of the article about the transition of Dropbox to envoy, there are many details about its comparison with nginx. I'll tell you more about personal impressions of the transition results.



The most important and obvious fact for anyone who has come across the use of scalable software: be prepared to pay for it. The increased complexity of the setup (data plane + control plane), and if there are upstreams not only in kubernetes, then perhaps even writing your own control plane. Also, in the case of envoy specifically, for the relative youth of the software and hence for the absence of some common nginx features + an increased frequency of updates, if these features are added to them. For example, it may turn out that in the standard options there is no default for nginx combining slashes in: path, dropping the port from the Host header, or, God forgive me, rewrite by regexp. For today, everything from this list has already been added, but you will surely find something else.



Positive things



Awful documentation! On the positive side, the envoy team finally hired a tech writer at the end of last year and things have become much friendlier. At least, you no longer need to study the path of processing a request through the source code and find a description of the work of some options exclusively in the answers in your issue. And to find the options themselves, be a level 80 google master. Now a lot of this is in the past, although the authors still do not bother to mark in which version of envoy this or that option appeared, or with links to the issue in the release notes, but at least they began to highlight the list of breaking changes in releases in a dedicated section, you can see, that there is progress.



Extended telemetry



Here all hopes were justified, now our grafana dashboard by envoy kills all browsers that are not prepared with a number of graphs. But seriously, now you can conveniently monitor what is happening with traffic at all stages of its passage, it helps especially well in exciting detective stories - investigations after incidents. And, of course, the definition of anomalies.





Anomaly: "Hello." A fragment from the same envoy grafana dashboard.



Control plane



Well, and most importantly, for the sake of which everything was started, we solved the problem of automatic route control. Two words about the approach, if someone is not in the subject: the control plane works as a data controller, manages their storage and creates a config, which is then sent to envoy (stateless data plane).



If you have only one kubernetes as a backend, then you can take a ready-made control plane of the ambassador type. But we had to manage the old infrastructure too, plus there were several clusters. So we had to take one of the data plane api implementations offered by the envoy project and screw up all the features we need, connecting this part of the infrastructure with automation in kubernetes, but this is a topic for another interesting story.



Impressions from the process of switching to envoy - "for some reason there were no special problems, very suspicious."



In short, where to start and what to be ready for right away. After meditating on the envoy documentation and accepting the futility of existence, we take two virtual hosts from the old front proxy (the simplest and most typical, and the most extensive), start them in envoy, sorting out the options along the way.



The main thing to keep in mind here is that the approaches to writing configs between nginx and envoy are very different, i.e. we must be prepared for sharp turns of the form: instead of two simple allow / deny entries, we write 26 lines of the RBAC rules tree. In general, to accept that a little exploding head here is normal, since the envoy config is made with a priority on the convenience of automation, and not on human readability.



You may need to put together a cheat sheet for the mapping options and make sure they actually do what you think they do. So we once came to the conclusion that the mechanism for combining slashes in the URL (even when it was already added to envoy) works differently: in nginx it did not change: path, which was sent to upstream, and in envoy, a full rewrite took place. and everything would be fine, but with this rewriting, a bug came out that changed: path completely to the game, well, in general, after our issue it was also fixed, but be careful.



By the way, about the issues - I can't help but mention one more positive impression.



Friendly community and developers



Since envoy is CNCF-hosted open source, you can traditionally just come to the project's GitHub and suggest your improvement or ask a question. Issues are a wild number, developers clearly do not have enough hands, but at the same time the worst thing that can happen to your question is that it will be ignored. Although most often at least something, but they answer, even if it is something short in the spirit of "sorry, we do not plan to do this." No toxicity, even if questions from newbies, very friendly atmosphere.





Atmospheric topics, especially corgis. Screenshot of the envoy public repository on github.com



Well, as usual - pull requests are welcome. They help even those who are not particularly good at C ++. There are also a number of issues marked with the beginner tag , in case someone wants to contribute and doesn't know where to start.



In addition to GitHub, there are also email newsletters and slack, but the latter is more often a mess. :)



Of the events, EnvoyCon is held, which, however, is now online, but I still recommend it.



Outcome



In general, you do not need envoy just because it is trendy and youthful, "everyone goes over" and the founder has a funny hairstyle. Stay where you were until it squeezes. If you have a startup or just a small project, it is definitely better to leave nginx, because it is simple and cute. The main thing is to start there.



If there are many services, many developers, there are kubernetes and all the tradeoffs in the article above do not bother you - you can think and try.



Good luck and maybe see you at EnvoyCon!



All Articles