How we at Dropbox switched from Nginx to Envoy

In this article, we will talk about our old Nginx-based infrastructure, its sores, and the benefits we got after migrating to Envoy . We will compare Nginx and Envoy in various ways. We will also briefly touch on the migration process, the current state, as well as the problems encountered during the transition.





When we switched most of the traffic to Envoy, we were able to seamlessly migrate a system that handles tens of millions of open connections, millions of requests per second and terabits of bandwidth. In fact, we have become one of the largest Envoy users in the world.



Disclaimer: We try to remain objective, quite a lot of comparisons only apply to Dropbox and our software development principles: we bet on Bazel, gRPC, C ++ and Golang.



, Nginx , .



, Nginx



Nginx , Python2, Jinja YAML. . , upstream , Lua. , Go. , Nginx.



Nginx . :



  • () API REST gRPC, .
  • Protocol buffers de facto
  • , , Bazel.
  • .


Nginx :



  • YAML, Jinja2 Python.
  • Lua, ,
  • , .
  • , : syslog, logrotate , , .


Nginx.



Bandaid?



, Bandaid, -, Go. Dropbox, Go: , , .. Nginx, , :



  • Golang , C++. Edge, " " .

    • (GC), HTTP TLS, , BoringSSL, Nginx\Envoy.
    • "goroutine-per-request" GC , .
  • FIPS Golang TLS
  • Bandaid Dropbox, , .


Envoy.



, Envoy



, , , Envoy , Nginx Envoy.





Nginx . SO_REUSEPORT, EPOLLEXCLUSIVE, . , , , , , , ( aio, aio_write, thread pools). , .



Envoy , . SO_REUSEPORT ( BPF) libevent ( — epoll(2), EPOLLEXCLUSIVE). Envoy - -. , .



, Nginx Envoy . — , Nginx Envoy. , , , .



: RPS, , gRPC . . Nginx , . Envoy , envoy-perf, , , . "hulk", "" .



:



  • Nginx . -, SO_REUSEPORT, .
  • Nginx Envoy, Lua Nginx RPS . , lua_shared_dict, mutex. , . - counter(9) FreeBSD, : , , . , Nginx ( , ), , .


Envoy , 60% , Nginx.





, , . , , .



Nginx "stub status", :



Active connections: 291
server accepts handled requests
 16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106


, log_by_lua, , , Lua: , , .. , :



function _M.cache_hit_stats(stat)
    if _var.upstream_cache_status then
        if _var.upstream_cache_status == "HIT" then
            stat:add("upstream_cache_hit")
        else
            stat:add("upstream_cache_miss")
        end
    end
end


, , error.log, upstream, http, Lua TLS.



Nginx: , , RSS\VMS, TLS .



Envoy ( Prometheus), , :



$ curl -s http://localhost:3990/stats/prometheus | wc -l
14819


:



  • \ upstream\ , .
  • : TCP\HTTP\TLS
  • , , .


Envoy. /certs, /clusters config_dump, :



  • /logging, .
  • /cpuprofiler, /heapprofiler, /contention,
  • /runtime_modify, , ..


Envoy . , , , Edge . Nginx OpenTracing, .



Envoy gRPC, syslog-to-hive (, .. Envoy syslog . ). ( !) gRPC, TCP\UDP.



Envoy, , gRP: Access Log Service (ALS). Envoy data plane , .





Nginx "". . ( , , \ ..) ( syslog, HTTP). , Nginx shell-. .



Envoy , data plane control plane, , , . protobuf gRPC, API, xDS. Envoy ( ) xDS. Envoy, UDPA (universal data plane interface) : "de facto" L4\L7. — . ORCA (Open Request Cost Agregation) UDPA , Envoy, Katran, eBPF\XDP L4.



Dropbox, API gRPC. xDS control plane, Envoy , , . Dropbox RPC — , , , , gRPC.



xDS, Nginx, , :





Envoy . control plane Envoy, Istio go-control-plane. Envoy API xDS. gRPC . Golang Envoy API xDS. , , cron\logrotate\syslog\ ..





Nginx . , . — Python2, Jinja2 YAML. , erb, pug, Text::Template m4 ( ! . ):



{% for server in servers %}
server {
    {% for error_page in server.error_pages %}
    error_page {{ error_page.statuses|join(' ') }} {{ error_page.file }};
    {% endfor %}
    ...
    {% for route in service.routes %}
    {% if route.regex or route.prefix or route.exact_path %}
    location {% if route.regex %}~ {{route.regex}}{%
            elif route.exact_path %}= {{ route.exact_path }}{%
            else %}{{ route.prefix }}{% endif %} {
        {% if route.brotli_level %}
        brotli on;
        brotli_comp_level {{ route.brotli_level }};
        {% endif %}
        ...


Nginx : \ . YAML anchors, Jinja2 — , , Python . . , :



  • . — .
  • C. , , . nginx -t.


Envoy : Protocol Buffers. , . , , protobuf , \ — .



Envoy protobuf Python3. proto, Python. :



from dropbox.proto.envoy.extensions.filters.http.gzip.v3.gzip_pb2 import Gzip
from dropbox.proto.envoy.extensions.filters.http.compressor.v3.compressor_pb2 import Compressor

def default_gzip_config(
    compression_level: Gzip.CompressionLevel.Enum = Gzip.CompressionLevel.DEFAULT,
    ) -> Gzip:
        return Gzip(
            # Envoy's default is 6 (Z_DEFAULT_COMPRESSION).
            compression_level=compression_level,
            # Envoy's default is 4k (12 bits). Nginx uses 32k (MAX_WBITS, 15 bits).
            window_bits=UInt32Value(value=12),
            # Envoy's default is 5. Nginx uses 8 (MAX_MEM_LEVEL - 1).
            memory_level=UInt32Value(value=5),
            compressor=Compressor(
                content_length=UInt32Value(value=1024),
                remove_accept_encoding_header=True,
                content_type=default_compressible_mime_types(),
            ),
        )


Python3 ! mypy-protobuf . IDE . , protobuf . window_bits Gzip 9 15. protoc-gen-validate:



google.protobuf.UInt32Value window_bits = 9 [(validate.rules).uint32 = {lte: 15 gte: 9}];


— , :



// Value from 1 to 9 that controls the amount of internal memory used by zlib. Higher values.
// use more memory, but are faster and produce better compression results. The default value is 5.
google.protobuf.UInt32Value memory_level = 1 [(validate.rules).uint32 = {lte: 9 gte: 1}];


, protobuf , , , Harvey Tuch, Envoy.





Nginx - , , C. . , . Nginx. , , : hash-, , - , ( RAII), HTTP. , pcre, zlib, openssl libc.



Nginx Perl Javascript. , .



lua-nginx-module OpenResty. . log_by_lua balancer_by_lua backend.



Nginx C++, , . - . , .



Envoy — C++. Nginx, . - , :



  • . — .
  • C++14. , -, . C++14 Golang — Python. (! . )
  • C++14 . abseil, C++, mutex , , \ , .


Envoy Vortex2 ( framework ) 200 stats.



Envoy Lua moonjit, LuaJIT c Lua 5.2. Lua Nginx , Lua Envoy - , . , Lua, , Lua C++ Envoy.



Envoy , WebAssembly (WASM) — , . WASM . . Envoy WebAssembly for Proxies ( SDK C++ Rust), WASM L4\L7. , WASM . , proxy-wasm xDS, A\B . Kubecon'19 ( , ?) WASM Envoy . 60-70% C++.



WASM . , , proxy-wasm ABI. , WebAssembly. , C++.



Istio WebAssembly, WebAssemblyHub . .



Dropbox WebAssembly, , proxy-wasm SDK Go.





Nginx shell-, make. , Bazel , , . Google Nginx Bazel, Nginx, BoringSSL, PCRE, ZLIB Brotli.



, Nginx unit-.



Lua Python:



class ProtocolCountersTest(NginxTestCase):
    @classmethod
    def setUpClass(cls):
        super(ProtocolCountersTest, cls).setUpClass()
        cls.nginx_a = cls.add_nginx(
            nginx_CONFIG_PATH, endpoint=["in"], upstream=["out"],
        )
        cls.start_nginxes()

    @assert_delta(lambda d: d == 0, get_stat("request_protocol_http2"))
    @assert_delta(lambda d: d == 1, get_stat("request_protocol_http1"))
    def test_http(self):
        r = requests.get(self.nginx_a.endpoint["in"].url("/"))
        assert r.status_code == requests.codes.ok


( ip- 127.0.0.1/8, ..) nginx -c.



Envoy, Bazel, : Bazel . copybara protobuf Envoy, UDPA. , .



Envoy unit- ( gtest\gmock) , , . , .



Envoy 100% unit-. CI Azure .



google\benchmark:



$ bazel run --compilation_mode=opt test/common/upstream:load_balancer_benchmark -- --benchmark_filter=".*LeastRequestLoadBalancerChooseHost.*"
BM_LeastRequestLoadBalancerChooseHost/100/1/1000000          848 ms          449 ms            2 mean_hits=10k relative_stddev_hits=0.0102051 stddev_hits=102.051
...


Envoy unit- :



TEST_F(CourierClientIdFilterTest, IdentityParsing) {
  struct TestCase {
    std::vector<std::string> uris;
    Identity expected;
  };
  std::vector<TestCase> tests = {
    {{"spiffe://prod.dropbox.com/service/foo"}, {"spiffe://prod.dropbox.com/service/foo", "foo"}},
    {{"spiffe://prod.dropbox.com/user/boo"}, {"spiffe://prod.dropbox.com/user/boo", "user.boo"}},
    {{"spiffe://prod.dropbox.com/host/strange"}, {"spiffe://prod.dropbox.com/host/strange", "host.strange"}},
    {{"spiffe://corp.dropbox.com/user/bad-prefix"}, {"", ""}},
  };
  for (auto& test : tests) {
    EXPECT_CALL(*ssl_, uriSanPeerCertificate()).WillOnce(testing::Return(test.uris));
    EXPECT_EQ(GetIdentity(ssl_), test.expected);
  }
}


. . unit- , Envoy.



Bazel — , - . , : , , / ..



Bazel , . . , , , ..





Nginx , . : zlib ( ), - TLS PCRE. Nginx , . , libc.



Nginx , OpenBSD. OpenBSD httpd. BSDCon.



, Nginx 30 11 .



Envoy , , C++ , C, Nginx. . , , . .



Envoy . AddressSanitizer, ThreadSanitizer MemorySanitizer. fuzzing.



, IT, OSS-Fuzz, fuzzing. .



, , . 22 .



Envoy , . Envoy Google's Vulnerability Reward Program (VRP). Google , , .



, , CVE-2019–18801



Ubuntu Debian, hardened , Edge. ASLR, :



build:hardened --force_pic
build:hardened --copt=-fstack-clash-protection
build:hardened --copt=-fstack-protector-strong
build:hardened --linkopt=-Wl,-z,relro,-z,now


fork, Nginx, , - , — 1000 . Envoy, , .



, . BoringSSL FIPS, . ASAN Edge.





, .



Nginx , . : , ( ), .



Nginx , . HTTP/2 , gRPC , . gRPC. " " , . , , , .



Envoy ingress\egress , gRPC . : , , . Nginx, Envoy upstream .



Envoy , Envoy S3 . eCache, HTTP Envoy.



Envoy , gRPC:





Envoy , :



  • Egress , Envoy HTTP CONNECT — Squid . Squid Envoy, , data plane ( ).
  • : Courier gRPC Envoy service mesh. Envoy , . Envoy . Hadoop . Superset airflow, presto hive. Grafan MySQL.




Nginx , . , bug tracker. #nginx IRC FreeNode, .



Envoy : \ GitHub, ( Zoom , . ). Slack, .



, HTTP/3.



QUIC HTTP/3 Nginx F5. , . , Cloudflare . Nginx ( — , ! . )



Envoy , quiche. , , , , " ".



, . , Envoy, , gRPC .





Nginx Envoy , DNS. Envoy :



  • Ingress . Dropbox gRPC Envoy. Envoy Edge.
  • Ingress RPS. , gRPC, - .
  • . , HTTP ( ). gRPC long-polling.
  • RPS. API ( ). API gRPC. API REST Edge.
  • Egress, . — AWS, S3. Squid , L4\L7 data plane.


, , www.dropbox.com. Edge Nginx. .



,



. - . API. Dropbox API, curl\wget HTTP/1.0 HTTP. Nginx "de facto", , . Nginx Envoy, API, Envoy . , .



/ RF :



  • URL'. — , Nginx , Envoy . , , merge_slashes.
  • . Nginx Host : example.com example.com:port. , . ( ), Envoy: strip_matching_host_port.
  • . API - Transfer-Encoding: Chunked ( C). , RFC7230 , Transfer-Encoding/TE . .
  • , Content-Length Transfer-Encoding: chunked. Nginx, Envoy. RFC7230 , , , "". , Content-Length . http-parse, , Envoy.


, :



  • circuit-breaking. Envoy , HTTP/1 HTTP/2, circuit breakers . , , Envoy mesh .
  • . Nginx , , HTTP/1.0, chunked. Nginx Content-Length, . Envoy Buffer, , , .


Envoy — . , .



?



  • HTTP/3. . Envoy . Linux UDP Edge.
  • xDS . Load Reporting service (LRS) Endpoint discovery service (EDS) Envoy, gRPC.
  • Envoy WASM. Golang proxy-wasm SDK — Envoy Go, Golang.
  • Bandaid. Dropbox data plane — . , Bandaid ( ) Envoy. , .
  • Envoy mobile. , Envoy . (HTTP/3, gRPC, TLS 1.3 ..) .




. Traffic Runtime, : Agata Cieplik, Jeffrey Gensler, Konstantin Belyalov, Louis Opter, Naphat Sanguansin, Nikita V. Shirokov, Utsav Shah, Yi-Shu Tai, Envoy, .



Runtime Ruslan Nigmatullin, Envoy, Envoy MVP, .




All Articles