PostgreSQL on K8s at Zalando: two years in combat. Alexander Kukushkin (Zalando)

We all know that most DBAs are very conservative and prefer that their databases live exclusively on dedicated servers. In the modern world with microservices, Kafka and Kubernetes, the number of bases begins to grow in direct proportion to the size of the organization and very quickly goes beyond the comfortable manual or semi-automatic management.

I have been working at Zalando for almost 7 years now. How many have heard of Zalando?

  • For those who have not heard, this is a company similar to the Russian Lamoda.

  • We sell clothes and shoes, but we do it in Europe, in 17 countries.

  • We have 7 of our own logistics centers and warehouses.

  • Zalando employs over 15,000 people.

  • And of these, about 2,000 work in technology. The people in technology are spread across roughly 200 teams that write applications.

  • Recently, we have been deploying a lot of things on Kubernetes and working a lot with Kubernetes.


  • , Kubernetes, , .
  • , Postgres Kubernetes Spilo Patroni.
  • , Postgres-Operator Kubernetes.
  • – , , .

  • Kubernetes . 140 . 50/50 production/test environment. . . cost unit 2 Kubernetes-. , . .
  • production deployment CI/CD. docker image, , CI/CD.
  • production Kubernetes- , . request, 4- , - , . -.

Postgres Kubernetes? . 10 Postgres- Kubernetes-.

Postgres-Operator Postgres Kubernetes , 140, .

Kubernetes, Postgres? . , , Kubernetes.

, , - .

  • Kubernetes . tools.
  • Kubernetes . .

. -. worker-, , Kubernetes , kubelet, docker, fluentd, kube-proxy . .

. , .


- . docker . Kubernetes . , PersistentVolumes PersistentVolumeClaim.

– StatefulSets, , -, , . . . , -, StatefulSets PersistentVolumeClaim PersistentVolumeClaim templates volume, volume , .

Postgres Kubernetes, . , Kubernetes docker. , - .

  • docker image. Spilo. Spilo – . image Postgres, . . , 9.3 12.
  • postgres’ extensions , pg_partman, pg_cron, postgis, etc, timescaledb.
  • tools , pgq, pgbouncer, wal-e/wal-g. , , docker Kubernetes, , image Kubernetes EC2 instance Amazon.
  • HA Patroni,
  • .

Patroni? , , . Postgres, HA.

Patroni Python. Kubernetes. Postgres first class citizen Kubernetes, . . Postgres .

Patroni Postgres Kubernetes supervisor , . . .

Patroni – , , failover . Patroni , . . InitDB Postgres, Patroni point in time recovery, .

, , Patroni .

, Patroni, Postgres. - Postgres, Patroni: « ». .

? StatefulSet. . . PersistentVolume. StatefulSet, demo-0 demo-1.

, – Patroni. Patroni kubernetes’ endpoint. . . , Patroni , . , , , endpoint, IP.

-. , .

demo — repl. , labelSelector: role = replica. , labelSelector.


, , YAML manifests. . , YAML. , .

Helm, . . CI/CD deployment. . rolling upgrade. minor Postgres, docker image, ? StatefulSet , StatefulSet, . . .

, , rolling upgrade. rolling upgrade Kubernetes-.

? , : 1, 2, 3. availability , . . -. , volumes .

Kubernetes upgrade, workers, . . . cloud environment AWS, - EC2 instance, . .

? , 3 , 3- . 2 availability .

Kubernetes , . Patroni . enter option , . . connections , . , .


Kubernetes rolling upgrade .

. . . .

, .


? – .

, 3 failover , . . 3 3 failover. B – 2, C – 2.

- , .


, , . . , : « Postgres». , pull request Git. kubectl Amazon. .

, - instance, .


, .



  • Deployments. . .
  • Upgrades clusters. rolling upgrade Postgres. rolling upgrade Kubernetes .
  • : , , .
  • failovers maintenance.

Postgres-Operator. Kubernetes, , . . , , . – , .

Postgres, YAML-. .

-, , ID , . . . Team, , ACID. ? , . . Atomicity, Consistency, Isolation, Durability.

-, volume. – 1 . – 2. Postgres. . : «, , . owner ».


DB deployer. , CI/CD. YAML- CRD-, . Postgres-operator event . StatefulSet - . endpoint, . Postgres, . . superuser , .

Kubernetes , .

rolling upgrade Kubernetes?


3 , 3 . , 3 , .

, . Kubernetes , .

. , , .


, . . switchover = 1.

, .

Switchover . , , , , . . , downtime .

? issues ?

-, Kubernetes- AWS. .

AWS API , API. , - , AWS .

? Kubernetes AWS API , volumes, , , volumes , postgres’ . , . .

, deployment , . , .

EC2 instance Amazon. , , , , . Amazon, EBS volumes instances. ? , . . - , instances. , instance Amazon, volumes . . . 30 , . , .

Kubernetes, , Postgres, , . Postgres . Patroni . Postgres , Patroni . – crash loop. , .

partitions , -. volume . . volume, , throughput IOPS. volume .

auto-extend volumes? Amazon . API. volume 100 , .

, , , , , auto-extend. , , . . .

volumes , .

. , - jobs . .

? HA , Disaster Recovery , wal-e continuous archiving , basebackup.

wal-e – , - . pg_stat_statements 2- . , . , : APDATE WHERE id IN 150 . . . Postgres – .

Pg_stat_statements 2- . pg_stat_statements , . Kubernetes , , , . .

wal-e , . , , postgres’ - label- . - reinitializing.

– - tools, , , wal-g, pgBackRest. . -, , Postgres 9.6, 9.5 . -, , , .

. wal-e, , basebackup wal-e.

. Out-Of-Memory? docker Kubernetes – . Postgres, , 9. , . production .

. dmesg. , Memory cgroup out of memory Postgres. , ?

? process ID, .

, , . dmesg -T -. OOM system control «oom_score_adj», . Patroni Postgres, . . , .

memory limit 8 , cgroup , 6 + postgres’ shared buffers 2 . 6 . postgres’ , , , .

. . , cgroup shared memory , - .

, shared buffers 25 % 20 %. , , . . .

Postgres 11- . production minor releases, . , , .

. , – , - , shared memory. docker shared memory 64 .

Postgres 11? Postgres 11 parallel hash join. ? worker hash, shared memory. 64 , hash .

? docker dev/shm, .

Kubernetes . . . – tmpfs volume dshm.

, . . volume – enableShmVolume. , , volume. , .

Postgres . -, failover , . . Patroni, - events. Patroni failover , .

, , FATAL too many connections. . . 12- Postgres . max_wal_senders max_connections. wal_senders Postgres. .

Postgres – Built-in connection pooler.

– :

  • , cluster manifest, , . , : 100 . , , . , . OOM-Killer . , .

    . , : 4 , 32 . , 5 64 , , Kubernetes’ . , - .

  • ? production - ServiceAccout, Spilo. , , Postgres real only. ServiceAccount , , - , . .

  • YAML-.


, , , , array . .

tools, , Postgres , , 10.10, . 10. volume . .

tools . , , Git .

environment «». .

1 500 postgres’ . 100 Kubernetes-. . , on-Call , , , , . . - .

, . , , Patroni, Spilo, .

, open source. . Patroni Spilo .

! , .


availability ?



, anti-affinity, . . .

! . : production?*

, . 600 1 400 production. . . 600 . , . , , environment . , . , production 2- .

, external volume, . . Host Path , . . - ?

, . . . i3-volume Amazon . ? EBS , . , . . , . , .

, IO-bound , ?

, . Amazon i3-instances. NVMe . instance , . , , . Kubernetes team , , , rolling upgrade , . . 1-2 . 1-2 - .

! ?

wal-e. docker crone, basebackup. archive_command, . . wal, , S3 Amazon. , basebackup + wal . retention – 5 , . . 5 .

! . 1 400? ? 2?

200 . , , , , . . Kafka. , . , . . , . , , . . . 80, . . .

, , Postgres ?

7 . . , . pets world cattle. Pet – , -, . – , . . - , .


, .

, ! EBS volumes ?

gp2 , . Io1 – . 3 000 IOPS, io1 , , .

EBS gp2, 250 ?

. Kubernetes. – volumes, RAID. . Kubernetes . Kubernetes , ES2 i3-instance c nvme, instance, EBS , stripe.

Kubernetes + AWS?

, . . . . CPU, memory limit request 100 millicore, 100, 10 . . . . , 101, – . . .

RPO, RTO Postgres ?

, Kafka. . . , .

, .

Data is lost, as a rule, 1-2 wal-segments of the latter, if at all bad. Replication does not lag behind us, as a rule.

1-2 segments, if the load is small, then it can be half a day.

Yes, if there is no load, then the segments may not be rotated at all, that is, if there are no transactions even after a timeout.

Can I put it there automatically?

It should timeout, but if there are no transactions, they are not rotated. I recently dealt with this.

All Articles