Common mistakes when building highly available clusters and how to avoid them. Alexander Kukushkin



You just installed PostgreSQL and started your first cluster, created some tables, loaded data, and even tweaked PostgreSQL configuration a little to improve performance. Now you are thinking about how to make your cluster highly available. Unfortunately, PostgreSQL does not know how to automatically switch itself when the wizard is not available, but luckily for us, this can be achieved using third-party utilities. The task is clear, and you start to study the advantages and disadvantages of all utilities in order to choose the best one. And ... you are already on the wrong track because you must first decide on the SLA, RTO and RPO values. In this talk, I plan to talk about a number of mistakes that database administrators make when setting up and operating a highly available auto-failover Postgres cluster.





! . 6 . Zalando. , high availability postgres’ Patroni.





, , Zalando. , Lamoda. .





Postgres. -.



4- Amazon. Amazon. , , . Amazon EC2 instances, Kubernetes, EC2 instances.





, - . - high availability. , , high availability disaster recovery, .



, , high availability failover.



, . , , .



, HA , , . .





( ?). HA?





. ? . , . , , , .. . .





. - , , , , , BIOS - .



downtime. - , . , , - , : . . . , , . . . , , update delete where clause. , .





. . , --.



, , – . . 15 .



99,95 %, . . . ? , , Amazon, RDS, . , , , , .



, .



, . . , Google Amazon , , .





- ? . IBM, . , .



, , - - . - agreed.





? , level an agreed. , . . , , , , , . . - , , .



, , - level indication. , availability. level objective – , . , . .. , availability , , . , , - .





, .



:



  • .
  • .
  • - .
  • .
  • .




? . disaster recovery, .. .





HA, . - .



DBA , -. , - disaster, .



. RPO (recovery point objective), . ., . RTO (recovery time objective). .



, SLA, SLI, SLA , . RTO – .





https://en.wikipedia.org/wiki/File:RPO_RTO_example_converted.png



disaster recovery. RPO, RTO. . RTO ( ).



, . , . , .



, , , . PostgreSQL ( ), . - (), .



RPO RTO – . . . , , .





Postgres? , failover . , - . Postgres.



, - archive command. - -. . . , Postgres WAL - -. , -, archive_timeout , . . , . - , , 5 30 , .



pg_receivewal - . , . archive_command.



, - , . , , .



, , Postgres . , - . , , , . Zalando - , . , - , , .



RTO . , , RTO , 15 , failover , , . . . - , DBA. , . , - . , - , DBA .



, failover.





- failover, . , , . . , availability, .



- , – , , - . , , - .





. , . . - . RPO RTO.





?





: « . . , . ».



Postgres? PostgreSQL XC/XL. , – global transaction manager. . - global transaction manager.



BDR . - . , – eventual consistency. , - . , . , , - , . - .



Eventual consistency – - . . . . . , , - , .



Postgres Pro. - . ? – . – , , . . . latency . , , , . , , , , , .





?



-, quorum. Quorum . . . . Google, : « Google ?». : «, ». ? - Google, , , . quorum .



, , — fencing. - , , . , , STONITH (Shoot The Other Node In The Head) , . . - , , .



. - switch . . - .



– watchdog. , Postgres, , , , .



Linux watchdog, . , - , , split-brain.





quorum, fencing, watchdog ? . . , GitHub.





https://github.com/MasahikoSawada/pg_keeper



? , - , slave, . , slave.



? network split, . . . slave . – split-brain.



, . Masahiko Sawada. , .





? - , primary, standby, , .





, – . , . , , .





, - network split, standby. .





standby. . . fencing . , , , , - , .





- , quorum. Quorum , , , , Etcd, , . . Patroni. Patroni Etcd, (), .



, Etcd, standby , .





.





https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-october/



– , , . GoCardless. Corosync + Pacemaker. , , .



Pacemaker quorum. IP , , watchdog. , , .





? RAID-. , . ? , () , Pacemaker , . . , .



, , , postgres’ . crash recovery, , failover .



, Pacemaker. , Pacemaker .



.





? ?



, , , . . . , Airbus A380. . - , .



, postgres’ , , . .



- , , . , , . .





https://github.blog/2018-10-30-oct21-post-incident-analysis/



. . GitHub, Postgres , .



GitHub -. , . . . , , , . , Jobs GitHub.com . Latency - 60 .





? . . , .



Jobs GitHub.com , , 60 . . . , , ( ).



, , . , , . .





? Failover . failover, , , , .



. Pg_rewind . MySQL, , . Postgres , .





https://about.gitlab.com/blog/2017/02/10/postmortem-of-database-outage-of-january-31/



, . GitLab. . .



- , . . WAL- , , .



pg_basebackup. pg_basebackup, . Ctrl+C, PGDATA, – .



? pg_basebackup checkpoint spread, . . , checkpoint. verbose mode, pg_basebackup.



- .



? , , . .



Pg_dump, , . , , , c . - , Postgres , , pg_dump .



Postgres, pg_dump. . 9.5, pg_dump 9.2 9.5. . . , .



– Microsoft Azure. , - , .



– LVM, . , staging. , – , . , .



. 6 . . . , . 24 . , , 6- .





?



RPO RTO , .



, - , , RPO 24 , .



. , .



, . Runbooks. , - . – pg_basebackup , .



, , . pg_basebackup , , . , . disaster.



, . . , .





, .





. , Patroni , . . CPU, . ., , , . , , . , , checkpoints . . .





. , alert. RTO ( : RTO, RPO) , , . . , Postgres, , . .





: « Patroni postgres’ ?». Patroni, , max connections , , .



Linux, huge pages, shared memory, semaphores, overcommit . .



Patroni postgres’ , . . shared_buffers, max_wal_size, checkpoint completion_target, random_page_cost . . . , , .





.



. RPO RTO. .





, RTO, , , HA, , , .



availability, . .



, , , .



. , . . disaster recovery .



. , ! , .





Questions



. , . , . . , HA . , . -. . . . disaster recovery – .



. , , , . , , , . , . .



, , . . , , , . – 100 % .



IBM, , .



, 99,99. .



, , . : IBM , . , . . , .



, , 10 .



, , . .



. , – . , . , . . .



switch - , . .



?



.



, , , switches?



, , .



, Patroni. , , - , . Patroni ?



Patroni recovery.conf . recovery_min_apply_delay, . Patroni – load balancing , , , stale . failover .



, !



! . , , , HA? , , witness. . Witness – . . . ?



, . , quorum. Quorum – . , . , . , . , . Postgres. . . -, quorum.



!



! , Zalando, , Amazon. , Amazon , ?*



, . - – . - . . . . , , . , . , , , . Amazon . . , . . . . Amazon .



, - , . , .



! . , Patroni? , , .



Nothing wrong.



Do you plan to integrate the metrics that Patroni is now giving back to Patroni slash in json into Prometheus format or not?



Basically, Patroni is an open source project. If you want, you can implement, make a pull request. I will be happy to watch it and smite it.



Thanks!



There even a ticket, in my opinion, was opened on this account.



No more questions, thank you everyone!




All Articles