Hello! My name is Pavel! I work for 2GIS. Our company is a city information guide, navigation service. This is a very good thing that helps to live in the city.
, -. Infrastructure & Operations, IO. -. .
Stack, , . Kubernetes, Golang, Python, Bash. Elasticsearch, Cassandra , , Postgres, , .
, Postgres. , : , , , , , production.
Postgres, Postgres, , . , – , , . .
. . . PHP, Python Java Scala. Golang, Node.js , FrontEnd.
Postgres. MySQL, Mongo, , Postgres.
DevOps, , . . , . , Postgres. Postgres, , .
, Postgres , , . - , Kubernetes, Kubernetes production-. , , Postgres , , . , .
Postgres Ansible. -, , Chef, .
– , , IO, .
, , Postgres. , , . , . , , . . 9.4 Postgres Postgres 8- .
, - , . . : « , ?». , . . .
, production, , , autofailover , , . – - , - .
, Postgres – . pooling, , . . , .
, point in time recovery. WAL-.
.
: infrastructure as code, , . . Ansible. - Python Bash .
, , . - , , , inventory . . . «deploy» .
Gitlab, , Gitlab, continuous integration, continuous delivery. . . «deploy» .
.
PostgreSQL
- PostgreSQL 9.4. 3,5 . 9.4 baseline. 9.6, -, , . 9.6, . . .
- . – . , . .
- HA. repmgr 2ndQuadrant. repmgr . , HA, , , , failovers . . . repmgr . . downtime, . 2 repmgr , failover .
- repmgr failover_command. , repmgr , failover, . . . . . . , , , . . . . failover, . , . .
- . – . . , , . repmgr.
, , , ping’, . . warning, . . , - , , . .
, . . repmgr , . hot standby, . restore - WALs. , . , , .
, , . . . , , , , . fallback , .
. . , archive_command, rsync, . . rsync ssh. , , . , , , streaming . , .
, .
. keepalive. keepalive , . , keepalive , , . . .
DNS round robin, . . , , .
keepalive PgBouncer. pooler, . , . . . Pgpool. , , , , slave.
, , , . Pgpool . . , , – , , . . . , read only . , .
Pgpool. PgBouncer . PgBouncer’ . failover_command PgBouncer. PgBouncer .
, , Pgpool slave. , slave .
real only . . . , , .
, . , . . PgBouncer HAProxy, , slave round robin .
pooling , . . Hikari Pool. , , . .
, .
Barman.
- Barman cron , .
- archive_command WAL- Barman, PITR .
, , . , , . - WAL-, , PITR . , , . , , . , , .
:
- archive_command.
- , WAL- , , , .
- Prometheus. pool- . . . . , ( , ) exporter, Prometheus, .
- exporters Golang, Python, Bash.
- exporters. Node exporter – exporter; Postgres exporter, Postgres; exporter PgBouncer Python. Cgroups exporters Golang. , , .
- : , , , .
- . exporter’ Postgres , exporter’ , . , , . . ., , , Postgres . , , tuples, WAL- , , . , . . . . - Prometheus.
Grafana. Cgroups exporter’. , , , . . .
alerts. Alerts Prometheus AlertManager.
alerts, . . . , failover, failover , failover . . alerts, , Slack, : «, !».
, . , , , , . . .
, , , , .
. ELK stack , . . Elasticsearch, Kibana LogStash. , Postgres CSV-, Python Beaver. . LogStash. LogStash - , , , , - .
Kibana , , , .
Pgbadger. , , , slow . , . Perl. , . Kubernetes. Kubernetes. LogStash. LogStash HTTP Pgbadger , . , , .
. , .
Ansible. , , , . , Postgres. Postgresql.conf, - . , , .
, Ansible. 25 . , . . . . , , - , , .
20 . 12 , . . , . , , .
. . . , , , . GitHub, . .
Ansible. . . . testing, production staging. staging production, . , Ansible . - testing, - , environments, , «deploy». .
, . Ansible Vault. Ansible .
Ubuntu Linux , . , , apt. , , - , , .
. . . : , , real only , - , , , prepared statements, .
bootstrap . . . , , , . , . . . . .
Git-, , Gitlab.
. workflows. . -. . - , . . , , , , . , , . .
OpenStack. . , . Openstack API Ansible, OpenStack, heat-api, . . - , , heat-api . . . : « . ». , . . OpenStack . .
bootstrap, . Debian, Ubuntu. . . , . , , , , . . . , .
. Ansible. , . . . . . , , , , , . . .
. Gitlab.
Gitlab jobs. Jobs – , .
. . syntax check, , . . pipeline, syntax check, . . , Ok, , staging production.
, , , , , testing staging production.
. ? , , . , . . . 15 «deploy», . endpoint . , pipeline, . . .
. , . . .
. , Postgres, . , . , , , .
. rep managers, , , Postgres. , .
Failover . Failover, , . failover. Failover .
switchover, . . failover . Switchover . , , . switchover, . downtime, . repmgr (Replication Manager) – , -, .
falls :
- , . . , , , , rep managers. . . , , .
- - , . .
- . , Postgres, - postgresql.conf, . , , . . , . .
- – . . , , , downtime. downtime.
- , . , pushes .
Kubernetes, - . , Postgres Kubernetes. Stolon, Patroni. , , .
, . . , - , - , . , , , , , .
– PostgreSQL 10-11. , «» . , , .
! Ansible, ?
, - - . 2-3 .
- Ansible Postgres, ? , - ?
Ansible?
.
, Ansible, . . , .
! Barman ? , 9.6 . , 10 11 , ?
, - Barman. , . . . Ansible. - , , WAL-G. , , WAL-G (: ). , - . S3, , , ?
. 9.6. . 9.4 .
! , . , ?
.
staging?
. . testing, staging. staging, , -…. , , . , WALs, . , , …. , , , . . , ? production .
! ! . Postgres?
.
, Postgres Professional, . . Postgres Pro Standard. , . pg_probackup, Barman, . ?
, . , - Postgres Pro’ . - . , , . .
. . : , .
.
!
!