How not to oversleep problems in Postgres databases. Nikolay Samokhvalov (Postgres.ai)



To maintain a database in a healthy state, it is necessary to periodically look β€œunder the hood”, β€œprobe” it for early symptoms - in other words, do a preventive study, it is also a technical audit of the database, it is also a healthcheck.





.





:



  • Postgres . - .
  • , , , , , , , production.
  • , .


, , Cloud Vendors . - , Kubernetes . . provisioning, , , failover, . – DBA, . .



. .





.



Postgres ? , . , - , Postgres? . .



Open source , , . , . , Postgres , plain English. , .





. Postgres.





HA, downtime , autofailover, .





, .





pet, . . , , cattle . .



, , - , , backend engineers. , , , . - . .



. - , – . , , . . .



. , . , , , - -.



– . . . , . , , . .





, ?



, , , . .



, 100 Postgres. , . , . , .



, , . , . . , . .





, . cloud , instance, . self-management cloud , , . , , . failover , , .





, , - .





, - , . . , , . , , , , - .



- performance. , workload , .



- . - - , , , , fsync . , hard reset . .





, , .





. open source , . 1.1, 3-4 1.0, . . production , . open source, GitLab . , - .



.





. , , CPU . , .



. , , . . Datadog, . Grafana - pg_stat_statements. Okmeter, . .



. - . , , bloat. 10-20 . . 5 – . . , .



Postgres-checkup – , , .



, - . . - , . . Postgres-checkup .





, . . . , - , , . , .





. – , , - .



, , , - - . , , , . . , – . , , , , , . .





– . , Postgres-checkup. . Postgres-checkup . .



Postgres-checkup' - , , , 30 - , .





. - toolset. , . , .



https://github.com/NikolayS/postgres_dba



https://www.youtube.com/watch?v=V-cwPLtDtSY



Postgres_dba, GitHub. , PSQL . : Β«, , Β». . - .



. . , , PSQL-, , . . . .





pg-utils, pgsql-bloat-estimation, pgx_scripts, pgcluu, check_postgres.pl, pghero, sqlcheck, heroku-pg-extras



. . , pghero, Ruby. , .



, check_postgres.pl, Perl . .



, :



  • .
  • .
  • multi-node analysis.


Postgres-checkup.





?





https://gitlab.com/postgres-ai/postgres-checkup



GitLab. Postgres_dba, .



, , . : , . , , .





30 , . . , - - . , bloat-. statement_timeout . . , , . , , . , .



, application-name checkup. , , .





, , multi-node analysis. .



, , , , 4 . – 10 . 10 – , 2 . , , . .





?



, . ssh-.





. , DBA . , DBA, . remote-ssh PSQL . .



– . RDS Postgres, , , . .



RDS , , instances, . .





, , JSON-. . Postgres. , . . . . JSON, - . , machine friendly.



markdown report. Markdown reports , , , GitHub issues GitLab issues. , . , , report GitHub issues, : Β«, - , - Β», . markdown , - .



markdown PDF html. html, JSON , . . , . machine friendly human friendly. , , , .





. . docker-, , GitLab. . .



. . , . Gitlab.com. checkups, reports CI/CD . Kubernetes runner, . , . , .





.





. 50 , . issues , . . , .



. . , . , . . , - autofailover . , . . . .



. , bloat, , , . .





.





– . runner GitLab’ . , , . 1 VCP . , . .







Conclusions, recommendations , , .



. , 3 :



  • 1 – , . . , . , , . , .
  • 2 – conclusions, . . . plain English Russian. : Β« , - . , Β».
  • 3 – action item, . . , , , , , .


. , .



conclusions, recommendations , . 5 , -, . conclusions, recommendations . . checkup@postgres.ai.





. , .





, , – , – , . . .



? switchover failover, . , .





https://why-upgrade.depesz.com/



, , GNU , , . , , , community . , , , , , , .



, , . : why-upgrade.depesz.com. , . – - .





, -, . . – .





-, , , , , . . 9.5 9.6, , failover , . 9.6 – .



, . , fsync. Linux fsync , , fsync , , . Postgres , , .



. , : Β«, , , , corruption , , ? , , corruption?Β». , – , – , .





. , . Β« Β». , , . current database. . , .





, , . . ALTER EXTENSTION .





. . , , , , 100. . , , , .





, , . . -, log_distination, . . csv-, .



, . . , . , failover, , . .





Work_mem . . HA, . , work_mem , , . – . .





, – ALTER SYSTEM. ALTER SYSTEM, postgresql.auto.conf. , - ALTER SYSTEM. , , . , , , , , , Ansible, Puppet . .



, 3 . , Git, , .





– Postgres- . PGDATA. PGDATA, WAL directory, – , . .



, mount point . , . , rotation disc. , .



. SSD. .



stats_temp_directory. . .



, . . . , , , .





. HighLoad Backend Conf . YouTube. , , Postgres -. bloat, . . , .





, . , . . , , . . observations , autovacuum , , , , . Postgres 12- .



, 12- - 10 .



12- , , - . , bloat .





, , transaction id wraparound.



mailchimp, . . , transaction id wraparound. .



, 50 % . capacity used . . 10 %, Postgres, 10 % autovacuum .





Heap bloat – table bloat. human friendly , . . , . , . , 50 %. , . , bloat.





, index bloat – , , , index scan index only scan. .



, index bloat – . 12- , 12- . , .



. 40 %, . 50 % β€” . 90 % β€” , 10 . 90 % β€” bloat, 10 % β€” . 10 . , .



90 % 99 %, 9 % , 10 100 . . .



. bloat , .





: , estimated. , . .



, , , bloat, .



int4, int8, int2, int8. , internals, , panting aliment. int4 int8 4 , . . - bloat . .





. 1 000 000 . bloat , 0. bloat , 23 % bloat.





, 31 %. , .



. checkup’ , database lab , . . – , vacuum full. . . . , bloat.



, , estimates . .





Index analysis – . , . 20-30 % . , , .



– 2: 1 – , 2 – .



– 2 . , .





, , . Postgres-checkup , , . .





. , , . , .



, , , , , , . , .



, , . . , . . Postgres, . Reference table, . . . , , Seq Scan , , .



, , , - . - , .





. . . , . . .





. Pg_stat_statements . , , . pg_stat_statements . , – .



. Pg_stat_statements. ? Postgres . , buffers buffer pool. . , . read, . . buffer pool , . , Postgres , , , . , .





, pg_stat_kcache, pg_stat_statements. , . , overhead – .



- , , .



. . . , - . , - SELECT, INSERT.



Pg_stat_statements , . . SELECT * FROM TABLE WHERE id = 10, id =$1 , .



. , - .



– total time. .





, checkup , , . .



, 10 15 , . . , . .



, total time , . , . . 50 % , .



? 4 . . duration, . , 10 , 15 , .



, - . , . Total time 1,848 s/sec. , , , . . . , , . .



– , . Calls. 1.00/call, , , . . , , , 10ms/call . average latency.



, pg_stat_statements , average, , – . .





, – 10 . total time, calls – 171,21/sec ( ). . . , . , 4,03 blk/call ( call), .. I/O , . . . . . , , CPU bound.





– workload, . . Total time – 19.95 % , . . 1/5. – 19,06 %. .





30 %, , .





. , , , 7,63 % .



. , . . , . . , Okmeter , -, .





? : . , workload 9,264s/sec ( ), 12 . . , load average . .





. , , first word analysis. SELECT , INSERT, UPDATE. . , , .







– int4. (Ruby, Java) int8 . int4 2,1 .



, , primary key int4, . , . . . .





:



  • ,
  • .




Questions



! ! . , , , , Postgres , . . . Postgres : Β« slave, Β». , ?



. . -, , . , - , . , .



, rolling update – , . - . . . , . , . . . . checkups , .



.



. , . , Ansible, Puppet Kubernetes . , , . . , .



! , , , ?



. observations, recommendations. , , . . – , . . USS enterprise, , .



5 . , , . .



observations . , – , , , . , .



, - . , DBA. Postgres , , ?



. , , - . . observations human friendly. , , 5 ( ), , . – . , , , . . .



, ?



, USA market enterprise. , open source. , - . .




All Articles