Removing bottlenecks in PostgreSQL under high loads

Hello. Right now, OTUS is open to enrollment for the new PostgreSQL course stream . In this regard, we have traditionally prepared for you a translation of useful material on the topic.






Taking as a basis the article by Peter Zaitsev on MySQL Performance Bottlenecks , I want to talk a little about PostgreSQL.



ORM frameworks are often used to work with PostgreSQL these days. They usually work well, but over time the load increases and it becomes necessary to tune the database server. As reliable as PostgreSQL is, it can slow down as traffic increases.



There are many ways to eliminate performance bottlenecks, but in this article we will focus on the following:



  • Server parameters
  • Connection management
  • Autovacuum setting
  • Additional autovacuum setting
  • Bloating tables (bloat)
  • Hot spots in data
  • Application servers
  • Replication
  • Server environment


About "categories" and "potential impact"



โ€œComplexityโ€ refers to how easy it is to implement the proposed solution. And "potential impact" gives an indication of the degree of improvement in system performance. However, due to the age of the system, its type, technical debt, etc. accurately describing the complexity and impact can be problematic. After all, in difficult situations, the final choice is always yours.



Categories:



  • Complexity

    • Low
    • Average
    • High
    • Low-Medium-High
  • Potential Impact



    • Low
    • The average
    • High
    • Low-Medium-High


Server parameters



Difficulty: low.

Potential Impact: High.



Not too long ago, there were times when current versions of postgres could run on i386. The default settings have since changed, but they are still configured to use the least amount of resources.



These settings are very easy to change and are usually configured during the initial installation. Incorrect values โ€‹โ€‹of these parameters can lead to high CPU and I / O utilization:



  • Parameter effective_cache_size ~ 50 to 75%
  • Parameter shared_buffers ~ 1/4 - 1/3 the amount of RAM
  • Parameter work_mem ~ 10MB


The recommended value for effective_cache_size, although typical, can be calculated more accurately if we refer to โ€œtopโ€ - free + cached .



Calculating the value of shared_buffers is an interesting puzzle. You can look at it from two angles: if you have a small database, you can set the value of shared_buffers large enough to fit the entire database in RAM. On the other hand, you can configure loading only frequently used tables and indexes into memory (remember the 80/20). Previously, it was recommended to set the value to 1/3 of the amount of RAM, but over time, as the amount of memory grew, it was reduced to 1/4. If little memory is allocated, then I / O and processor load will increase. Too much memory allocation will be indicated by reaching the plateau of processor and I / O load.







Another factor to consider is the OS cache . Given enough RAM, Linux will cache tables and indexes in memory and, depending on the settings, can make PostgreSQL believe it is reading data from disk rather than from RAM. The same page is in both the postgres buffer and the OS cache, and this is one of the reasons for not making shared_buffers very large. Using the pg_buffercache extensionyou can see the use of the cache in real time.



The work_mem parameter specifies the amount of memory used for sort operations. Setting this value too low guarantees poor performance, since the sort will be performed using temporary files on disk. On the other hand, although setting a large value does not affect performance, with a large number of connections there is a risk of running out of RAM. By analyzing the memory used by all requests and sessions, you can calculate the required value.



Using EXPLAIN ANALYZE you can see how the sort operations are performed and, by changing the value for the session, determine when the flush to disk begins.



You can also use benchmarks systems.



Connection management



Difficulty: low.

Potential Impact: Low-Medium-High



High load is usually associated with increased client sessions per unit of time. Too many of them can block processes, cause delays, or even lead to errors.



The simple solution is to increase the maximum number of concurrent connections:



# postgresql.conf: default is set to 100<br />max_connections






But a more efficient approach is connection pooling . There are many solutions, but the most popular is pgbouncer . PgBouncer can manage connections using one of three modes:



  • (session pooling). . , . , . .
  • (transaction pooling). . PgBouncer , , .
  • (statement pooling). . . , .


You also need to pay attention to Secure Socket Layer (SSL). When enabled, connections will use SSL by default, which will increase the processor load compared to unencrypted connections. For regular clients, you can configure host-based authentication without SSL ( pg_hba.conf), and use SSL for administrative tasks or for streaming replication.



Autovacuum setting



Difficulty: Medium.

Potential Impact: Low-Medium.



Multi-Version Concurrency Control is one of the fundamental principles that make PostgreSQL such a popular database solution. However, one of the annoying issues is that for each changed or deleted record, unused copies are created, which must eventually be disposed of. An improperly configured autovacuum process can degrade performance. Moreover, the more loaded the server, the more the problem manifests itself.



The following parameters are used to control the autovacuum daemon:



  • autovacuum_max_workers. ( ). , . . . .
  • maintenance_work_mem. , . , . , .
  • autovacuum_freeze_max_age TXID WRAPAROUND. , , . , , , . , txid, . / txid pg_stat_activity WRAPAROUND.


Beware of overloading RAM and CPU. The higher the initially set value, the greater the risk of resource depletion when the load on the system increases. If set too high, performance can drop dramatically when a certain load level is exceeded.



Similar to calculating work_mem , this value can be calculated arithmetically or benchmarked to obtain optimal values .



Additional autovacuum setting



Difficulty: high.

Potential Impact: High.



This method, due to its complexity, should only be used when system performance is already on the verge of the physical limits of the host and this has really become an issue.



Runtime autovacuum options are configured in postgresql.conf. Unfortunately, there is no one-size-fits-all solution that will work in any high-load system.



Storage options for tables . Often in a database, a significant portion of the load falls on only a few tables. Customizing autovacuum settings for a table is a great way to avoid having to manually start VACUUM, which can significantly affect the system.



You can customize tables using the command :



ALTER TABLE .. SET STORAGE_PARAMETER


Bloating tables (bloat)



Difficulty: low.

Potential Impact: Medium-High.



Over time, system performance can degrade due to inappropriate cleanup policies due to excessive bloat of tables. So even setting up the autovacuum daemon and manually starting VACUUM does not solve the problem. The pg_repack extension comes to the rescue in these cases .



The pg_repack extension can be used to rebuild and reorganize tables and indexes in production



Hot spots in data



Difficulty: high.

Potential Impact: Low-Medium-High.



As with MySQL , PostgreSQL relies on your data streams to get rid of hotspots and may even change the architecture of your system.



First of all, you should pay attention to the following:



  • Indices . Make sure there are indexes on the columns being searched. You can use system catalogs and views to monitor and verify that queries are using indexes. Use the pg_stat_statement and pgbadger extensions to analyze query performance.
  • Heap Only Tuples (HOT) . There may be too many indexes. You can reduce potential bloat and reduce table size by dropping unused indexes.
  • . , , . , , , . , . , , .
  • . postgres. , .
  • . , . . , !








Difficulty: low.

Potential Impact: High.



Avoid running applications (PHP, Java and Python) and postgres on the same host. Be careful with applications in these languages, as they can consume large amounts of RAM, especially the garbage collector, which entails competition with database systems for resources and reduced overall performance.



Replication



Difficulty: low.

Potential Impact: High.



Synchronous and asynchronous replication. Recent versions of postgres support logical and streaming replication in both synchronous and asynchronous modes. Although the default replication mode is asynchronous, you need to consider the implications of using synchronous replication, especially on networks with significant latency.



Server environment



Last but not least, it's a simple increase in host capacity. Let's take a look at what each of the resources affects in terms of PostgreSQL performance:



  • . , . . , , -.
  • . , , . .
  • . .

    • -, ,
  • .



    • . , .
    • .
    • . .
    • WAL-, , , . , (log shipping) , , .












:






All Articles