Many systems have a pattern of constant accumulation of data over time. Moreover, most of them never change again - that is, they are written in append-only mode .
These are not only various kinds of logs and equipment metrics, but also such seemingly unrelated things as correspondence between users or comments on news.
About a year ago, I already wrote about the model for organizing the partitioning of such data and the cascading changes in the database structure caused by this. And today, using the example of our service for analyzing PostgreSQL server logs, we will analyze the features of servicing databases organized in this way, and how a competent approach (and a little night work) can reduce infrastructure costs significantly .
Something we are heavily loading the disk ...
The entire base of our service is "sawn" into daily sections and reorganized for maximum throughput. You can read more about this in the articles "Writing in PostgreSQL on the sublight: 1 host, 1 day, 1TB" and "Saving a penny on large volumes in PostgreSQL" .
(. " PostgreSQL"), "" - 100% , , :
" - , - !" ?
, 200MB/s:
... .
- ?.. , ! - !
? , ? 18 maintenance-? VACUUM, ANALYZE, CREATE INDEX ..
PostgreSQL, " PostgreSQL โ , ".
" ! - "
-, autoVACUUM/autoANALYZE:
, autovacuum (to prevent wraparound) - - "" , ! , "" , 150GBโฆ .
" , - , ."
, autoANALYZE?..
ANALYZE- append-only ! , PK.
, " " - autoanalyze. :
ALTER SYSTEM SET autovacuum_analyze_scale_factor = 1;
-- , (x2)
ALTER SYSTEM SET autovacuum_analyze_threshold = 100000;
-- ... 100K
"" . , - , .
:
INSERT ... ON CONFLICT UPDATE
- append-only ""
pg_repack
MVCC, "-". .
"" , , cron-, 00:15, " " pg_repack, "" :
pgrepack โ Postgres Pro Standard, . CLUSTER
VACUUM FULL
, ยซ ยป, . pgrepack , ,CLUSTER
.
VACUUM FREEZE
(append-only) - VACUUM FREEZE
, "" :
VACUUM FREEZE rawdata_20190419; VACUUM FREEZE rawplan_20190419; ...
!
20 , (disk busy) ~60%: