Sometime in the distant future, automatic deletion of unnecessary data will be one of the important tasks of a DBMS [1]. In the meantime, we ourselves need to take care of deleting or moving unnecessary data to less expensive storage systems. Let's say you decide to delete several million rows. Quite a simple task, especially if the condition is known and there is a suitable index. "DELETE FROM table1 WHERE col1 =: value" - what could be easier, right?
Video:
I have been on the Highload program committee since the first year, i.e. since 2007.
And I have been with Postgres since 2005. Used it in many projects.
A group with RuPostges also since 2007.
At Meetup, we have grown to 2100+ participants. It is the second largest in the world after New York, ahead of San Francisco for a long time.
. , . Postgres. .
https://postgres.ai/ – . , , .
- , Postgres - . , , , , DBA . , .
https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
VLDB -. . , , . .
– 1 000 000 . , 100 . .
https://vldb2019.github.io/files/VLDB19-keynote-2-slides.pdf
? , . . .
, , . , , .
, , , , . . , , . .
, . – . , - , . - , . . . - , , . .
, , - .
, , . . .
. , – . staging – . – . – .
. DELETE .
, , . .
. - , .
dev, staging – . , , . , .
? . , - . , , . , DELETE .
? , . ?
, review, . . DBA- . , prod, .
, - .
.
- , Postgres MySQL .
, , - .
, - , ?
DBA. DBA , : « ». , GitLab, GitHub code review , DBA prod, DBA : « ».
, disk IO , , , .
– . , prod , , staging . , 1 000 , .
, , . . , , . -.
. , , . , . , YouTube .
, ? , latency . , 100 %. , NVMe , , , . , , .
, . . Switchover. . .
- ? DBA , checkpoint tuning. , checkpoint tuning.
checkpoint? . , . , , , write-ahead log. - , , , , REDO. . , checkpoint. .
Postgres . 10-15- . checkpoint – .
c Postgres check-up, . . . - . , checkpoints 90 % .
? . Checkpoint timeout , , 10 . , .
max_wal_saze 1 . , Postgres 300-400 . checkpoint .
, , , , checkpoint , 30 , . .
, . . . max_wal_size. .
, , . . , .
, .
– max_wal_size. . 1 . DELETE .
, . , disk IO . , WAL , . . . , checkpoint . , .
max_wal_size. . , . . , 10 – , 1, 2, 4, 8 . . , prod. , Postgres .
, , DELETE, checkpoint’.
Checkpoint - – .
: DELETE , «» .
. . 1 max_wal_size , . – , . . . , DELETE .
prod, , , DELETE .
, 16 , , . – , . . , . . – . – . , , 16 .
64 , . , , - .
?
, , checkpoint tuning, , , , .
checkpoint , , , , , , , , . checkpoint , , .
checkpoint . .
. Postgres 8 , Linux 4 . full_page_writes. . , , , , .
WAL , , checkpoint , , . . 8 , , 100 . .
, .
, , checkpoint , . checkpoints, , full_page_writes = on , , . . WAL . , , .
, , .
max_wal_size, , checkpoint, wal writer. .
. ? , , checkpoint . REDO . .
, checkpoint , , kill -9 Postgres .
, , , . . REDO .
, . -, checkpoint, , . , -, . checkpoints -, , , WAL checkpoint. . . .
max_wal_size , , max_wal_size 64 , 10 . – . -. , - : « ? 3-5 ?». .
. Patroni. , , . autofailover Postgres. GitLab Data Egret .
autofailover, 30 , 10 ? , . . . , .
, . , , - 10 .
- , autofailover. , , 64, 100 – . . , .
, , , max_wal_size =1, 8, . . , . ?
, , . .
. , «BEGIN, DELETE, ROLLBACK», DELETE . . . , . . bloat . DELETE.
DELETE c ROLLBACK checkpoint tuning, database labs.
«i». Postgres . , . : ctid, xmid, xmax.
Ctid – . , .
, ROOLBACK . . . , . .
Xmax – . , Postgres , , , 0, – . , DELETE . database labs .
. DBA , : « ?». . , .
, . . DELETE . 20 , . , , , .
?
, , . , , throttling.
. , , , , , , autovacuum, , . , - , , - . .
https://postgres.ai/products/joe/
. , : « ?».
, , transaction overhead, . . . .
: , .
? , . . 50 . - , . , . - 100 , , , 100 , . .
, 10- , , - . , . . , transaction overhead . , .
. - . . . . . DELETE UPDATE.
, , , DELETE. , .
, . index scan, index only scan. . . .
, . , , - . , . database labs.
-, production . , , , . , . , - , .
, , . . . 30 , . - RESET, . . . .
https://docs.gitlab.com/ee/development/background_migrations.html
? 3 , .
. . , . . 100 , 5 , 1 000 . , . ID . .
– . Gitlab. . ID , 10 000 . - . . .
, , . .
. . , . . , , ID. ID, index only scan, heap .
, index only scan – , index scan.
, . BATCH_SIZE . , . , , , , . for update skip locked. Postgres , . . CTE – . CTE – returning *
. returning id, *
, .
? , . . ID created_at -. min, max . - . . .
. , , , heap only tuples updates. . . Postgres . pg_stat_user_tables . – hot updates .
, . updates, , . ( - updates, -), . . . , . . , - , , -, , .
, batches , - . .
— https://gitlab.com/snippets/1890447
Blocked autovacuum — https://gitlab.com/snippets/1889668
Blocking issue — https://gitlab.com/snippets/1890428
№ 5 . Okmeter Postgres. Postgres, , . - , - . Okmeter – , , . .
, dead tuples . , - . , , . .
IO, , .
. OLTP . , .
– ? . . autovacuum . . hot standby , . , .
, alert . . autovacuum. Avito, . , , autovacuum. , - . alert .
issues. . - - . Data Egret CTE, . . . . statement_timeout . lock_timeout .
.
, – . , . 2 – .
, production, . production.
. , . DBA , , . .
. , , REPACK . . , .
, , . -. : . , .
, , open source. GitLab. , DBA. database lab, . . , Joe. production. Joe slack, : «explain - » . DELETE , .
, 10 , database lab 10 . 10- 10 . , . , . . . .
thin provisioning. . , , . . . .
: 5 , , 30 . , . . , .
Postgres.ai . , , . . . .
, , , , . . . , , . , , . ?
. , pg_repack, , , 4- . , , 8 .
. . . . . . . . pg_repack. . , , . , . . , . , , .
pg_repack GitHub , , int 4 int 8, pg_repack . , , . , pg_repack : « », . . , . .
, .
Bloat , . , , . . . . Python, .
MySQL, . .
, 90 %. 5 %, .
! prod, - , ?
. . , , , , - . , . , , , .
! , Postgres, - , . . Postgres, - DELETE deferent - , , - ?
SQL , Postgres ? . . . .
.
, checkpoint tuning . - , , . .
, , , ? .. ?
, . Nancy, checkpoint tuning. - Postgres? , . . , . . . , auto tuning . . checkpoint tuning . . . performance, shell buffer . .
checkpoint tuning : , cloud, Nancy . max_wal_size . , .
! . , autovacuum . ? . ?
Autovacuum – , , . , , . , . , . autovacuum – . OLTP: autovacuum. hot standby feedback , autovacuum , . , , . . , – . -, . . . . , autovacuum, .
! , . , , . , . . . live, live, , 60-70 %. ,
DBA, , , — . , . , , production . . . -. , – . .
garbage select , , deleted flag
, autovacuum Postgres.
, ?
Autovacuum garbage collector.
!
! , - ?
, .
Is it possible then to protect ourselves if we have locked a table that should not be used?
Of course have. But this is a question like a chicken and an egg. If we all know what will happen in the future, then, of course, we will do everything great. But the business is changing, there are new columns, new queries. And then - oops, we want to remove this. But this ideal situation occurs in life, but not always. But overall it's a good idea. Just truncate and that's it.