Familiarity with pg_probackup. The second part

image



We continue to get acquainted with the pg_probackup tool .



In the first part, we installed pg_probackup, created and configured an instance, took two backups - full and incremental in DELTA mode, learned how to view and change the instance configuration. We got a list of backups, wrote a script (bkp_base.sh) that backs up the cluster and sends the results of the last backup operation to the monitoring system. Today we will tackle equally interesting tasks.



Problem 2



Given: We have two servers, on the first we have our database (hostname srv_db1, user postgres), and on the second we will store backups (hostname srv_bkp, user backup_user). But in addition to backups on the same server, we will store copies of the pre-recording logs in order to be able to restore to an arbitrary point in time (Point-in-time recovery) within the last 3 days.



Solution:



In order to be able to restore to the selected point in time (restore point), we must have a backup made before the restore point, as well as all WAL files from the moment the backup began to the restore point.



We already have backups, it remains to configure WAL archiving on srv_bkp.

Connect to srv_db1 using postgres user and execute the following commands:



ssh-keygen -t rsa
ssh-copy-id backup_user@srv_bkp


Let's modify the ~ / .ssh / autorized_keys file to srv_bkp:



command="pg_probackup-12 agent" ssh-rsa AAAA....


Back to srv_db1, you need to enable the archive mode (archive_mode) and configure the archive_command parameter. It contains a command to back up a full WAL segment.



psql -c 'show archive_mode'


if it returns off, then change to on



psql -c 'alter system set archive_mode = on'


In order for the change to apply, you need to restart PostgreSQL, but for now we will postpone this action and configure one more parameter.



If the stream of WAL files is large enough, the backup server may soon run out of space, so we can insert the --compress option in the archive_command line, then the log files will be compressed before being sent to srv_bkp. And we will not need to worry about the fact that these files will need to be unpacked separately during recovery - pg_probackup can work with compressed files.



alter system set archive_command = 'pg_probackup-12 archive-push -B /home/backup_user/backup_db --instance=db1 --wal-file-path=%p --wal-file-name=%f --remote-host=srv_bkp --remote-user=backup_user --compress';


Now you can restart.



What we did in the first task is called an offline backup. It is called so because such a copy contains all the WAL files necessary to restore it. In the current task, we are moving from offline copies to archive copies, such copies do not contain the necessary logs inside them, but this does not matter, because we are going to save all the WAL files we need to an archive. When restoring from backup copies, WAL files will be copied from the archive.



Since in the situation under consideration we are switching from offline backups (made in stream mode) to archived (in archive mode), a situation may arise in which we made a copy when the archive mode was not yet enabled, after which some of the WAL segments have already appeared removed. This means that the first backup after switching to archive mode cannot be made in PAGE mode, because the segment of WAL in the archive between the past and the current copy may not be complete.



Let's do this using the script created in the first task:



./bkp_base.sh DELTA


Then we will create our first backup in PAGE mode



./bkp_base.sh PAGE


Let me remind you that there are three incremental modes available: PAGE, DELTA and PTRACK. They differ from each other in the ways of obtaining the necessary information about the changed pages:



  • DELTA ,
  • PAGEWAL , ,
  • PTRACK — - Block Change Tracking Oracle — , . ( ..). , .


Now let's think, the backup copy, in order to be able to recover from it, needs the WAL files created during the creation of the backup. Those. if we make an incremental backup for 30 minutes and during these 30 minutes 10 GB of WAL was created, then only these 10 GB are needed so that we can consistently recover from it. All other WAL files are needed only for the purpose of recovering to a selected point in time (Point-in-time recovery).



The task indicated that we want to be able to recover at any point in time over the past 3 days. That is, all WAL for this period must be saved, in addition, it is necessary to save all WAL that are needed to restore from previous backups, but we do not need to store all the other WAL files!



And, if we can use the find command to remove obsolete WALs, adding mtime and -exec rm {} to it, then determining which WAL segment is needed to consistently restore a particular backup becomes not an easy task. It's good that the developers thought about this and added the --wal-depth parameter, with which you can set the WAL storage depth, calculated in backups.



It can be schematically described as follows:







Please note that it is now somewhere in the middle of Saturday, which means that we can delete all WAL files that are not needed to restore backups older than three days (brown color on the graph). Those WALs that are still needed are highlighted in yellow. Here a logical question may arise - but after all, it is already the middle of Saturday, which means that some of the morning logs created on Wednesday are no longer needed, and it can be deleted. Yes, this is so, and you can configure the deletion of excess WAL at least every minute, but in our article we will delete logs along with the creation of backups, so despite the fact that they no longer fall into the retention policy, they will be deleted when creating the next backup. ...



Change the settings of the db1 instance - add the lifetime of WAL files



pg_probackup set-config --instance db1 --wal-depth=3


Let's check that the settings have been applied:



pg_probackup show-config --instance=db1 | grep wal-depth


Add the --delete-wal flag to the backup command in the bkp_base.sh script , and also remove the --stream switch, because we are switching from offline backups to archived ones



pg_probackup backup --instance=db1 -j 2 --progress -b $1 --compress --delete-expired --delete-wal


At the moment, we have configured the creation of archive backups on a separate server. Also, log files are added here, which gives us the opportunity to use recovery not only for a specific backup, but also to perform Point-in-time recovery - recovery to a specific point in time.



Since we now have an archive of WAL files, we can use PAGE mode to create incremental backups, let me remind you that in this mode, changes relative to the previous backup are calculated not by data files, but by WAL accumulated since the previous backup.



PS For educational purposes only! Let's create a table in the database on the srv_db1 server:



psql -c 'create table timing(time_now timestamp with time zone)'


Then we will write the following line in crontab:



* * * * * psql -c 'insert into timing(select now())'


Every minute, information about the current time will be recorded in the database, this information will be useful to us when we restore to a point in time.



Problem 3



Given:



We have two servers, on the first we have our database (hostname srv_db1, user postgres), on the second one stores archived backups and WAL files (hostname srv_bkp, user backup_user). Another server srv_db2 appears in our environment (user postgres), on which we will have to deploy a replica of our cluster and reconfigure pg_probackup so that it takes backups from the replica.



Decision:



The Internet is full of descriptions of how to create replicas in PostgreSQL, you just have to drive "creating a replica in PostgreSQL" into the search engine - choose, I don't want to! There are documentation, articles, and even video tutorials. And all of these methods are good, but they usually do not take into account that we already have backups. We want to create a replica using a backup, so we remove the reading load from the master. That is, our production server will not be aware that somewhere next to it a replica is being created (this, of course, with reservations - here the replication slot will need to be created and access rights set, but while we are creating a replica, there is no additional load on the master will not be).



We set up access by keys between the srv_bkp and srv_db2 servers, install PostgreSQL and pg_probackup on srv_db2 (we did everything except the PostgreSQL installation during the first task, but if you have any questions about installing the DBMS, take a look here ).



Go to srv_db2



ssh-keygen -t rsa
ssh-copy-id backup_user@srv_bkp


Go to srv_bkp



ssh-copy-id postgres@srv_db2


Let's turn on our internal paranoid and edit ~ / .ssh / autorized_keys - insert



command="pg_probackup-12 agent"


before new keys.



Using WAL files is much slower than restoring from a backup, so let's create another incremental backup - connect to the srv_bkp server using backup_user and run the command:



pg_probackup backup --instance=db1 -j 2 --progress -b PAGE --compress


Why didn't we use the script we created? The fact is that earlier we added the --delete-wal option to the script, that is, after creating this backup, we would not be able to restore to a point in time that was three days ago. But if we leave this backup, then the next backup made by running our script will still leave WAL only for the last two days, that is, after we recover from this backup, it makes sense to delete it.



We make recovery:



time pg_probackup restore --instance=db1 -D /var/lib/pgsql/12/data -j 2 --restore-as-replica --progress --remote-proto=ssh --remote-host=srv_db2 --archive-host=srv_bkp --archive-user=backup_user --log-level-console=log --log-level-file=verbose --log-filename=restore_rep.log


The / var / lib / pgsql / 12 / data directory must be empty, in addition, on the srv_db1 server, you must make changes to pg_hba.conf - allow access from the srv_db2 server using the replication protocol.



host    replication     all             srv_db2                 md5


We re-read the configuration:



psql -c 'select pg_reload_conf()'


Checking for typos:



psql -c 'select * from pg_hba_file_rules'


create a file ~ / .pgpass on srv_db2, in which we specify the connection permissions in srv_db1 but this time with the replication base and start PostgreSQL.



srv_db1:5432:replication:backup:Strong_PWD


And let's change its rights to 600:



chmod 600 ~/.pgpass


We start the cluster on srv_db2.



Let's check that everything works well. We will use the following possibilities for this.



We look into the replica log file in it, somewhere near the end, the following line should appear:



Database system is ready to accept read only connections


psql -c 'select pg_is_in_recovery()' 


should return t



Now let's create a plate t1 on the wizard:



srv_db1: psql -c 'create table t1()'


Let's check if it appeared on the replica.



srv_db2: psql -c '\d'


The plate is in place, then replication is working. We remove the plate on the master.



srv_db1: psql -c 'drop table t1'


Of course, on a real database, now it would be necessary to create a replication slot on the master and configure the replica so that it goes to the master through this slot, but the topic of our article is not replicas, but backups, so we will continue.



So, the replica works for us, if necessary, we can switch to the replica, but let's ask ourselves a question - can we do better?

Of course you can. Why do we need to take backups from the master when we can remove the reading load from the master and transfer it to a replica?



CAUTION! REPLIC LACK MONITORING MUST BE MONITORED, OTHERWISE MAY TAKE OUT THAT YOU WILL NOT KNOW THAT THE REPLICA LAGGED AND WILL CONTINUE TO BACKUP FROM THE REPLICA LAG.



Let's do that!



We make changes in the cluster settings on the srv_db1 and srv_db2 servers:



alter system set archive_timeout=180;
select pg_reload_conf();


Go to srv_bkp and change the value of the remote-host parameter:



pg_probackup set-config --instance db1 --remote-host=srv_db2


We make changes to .pgpass on the srv_bkp server - add the connection strings to the srv_db2 server:



srv_db2:5432:replication:backup:Strong_PWD
srv_db2:5432:backupdb:backup:Strong_PWD


And let's try to take another backup.



srv_bkp: ./bkp_base.sh PAGE


We see that the backup has worked.



The problem is solved!



The next part will be devoted to restoring from backups: we will consider various recovery options, learn how to restore to a specific backup, to a point in time, get acquainted with partial and incremental restores.



All Articles