A guide to backing up databases

“Oh, no shelter can withstand a meteor impact. But you, like everyone, have a reserve, so don't worry.



Stanislav Lem, "The Star Diaries of Iyon the Tikhiy"


Backing up refers to keeping a copy of your data somewhere outside the primary storage location.







The main purpose of backup is to restore data after its loss. In this regard, we often hear that if you have a replica of a database, you can always restore data from it, and you do not need a backup. In fact, backup allows you to solve at least three tasks that cannot be solved using a replica, and you cannot initialize a replica without a backup.



First, a backup allows you to recover data after a logical error. For example, an accountant deleted a group of transactions, or a DBA destroyed a tablespace. Both operations are absolutely legitimate from the database point of view, and the replication process will reproduce them in the replica database.



Secondly, modern DBMSs are very reliable software systems, but occasionally damage to the internal structures of the database occurs, after which access to the data is lost. What is especially offensive, such a violation usually occurs at high load or when installing some kind of update. But both the high load and regular updates indicate that the database is by no means a test database, and the data stored in it is valuable.



Finally, the third task, the solution of which requires a backup copy, is cloning the database, for example, for testing purposes.



Database backup is somehow based on one of two principles:



  • Fetching data with subsequent saving in an arbitrary format;
  • Snapshot of database files and saving logs.


Let's take a closer look at these principles and the tools that implement them.



Uploading data



The set of utilities that come with any DBMS must have tools for uploading and downloading data. The data is stored either in text format or in a binary format specific to a particular DBMS. The table below lists these tools:



Binary format Text format

Oracle DataPump Export / DataPump Import

Export / Import
SQL * Plus / SQL * Loader
PostgreSQL pg_dump, pg_dumpall / pg_restore pg_dump, pg_dumpall / psql
Microsoft SQL Server bcp bcp
DB2 unload / load unload / load
MySQL mysqldump, mysqlpump / mysql, mysqlimport
MongoDB mongodump / mongorestore mongoexport / mongoimport
Cassandra nodetool snapshot / sstableloader cqlsh


The good thing about the text format is that it can be edited or even created by external programs, and the binary format, in turn, is good because it allows faster unloading and loading of data due to parallel loading and saving resources on format conversion.



Despite the simplicity and obviousness of the idea of ​​downloading data, this method is rarely used to back up loaded industrial bases. Here are the reasons why offloading is not suitable for a full backup:



  • the unloading process creates a significant load on the source system;
  • unloading takes a lot of time - by the time the unloading is finished, it will become irrelevant;
  • it is almost impossible to make a consistent unloading of the entire database under high load, since the DBMS is forced to keep a snapshot of its state at the time the unloading starts. The more transactions have been committed since the start of the upload, the larger the snapshot volume (out-of-date copies of data in PostgreSQL, undo space in Oracle, tempdb in Microsoft SQL Server, etc.);
  • unloading preserves the logical structure of data, but does not preserve their physical structure - parameters of physical storage of tables, indexes, etc.; rebuilding indexes at boot time can take a long time.


Nevertheless, unloading has advantages:



  • high selectivity: you can unload individual tables, individual fields, and even individual rows;
  • the uploaded data can be loaded into a database of another version, and if the upload is made in text format, then into another database.


Thus, unloading is mainly used for tasks such as backing up small tables (for example, directories) or distributing datasets with the next release of the application.



The most common method for backing up databases is copying database files.



"Cold" saving of database files



The obvious idea is to stop the database and copy all of its files. This is called a cold backup. The method is extremely reliable and simple, but it has two obvious drawbacks:



  • from a cold backup, you can restore only the state of the database that was at the time of shutdown; transactions made after the database restart will not be included in the "cold" backup;
  • not every database has a technological window when the database can be stopped.


If you are happy with cold backups, remember that



  • «» . , «» , . , Oracle online redo, , , . PostgreSQL , , .
  • , . , «» .


«»



Most modern database backups are performed by copying the database files without stopping the database. Several problems are visible here:



  • At the start of copying, the contents of the database may not coincide with the contents of files, since some of the information is in the cache and has not yet been written to disk.
  • During copying, the contents of the database may change. If mutable data structures are used, the contents of the files change, and when immutable structures are used, the set of files changes: new files appear, and old ones are deleted.
  • Since writing data to the database and reading database files are not synchronized in any way, the backup program can read an incorrect page, in which half will be from the old version of the page and the other half from the new one.


In order for the backup to be consistent, each DBMS has a command that informs that the backup process has begun. Syntactically, this command may look different:



  • in Oracle it is a separate command ALTER DATABASE / TABLESPACE BEGIN BACKUP;
  • in PostgreSQL, the pg_start_backup () function;
  • on Microsoft SQL Server and DB2, preparation for a backup is implicit during the BACKUP DATABASE command;
  • in MySQL Enterprise, Percoba Server, Cassandra, and MongoDB, preparation is implicitly performed by an external utility - mysqlbackup, Percona XtraBackup, OpsCenter, and Ops Manager, respectively.


Despite the syntactic differences, the process of preparing for a backup looks the same.



This is how preparation for backups looks like in a DBMS with variable disk structures, that is, in all traditional disk relational systems:



  1. The moment of the start of the backup is remembered; the backup will have to contain the database logs from now on.
  2. A checkpoint is performed, that is, all changes that occurred in the data pages up to the point in time are flushed to disk. This ensures that no logs are required prior to the start of the backup during recovery.
  3. : , , , . , . , . , , .
  4. , , . , , .


After all the above procedures are completed, you can copy data files using the operating system tools - cp, rsync and others. Enabling the backup mode reduces the performance of the database: firstly, the volume of logs increases, and secondly, if suddenly the backup mode fails, the recovery will take longer, since the data file headers are not updated. The faster the backup is completed, the better for the database, so it is appropriate to use tools such as a snapshot of the file system or a mirror break (BCV) in a disk array. Some DBMS (Oracle, PostgreSQL) leave the administrator the opportunity to independently choose the copying method,others (Microsoft SQL Server) provide an interface for integrating native backup utilities with file system or storage engines.



When the backup is complete, you need to bring the database back to its normal state. In Oracle, this is done with the ALTER DATABASE / TABLESPACE END BACKUP command, in PostgreSQL, by calling the pg_stop_backup () function, and in other databases, by internal routines of the corresponding commands or external services.



Here's what the timeline for the backup process looks like:







  • Preparation for backup (begin backup) takes time, sometimes considerable. Even if mirrored volumes or snapshot file systems are used, the backup process will not be instantaneous.
  • Together with the data files, it is necessary to save the logs from the moment the preparation for the backup began and ending with the moment the database returned to its normal state.
  • You can recover from this backup at the time the database returns to its normal state . Recovery to an earlier moment is not possible.


With databases using immutable data structures (snapshots, LSM trees), the situation is easier. Preparation for backup consists of the following steps:



  1. Data from memory is flushed to disk.
  2. The list of files included in the backup is recorded. Until the backup process is over, the database is prohibited from deleting these files, even if they are no longer needed.


Upon a signal about the end of the backup, a database with immutable structures can again delete unnecessary files.



Point recovery



The backup allows you to restore the state of the database to the moment when the command to return from the backup mode completed. However, a disaster that requires recovery can occur at any time. The task of restoring the state of the database to an arbitrary moment is called "point-in-time recovery".



To do this, you should save the database logs from the moment the backup finished and continue to apply the logs to the restored copy during the restore process. After the database is restored from a backup copy at the end of the copy, the state of the database (files and cached pages) is guaranteed to be correct, so a special logging mode is not needed. By applying the logs until the desired moment, you can get the state of the database at any point in time.



If the speed of restoring a backup is limited only by the disk bandwidth, then the speed of applying the logs is usually limited by the processor's performance. If changes in the main database occur in parallel, then during recovery, all changes are performed sequentially - in the order they were read from the log. Therefore, the recovery time is linearly dependent on how far the recovery point is from the end point of the backup. Because of this, you have to make full backups quite often - at least once a week for databases with a low transactional load and before the daily copying of highly loaded databases.



Incremental backups



To speed up point-to-point recovery, I would like to be able to back up as often as possible, but at the same time not take up extra disk space and not burden the database with backup tasks.



The solution to the problem is an incremental backup, that is, copying only those data pages that have changed since the previous backup.

Incremental backups only make sense for DBMSs that use mutable data structures.



The increment can be calculated both from the full backup (cumulative copy) and from any previous copy (differential copy).







Unfortunately, there is no uniform terminology, and different manufacturers use different terms:



Differential Cumulative

Oracle Differential Cumulative
PostgresPro Incremental -
Microsoft SQL Server - Differential
IBM DB2 Delta Incremental
MySQL Enterprise Incremental Differential
Percona Server Incremental


With incremental backups, the point-to-point restore process looks like this:



  • the last full backup made before the restoration is restored;
  • incremental copies are restored over the full copy;
  • rolls logs from the point where the backup started to the point of recovery.


Having a cumulative copy speeds up the recovery process. So, for example, to restore the base state to a point between T3 and T4, you need to restore two incremental copies, and to restore to a point after T4 - only one.

Obviously, the volume of one cumulative copy is less than the volume of several differential copies, because some pages have changed several times, and each incremental copy contains its own version of the page.



There are three ways to create an incremental copy:



  1. creating a full copy and calculating the difference from the previous full copy;
  2. parsing logs, creating a list of changed pages and backing up pages included in the list;
  3. request for changed pages in the database.


The first method saves disk space, but does not solve the problem of reducing the load on the database. Moreover, if we have a full backup, then turning it into an incremental backup is pointless, since restoring a full backup is faster than restoring a previous full copy and an increment. With this approach, it is better to shift the task of saving disk space to special components with built-in deduplication mechanisms. These can be either special storage systems (EMC DataDomain, HPE StorageWorks VLS, the entire NetApp line), or software products (ZFS, Veritas NetBackup PureFile, Windows Server Data Deduplication).



The second and third methods differ in the mechanism for determining the list of changed pages. Parsing logs is more resource-intensive, plus you need to know the structure of log files to implement it. It is easiest to ask the database itself which pages have changed, but for this the DBMS kernel must have the functionality of tracking the changed blocks (block change tracking).



Incremental backup functionality was first introduced in Oracle Recovery Manager (RMAN) software, which was introduced with the Oracle 8i release. Oracle immediately implemented changed block tracking, so there is no need to parse the logs.



PostgreSQL does not track changed blocks, so the pg_probackup utility, developed by the Russian company Postgres Professional, detects the changed pages by analyzing the log. However, the company also supplies PostgresPro, which includes a ptrack extension that tracks page changes. When using pg_probackup with PostgresPro, the utility queries the database itself for modified pages, just like RMAN does.



Microsoft SQL Server, like Oracle, tracks modified pages, but the BACKUP command only allows full and cumulative backups.



DB2 has the ability to track changed pages, but it is disabled by default. Once enabled, DB2 will allow full, differential, and cumulative backups.



An important difference between the tools described in this section (except for pg_probackup) and the file-based backup tools is that they query the database for page images rather than reading the data from disk themselves. The disadvantage of this approach is the small additional load on the base. However, this drawback is more than compensated for by the fact that the page read is always correct, so there is no need to enable a special journaling mode during the backup.



Note again that the presence of incremental backups does not replace the requirement for logs to be restored to an arbitrary point in time. Therefore, in industrial databases, logs are constantly rewritten to external media, and backups, full and / or incremental, are created on a schedule.



The best implementation of the idea of ​​incremental backup today is the Zero Data Loss Recovery Appliance (engineered system in Oracle terminology) - a specialized Oracle solution for backing up its own database. The complex is a cluster of servers with a large volume of disks, on which a modified version of the Recovery Manager software is installed and can work both with other Oracle software and hardware complexes (Database Appliance, Exadata, SPARC Supercluster), and with Oracle databases on traditional infrastructure. Unlike “regular” RMAN, ZDLRA implements the concept of “incremental forever”. The system creates a full copy of the database only once, and then makes only incremental copies. Additional RMAN modules allow you to merge copies,creating new full copies from incremental ones.



To the credit of Russian developers, it should be noted that pg_probackup is also able to combine incremental copies.







Unlike many similar questions, the question "which backup method is better" has an unambiguous answer - the best is the native utility for the used DBMS that provides the ability to incremental backups.



For the DBA, the most important issues are the choice of backup strategy and the integration of database backups into the corporate infrastructure. But these questions are beyond the scope of this article.



All Articles