What is replication and why is it needed
By itself, replication refers to the process of synchronizing multiple copies of an object. In our case, such an object is the database server, and the data itself is of the greatest value. If we have two or more servers, and in any possible way maintain a synchronized set of data on them, we have implemented system replication. Even the manual option c
mysqldump -> mysql load
is also replication.
It should be understood that data replication itself has no value, and is only a tool for solving the following tasks:
- improving data reading performance. With the help of replication, we will be able to maintain several copies of the server, and distribute the load between them.
- . , . , .
- . , , , .
- . , ( , ), , .
- . , , .
- . .
MySQL
The replication process involves the propagation of data changes from the master server (usually referred to as the master, master ), to one or more slave servers (slave, slave ). There are also more complex configurations, in particular with several master servers, but for each change on a specific master server, the remaining masters conditionally become slaves and consume these changes.
In general, MySQL replication consists of three steps:
- The master server writes data changes to the log. This log is called binary log , and changes are called binary log events.
- The slave copies the changes to the binary log into its own, called the relay log .
- The slave replays the changes from the relay log, applying them to its own data.
Replication types
There are two fundamentally different approaches to replication: command - by- command and row-by-row . In the case of command-by-command replication, data change requests (INSERT, UPDATE, DELETE) are logged to the master log, and the slaves reproduce the same commands exactly. With row-by-row replication, the log will directly change the rows in the tables, and the same actual changes will then be applied to the slave.
As there is no silver bullet, each of these methods has advantages and disadvantages. Replication by command is easier to implement and understand, and reduces the load on the master and on the network. However, per-command replication can lead to unpredictable effects when using non-deterministic functions such as NOW (), RAND (), etc. There can also be problems caused by data out of sync between master and slave. Row-by-row replication leads to more predictable results, since actual data changes are captured and reproduced. However, this method can significantly increase the load on the master server, which has to log every change in the log, and on the network through which these changes propagate.
MySQL supports both replication methods, and the default (we can say that the recommended one) changed depending on the version. Modern versions like MySQL 8 use row-based replication by default.
The second principle for dividing replication approaches is the number of master servers... The presence of one master server implies that only it accepts data changes, and is a kind of standard from which changes are already propagated to many slaves. In the case of master-master replication, we get both some profit and problems. One of the advantages, for example, is that we can give remote clients from the same Sydney and Helsinki an equally fast opportunity to write their changes to the database. This leads to the main drawback, if both clients simultaneously changed the same data, whose changes are considered final, whose transaction is committed, and whose transaction is rolled back.
Also, it is worth noting that the presence of a master replication master in general cannot increase the performance of writing data in the system. Let's imagine that our only master can process up to 1000 requests at a time. By adding a replicated second master to it, we will not be able to process 1000 requests on each of them, since in addition to processing “their” requests, they will have to apply the changes made on the second master. That, in the case of command-by-command replication, will make the total possible load on both no more than on the weakest of them, and with row-by-row replication, the effect is not entirely predictable, it can be either positive or negative, depending on specific conditions.
An example of building a simple replication in MySQL
Now is the time to create a simple replication configuration in MySQL. For this we will use the Docker and MySQL images from dockerhub , as well as the world database .
To begin with, we will launch two containers, one of which we will later configure as a master, and the other as a slave. Let's network them so they can talk to each other.
docker run -d --name samplereplication-master -e MYSQL_ALLOW_EMPTY_PASSWORD=true -v ~/path/to/world/dump:/docker-entrypoint-initdb.d mysql:8.0
docker run -d --name samplereplication-slave -e MYSQL_ALLOW_EMPTY_PASSWORD=true mysql:8.0
docker network create samplereplication
docker network connect samplereplication samplereplication-master
docker network connect samplereplication samplereplication-slave
The volume connection with the world.sql dump is specified for the master container in order to simulate the presence of some initial base on it. When creating a container, mysql will download and execute sql scripts located in the docker-entrypoint-initdb.d directory.
To work with configuration files, we need a text editor. Any convenient one can be used, I prefer vim.
docker exec samplereplication-master apt-get update && docker exec samplereplication-master apt-get install -y vim
docker exec samplereplication-slave apt-get update && docker exec samplereplication-slave apt-get install -y vim
First of all, let's create an account on the master that will be used for replication:
docker exec -it samplereplication-master mysql
mysql> CREATE USER 'replication'@'%';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'replication'@'%';
Next, let's change the configuration files for the master server:
docker exec -it samplereplication-master bash
~ vi /etc/mysql/my.cnf
The following parameters should be added to the my.cnf file in the [mysqld] section:
server_id = 1 # log_bin = mysql-bin #
When enabling / disabling binary log, a server restart is required. In the case of Docker, the container is reloaded.
docker restart samplereplication-master
Make sure the binary log is enabled. Specific values such as file name and position may vary.
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 156 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
In order to start data replication, it is necessary to “pull up” the slave to the state of the master. To do this, you need to temporarily block the wizard itself to make a snapshot of the actual data.
mysql> FLUSH TABLES WITH READ LOCK;
Next, we'll use mysqldump to export data from the database. Of course, in this example, you can use the same world.sql, but let's get closer to a more realistic scenario.
docker exec samplereplication-master mysqldump world > /path/to/dump/on/host/world.sql
After that, it is necessary to execute the SHOW MASTER STATUS command again, and remember or write down the File and Position values. These are the so-called coordinates of the binary log. It is from them that we will further indicate to start the slave. Now we can unlock the master again:
mysql> UNLOCK TABLES;
The master is configured and ready to replicate to other servers. Let's move on to the slave now. First of all, load the dump from the master into it.
docker cp /path/to/dump/on/host/world.sql samplereplication-slave:/tmp/world.sql
docker exec -it samplereplication-slave mysql
mysql> CREATE DATABASE `world`;
docker exec -it samplereplication-slave bash
~ mysql world < /tmp/world.sql
And then we will change the slave config by adding parameters:
log_bin = mysql-bin # server_id = 2 # relay-log = /var/lib/mysql/mysql-relay-bin # relay-log-index = /var/lib/mysql/mysql-relay-bin.index # read_only = 1 # “ ”
After that, reload the slave:
docker restart samplereplication-slave
And now we need to tell the slave which server will be the master for it, and where to start replicating data. Instead of MASTER_LOG_FILE and MASTER_LOG_POS, you must substitute the values obtained from the SHOW MASTER STATUS on the master. These parameters are collectively called binary log coordinates.
mysql> CHANGE MASTER TO MASTER_HOST='samplereplication-master', MASTER_USER='replication', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=156;
Let's start replaying the relay log, and check the replication status:
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
SLAVE STATUS
*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: samplereplication-master Master_User: replication Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 156 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 324 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 156 Relay_Log_Space: 533 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: c341beb7-3a33-11eb-9440-0242ac110002 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec)
If everything went well, your status should look similar. Key parameters here:
- Slave_IO_State - actually, the state of replication.
- Read_Master_Log_Pos is the last position read from the master log.
- Relay_Master_Log_File - The current master log file.
- Seconds_Behind_Master - slave lag behind master, in seconds.
- Last_IO_Error , Last_SQL_Error - replication errors, if any.
Let's try to change the data on the master:
docker exec -it samplereplication-master mysql
mysql> USE world;
mysql> INSERT INTO city (Name, CountryCode, District, Population) VALUES ('Test-Replication', 'ALB', 'Test', 42);
And check if they appeared on the slave.
docker exec -it samplereplication-slave mysql
mysql> USE world;
mysql> SELECT * FROM city ORDER BY ID DESC LIMIT 1;
+------+------------------+-------------+----------+------------+
| ID | Name | CountryCode | District | Population |
+------+------------------+-------------+----------+------------+
| 4081 | Test-Replication | ALB | Test | 42 |
+------+------------------+-------------+----------+------------+
1 row in set (0.00 sec)
Excellent! The entered record is also visible on the slave. Congratulations, you have now created your first MySQL replication!
Conclusion
I hope that within the framework of this article I was able to give a basic understanding of replication processes, familiarize myself with the use of this tool, and try to independently implement a simple example of replication in MySQL. The topic of replication, and its practical application is extremely extensive, and if you are interested in this topic, I can recommend the following sources to study:
- Report "How MySQL Replication Works" by Andrey Aksenov (Sphinx)
- The book “MySQL to the Maximum. Optimization, replication, backup ”- Baron Schwartz, Petr Zaitsev, Vadim Tkachenko
- "Highload" - here you can find specific recipes for data replication
Hope you found this article helpful and welcome your feedback and comments!