Failsafe Zabbix: migrating from asynchronous to group replication

Introduction

Zabbix supports several databases, but only MySQL and PostgreSQL were considered as the most suitable for my installation. PostgreSQL with its repomgr and pgbouncer or some stolon on the one hand and MySQL Group Replication on the other. Due to the use of MySQL in the current configuration and the desire for standard equipment, the choice fell on the second option.





So what exactly is MySQL Group Replication. As the name suggests, this is a group of servers that stores the same set of data. The maximum number of nodes in a group is limited to 9. It can work in single-primary or multi-primary mode. But the most interesting thing is that everything works automatically, whether it is the election of a new master server, detection of a broken node, Split-brain, or database recovery. This functionality is supplied as plugins group_replication and mysql_clone, communication occurs via the Group Communication System protocol, which is based on the Paxos algorithm. This type of replication is supported since versions 5.7.17 and 8.0.1.





My current installation works on Zabbix 5.0 LTS and MySQL 5.7, the migration will be carried out with raising the MySQL version to 8.0, so it's more interesting).





Replication monitoring

Yes Yes. It's like TDD, only in administration, first you need to prepare monitoring so that the new cluster immediately gets on the radars of our monitoring system and not a single problem escapes its watchful eye. Since you do not have group replication (GR) yet, the output of the commands below will be blank, so I am giving an example of output from a running cluster.





The main source of information about the status of nodes is the command:





SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID	                           | MEMBER_HOST  | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 500049c2-99b7-11e9-8d36-e4434b5f9d0c | example1.com |      3306   | ONLINE       | SECONDARY   | 8.0.13         |
| group_replication_applier | 50024be2-9889-11eb-83da-e4434ba03de0 | example2.com |      3306   | ONLINE       | PRIMARY     | 8.0.13         |
| group_replication_applier | 500b2035-986e-11eb-a9f8-564d00018ad1 | example3.com |      3306   | ONLINE       | SECONDARY   | 8.0.13         |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
      
      



MEMBER_STATE . https://dev.mysql.com/doc/refman/8.0/en/group-replication-server-states.html. , , .





:





SELECT * FROM performance_schema.replication_group_member_stats\G
*************************** 1. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 16178860996821458:41
                                 MEMBER_ID: 500049c2-99b7-11e9-8d36-e4434b5f9d0c
               COUNT_TRANSACTIONS_IN_QUEUE: 0
                COUNT_TRANSACTIONS_CHECKED: 75715997
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 1957048
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: 500049c2-99b7-11e9-8d36-e4434b5f9d0c:1-1821470279,
500293cf-594c-11ea-aafd-e4434ba03de0:1-622868371,
5000d25c-059e-11e8-822b-564d00018ad1:1-140221041,
c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:1-125382195
            LAST_CONFLICT_FREE_TRANSACTION: c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:125471159
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 5664
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 75710337
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
*************************** 2. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 16178860996821458:41
                                 MEMBER_ID: 50024be2-9889-11eb-83da-e4434ba03de0
               COUNT_TRANSACTIONS_IN_QUEUE: 0
                COUNT_TRANSACTIONS_CHECKED: 75720452
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 1955202
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: 500049c2-99b7-11e9-8d36-e4434b5f9d0c:1-1821470279,
500293cf-594c-11ea-aafd-e4434ba03de0:1-622868371,
5000d25c-059e-11e8-822b-564d00018ad1:1-140221041,
c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:1-125377993
            LAST_CONFLICT_FREE_TRANSACTION: c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:125470919
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 75711354
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 9105
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
*************************** 3. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 16178860996821458:41
                                 MEMBER_ID: 500b2035-986e-11eb-a9f8-564d00018ad1
               COUNT_TRANSACTIONS_IN_QUEUE: 38727
                COUNT_TRANSACTIONS_CHECKED: 49955241
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 1250063
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: 500049c2-99b7-11e9-8d36-e4434b5f9d0c:1-1821470279,
500293cf-594c-11ea-aafd-e4434ba03de0:1-622868371,
5000d25c-059e-11e8-822b-564d00018ad1:1-140221041,
c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:1-125382195
            LAST_CONFLICT_FREE_TRANSACTION: c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:125430975
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 47096
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 49908155
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
3 rows in set (0.00 sec)
      
      



COUNT_TRANSACTIONS_IN_QUEUE, Seconds_Behind_Master . , .





, () . , , . , - , . , , , , .





, - . , . , .





:





  1. ( TCP 33061 ). ;





  2. MySQL 8.0 (FreeBSD, Poudriere - );





  3. , Zabbix ( );





  4. , Secondary ( , - ). ;





  5. MySQL 5.7 ;





  6. ( , );





  7. ;





  8. MySQL 8.0 (mysql_upgrade , 8 );





  9. , ( , , . . ). , ;





  10. , , ( , );





  11. ( RESET SLAVE ALL;);





  12. ;





  13. Zabbix Zabbix ;





  14. ( 4 8, 8 , . . );





  15. ;





  16. Ansible Playbook' ;





  17. ;





  18. HADNS;





  19. ;





:





  1. MySQL ;





  2. ;





  3. , MySQL ;





  4. ;





, .





9, 12 14 .





9:

. .





.





SELECT tables.table_schema , tables.table_name , tables.engine
 FROM information_schema.tables
 LEFT JOIN (
 SELECT table_schema , table_name
 FROM information_schema.statistics
 GROUP BY table_schema, table_name, index_name HAVING
 SUM( case when non_unique = 0 and nullable != 'YES' then 1 else 0 end ) = count(*) ) puks
ON tables.table_schema = puks.table_schema and tables.table_name = puks.table_name
WHERE puks.table_name is null
 AND tables.table_type = 'BASE TABLE' AND Engine="InnoDB";
      
      



Zabbix, . Zabbix, dbversion . .





ALTER TABLE history ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE history_uint ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE history_text ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE history_str ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE history_log ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE dbversion ADD PRIMARY KEY (mandatory);
      
      



. , - Zabbix.





12:

, , .





server-id=[     ]
gtid_mode=ON
enforce_gtid_consistency=ON
log_bin=binlog
log_slave_updates=ON
binlog_format=ROW
master_info_repository=TABLE
relay_log_info_repository=TABLE
transaction_write_set_extraction=XXHASH64

disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
plugin_load_add='group_replication.so;mysql_clone.so'
ssl-ca=/usr/local/etc/ssl/mysql/ca.crt
ssl-cert=/usr/local/etc/ssl/mysql/server.crt
ssl-key=/usr/local/etc/ssl/mysql/server.key
group_replication_ssl_mode=VERIFY_IDENTITY
group_replication_group_name="[   ,     SELECT UUID();]"
group_replication_start_on_boot=off #       
group_replication_local_address="[   ].com:33061"
group_replication_group_seeds="example1.com:33061,example2.com:33061,example3.com:33061"
group_replication_ip_allowlist="2.2.2.2/32,3.3.3.3/32,4.4.4.4/32"
group_replication_member_weight=50

group_replication_recovery_use_ssl=ON
group_replication_recovery_ssl_verify_server_cert=ON
group_replication_recovery_ssl_ca=/usr/local/etc/ssl/mysql/ca.crt
group_replication_recovery_ssl_cert=/usr/local/etc/ssl/mysql/server.crt
group_replication_recovery_ssl_key=/usr/local/etc/ssl/mysql/server.key
      
      



my.cnf, , , . , . group_replication_start_on_boot, , .





SHOW VARIABLES LIKE 'binlog_format'; SET GLOBAL binlog_format = RAW; , .





group_replication_ssl_mode group_replication_recovery_ssl_verify_server_cert , Subject Alternative Name (SAN) , group_replication_group_seeds.





group_replication_member_weight . , , , .





:





SET SQL_LOG_BIN=0;
CREATE USER 'replication'@'%' IDENTIFIED BY '[ ]' REQUIRE SSL;
GRANT replication slave ON *.* TO 'replication'@'%';
GRANT BACKUP_ADMIN ON *.* TO 'replication'@'%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
      
      



:





INSTALL PLUGIN group_replication SONAME 'group_replication.so';
INSTALL PLUGIN clone SONAME 'mysql_clone.so';
SHOW PLUGINS;
      
      



, :





CHANGE REPLICATION SOURCE TO SOURCE_USER='replication', SOURCE_PASSWORD='[ ]' \\
  FOR CHANNEL 'group_replication_recovery';
      
      



. group_replication_bootstrap_group , , :





SET GLOBAL group_replication_bootstrap_group=ON; #     
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF; #     
      
      



, :





mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE  |
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| group_replication_applier | ce9be252-2b71-11e6-b8f4-00212844f856 |example1.com |       3306  | ONLINE        |
+---------------------------+--------------------------------------+-------------+-------------+---------------+
      
      



MySQL .





14:

Zabbix , . MySQL , , .





12- , (server-id, group_replication_local_address). , group_replication_bootstrap_group .





Distributed Recovery mysql_clone . , , , .





, , .





, my.cnf group_replication_start_on_boot off on MySQL .





SELECT * FROM performance_schema.replication_group_members; - .





SELECT * FROM performance_schema.replication_group_member_stats\G - .





SELECT group_replication_set_as_primary('[uuid ]'); - .





Zabbix

Zabbix , , . . , Primary , , Zabbix , , . HADNS, Zabbix IP DNS .





Perhaps not everything is done as elegantly as we would like. You probably want to use mysql-shell, mysqlrouter and convert Group Replication to InnoDB Cluster, or add HAProxy, especially when you are deploying Zabbix from scratch. I hope this story serves as a good starting point and is helpful. Thanks for attention!





additional literature

https://dev.mysql.com/doc/refman/8.0/en/group-replication.html





https://blog.zabbix.com/scaling-zabbix-with-mysql-innodb-cluster/8472/





https://en.wikipedia.org/wiki/Paxos_(computer_science)








All Articles