keepalived实现MySQL MGR高可用(keepalived 2.2.7 + mysql 5.7.35)

发布时间 2023-09-04 13:59:15作者: xiaoyaozhe

Linux系统-部署-运维系列导航

 

一、架构介绍

MySQL MGR实现了MySQL服务的高可用、高扩展、高可靠,但在客户端只能同时连接一台服务实例,在master切换后,客户端无法感知并自动切换,所以需要搭配keepalived实现MySQL MGR集群在客户端的高可用。
 

二、架构搭建

架构设计

MySQL MGR一主一从
机器名称  
IP
服务器角色
备注
localhost
192.168.11.13
keepalived master
mysql mgr master
keepalived 2.2.7 + mysql 5.7.35
localhost
192.168.11.14
keepalived backup
mysql mgr slave
keepalived 2.2.7 + mysql 5.7.35

高可用自动切换策略

  1. 将所有keepalived节点配置为backup角色,仅根据权重竞争master
  2. 所有节点的权重分为基础权重与脚本权重,其中脚本权重根据脚本执行结果动态增加,且每次脚本检查后都是基于基础权重增加,不会累计
  3. 脚本权重设置原则:最小基础权重 + 脚本权重 > 最大基础权重
  4. 如果mgr角色为master,则需要竞争keepalived master,所以不管keepalived角色如何,都要控制keepalived增加权重
  5. 如果mgr角色为backup,则控制keepalived不能增加权重,如果此时keepalived角色为master,则需要转让其master角色,即重启keepalived,触发VIP漂移
 
特别关注:keepalived角色竞争,只有在节点启动(加入集群)、master离线时执行
根据当前mgr角色 与 keepalived角色,需要执行的操作如下
 
mgr master
mgr slave
keepalived master
增加权重
保持权重,触发VIP漂移(即重启keepalived)
keepalived backup
增加权重
保持权重
 
keepalived实现MySQL MGR高可用,包括以下步骤
  1. mysql mgr集群搭建
  2. keepalived安装
  3. keepalived配置
  4. keepalived运行测试
 

架构实现

1.mysql mgr集群搭建
 
2.keepalived安装
 
3.keepalived配置
本文设计keepalived + mysql mgr流程如下
3.1 keepalived服务配置(keepalived.conf)
本方案实现keepalived根据mgr集群实际节点角色自动漂移VIP,所以将在所有keepalived节点上做相同配置(除各自基础权重)
特别关注:配置文件中指定的所有路径,请在启动前确保已存在
! Configuration File for keepalived
###使用说明开始###
#keepalived实现mysql mgr 集群高可用,流程如下
#0.所有keepalived节点配置为backup,非抢占模式,基础权重建议为 708090,脚本权重为30,即最小权重+脚本权重>最大权重
#1.如果mysql服务异常,启动服务
#2.如果mysql为master,则返回成功0,keepalived增加权重
#3.如果mysql为slave,则返回失败100,keepalived不增加权重
#4.如果mysql为slave,且keepalived为master,则keepalived切换
###使用说明结束###
global_defs {
   notification_email {
     
   }   
   #keepalived机器标识,无特殊作用,一般为机器名
   router_id ha_mysql
}

#检查脚本,可以用来关联业务,脚本执行结果决定是否准备切换
vrrp_script ha_mysql{
    #脚本路径,脚本执行是否成功,根据脚本的退出码确认,默认为0,即exit 0
    script "/usr/local/ha_mysql/ha_mysql.sh"
     #脚本检测周期,单位秒
    interval 2
     ##权重策略是:根据脚本执行结果计算权重,然后触发keepalived重新选举
    #当weight > 0时:脚本执行成功了 Priority + Weight,执行失败 Priority
    #当weight < 0时:脚本执行成功了 Priority 执行失败 Priority + Weight
     weight 30
}

#VRRP协议配置
vrrp_instance VI_1 {
    #集群初始状态统一配置为 BACKUP,当至少2台keepalived启动后,将根据priority重新竞选角色
    state BACKUP
    interface enp0s3
    #虚拟路由id,同一个集群中的keepalived设置一致
    virtual_router_id 100
    #优先级决定最终的master角色
    priority 70
    #不抢占,即异常恢复后,不立即抢占master角色
    nopreempt
    #主备之间通信检查的时间间隔,单位秒
    advert_int 1
    authentication {
    #keepalived之间认证类型为密码
        auth_type PASS
        auth_pass 1234
    }
    #虚拟IP池
    virtual_ipaddress {
        #VIP地址,一行一个,格式为 <IP地址>/<掩码> brd <广播地址> dev <网卡设备> scope <范围如global> label <网卡别名>
        192.168.11.200/24
    }

    #检查脚本,与vrrp_script对应
    track_script{
        ha_mysql
    }
}

 

3.2 mysql业务检测脚本
ha_mysql.sh脚本
配置脚本可执行权限
[root@localhost ~]# chmod +x /usr/local/ha_mysql/ha_mysql.sh 

 

脚本内容,自动检测mysql mgr集群角色,以及keepalived角色,实现自动将vip漂移到mgr master
#!/bin/bash
###使用说明开始###
#配合keepalived实现mysql mgr 集群高可用,流程如下
#1.如果mysql服务异常,启动服务
#2.如果mysql为master,则返回成功0,keepalived增加权重
#3.如果mysql为slave,则返回失败100,keepalived不增加权重
#4.如果mysql为slave,且keepalived为master,则keepalived切换
###使用说明结束###

#监控日志
source /etc/profile
vip="192.168.11.200"
monitorLogPath=/usr/local/ha_mysql/monitor.log
touch $monitorLogPath

#格式化日期时间
function getDatetime(){
    local cur=`date "+%Y-%m-%d %H:%M:%S"`
     echo $cur  
}

#获取local IP
function getLocalIp(){
    echo `ifconfig enp0s3 | grep -w "inet" | awk '{print $2}'`
}

#获取master IP
function getMasterIp(){
    echo `mysql -uroot -pxxxxxxx -e "SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_ID = (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member')" | awk 'NR==2{print($3)}'`
}

#检测keepalived是否master
function checkKeepalivedMaster(){
    local num=`ip a | grep "${vip}" | wc -l`
    if [ $num -eq 0 ];then
        echo 0
    else
        echo 1
    fi    
}

#检测mysql进程
function checkMysqlProcess(){
    #检测mysql服务,可以使用进程,也可以使用端口
    #此处检测mysql进程
    local num=`ps -C mysqld --no-header | wc -l`
    #local num=`netstat -lntup | grep -w 3307 | wc -l`
    if [ $num -eq 0 ];then
        echo 0
    else
        echo 1
    fi
}

#检测MGR Master
function checkMgrMaster(){
    #获取本机IP
    localIp=`getLocalIp`
    #获取mgr master ip
    masterIp=`getMasterIp`
    
    #判断当前实例是否为master
    if [ "${localIp}" = "${masterIp}" ];then
        echo 1
    else
        echo 0
    fi
}

#定义变量,mysql是否运行
run=`checkMysqlProcess`

#1.如果mysql服务异常,启动服务
if [ $run -eq 0 ];then
    #mysql异常,先启动mysql
    echo `getDatetime` "mysql error,start mysql" >> $monitorLogPath 
    service mysqld start
    run=`checkMysqlProcess`
    if [ $run -eq 0 ];then
        #启动mysql失败,停止keepalived服务,停止热备,触发keepalived切换
        echo `getDatetime` "start mysql failed" >> $monitorLogPath
    fi
#else
#    echo "mysql is alive"
fi

mgrMaster=`checkMgrMaster`
keepalivedMasger=`checkKeepalivedMaster`
if [ ${mgrMaster} -eq 1 ];then
    #2.如果mysql为master,则返回成功0,keepalived增加权重
    #echo `getDatetime` "mgr master,add priority" >> $monitorLogPath
    exit 0
else
    #3.如果mysql为slave,则返回失败100,keepalived不增加权重
    if [ ${keepalivedMasger} -eq 1 ];then
        #4.如果mysql为slave,且keepalived为master,则keepalived切换(服务重启)
        echo `getDatetime` "mgr slave,keepalived master,restart keepalived" >> $monitorLogPath
        service keepalived restart
        #keepalived服务重启,本脚本返回结果为重启前的keepalived进程,此时无需exit
    else
        #echo `getDatetime` "mgr slave,keepalived slave,keep priority" >> $monitorLogPath
        exit 100
    fi
fi

 

三、架构测试

1.确认mgr master
mysql> select * from performance_schema.replication_group_members ;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 33e3b5a5-7367-11ec-9299-0800272dd186 | 192.168.11.14 |        3307 | ONLINE       |
| group_replication_applier | fdbf6d8c-7365-11ec-bf67-08002746658b | 192.168.11.13 |        3307 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
2 rows in set (0.00 sec)

mysql> select * from performance_schema.global_status where variable_name like '%group%';
+----------------------------------+--------------------------------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | fdbf6d8c-7365-11ec-bf67-08002746658b |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

 

2.以上确认11.13服务器为mgr master,在11.13确认VIP已经绑定
[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:46:65:8b brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.13/24 brd 192.168.11.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.11.200/24 scope global secondary enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::c08b:489f:1587:3bb6/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::a33a:d49b:da44:119a/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::d071:eb89:2f12:8e56/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@localhost ~]# 

 

3.模拟11.13 MySQL服务宕机
[root@localhost ~]# service mysqld stop
Redirecting to /bin/systemctl stop mysqld.service

 

4.确认 mgr master 已经改为 11.14
此时,由于mysql健康检查脚本中已经自动启动mysql服务器,所以正常情况下无需运维人员介入
mysql> select * from performance_schema.replication_group_members ;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 33e3b5a5-7367-11ec-9299-0800272dd186 | 192.168.11.14 |        3307 | ONLINE       |
| group_replication_applier | fdbf6d8c-7365-11ec-bf67-08002746658b | 192.168.11.13 |        3307 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
2 rows in set (0.00 sec)

mysql> select * from performance_schema.global_status where variable_name like '%group%';
+----------------------------------+--------------------------------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 33e3b5a5-7367-11ec-9299-0800272dd186 |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

 

5.在11.13上确认VIP已经失效
[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:46:65:8b brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.13/24 brd 192.168.11.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::c08b:489f:1587:3bb6/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::a33a:d49b:da44:119a/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::d071:eb89:2f12:8e56/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@localhost ~]# 

 

6.在11.14上确认VIP已经漂移
[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:2d:d1:86 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.14/24 brd 192.168.11.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.11.200/24 scope global secondary enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::c08b:489f:1587:3bb6/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::a33a:d49b:da44:119a/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::d071:eb89:2f12:8e56/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever

 

四、后记

关于keepalived VIP漂移 与 mgr 角色转换

MySQL MGR master故障后,slave会自动升级为 master,此时故障服务器恢复后重新加入集群,自动为 slave角色,处于只读状态。如果此时VIP重新漂移至该节点,则客户端会引起事务提交异常。
为了避免出现此问题,本文中将所有 keepalived 实例设置为 backup 角色,同时添加了 nopreempt 配置项,即设置为 非抢占 模式,如此,MySQL服务重启 与 keepalived 服务重启后,都不会主动竞争 master,客户端业务保持正常。
 
特别关注:系统默认启用了SELinux内核模块(安全子系统),所以在服务绑定/监听某些端口时,提示无访问权限,此时需要禁用SELinux,修改 /etc/selinux/config 文件,设置SELINUX=disabled
Can't start server: Bind on TCP/IP port: Permission denied

 

特别关注:selinux设置完成需要重启生效,如果当前不方便重启,可以执行 setenforce 0 临时关闭selinux,下次重启是配置再生效
 
特别关注:系统默认启用了防火墙,请在启动服务前关闭防火墙,或在防火墙中添加服务端口