实现redis哨兵,模拟master故障场景

发布时间 2023-10-07 15:29:22作者: 小糊涂90

 

1.概述
在哨兵(sentinel)机制中,可以解决redis高可用问题,即当master故障后可以自动将slave提升为master,从而可以保证redis服务的正常使用。

2.哨兵的实现
哨兵的前提是已经实现了一个redis的主从复制的运行环境,从而实现一个一主两从基于哨兵的高可用redis架构。注意: master 的配置文件中masterauth 和slave 都必须相同
sentinel master 10.0.0.150
sentinel slave1 10.0.0.160
sentinel slave2 10.0.0.170

2.1.哨兵的准备实现主从复制架构

2.1.1.所有主从节点的redis.conf中关健配置
[root@centos8 ~]#dnf -y install redis
[root@centos8 ~]#sed -i -e 's/bind 127.0.0.1/bind 0.0.0.0/' -e 's/^# masterauth .*/masterauth 123456/' -e 's/^# requirepass .*/requirepass 123456/' /etc/redis.conf

2.1.2.所有从节点上
[root@centos8 ~]#echo "replicaof 10.0.0.150 6379" >> /etc/redis.conf

2.1.3.在所有主从节点执行
[root@centos8 ~]#systemctl enable --now redis

2.1.4.查看master服务器状态
[root@centos8 ~]#redis-cli -a 123456 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.160,port=6379,state=online,offset=84,lag=0
slave1:ip=10.0.0.170,port=6379,state=online,offset=84,lag=0
master_replid:a2491766a22c952f10cbdc8f4566d82c0ec87a7b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:84
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:84

2.2.配置哨兵
Sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持.默认监听在26379/tcp端口.哨兵可以不和Redis服务器部署在一起,但一般部署在一起,所有redis节点使用相同的配置文件。

2.2.1.编辑哨兵的配置文件
#如果是编译安装,在源码目录有sentinel.conf,复制到安装目录即可,如:/apps/redis/etc/sentinel.conf

所有哨兵服务器上配置
[root@centos8 ~]#vim /etc/redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile ""

dir "/tmp"
#工作目录

sentinel monitor mymaster 10.0.0.8 6379 2
#指定当前mymaster集群中master服务器的地址和端口
#2为法定人数限制(quorum),即有几个sentinel认为master down了就进行故障转移,一般此值是所有sentinel节点(一般总数是>=3的 奇数,如:3,5,7等)的一半以上的整数值,比如,总数是3,3/2=1.5,取整为2,是master的ODOWN客观下线的依据

sentinel auth-pass mymaster 123456
#mymaster集群中master的密码,注意此行要在上面行的下面

sentinel down-after-milliseconds mymaster 30000
#(SDOWN)判断mymaster集群中所有节点的主观下线的时间,单位:毫秒,建议3000

sentinel parallel-syncs mymaster 1
#发生故障转移后,同时向新master同步数据的slave数量,数字越小总同步时间越长,但可以减轻新master的负载压力

sentinel failover-timeout mymaster 180000
#所有slaves指向新的master所需的超时时间,单位:毫秒

sentinel deny-scripts-reconfig yes
#禁止修改脚本

logfile /var/log/redis/sentinel.log

2.2.2.三台机器上启动哨兵服务
[root@centos8 ~]systemctl enable --now redis-sentinel.service

2.2.3.查看配置文件,确定myid不一样
[root@centos8 ~]#cat /etc/redis-sentinel.conf |grep myid
sentinel myid 6b318eb64c0e2f5c97a4a9120f38b9be4ba07dc7
[root@centos8 ~]#cat /etc/redis-sentinel.conf |grep myid
sentinel myid 7714ef6638d3caa3b9a1e89f81315ea584e3550f
[root@centos8 ~]#cat /etc/redis-sentinel.conf |grep myid
sentinel myid 107800d9f386633fbc5b25165ae3d076c52053aa

2.2.4.验证端口
[root@centos8 ~]#ss -ntl |grep 26379
LISTEN 0 128 0.0.0.0:26379 0.0.0.0:*

2.2.5.查看日志
[root@centos8 ~]#cat /var/log/redis/sentinel.log
32728:X 20 Jan 2022 18:08:31.874 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
32728:X 20 Jan 2022 18:08:31.874 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=32728, just started
32728:X 20 Jan 2022 18:08:31.874 # Configuration loaded
32728:X 20 Jan 2022 18:08:31.874 * supervised by systemd, will signal readiness
32728:X 20 Jan 2022 18:08:31.875 * Running mode=sentinel, port=26379.
32728:X 20 Jan 2022 18:08:31.875 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
32728:X 20 Jan 2022 18:08:31.876 # Sentinel ID is 6b318eb64c0e2f5c97a4a9120f38b9be4ba07dc7
32728:X 20 Jan 2022 18:08:31.876 # +monitor master mymaster 10.0.0.150 6379 quorum 2
32728:X 20 Jan 2022 18:08:31.876 * +slave slave 10.0.0.160:6379 10.0.0.160 6379 @ mymaster 10.0.0.150 6379
32728:X 20 Jan 2022 18:08:31.877 * +slave slave 10.0.0.170:6379 10.0.0.170 6379 @ mymaster 10.0.0.150 6379
32728:X 20 Jan 2022 18:12:12.831 * +sentinel sentinel 7714ef6638d3caa3b9a1e89f81315ea584e3550f 10.0.0.160 26379 @ mymaster 10.0.0.150 6379
32728:X 20 Jan 2022 18:12:41.172 * +sentinel sentinel 107800d9f386633fbc5b25165ae3d076c52053aa 10.0.0.170 26379 @ mymaster 10.0.0.150 6379

2.2.6.查看当前sentinel状态
[root@centos8 ~]#redis-cli -p 26379 info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.150:6379,slaves=2,sentinels=3

3.模拟master故障
3.1.停止Redis Master测试故障转移。在10.0.0.150上执行
[root@centos8 ~]#killall redis-server

3.2.查看哨兵信息,已自动切换到10.0.0.160为主节点
[root@centos8 ~]#redis-cli -a 123456 -p 26379 info sentinel
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.160:6379,slaves=2,sentinels=3

3.3.故障转移时sentinel的信息
[root@centos8 ~]#cat /var/log/redis/sentinel.log
930:X 20 Jan 2022 18:39:36.834 # +sdown master mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:36.908 # +odown master mymaster 10.0.0.150 6379 #quorum 2/2
930:X 20 Jan 2022 18:39:36.908 # +new-epoch 3
930:X 20 Jan 2022 18:39:36.908 # +try-failover master mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:36.910 # +vote-for-leader 6b318eb64c0e2f5c97a4a9120f38b9be4ba07dc7 3
930:X 20 Jan 2022 18:39:36.913 # 7714ef6638d3caa3b9a1e89f81315ea584e3550f voted for 6b318eb64c0e2f5c97a4a9120f38b9be4ba07dc7 3
930:X 20 Jan 2022 18:39:36.913 # 107800d9f386633fbc5b25165ae3d076c52053aa voted for 6b318eb64c0e2f5c97a4a9120f38b9be4ba07dc7 3
930:X 20 Jan 2022 18:39:37.002 # +elected-leader master mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.002 # +failover-state-select-slave master mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.061 # +selected-slave slave 10.0.0.160:6379 10.0.0.160 6379 @ mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.061 * +failover-state-send-slaveof-noone slave 10.0.0.160:6379 10.0.0.160 6379 @ mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.147 * +failover-state-wait-promotion slave 10.0.0.160:6379 10.0.0.160 6379 @ mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.252 # +promoted-slave slave 10.0.0.160:6379 10.0.0.160 6379 @ mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.252 # +failover-state-reconf-slaves master mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.327 * +slave-reconf-sent slave 10.0.0.170:6379 10.0.0.170 6379 @ mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.876 * +slave-reconf-inprog slave 10.0.0.170:6379 10.0.0.170 6379 @ mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:37.983 # -odown master mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:38.917 * +slave-reconf-done slave 10.0.0.170:6379 10.0.0.170 6379 @ mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:39.017 # +failover-end master mymaster 10.0.0.150 6379
930:X 20 Jan 2022 18:39:39.017 # +switch-master mymaster 10.0.0.150 6379 10.0.0.160 6379
930:X 20 Jan 2022 18:39:39.017 * +slave slave 10.0.0.170:6379 10.0.0.170 6379 @ mymaster 10.0.0.160 6379
930:X 20 Jan 2022 18:39:39.017 * +slave slave 10.0.0.150:6379 10.0.0.150 6379 @ mymaster 10.0.0.160 6379
930:X 20 Jan 2022 18:40:09.089 # +sdown slave 10.0.0.150:6379 10.0.0.150 6379 @ mymaster 10.0.0.160 6379

3.4.故障转移后的redis配置文件会被自动修改,在10.0.0.170上查询
root@centos8 ~]#grep ^replicaof /etc/redis.conf
replicaof 10.0.0.160 6379

3.5.哨兵配置文件的sentinel monitor IP 同样也会被修改
[root@centos8 ~]#cat /etc/redis-sentinel.conf |grep -v ^# |grep monitor
sentinel monitor mymaster 10.0.0.160 6379 2

3.6.当前redis的状态为160节点为主,170节点为从
[root@centos8 ~]#redis-cli -a 123456 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.170,port=6379,state=online,offset=486781,lag=1
master_replid:433ef00d3756763aa600b3630775ee4f90cff41a
master_replid2:4a53d9f341ddfb8d32ec934d812f2b63f56d68e9
master_repl_offset:487051
second_repl_offset:325790
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:305377
repl_backlog_histlen:181675


3.7.恢复故障的原master重新加入redis集群,作为从节点
[root@centos8 ~]#systemctl restart redis
[root@centos8 ~]#cat /etc/redis.conf |grep replicaof
# Master-Replica replication. Use replicaof to make a Redis instance a copy of
# replicaof <masterip> <masterport>
replicaof 10.0.0.160 6379

4.sentinel 运维
手动让主节点下线
sentinel failover <masterName>

范例: 手动故障转移
[root@centos8 ~]#vim /etc/redis.conf
replica-priority 10 #指定优先级,值越小sentinel会优先将之选为新的master,默为值为100
[root@centos8 ~]#redis-cli   -p 26379
127.0.0.1:26379> sentinel failover mymaster
OK