Redis Cluster 核心技术

发布时间 2023-10-10 09:56:46作者: 普里莫

Redis Cluster 核心技术

redis 高可用集群

  • Sentinel
  • Redis-Cluster
  • Codis

Redis Cluster 介绍

redis 什么时候才会读取 dump.rdb 文件?

Redis Cluster 作用

1)Redis 集群是一个可以在多个 Redis 节点之间进行数据共享的设施(installation)。

2)Redis 集群不支持那些需要同时处理多个键的 Redis 命令,因为执行这些命令需要在多个 Redis 节点之间移动数据,并且在高负载的情况下,这些命令将降低 Redis 集群的性能,并导致不可预测的行为。

3)Redis 集群通过分区(partition)来提供一定程度的可用性(availability):即使集群中有一部分节点失效或者无法进行通讯,集群也可以继续处理命令请求。

4)Redis 集群有将数据自动切分(split)到多个节点的能力。

Redis Cluster 如何存储数据

hash 槽

哈希算法

  • CRC16 算法(redis-cluster 槽位算法)
  • CRC32 算法
# 存储一个key
set name ljy
 
crc16(name)=5092
# 取模计算
python计算
16384 % 5092
# 这个就是在槽位中的位置
1108

Redis Cluster 的特点

高性能

\1. 在多酚片节点中,将 16384 个槽位,均匀分布到多个分片节点中

\2. 存数据时,将 key 做 crc16(key),然后和 16384 进行取模,得出槽位值(0-16384 之间)

\3. 根据计算得出的槽位值,找到相对应的分片节点的主节点,存储到相应槽位上

\4. 如果客户端当时连接的节点不是将来要存储的分片节点,分片集群会将客户端连接切换至真正存储节点进行数据存储

高可用

在搭建集群时,会为每一个分片的主节点,对应一个从节点,实现 slaveof 功能,同时当主节点 down,实现类似于 sentinel 的自动 failover 的功能。

image-20230906091227793

Redis Cluster 客户端连接任意节点

image-20230906091604748

如图所示,当我们用客户端连接 A 分片时,如果按照数据的取模,我们想要访问的数据,不在 A 分片中,那么集群会自动将请求进行转发。

redis 集群数据共享 (设计理念)

Redis 集群使用数据分片(sharding)而非一致性哈希(consistency hashing)来实现: 一个 Redis 集群包含 16384 个哈希槽(hash slot), 数据库中的每个键都属于这 16384 个哈希槽的其中一个, 集群使用公式 CRC16 (key) % 16384 来计算键 key 属于哪个槽, 其中 CRC16 (key) 语句用于计算键 key 的 CRC16 校验和 。

\1. 节点 A 负责处理 0 号至 5500 号哈希槽。

\2. 节点 B 负责处理 5501 号至 11000 号哈希槽。

\3. 节点 C 负责处理 11001 号至 16384 号哈希槽。

image-20230906090141843

Redis Cluster 运行机制

  • 所有的 redis 节点彼此互联 (PING-PONG 机制), 内部使用二进制协议优化传输速度和带宽.
  • 节点的 fail 是通过集群中超过半数的 master 节点检测失效时才生效.
  • 客户端与 redis 节点直连,不需要中间 proxy 层。客户端不需要连接集群所有节点,连接集群中任何一个可用节点即可
  • 把所有的物理节点映射到 [0-16384] slot 上,cluster 负责维护 node<->slot<->key

Redis Cluster 如何做集群复制

为了使得集群在一部分节点下线或者无法与集群的大多数(majority)节点进行通讯的情况下, 仍然可以正常运作, Redis 集群对节点使用了主从复制功能: 集群中的每个节点都有 1 个至 N 个复制品(replica), 其中一个复制品为主节点(master), 而其余的 N-1 个复制品为从节点(slave)。

在之前列举的节点 A 、B 、C 的例子中, 如果节点 B 下线了, 那么集群将无法正常运行, 因为集群找不到节点来处理 5501 号至 11000 号的哈希槽。

假如在创建集群的时候(或者至少在节点 B 下线之前), 我们为主节点 B 添加了从节点 B1 , 那么当主节点 B 下线的时候, 集群就会将 B1 设置为新的主节点, 并让它代替下线的主节点 B , 继续处理 5501 号至 11000 号的哈希槽, 这样集群就不会因为主节点 B 的下线而无法正常运作了。

不过如果节点 B 和 B1 都下线的话, Redis 集群还是会停止运作。

集群的复制特性重用了 SLAVEOF 命令的代码,所以集群节点的复制行为和 SLAVEOF 命令的复制行为完全相同。

总结:主节点和从节点,一定不能在同一台服务器

Redis Cluster 故障转移

1)在集群里面,节点会对其他节点进行下线检测。

2)当一个主节点下线时,集群里面的其他主节点负责对下线主节点进行故障移。

3)换句话说,集群的节点集成了下线检测和故障转移等类似 Sentinel 的功能。

4)因为 Sentinel 是一个独立运行的监控程序,而集群的下线检测和故障转移等功能是集成在节点里面的,它们的运行模式非常地不同,所以尽管这两者的功能很相似,但集群的实现没有重用 Sentinel 的代码。

Redis Cluster 中执行命令的两种情况

1)命令发送到了正确的节点:命令要处理的键所在的槽正好是由接收命令的节点负责,那么该节点执行命令,就像单机 Redis 服务器一样。

image-20230906100200952

2)命令发送到了错误的节点:接收到命令的节点并非处理键所在槽的节点,那么节点将向客户端返回一个转向(redirection)错误,告知客户端应该到哪个节点去执行这个命令,客户端会根据错误提示的信息,重新向正确的节点发送命令。

image-20230906100211279

Redis Cluster 安装部署(7.2 版本多机多实例)

环境准备

外网 IP 主机 端口 应用
10.0.0.51 db01 7000、7001 redis-server、redis-client
10.0.0.51 db02 7002、7003 redis-server、redis-client
10.0.0.51 db03 7004、7005 redis-server、redis-client

创建多实例

# 创建目录
[root@db01 ~]# mkdir -p /data/redis/700{0,1}
[root@db02 ~]# mkdir -p /data/redis/700{2,3}
[root@db03 ~]# mkdir -p /data/redis/700{4,5}
 
# 编写多实例配置文件
[root@db01 redis]# vim /data/redis/7000/redis.conf
port 7000
daemonize yes
pidfile /data/redis/7000/redis.pid
loglevel notice
logfile /data/redis/7000/redis.log
dbfilename dump.rdb
dir /data/redis/7000
bind 172.16.1.51
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
 
[root@db01 redis]# vim /data/redis/7001/redis.conf
port 7001
daemonize yes
pidfile /data/redis/7001/redis.pid
loglevel notice
logfile /data/redis/7001/redis.log
dbfilename dump.rdb
dir /data/redis/7001
bind 172.16.1.51
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
 
[root@db02 redis-7.2.0]# vim /data/redis/7002/redis.conf    
port 7002
daemonize yes
pidfile /data/redis/7002/redis.pid
loglevel notice
logfile /data/redis/7002/redis.log
dbfilename dump.rdb
dir /data/redis/7002
bind 172.16.1.52
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
 
[root@db02 redis-7.2.0]# vim /data/redis/7003/redis.conf
port 7003
daemonize yes
pidfile /data/redis/7003/redis.pid
loglevel notice
logfile /data/redis/7003/redis.log
dbfilename dump.rdb
dir /data/redis/7003
bind 172.16.1.52
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
 
[root@db03 redis-7.2.0]# vim /data/redis/7004/redis.conf
port 7004
daemonize yes
pidfile /data/redis/7004/redis.pid
loglevel notice
logfile /data/redis/7004/redis.log
dbfilename dump.rdb
dir /data/redis/7004
bind 172.16.1.53
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
 
[root@db03 redis-7.2.0]# vim /data/redis/7005/redis.conf
port 7005
daemonize yes
pidfile /data/redis/7005/redis.pid
loglevel notice
logfile /data/redis/7005/redis.log
dbfilename dump.rdb
dir /data/redis/7005
bind 172.16.1.53
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000

3.x 版本操作安装 ruby 环境(新版本无需执行)

# 安装ruby环境
[root@db01 redis]# yum install ruby rubygems -y
 
# 查看ruby的源
[root@db01 ~]# gem source --list
*** CURRENT SOURCES ***
 
https://rubygems.org/
 
# 更换ruby的源
[root@db01 ~]# gem sources -a https://mirrors.aliyun.com/rubygems/
https://mirrors.aliyun.com/rubygems/ added to sources
[root@db01 ~]# gem source --remove https://rubygems.org/
https://rubygems.org/ removed from sources
 
# 安装redis的ruby插件(Redis Cluster)
[root@db01 ~]# gem install redis -v 3.3.3

启动 redis 多实例

# 启动实例
[root@db01 ~]# redis-server /data/redis/7000/redis.conf 
[root@db01 ~]# redis-server /data/redis/7001/redis.conf 
[root@db02 ~]# redis-server /data/redis/7002/redis.conf 
[root@db02 ~]# redis-server /data/redis/7003/redis.conf 
[root@db03 ~]# redis-server /data/redis/7004/redis.conf 
[root@db03 ~]# redis-server /data/redis/7005/redis.conf
 
# 查看端口
[root@db01 redis]# netstat -lntup
tcp        0      0 172.16.1.51:17000       0.0.0.0:*               LISTEN      27950/redis-server  
tcp        0      0 172.16.1.51:17001       0.0.0.0:*               LISTEN      27931/redis-server  
tcp        0      0 172.16.1.51:7000        0.0.0.0:*               LISTEN      27950/redis-server  
tcp        0      0 172.16.1.51:7001        0.0.0.0:*               LISTEN      27931/redis-server  
 
[root@db02 ~]# netstat -lntup
tcp        0      0 172.16.1.52:17002       0.0.0.0:*               LISTEN      27469/redis-server  
tcp        0      0 172.16.1.52:17003       0.0.0.0:*               LISTEN      27485/redis-server  
tcp        0      0 172.16.1.52:7002        0.0.0.0:*               LISTEN      27469/redis-server  
tcp        0      0 172.16.1.52:7003        0.0.0.0:*               LISTEN      27485/redis-server  
 
[root@db03 ~]# netstat -lntup
tcp        0      0 172.16.1.53:17004       0.0.0.0:*               LISTEN      27472/redis-server  
tcp        0      0 172.16.1.53:17005       0.0.0.0:*               LISTEN      27488/redis-server  
tcp        0      0 172.16.1.53:7004        0.0.0.0:*               LISTEN      27472/redis-server  
tcp        0      0 172.16.1.53:7005        0.0.0.0:*               LISTEN      27488/redis-server 

创建 cluster 集群

# 前三个为主节点,后三个为从节点(老版本)
[root@db01 redis]# redis-trib.rb create --replicas 1 172.16.1.51:7000 172.16.1.51:7001 172.16.1.52:7002 172.16.1.52:7003 172.16.1.53:7004 172.16.1.53:7005
 
# 自动分配集群并创建分片(新版本)
## cluster的帮助菜单
[root@db01 redis]# redis-cli --cluster help
[root@db01 redis]# redis-cli --cluster create --cluster-replicas 1 172.16.1.51:7000 172.16.1.51:7001 172.16.1.52:7002 172.16.1.52:7003 172.16.1.53:7004 172.16.1.53:7005
# 分配显示结果如下
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 172.16.1.52:7003 to 172.16.1.51:7000
Adding replica 172.16.1.53:7005 to 172.16.1.52:7002
Adding replica 172.16.1.51:7001 to 172.16.1.53:7004
M: d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000
   slots:[0-5460] (5461 slots) master
S: f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001
   replicates 4652baa7d9684723950efb2780e61bbb1fdd314e
M: efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002
   slots:[5461-10922] (5462 slots) master
S: 0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003
   replicates d2f578c347db4999e7bfaa201f10e1aa0a2c7049
M: 4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004
   slots:[10923-16383] (5461 slots) master
S: d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005
   replicates efc678a3dba4e8b1685bb653d654ab00c14a731d
Can I set the above configuration? (type 'yes' to accept): yes
 
# 查看master和slave的状态
[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000 cluster nodes | grep slave
d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005@17005 slave efc678a3dba4e8b1685bb653d654ab00c14a731d 0 1693968000834 3 connected
f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001@17001 slave 4652baa7d9684723950efb2780e61bbb1fdd314e 0 1693967999523 5 connected
0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003@17003 slave d2f578c347db4999e7bfaa201f10e1aa0a2c7049 0 1693968000530 1 connected

MOVED 重定向

现在我们在节点 172.16.1.51:7000 上进行写入:

[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000
172.16.1.51:7000> set name ljy
(error) MOVED 5798 172.16.1.52:7002

它会提示你去 172.16.1.52:7002 上进行写入。

这个就是 MOVED 重定向。

-c 参数连接集群

如何解决这个问题?其实在登录的时候加上参数 - c 即可,-c 参数无所谓你的 Redis 是否是集群模式,建议任何登录操作都加上,这样即使是 Redis 集群也会自动进行 MOVED 重定向:

[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000 -c
172.16.1.51:7000> set name ljy
-> Redirected to slot [5798] located at 172.16.1.52:7002
OK
172.16.1.52:7002> 

添加节点

扩展一个新的节点

# 在db03上起两个多实例
[root@db03 7006]# cat redis.conf 
port 7006
daemonize yes
pidfile /data/redis/7006/redis.pid
loglevel notice
logfile /data/redis/7006/redis.log
dbfilename dump.rdb
dir /data/redis/7006
bind 172.16.1.53
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
 
[root@db03 7007]# cat redis.conf 
port 7007
daemonize yes
pidfile /data/redis/7007/redis.pid
loglevel notice
logfile /data/redis/7007/redis.log
dbfilename dump.rdb
dir /data/redis/7007
bind 172.16.1.53
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
 
# 启动另外两台实例
[root@db03 7007]# redis-server /data/redis/7006/redis.conf
[root@db03 7007]# redis-server /data/redis/7007/redis.conf
 
# 先将新节点加入集群(老版本)
[root@db01 ~]# redis-trib.rb add-node 172.16.1.52:7006 172.16.1.51:7000
 
# 先将新节点加入集群(新版本)
[root@db01 redis]# redis-cli --cluster add-node 172.16.1.53:7007 172.16.1.51:7000
>>> Adding node 172.16.1.53:7007 to cluster 172.16.1.51:7000
>>> Performing Cluster Check (using node 172.16.1.51:7000)
M: d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005
   slots: (0 slots) slave
   replicates efc678a3dba4e8b1685bb653d654ab00c14a731d
M: 4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001
   slots: (0 slots) slave
   replicates 4652baa7d9684723950efb2780e61bbb1fdd314e
S: 0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003
   slots: (0 slots) slave
   replicates d2f578c347db4999e7bfaa201f10e1aa0a2c7049
M: efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Getting functions from cluster
>>> Send FUNCTION LIST to 172.16.1.53:7007 to verify there is no functions in it
>>> Send FUNCTION RESTORE to 172.16.1.53:7007
>>> Send CLUSTER MEET to node 172.16.1.53:7007 to make it join the cluster.
[OK] New node added correctly.
 
# 查看集群状态并保存新加入的nodeID
[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000 cluster nodes
d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005@17005 slave efc678a3dba4e8b1685bb653d654ab00c14a731d 0 1693971466398 3 connected
4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004@17004 master - 0 1693971467103 5 connected 10923-16383
f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001@17001 slave 4652baa7d9684723950efb2780e61bbb1fdd314e 0 1693971466000 5 connected
0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003@17003 slave d2f578c347db4999e7bfaa201f10e1aa0a2c7049 0 1693971466096 1 connected
d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000@17000 myself,master - 0 1693971465000 1 connected 0-5460
b392d9623c7408569b71d86dca9bded5878648f4 172.16.1.53:7007@17007 master - 0 1693971466902 0 connected
efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002@17002 master - 0 1693971465392 3 connected 5461-10922
 
b392d9623c7408569b71d86dca9bded5878648f4
 
# 重新分片(分配槽位 老版本)
[root@db01 ~]# redis-trib.rb reshard 127.0.0.1:7000
 
# 重新分片
[root@db01 redis]# redis-cli --cluster reshard 172.16.1.51 7000
>>> Performing Cluster Check (using node 172.16.1.51:7000)
M: d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
S: d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005
   slots: (0 slots) slave
   replicates efc678a3dba4e8b1685bb653d654ab00c14a731d
M: 4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
S: f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001
   slots: (0 slots) slave
   replicates 4652baa7d9684723950efb2780e61bbb1fdd314e
S: 0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003
   slots: (0 slots) slave
   replicates d2f578c347db4999e7bfaa201f10e1aa0a2c7049
M: b392d9623c7408569b71d86dca9bded5878648f4 172.16.1.53:7007
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
M: efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
## 你想要转移多少slot(槽位)到新节点
How many slots do you want to move (from 1 to 16384)? 4096
## 你接收的node ID是谁
What is the receiving node ID? b392d9623c7408569b71d86dca9bded5878648f4
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
## 源节点的ID
Source node #1: all
## 是否希望我这样分配
Do you want to proceed with the proposed reshard plan (yes/no)? yes
 
# 查看集群状态
[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000 cluster nodes
d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005@17005 slave efc678a3dba4e8b1685bb653d654ab00c14a731d 0 1693971767198 3 connected
4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004@17004 master - 0 1693971766000 5 connected 12288-16383
f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001@17001 slave 4652baa7d9684723950efb2780e61bbb1fdd314e 0 1693971766593 5 connected
0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003@17003 slave d2f578c347db4999e7bfaa201f10e1aa0a2c7049 0 1693971766192 1 connected
d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000@17000 myself,master - 0 1693971765000 1 connected 1365-5460
b392d9623c7408569b71d86dca9bded5878648f4 172.16.1.53:7007@17007 master - 0 1693971766594 7 connected 0-1364 5461-6826 10923-12287
efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002@17002 master - 0 1693971766694 3 connected 6827-10922

添加从节点

# 添加从节点 (老版本)
[root@db01 ~]# redis-trib.rb add-node --slave --master-id
f4e3ce12a9aa1fe741634e74b88bb8b70b414f51 172.16.1.52:7007 172.16.1.51:7000
 
# 添加从节点 (新版本)
[root@db01 redis]# redis-cli --cluster add-node --cluster-slave --cluster-master-id b392d9623c7408569b71d86dca9bded5878648f4 172.16.1.53:7006 172.16.1.53:7007
>>> Adding node 172.16.1.53:7006 to cluster 172.16.1.53:7007
>>> Performing Cluster Check (using node 172.16.1.53:7007)
M: b392d9623c7408569b71d86dca9bded5878648f4 172.16.1.53:7007
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
S: f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001
   slots: (0 slots) slave
   replicates 4652baa7d9684723950efb2780e61bbb1fdd314e
M: d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
S: d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005
   slots: (0 slots) slave
   replicates efc678a3dba4e8b1685bb653d654ab00c14a731d
S: 0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003
   slots: (0 slots) slave
   replicates d2f578c347db4999e7bfaa201f10e1aa0a2c7049
M: efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
M: 4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 172.16.1.53:7006 to make it join the cluster.
Waiting for the cluster to join
 
>>> Configure node as replica of 172.16.1.53:7007.
[OK] New node added correctly.
 
# 重新查看集群状态
[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000 cluster nodes
d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005@17005 slave efc678a3dba4e8b1685bb653d654ab00c14a731d 0 1693971972409 3 connected
4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004@17004 master - 0 1693971972911 5 connected 12288-16383
f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001@17001 slave 4652baa7d9684723950efb2780e61bbb1fdd314e 0 1693971973515 5 connected
0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003@17003 slave d2f578c347db4999e7bfaa201f10e1aa0a2c7049 0 1693971974419 1 connected
d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000@17000 myself,master - 0 1693971972000 1 connected 1365-5460
e801f09e8ef1940ad2ec5890a570f5bd953ab62f 172.16.1.53:7006@17006 slave b392d9623c7408569b71d86dca9bded5878648f4 0 1693971972000 7 connected
b392d9623c7408569b71d86dca9bded5878648f4 172.16.1.53:7007@17007 master - 0 1693971972409 7 connected 0-1364 5461-6826 10923-12287
efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002@17002 master - 0 1693971973414 3 connected 6827-10922

删除节点

# 重新分片(把7007的槽位给到7002)
[root@db01 redis]# redis-cli --cluster reshard 172.16.1.51 7000
>>> Performing Cluster Check (using node 172.16.1.51:7000)
M: d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
S: d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005
   slots: (0 slots) slave
   replicates efc678a3dba4e8b1685bb653d654ab00c14a731d
M: 4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
S: f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001
   slots: (0 slots) slave
   replicates 4652baa7d9684723950efb2780e61bbb1fdd314e
S: 0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003
   slots: (0 slots) slave
   replicates d2f578c347db4999e7bfaa201f10e1aa0a2c7049
S: e801f09e8ef1940ad2ec5890a570f5bd953ab62f 172.16.1.53:7006
   slots: (0 slots) slave
   replicates b392d9623c7408569b71d86dca9bded5878648f4
M: b392d9623c7408569b71d86dca9bded5878648f4 172.16.1.53:7007
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
   1 additional replica(s)
M: efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node ID? efc678a3dba4e8b1685bb653d654ab00c14a731d
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: b392d9623c7408569b71d86dca9bded5878648f4
Source node #2: done
 
# 查看集群状态
[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000 cluster nodes
d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005@17005 slave efc678a3dba4e8b1685bb653d654ab00c14a731d 0 1693972598374 8 connected
4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004@17004 master - 0 1693972597872 5 connected 12288-16383
f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001@17001 slave 4652baa7d9684723950efb2780e61bbb1fdd314e 0 1693972597369 5 connected
0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003@17003 slave d2f578c347db4999e7bfaa201f10e1aa0a2c7049 0 1693972597168 1 connected
d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000@17000 myself,master - 0 1693972597000 1 connected 1365-5460
e801f09e8ef1940ad2ec5890a570f5bd953ab62f 172.16.1.53:7006@17006 slave efc678a3dba4e8b1685bb653d654ab00c14a731d 0 1693972598575 8 connected
b392d9623c7408569b71d86dca9bded5878648f4 172.16.1.53:7007@17007 slave efc678a3dba4e8b1685bb653d654ab00c14a731d 0 1693972597000 8 connected
efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002@17002 master - 0 1693972598877 8 connected 0-1364 5461-12287
## 7007与它的从库7006都自动变为7002的从库
 
# 删除节点(老版本)
[root@db01 ~]# redis-trib.rb del-node 172.16.1.52:7006
f4e3ce12a9aa1fe741634e74b88bb8b70b414f51
 
# 删除节点(新版本)
## 删除7007
[root@db01 redis]# redis-cli --cluster del-node 172.16.1.53:7007 b392d9623c7408569b71d86dca9bded5878648f4
## 删除7006
[root@db01 redis]# redis-cli --cluster del-node 172.16.1.53:7006 
 
# 查看集群状态
[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000 cluster nodes
d44a61b4dce8c7a09dfea30f0486b05483e207b2 172.16.1.53:7005@17005 slave efc678a3dba4e8b1685bb653d654ab00c14a731d 0 1693972887208 8 connected
4652baa7d9684723950efb2780e61bbb1fdd314e 172.16.1.53:7004@17004 master - 0 1693972886705 5 connected 12288-16383
f0a45f49c3af1f06d472963273ccf72806d90375 172.16.1.51:7001@17001 slave 4652baa7d9684723950efb2780e61bbb1fdd314e 0 1693972887610 5 connected
0ec56f7ceb0889db4e4e144030bf97b27602ab64 172.16.1.52:7003@17003 slave d2f578c347db4999e7bfaa201f10e1aa0a2c7049 0 1693972886203 1 connected
d2f578c347db4999e7bfaa201f10e1aa0a2c7049 172.16.1.51:7000@17000 myself,master - 0 1693972885000 1 connected 1365-5460
efc678a3dba4e8b1685bb653d654ab00c14a731d 172.16.1.52:7002@17002 master - 0 1693972886000 8 connected 0-1364 5461-12287

Redis Cluster 安装部署(7.2 版本单机多实例)

环境准备

外网 IP 内网 IP 端口 角色
10.0.0.51 172.16.1.51 6380 master
10.0.0.51 172.16.1.51 6381 master
10.0.0.51 172.16.1.51 6382 master
10.0.0.51 172.16.1.51 6384 6380 的 slave
10.0.0.51 172.16.1.51 6385 6381 的 slave
10.0.0.51 172.16.1.51 6383 6382 的 slave

redis7.2 安装请前往 TP

创建多实例

# 创建多实例目录
[root@db01 ~]# mkdir -p /data/redis/63{80..85}
 
# 编辑多实例配置文件
[root@db01 redis]# cat > /data/redis/6380/redis.conf << 'EOF'
port 6380
daemonize yes
pidfile /data/redis/6380/redis.pid
loglevel notice
logfile "/data/redis/6380/redis.log"
dbfilename dump.rdb
dir /data/redis/6380
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
bind 0.0.0.0
EOF
 
[root@db01 redis]# cat > /data/redis/6381/redis.conf << 'EOF'
port 6381
daemonize yes
pidfile /data/redis/6381/redis.pid
loglevel notice
logfile "/data/redis/6381/redis.log"
dbfilename dump.rdb
dir /data/redis/6381
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
bind 0.0.0.0
EOF
 
[root@db01 redis]# cat > /data/redis/6382/redis.conf << 'EOF'
port 6382
daemonize yes
pidfile /data/redis/6382/redis.pid
loglevel notice
logfile "/data/redis/6382/redis.log"
dbfilename dump.rdb
dir /data/redis/6382
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
bind 0.0.0.0
EOF
 
[root@db01 redis]# cat > /data/redis/6383/redis.conf << 'EOF'
port 6383
daemonize yes
pidfile /data/redis/6383/redis.pid
loglevel notice
logfile "/data/redis/6383/redis.log"
dbfilename dump.rdb
dir /data/redis/6383
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
bind 0.0.0.0
EOF
 
[root@db01 redis]# cat > /data/redis/6384/redis.conf << 'EOF'
port 6384
daemonize yes
pidfile /data/redis/6384/redis.pid
loglevel notice
logfile "/data/redis/6384/redis.log"
dbfilename dump.rdb
dir /data/redis/6384
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
bind 0.0.0.0
EOF
 
[root@db01 redis]# cat > /data/redis/6385/redis.conf << 'EOF'
port 6385
daemonize yes
pidfile /data/redis/6385/redis.pid
loglevel notice
logfile "/data/redis/6385/redis.log"
dbfilename dump.rdb
dir /data/redis/6385
protected-mode no
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
bind 0.0.0.0
EOF

启动 redis

[root@db01 redis]# systemctl start redis6380
[root@db01 redis]# systemctl start redis6381
[root@db01 redis]# systemctl start redis6382
[root@db01 redis]# systemctl start redis6383
[root@db01 redis]# systemctl start redis6384
[root@db01 redis]# systemctl start redis6385

加入集群

将所有节点加入集群

[root@db01 ~]# redis-cli -p 6380
127.0.0.1:6380> cluster meet 172.16.1.51 6381
OK
127.0.0.1:6380> cluster meet 172.16.1.51 6382
OK
127.0.0.1:6380> cluster meet 172.16.1.51 6383
OK
127.0.0.1:6380> cluster meet 172.16.1.51 6384
OK
127.0.0.1:6380> cluster meet 172.16.1.51 6385
OK

查看当前集群所有的节点

127.0.0.1:6380> CLUSTER NODES
945117e8396fc7e99bb4f39aa573d247b941bd30 172.16.1.51:6382@16382 master - 0 1693908669219 2 connected
e8b6d0a1c84e775451c8be6d2b730759bd3a7cb4 172.16.1.51:6384@16384 master - 0 1693908668715 0 connected
cb1e9177d5c9b116aa569a611c652ec274f75906 172.16.1.51:6380@16380 myself,master - 0 1693908669000 4 connected
c08d0c382974ea7e2cab815ecc85c052abbc3cc6 172.16.1.51:6385@16385 master - 0 1693908669219 5 connected
c3375f0370ba5dd4354c70b34625bbc7336bbbd8 172.16.1.51:6381@16381 master - 0 1693908670225 1 connected
1e2a8c8981fb04e0d80daaa05a22adda68749c94 172.16.1.51:6383@16383 master - 0 1693908669000 3 connected

查看监听端口

[root@db01 ~]# netstat -lntup | grep redis
tcp        0      0 0.0.0.0:16380           0.0.0.0:*               LISTEN      9001/redis-server 0 
tcp        0      0 0.0.0.0:16381           0.0.0.0:*               LISTEN      9003/redis-server 0 
tcp        0      0 0.0.0.0:16382           0.0.0.0:*               LISTEN      8998/redis-server 0 
tcp        0      0 0.0.0.0:16383           0.0.0.0:*               LISTEN      9000/redis-server 0 
tcp        0      0 0.0.0.0:16384           0.0.0.0:*               LISTEN      8996/redis-server 0 
tcp        0      0 0.0.0.0:16385           0.0.0.0:*               LISTEN      9002/redis-server 0 

主从配置

6 个服务之间并没有任何主从关系,所以现在进行主从配置,记录下上面 cluster nodes 命令输出的 node-id 信息,只记录主节点:

节点 node-id
172.16.1.51:6380 cb1e9177d5c9b116aa569a611c652ec274f75906
172.16.1.51:6381 c3375f0370ba5dd4354c70b34625bbc7336bbbd8
172.16.1.51:6382 945117e8396fc7e99bb4f39aa573d247b941bd30

首先节点 172.16.1.51:6384,复制主节点 172.16.1.51:6380 的 node-id

127.0.0.1:6384> CLUSTER REPLICATE cb1e9177d5c9b116aa569a611c652ec274f75906

然后节点 172.16.1.51:6385,复制主节点 172.16.1.51:6381 的 node-id

127.0.0.1:6385> CLUSTER REPLICATE c3375f0370ba5dd4354c70b34625bbc7336bbbd8

然后节点 172.16.1.51:6383,复制主节点 172.16.1.51:6382 的 node-id

127.0.0.1:6383> CLUSTER REPLICATE 945117e8396fc7e99bb4f39aa573d247b941bd30

查看集群节点信息

127.0.0.1:6380> CLUSTER NODES
945117e8396fc7e99bb4f39aa573d247b941bd30 172.16.1.51:6382@16382 master - 0 1693912017000 2 connected
e8b6d0a1c84e775451c8be6d2b730759bd3a7cb4 172.16.1.51:6384@16384 slave cb1e9177d5c9b116aa569a611c652ec274f75906 0 1693912018000 4 connected
cb1e9177d5c9b116aa569a611c652ec274f75906 172.16.1.51:6380@16380 myself,master - 0 1693912018000 4 connected
c08d0c382974ea7e2cab815ecc85c052abbc3cc6 172.16.1.51:6385@16385 slave c3375f0370ba5dd4354c70b34625bbc7336bbbd8 0 1693912018552 1 connected
c3375f0370ba5dd4354c70b34625bbc7336bbbd8 172.16.1.51:6381@16381 master - 0 1693912018049 1 connected
1e2a8c8981fb04e0d80daaa05a22adda68749c94 172.16.1.51:6383@16383 slave 945117e8396fc7e99bb4f39aa573d247b941bd30 0 1693912019056 2 connected
 
# myself表示当前登录的是哪个服务

分配槽位

接下来我们要开始分配槽位了,为了考虑今后的写入操作能分配均匀,槽位也要进行均匀分配。

仅在 Master 上进行分配,从库不进行分配,仅做主库的备份和读库使用。

使用 python 计算每个 master 节点分多少槽位:

$ python3
 
>>> divmod(16384,3)
(5461, 1)

槽位分配情况如下,槽位号从 0 开始,到 16383 结束,共 16384 个槽位:

节点 槽位数量
172.16.1.51:6380 0 - 5461
172.16.1.51:6381 5461 - 10922
172.16.1.51:6382 10922 - 16383

开始分配:

[root@db01 ~]# redis-cli -p 6380 cluster addslots {0..5461}
[root@db01 ~]# redis-cli -p 6381 cluster addslots {5462..10922}
[root@db01 ~]# redis-cli -p 6382 cluster addslots {10923..16383}

检查槽位是否分配正确,这里进行内容截取:

[root@db01 ~]# redis-cli -p 6380
127.0.0.1:6380> CLUSTER NODES
945117e8396fc7e99bb4f39aa573d247b941bd30 172.16.1.51:6382@16382 master - 0 1693912541903 2 connected 10923-16383
e8b6d0a1c84e775451c8be6d2b730759bd3a7cb4 172.16.1.51:6384@16384 slave cb1e9177d5c9b116aa569a611c652ec274f75906 0 1693912541501 4 connected
cb1e9177d5c9b116aa569a611c652ec274f75906 172.16.1.51:6380@16380 myself,master - 0 1693912540000 4 connected 0-5461
c08d0c382974ea7e2cab815ecc85c052abbc3cc6 172.16.1.51:6385@16385 slave c3375f0370ba5dd4354c70b34625bbc7336bbbd8 0 1693912540093 1 connected
c3375f0370ba5dd4354c70b34625bbc7336bbbd8 172.16.1.51:6381@16381 master - 0 1693912540898 1 connected 5462-10922
1e2a8c8981fb04e0d80daaa05a22adda68749c94 172.16.1.51:6383@16383 slave 945117e8396fc7e99bb4f39aa573d247b941bd30 0 1693912541501 2 connected
 
# 看master节点的最后

检查状态

使用以下命令检查集群状态是否 ok,如果槽位全部分配完毕应该是 ok,不然的话就检查你分配槽位时是否输错了数量:

127.0.0.1:6380> CLUSTER INFO
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:5
cluster_my_epoch:4
cluster_stats_messages_ping_sent:7347
cluster_stats_messages_pong_sent:7311
cluster_stats_messages_meet_sent:5
cluster_stats_messages_sent:14663
cluster_stats_messages_ping_received:7311
cluster_stats_messages_pong_received:7352
cluster_stats_messages_received:14663
total_cluster_links_buffer_limit_exceeded:0

MOVED 重定向

现在我们在节点 172.16.1.51:7000 上进行写入:

[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000
172.16.1.51:7000> set name ljy
(error) MOVED 5798 172.16.1.52:7002

它会提示你去 172.16.1.52:7002 上进行写入。

这个就是 MOVED 重定向。

-c 参数

如何解决这个问题?其实在登录的时候加上参数 - c 即可,-c 参数无所谓你的 Redis 是否是集群模式,建议任何登录操作都加上,这样即使是 Redis 集群也会自动进行 MOVED 重定向:

[root@db01 redis]# redis-cli -h 172.16.1.51 -p 7000 -c
172.16.1.51:7000> set name ljy
-> Redirected to slot [5798] located at 172.16.1.52:7002
OK
172.16.1.52:7002> 

故障转移

故障模拟

模拟 172.16.1.51:6380 下线宕机,此时应该由 172.16.1.51:6384 接管它的工作

[root@db01 ~]# redis-cli -p 6380 shutdown 

登录集群任意节点查看目前的集群节点信息

127.0.0.1:6381> CLUSTER NODES
# 已下线
cb1e9177d5c9b116aa569a611c652ec274f75906 172.16.1.51:6380@16380 master,fail - 1693913160858 1693913158341 4 disconnected
 
1e2a8c8981fb04e0d80daaa05a22adda68749c94 172.16.1.51:6383@16383 slave 945117e8396fc7e99bb4f39aa573d247b941bd30 0 1693913241542 2 connected
945117e8396fc7e99bb4f39aa573d247b941bd30 172.16.1.51:6382@16382 master - 0 1693913240837 2 connected 10923-16383
c08d0c382974ea7e2cab815ecc85c052abbc3cc6 172.16.1.51:6385@16385 slave c3375f0370ba5dd4354c70b34625bbc7336bbbd8 0 1693913241000 1 connected
c3375f0370ba5dd4354c70b34625bbc7336bbbd8 172.16.1.51:6381@16381 myself,master - 0 1693913240000 1 connected 5462-10922
# 接管6380成为新主库,并且插槽也转移了
e8b6d0a1c84e775451c8be6d2b730759bd3a7cb4 172.16.1.51:6384@16384 master - 0 1693913241844 6 connected 0-5461

恢复工作

重启 172.16.1.51:6380

[root@db01 ~]# systemctl start redis6380

登录 172.16.1.51:6380, 发现他已经自动上线了,并且变为 172.16.1.51:6384 的从库

[root@db01 ~]# redis-cli -p 6380
127.0.0.1:6380> CLUSTER NODES
945117e8396fc7e99bb4f39aa573d247b941bd30 172.16.1.51:6382@16382 master - 0 1693913476563 2 connected 10923-16383
c3375f0370ba5dd4354c70b34625bbc7336bbbd8 172.16.1.51:6381@16381 master - 0 1693913478000 1 connected 5462-10922
e8b6d0a1c84e775451c8be6d2b730759bd3a7cb4 172.16.1.51:6384@16384 master - 0 1693913478072 6 connected 0-5461
1e2a8c8981fb04e0d80daaa05a22adda68749c94 172.16.1.51:6383@16383 slave 945117e8396fc7e99bb4f39aa573d247b941bd30 0 1693913477000 2 connected
c08d0c382974ea7e2cab815ecc85c052abbc3cc6 172.16.1.51:6385@16385 slave c3375f0370ba5dd4354c70b34625bbc7336bbbd8 0 1693913478574 1 connected
cb1e9177d5c9b116aa569a611c652ec274f75906 172.16.1.51:6380@16380 myself,slave e8b6d0a1c84e775451c8be6d2b730759bd3a7cb4 0 1693913478000 6 connected

cluster 命令

以下是集群中常用的可执行命令,命令执行格式为:

cluster 下表命令

命令如下,未全,如果想了解更多请执行 cluster help 操作:

命令 描述
INFO 返回当前集群信息
MEET [] 添加一个节点至当前集群
MYID 返回当前节点集群 ID
NODES 返回当前节点的集群信息
REPLICATE 将当前节点作为某一集群节点的从库
FAILOVER [FORCE|TAKEOVER] 将当前从库升级为主库
RESET [HARD|SOFT] 重置当前节点信息
ADDSLOTS [ ...] 为当前集群节点增加一个或多个插槽位,推荐在 bash shell 中执行,可通过 {int..int} 指定多个插槽位
DELSLOTS [ ...] 为当前集群节点删除一个或多个插槽位,推荐在 bash shell 中执行,可通过 {int..int} 指定多个插槽位
FLUSHSLOTS 删除当前节点中所有的插槽信息
FORGET 从集群中删除某一节点
COUNT-FAILURE-REPORTS 返回当前集群节点的故障报告数量
COUNTKEYSINSLOT 返回某一插槽中的键的数量
GETKEYSINSLOT 返回当前节点存储在插槽中的 key 名称。
KEYSLOT 返回该 key 的哈希槽位
SAVECONFIG 保存当前集群配置,进行落盘操作
SLOTS 返回该插槽的信息