我在文章Redis failover中介绍过如何安装Redis并且通过sentinel(哨兵)实现Redis的高可用。随着Redis的不断更新,现在的Redis(我使用的版本是6.2.6)已经支持了集群功能,本文记录了如何搭建一个Redis集群并使用。

我们使用如下的6台机器来构建一个Redis集群

  1. 172.19.65.196
  2. 172.19.72.108
  3. 172.19.72.112
  4. 172.19.72.203
  5. 172.19.65.228
  6. 172.19.65.136

下载源码并编译

首先在172.19.65.196上下载Redis源代码并进行编译,这里我下载的版本是6.2.6

useradd -m redissu - rediswget https://download.redis.io/redis-stable.tar.gztar -zxvf redis-stable.tar.gzcd redis-stablemake

编译生成的可执行文件在src目录下

文件名功能
redis-serverRedis服务的启动程序
redis-cliRedis命令操作工具
redis-sentinelRedis哨兵,在Redis failover介绍过
redis-benchmarkRedis性能测试工具
redis-check-rdb检查快照文件状态
redis-check-aof检查aof文件状态

这里我们只需要用到编译生成的redis-server程序,复制redis-server服务程序和redis.conf配置文件到用户根目录

cp /home/redis/redis-stable/src/redis-server /home/rediscp /home/redis/redis-stable/redis.conf /home/redis

修改配置文件,同步文件到所有的机器

修改redis.conf文件,对一些属性进行设置,设置内容如下

cluster-enabled yes                 # 启用Redis集群设置cluster-config-file nodes.conf      # 集群配置信息的存储文件,该文件由Redis管理,不需要手动修改cluster-node-timeout 15000          # 集群节点超过指定时间(毫秒)无响应,就认为该节点已经挂掉了appendonly yes                      # 开启aof方式的数据持久化bind 0.0.0.0                        # 允许任何主机访问Redis的服务

随后我们将这两个文件同步到剩余的5台机器上,在5台机器上执行如下命令

useradd -m redissu - redisrsync -azvhP root@172.19.65.196:/home/redis/redis-server :/home/redis/redis.conf ./

启动Redis进程并构建集群

将redis-server和redis.conf这两个文件分发到所有的机器上之后,在所有的机器上启动Redis进程

./redis-server redis.conf

6台机器上面的redis-server进程都启动好了之后,复制刚刚我们编译好的redis-cli程序到任意一台器上,连接所有的redis-server创建Redis集群并设置副本为1

~ ./redis-cli --cluster create 172.19.65.196:6379 172.19.72.108:6379 \172.19.72.112:6379 172.19.72.203:6379 172.19.65.228:6379 \172.19.65.136:6379 --cluster-replicas 1>>> Performing hash slots allocation on 6 nodes...Master[0] -> Slots 0 - 5460Master[1] -> Slots 5461 - 10922Master[2] -> Slots 10923 - 16383Adding replica 172.19.65.228:6379 to 172.19.65.196:6379Adding replica 172.19.65.136:6379 to 172.19.72.108:6379Adding replica 172.19.72.203:6379 to 172.19.72.112:6379M: 8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379slots:[0-5460] (5461 slots) masterM: 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379slots:[5461-10922] (5462 slots) masterM: 2335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379slots:[10923-16383] (5461 slots) masterS: 31e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379replicates 2335076efd1d6f38eac1228d5b326380d92056f4S: 0146973e61ffe3d9f63da5dfb9e565e02b1774b6 172.19.65.228:6379replicates 8e172b28314aad39c31ace1229f7d1ae4cdb4973S: f76fff860057dfab9d4df63b7ee183bb0a23e7df 172.19.65.136:6379replicates 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03Can I set the above configuration? (type 'yes' to accept):

如上显示了将要创建的集群的状态信息。redis会提示你是否使用如上的配置,输入yes并回车

>>> Nodes configuration updated>>> Assign a different config epoch to each node>>> Sending CLUSTER MEET messages to join the clusterWaiting for the cluster to join

>>> Performing Cluster Check (using node 172.19.65.196:6379)M: 8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379slots:[0-5460] (5461 slots) master1 additional replica(s)S: f76fff860057dfab9d4df63b7ee183bb0a23e7df 172.19.65.136:6379slots: (0 slots) slavereplicates 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03M: 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379slots:[5461-10922] (5462 slots) master1 additional replica(s)M: 2335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379slots:[10923-16383] (5461 slots) master1 additional replica(s)S: 31e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379slots: (0 slots) slavereplicates 2335076efd1d6f38eac1228d5b326380d92056f4S: 0146973e61ffe3d9f63da5dfb9e565e02b1774b6 172.19.65.228:6379slots: (0 slots) slavereplicates 8e172b28314aad39c31ace1229f7d1ae4cdb4973[OK] All nodes agree about slots configuration.>>> Check for open slots...>>> Check slots coverage...[OK] All 16384 slots covered.

执行完命令之后,集群就已经创建了。根据如上显示的信息,此时6个节点的角色如下

节点功能
172.19.65.196master节点,保存slots 0-5460
172.19.72.108master节点,保存slots 5461-10922
172.19.72.112master节点,保存slots 10923-16383
172.19.72.203172.19.72.112:6379的replica
172.19.65.228172.19.65.196:6379的replica
172.19.65.136172.19.72.108:6379的replica

集群启动后新生成的文件

观察用户的根目录中除了redis-server和redis.conf之外,还生成了appendonly.aof、dump.rdb和nodes.conf文件

文件作用
appendonly.aofAOF文件,通过追加的方式记录Redis的每一次写操作到磁盘
dump.rdbRDB快照文件,是将Redis内存中的数据持久化到磁盘中生成的
nodes.confRedis进程用于保存Redis集群相关的配置信息,不需要手动修改

nodes.conf的内容如下,保存了一些和集群配置相关的信息,记录了哪些节点是master,哪些节点是slave并且它所追随的master节点是谁

f76fff860057dfab9d4df63b7ee183bb0a23e7df 172.19.65.136:6379@16379 slave 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 0 1650355208025 2 connected5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379@16379 master - 0 1650355209991 2 connected 5461-109222335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379@16379 master - 0 1650355210995 3 connected 10923-1638331e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379@16379 slave 2335076efd1d6f38eac1228d5b326380d92056f4 0 1650355213006 3 connected0146973e61ffe3d9f63da5dfb9e565e02b1774b6 172.19.65.228:6379@16379 slave 8e172b28314aad39c31ace1229f7d1ae4cdb4973 0 1650355212000 1 connected8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379@16379 myself,master - 0 0 1 connected 0-5460vars currentEpoch 6 lastVoteEpoch 0

触发failover

我们可以通过客户端连接redis-server执行命令,-c表示连接的是一个集群。执行命令cluster nodes查看当前集群的节点信息,这里显示了master节点和slave节点

~ ./redis-cli -c -h 172.19.65.196 -p 6379> cluster nodesf76fff860057dfab9d4df63b7ee183bb0a23e7df 172.19.65.136:6379@16379 slave 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 0 1650356492455 2 connected5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379@16379 master - 0 1650356495473 2 connected 5461-109222335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379@16379 master - 0 1650356491451 3 connected 10923-1638331e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379@16379 slave 2335076efd1d6f38eac1228d5b326380d92056f4 0 1650356494468 3 connected0146973e61ffe3d9f63da5dfb9e565e02b1774b6 172.19.65.228:6379@16379 slave 8e172b28314aad39c31ace1229f7d1ae4cdb4973 0 1650356493462 1 connected8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379@16379 myself,master - 0 0 1 connected 0-5460

执行命令./redis-cli -h 172.19.72.112 -p 6379 debug segfault停止112节点的Redis进程,之后再使用cluster nodes查看集群信息

~ ./redis-cli -h 172.19.65.196 cluster nodes | grep master5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379@16379 master - 0 1650356951720 2 connected 5461-109222335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379@16379 master,fail - 1650356740773 1650356736743 3 disconnected31e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379@16379 master - 0 1650356954752 7 connected 10923-163838e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379@16379 myself,master - 0 0 1 connected 0-5460

此时可以看到112已经挂掉了,而203接过112的职责成为了新的master,此时集群已经恢复正常。

进行数据读写操作

为了方便使用,我们可以在redis.conf中添加配置daemonize yes使得Redis以守护进程的方式运行。我们可以使用依次停止节点修改配置再启动节点的方式,不停止整个集群修改配置。

使用命令./redis-cli -c -h 172.19.65.196 -p 6379进入Redis的交互式命令行

> set counter 100-> Redirected to slot [6680] located at 172.19.65.136:6379OK> incr counter(integer) 101> incr counter(integer) 102> incr counter(integer) 103> incr counter(integer) 104> incr counter(integer) 105> incr counter(integer) 106> incr counter(integer) 107> incr counter(integer) 108> incr counter(integer) 109> incr counter(integer) 110> RPUSH mylist 11-> Redirected to slot [5282] located at 172.19.65.228:6379(integer) 1> RPUSH mylist 22(integer) 2> RPUSH mylist 33(integer) 3> LRANGE mylist 0 -11) "11"2) "22"3) "33"> hmset user:1000 username antirez birthyear 1977 verified 1OK> hget user:1000 username"antirez"> hgetall user:10001) "username"2) "antirez"3) "birthyear"4) "1977"5) "verified"6) "1"> hget user:1000 birthyear"1977"> SADD myset 1 12 3 3 1 2 33 88 1 2 3(integer) 6> SMEMBERS myset1) "1"2) "2"3) "3"4) "12"5) "33"6) "88"

参考

Scaling with Redis Cluster
深入学习Redis之Redis Cluster