Redis cluster tutorial - 前言
This document is a gentle introduction to Redis Cluster,
that does not use complex to understand distributed systems concepts.
It provides instructions about how to setup a cluster, test, and
operate it, without going into the details that are covered in the
Redis Cluster specification but just describing how the system behaves
from the point of view of the user.
本文是对 Redis 集群的简要介绍, 所以不必理解概念上非常复杂的分布式系统原理.
本文仅仅是介绍如何搭建集群, 测试以及操作它, 不去深入涵盖在<Redis 集群详细说明>
中的内容, 而只是从使用者的角度去描述系统是如何运作的.
However this tutorial tries to provide information about the availability
and consistency characteristics of Redis Cluster from the point of view
of the final user, stated in a simple to understand way.
然而, 这篇指导手册将试图以简单明了的方式, 向集群最终使用用户交代明白
Redis 集群关于可用性和一致性的信息.
Note this tutorial requires Redis version 3.0 or higher.
需要注意的是, 这篇指导手册要求用户安装的 Redis 版本为 3.0+
If you plan to run a serious Redis Cluster deployment, the more formal
specification is a suggested reading, even if not strictly required.
However it is a good idea to start from this document, play with Redis
Cluster some time, and only later read the specification.
如果你希望启动部署一系列 Redis 集群, 建议阅读<Redis 集群详细说明>.
然而, 从本文开始依旧是一个好的选择, 熟悉 Redis 集群一段时间后再阅读详细说明.
主旨
3.x 版本以上, 本文是面向使用者的概要.
Redis Cluster 101 - Redis 集群基础(101 - basic and important things)
Redis Cluster provides a way to run a Redis installation where data is
automatically sharded across multiple Redis nodes.
Redis 集群提供了数据自动多节点分片的方法.
Redis Cluster also provides some degree of availability during partitions,
that is in practical terms the ability to continue the operations when
some nodes fail or are not able to communicate. However the cluster stops
to operate in the event of larger failures (for example when the majority
of masters are unavailable).
Redis 集群分片也一定程度上提供了可用性, 因为在实际应用中, 在一些节点挂掉或
无法正常访问的情况下, 其余节点依旧可以正常提供服务. 但是在严重错误(例如,
大部分集群节点失败)情况下集群将停止工作.
译者注: 此处猜测是因为集群的仲裁机制.
So in practical terms, what you get with Redis Cluster?
所以在现实情况下, 我们从 Redis 集群中得到什么?
The ability to automatically split your dataset among multiple nodes.
自动分布数据到多个节点的能力
The ability to continue operations when a subset of the nodes are experiencing
failures or are unable to communicate with the rest of the cluster.
一部分节点出错时整体继续提供服务的能力.
主旨
Redis 集群可以自动分布数据, 一定意义上容错.
Up here, 2015-11-2 10:11:22
Redis Cluster TCP ports - Redis 集群 TCP 端口
Every Redis Cluster node requires two TCP connections open. The normal Redis
TCP port used to serve clients, for example 6379, plus the port obtained by
adding 10000 to the data port, so 16379 in the example.
每个 Redis 集群节点需要开启 2 个 TCP 连接. 一个是普通的 TCP 端口向客户端提供常规服务,
例如 6379, 该数据端口加上 10000, 得到 16379 就是另一个端口.
This second high port is used for the Cluster bus, that is a node-to-node
communication channel using a binary protocol. The Cluster bus is used by
nodes for failure detection, configuration update, failover authorization
and so forth. Clients should never try to communicate with the cluster bus
port, but always with the normal Redis command port, however make sure you
open both ports in your firewall, otherwise Redis cluster nodes will be not
able to communicate.
第二个高位端口用作集群通信, 是集群节点以二进制协议相互通信的通道. 此端口用作
节点故障诊断, 配置更新, 故障自动处理(移除节点)授权等. 客户端仅需连接普通端口
即可, 不要尝试连接此服务器高位端口, 但是一定要确保两个端口都在防火墙中打开
访问权限, 否则 Redis 集群节点将无法正常通信.
The command port and cluster bus port offset is fixed and is always 10000.
客户端用的端口和服务器集群通信接口之间差值永远是固定的 10000.
Note that for a Redis Cluster to work properly you need, for each node:
为了 Redis 集群正常工作, 在每个 node 中, 要注意以下几点:
The normal client communication port (usually 6379) used to communicate with
clients to be open to all the clients that need to reach the cluster, plus
all the other cluster nodes (that use the client port for keys migrations).
普通的客户端通信接口(一般是 6379), 用作与需要同集群通信的客户端, 连接集群的端口.
同时也是其他集群中节点用作数据迁移的端口.
The cluster bus port (the client port + 10000) must be reachable from all
the other cluster nodes.
集群通信高位端口(普通客户端通信接口+10000), 必须可以被其他所有集群节点访问.
If you don’t open both TCP ports, your cluster will not work as expected.
如果不同时开通这两个 TCP 端口, 集群将无法按预期正常工作.
The cluster bus uses a different, binary protocol, for node to node data
exchange, which is more suited to exchange information between nodes using
little bandwidth and processing time.
服务器集群总线使用不同的二进制协议, 用作服务器集群中节点之间数据交换, 该协议
因为使用更少带宽和处理时间, 所以更适合节点之间信息交换.
主旨
必须开通两个端口, 一个是数据端口, 一个是集群通信端口.
Up Here 2015-11-3 10:31:51
Redis Cluster data sharding Redis 集群数据分片
Redis Cluster does not use consistent hashing, but a different form of
sharding where every key is conceptually part of what we call an hash slot.
Redis 集群不使用一致性哈希, 而是另一种不同形式的分片机制, 我们将每个 key
在解析成为哈希桶的一部分.
There are 16384 hash slots in Redis Cluster, and to compute what is the
hash slot of a given key, we simply take the CRC16 of the key modulo 16384.
整个集群有 16384 哈希桶, 我们使用 key 的 CRC16 值对 16384 取余数, 得出该 key 在
哪一个桶中.
Every node in a Redis Cluster is responsible for a subset of the hash
slots, so for example you may have a cluster with 3 nodes, where:
每个 Redis 集群中的节点均负责存储一部分哈希桶, 例如有三个节点的集群:
Node A contains hash slots from 0 to 5500.
节点 A 存储 0-5500 桶
Node B contains hash slots from 5501 to 11000.
节点 B 存储 5501-11000 桶
Node C contains hash slots from 11001 to 16384.
节点 C 存储 11001-16364 桶
This allows to add and remove nodes in the cluster easily. For example if
I want to add a new node D, I need to move some hash slot from nodes
A, B, C to D. Similarly if I want to remove node A from the cluster I can
just move the hash slots served by A to B and C. When the node A will be
empty I can remove it from the cluster completely.
这个方案使得增加删除节点变得非常方便. 例如如果增加节点 D, 需要从 A,B,C 节点
中移动一部分桶到 D 中即可. 同理, 如果删除节点 A, 只需要将 A 中的桶移动到 B,C 中.
当 A 为空的时候, 就可以将其从集群中完全移除了.
Because moving hash slots from a node to another does not require to stop
operations, adding and removing nodes, or changing the percentage of hash
slots hold by nodes, does not require any downtime.
因为移动桶的操作无需停止服务的运行, 增删节点或者修改节点内容占比, 是无需
停服操作的.
Redis Cluster supports multiple key operations as long as all the keys
involved into a single command execution (or whole transaction, or Lua
script execution) all belong to the same hash slot. The user can force
multiple keys to be part of the same hash slot by using a concept called
hash tags.
Redis 集群支持多 key 操作, 当且仅当所有 key 操作涉及的执行命令(或者同一个
数据传输, 或者同一个 Lua 执行脚本), 均同属相同桶内. 用户可以使用叫”哈希标签”
的概念, 强制将多个 key 存储到相同的桶中.
译者注: 此处翻译很绕, 不是准确.
Hash tags are documented in the Redis Cluster specification, but the gist
is that if there is a substring between {} brackets in a key, only what is
inside the string is hashed, so for example this{foo}key and another{foo}key
are guaranteed to be in the same hash slot, and can be used together
in a command with multiple keys as arguments.
哈希标签文档参考 Redis 集群详细说明. 但其主要概念是: 如果 key 中有{}号括起来的
部分, 仅对{}号内部的部分进行运算. 例如 this{foo}key 和 another{foo}key 两个
key, 都将使用 foo 进行哈希运算, 从而被存储到相同桶中.
主旨
分片使用桶概念, 集群每个节点负责不同的桶. 增删节点只需要移动桶在节点中的
位置即可. 使用{}将 key 存储到相同桶中, 方便批量执行逻辑.
Redis Cluster master-slave model - Redis 集群的主从模型
In order to remain available when a subset of master nodes are failing or
are not able to communicate with the majority of nodes, Redis Cluster
uses a master-slave model where every hash slot has from 1 (the master
itself) to N replicas (N-1 additional slaves nodes).
为了在集群中部分节点宕机或者无法通信的时候仍继续可用的服务, Redis 集群使用
主从模型, 该模型中每个桶拥有 1(主机)~N(N-1 个从节点)个备份.
In our example cluster with nodes A, B, C, if node B fails the cluster
is not able to continue, since we no longer have a way to serve hash
slots in the range 5501-11000.
在我们的例子中, 集群节点 A,B 和 C, 如果节点 B 宕机, 那么集群将无法正常工作,
因为我们失去了为 5501-11000 号桶提供服务的机器.
However when the cluster is created (or at a latter time) we add a
slave node to every master, so that the final cluster is composed of
A, B, C that are masters nodes, and A1, B1, C1 that are slaves nodes,
the system is able to continue if node B fails.
但是如果在集群创建时(或者后期), 我们为每个主服务器创建从节点, 那么最终的
服务器集群由 A,B,C 三个主服务器, 和 A1,B1,C1 分别作为 A,B,C 的从节点, 整个系统
将在 B 宕机时继续正常运转.
Node B1 replicates B, and B fails, the cluster will promote node B1 as
the new master and will continue to operate correctly.
节点 B1 是 B 的复制, 如果 B 宕机, 集群将提升 B1 节点作为新的主服务器, 促使整个
集群正常运转.
However note that if nodes B and B1 fail at the same time Redis Cluster
is not able to continue to operate.
但是如果 B 和 B1 同时宕机, 那么集群将无法正常运转.
Redis Cluster consistency guarantees - Redis 集群一致性保证
Redis Cluster is not able to guarantee strong consistency. In practical
terms this means that under certain conditions it is possible that
Redis Cluster will lose writes that were acknowledged by the system
to the client.
Redis 集群无法保证数据的枪一致性. 这就意味着在一定情况下, 有可能丢失已经
写入到 redis 的数据.
The first reason why Redis Cluster can lose writes is because it uses
asynchronous replication. This means that during writes the following happens:
首先, 丢失数据的起因是 redis 使用异步的主从复制. 这意味着在写入数据的时候:
Your client writes to the master B.
客户端写入数据到主库 B.
The master B replies OK to your client.
主库 B 写入成功, 返回 OK.
The master B propagates the write to its slaves B1, B2 and B3.
主库 B 将写入的数据同步到从库 B1,B2,B3.
As you can see B does not wait for an acknowledge from B1, B2, B3 before
replying to the client, since this would be a prohibitive latency
penalty for Redis, so if your client writes something, B acknowledges
the write, but crashes before being able to send the write to its slaves,
one of the slaves (that did not receive the write) can be promoted to master,
losing the write forever.
可以看到, B 在没有等待 B1,B2,B3 成功写入前就对客户端进行了返回, 因为这将造成
Redis 严重的性能问题. 所以在客户端写入输出, 得到 B 返回成功后, 且没有完全将
数据同步到 B 的所有从库之前, B 宕机, 其中一个没有完全同步数据的从库有机会被
提升为主库, 此时, 我们将永远的丢掉写入的数据.
This is very similar to what happens with most databases that are
configured to flush data to disk every second, so it is a scenario
you are already able to reason about because of past experiences with
traditional database systems not involving distributed systems.
这与大部分每秒刷新数据到磁盘的数据库情况类似, 这也是为什么你已经开始疑问,
以前在传统数据库上的积累无法适用于分布式系统的原因.
Similarly you can improve consistency by forcing the database to flush
data on disk before replying to the client, but this usually results into
prohibitively low performance. That would be the equivalent of
synchronous replication in the case of Redis Cluster.
当然, 你可以选择在对客户端返回之前, 通过强制刷新数据到磁盘增强一致性, 但是
这终将导致性能的低下. 同时也将 Redis 集群变成了一个同步数据库.
Basically there is a trade-off to take between performance and consistency.
基本上我们是可以在性能和一致性上做权衡的.
Redis Cluster has support for synchronous writes when absolutely needed,
implemented via the WAIT command, this makes losing writes a lot less
likely, however note that Redis Cluster does not implement strong
consistency even when synchronous replication is used: it is always
possible under more complex failure scenarios that a slave that was not
able to receive the write is elected as master.
Redis 集群在需要情况下, 可以通过 WAIT 命令, 实现对同步写入的支持, 这看似使得
丢失写入数据情况大大减少, 但即使使用了同步的复制, 因为 Redis 集群仍是基于非
强一致性的: 在出现复杂的宕机情况下, 仍有可能将原来无法同步数据的从库提升
为主服务器.
There is another notable scenario where Redis Cluster will lose writes,
that happens during a network partition where a client is isolated with
a minority of instances including at least a master.
还有另一个值得注意的丢失数据的情况. 在数据分割之后, 网络上一少部分机器
(含有至少一个主机)与其他机器孤立, 客户端恰好与该被孤立部分进行通信.
Take as an example our 6 nodes cluster composed of A, B, C, A1, B1, C1,
with 3 masters and 3 slaves. There is also a client, that we will call Z1.
例如我们有 6 台机器组成一个集群, A, B, C 为主机, A1, B1, C1 为从库. 客户端
我们称之为 Z1.
After a partition occurs, it is possible that in one side of the
partition we have A, C, A1, B1, C1, and in the other side we have B and Z1.
在数据网络分割之后, 会出现如下情况: A, C, A1, B1, C1 在一端, 而 B 与 Z1 在另一端.
Z1 is still able to write to B, that will accept its writes. If the
partition heals in a very short time, the cluster will continue normally.
However if the partition lasts enough time for B1 to be promoted to
master in the majority side of the partition, the writes that Z1 is
sending to B will be lost.
Z1 仍可以成功写数据到 B. 如果该网络分割在很短时间内恢复, 集群将继续正常运行.
但是如果该分割持续足够长时间, 以至于 B1 在大部分机器端被集群提升为主机, 此时
Z1 写入到 B 中的数据将丢失.
Note that there is a maximum window to the amount of writes Z1 will
be able to send to B: if enough time has elapsed for the majority
side of the partition to elect a slave as master, every master node
in the minority side stops accepting writes.
值得说的是, Z1 允许发送到 B 的写入数据量有一个最大的值: 如果过了足够长时间
之后, 大部分服务器所在的网络端提升了一个从库为新的主库, 在少部分服务器
所在的网络端的主库将停止写入操作.
译者注: 意思就是如果集群网络分为了两部分, 会有一个诊断时间, 该时间内小部分机器
端主库可以接收写入操作, 同时大部分机器端不会选举从库成为新的主库, 但是超过
改时间后, 大部分机器端将开始选举, 而少部分机器端也停止接收数据.
This amount of time is a very important configuration directive of
Redis Cluster, and is called the node timeout.
该时间的长短是非常重要的配置参数, 被称之为节点过期时间.
After node timeout has elapsed, a master node is considered to be
failing, and can be replaced by one of its replicas. Similarly
after node timeout has elapsed without a master node to be able to
sense the majority of the other master nodes, it enters an error
state and stops accepting writes.
在节点过期时间过后, 主库节点被认为宕机, 即刻被任意一个从库替代.
类似的, 该时间过后, 原来主库仍然无法与其余大部主库节点进行通信,
那么该主库将停止写入, 并且返回错误.
Redis Cluster configuration parameters - Redis 集群配置参数
We are about to create an example cluster deployment. Before to continue let’s
introduce the configuration parameters that Redis Cluster introduces in the
redis.conf file. Some will be obvious, others will be more clear as you continue
reading.
我们接下来开始创建一个 Redis 集群服务器的测试版, 但是在这之前先介绍一下 redis.conf
中和集群相关的配置参数. 有些显而易见, 有些则需要在接下来的阅读中领悟.
cluster-enabled <yes/no>: If yes enables Redis Cluster support in a specific
Redis instance. Otherwise the instance starts as a stand alone instance as usually.
cluster-enabled <yes/no>: yes 开启集群, no 关闭集群, 以单机运行.
cluster-config-file
is not an user editable configuration file, but the file where a Redis Cluster
node automatically persists the cluster configuration (the state, basically)
every time there is a change, in order to be able to re-read it at startup.
The file lists things like the other nodes in the cluster, their state, persistent
variables, and so forth. Often this file is rewritten and flushed on disk as a
result of some message reception.
cluster-config-file
Redis 集群节点在集群配置文件有更改(一般是状态)时将配置持久化, 以便在重启的时候读取.
该文件列出了例如集群中其他节点信息, 状态, 持续变量等. 一般在收到一些消息之后,
该文件会重写和刷新.
cluster-node-timeout
node can be unavailable, without it being considered as failing. If a master
node is not reachable for more than the specified amount of time, it will be
failed over by its slaves. This parameter controls other important things in
Redis Cluster. Notably, every node that can’t reach the majority of master nodes
for the specified amount of time, will stop accepting queries.
cluster-slave-validity-factor
to failover a master, regardless of the amount of time the link between the
master and the slave remained disconnected. If the value is positive, a maximum
disconnection time is calculated as the node timeout value multiplied by the factor
provided with this option, and if the node is a slave, it will not try to start a
failover if the master link was disconnected for more than the specified amount of
time. For example if the node timeout is set to 5 seconds, and the validity factor
is set to 10, a slave disconnected from the master for more than 50 seconds will
not try to failover its master. Note that any value different than zero may result
in Redis Cluster to be unavailable after a master failure if there is no slave able
to failover it. In that case the cluster will return back available only when the
original master rejoins the cluster.
cluster-migration-barrier
connected with, for another slave to migrate to a master which is no longer covered
by any slave. See the appropriate section about replica migration in this tutorial
for more information.
cluster-require-full-coverage <yes/no>: If this is set to yes, as it is by default,
the cluster stops accepting writes if some percentage of the key space is not
covered by any node. If the option is set to no, the cluster will still serve
queries even if only requests about a subset of keys can be processed.