集群安装
1、准备软件源,选择一个安装版本(所有节点安装)备注:Ceph-CSI需要N版及以上版本,版本请参照阿里云开源镜像站https://mirrors.aliyun.com/ceph/
cat >/etc/yum.repos.d/ceph.repo<<EOF
[ceph]
name=ceph
baseurl=https://mirrors.aliyun.com/ceph/rpm-octopus/el7/x86_64/
gpgcheck=0
[ceph-noarch]
name=cephnoarch
baseurl=https://mirrors.aliyun.com/ceph/rpm-octopus/el7/noarch/
gpgcheck=0
EOF
# 可选版本
https://mirrors.aliyun.com/ceph/rpm-luminous/el7/ # L版本
https://mirrors.aliyun.com/ceph/rpm-mimic/el7/ # M版本
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/ # N版本
https://mirrors.aliyun.com/ceph/rpm-octopus/el7/ # O版本
2、配置主机名、域名解析和免密登录(略)
hostnamectl set-hostname node01
hostnamectl set-hostname node02
hostnamectl set-hostname node03
cat >>/etc/hosts<<EOF
192.168.20.128 node01
192.168.20.129 node02
192.168.20.130 node03
EOF
3、管理节点安装部署工具(以下步骤仅在安装节点执行即可)
# 安装阿里云提供的2.0.1版本的ceph-deploy,epel源1.5.25版本的ceph-deploy对新版本ceph支持不够友好
rpm -ivh https://mirrors.aliyun.com/ceph/rpm-15.2.10/el7/noarch/ceph-deploy-2.0.1-0.noarch.rpm
# 安装依赖
yum install python-setuptools python2-subprocess32 ceph-common -y
4、新建集群指定Mon节点并生成ceph.conf和keyring(会在当前目录生成ceph.conf和ceph.mon.keyring)
cluster-network用于集群内部通信网络,生产环境应设置为其他私有网段地址
ceph-deploy new node01 node02 node03 --cluster-network 192.168.20.0/24 --public-network 192.168.20.0/24
5、集群节点安装Ceph软件包
ceph-deploy install --no-adjust-repos node01 node02 node03
# 上述命令同在每个节点执行 yum -y install ceph ceph-radosgw
6、创建并初始化Mon节点(Mon进程监听在6789端口)
ceph-deploy mon create-initial
7、推送配置文件和admin秘钥到其他主机
ceph-deploy admin node01 node02 node03
# 仅推送配置
ceph-deploy config push node01 node02 node03
8、在目标主机部署Mgr
ceph-deploy mgr create node01 node02
9、查看集群状态
ceph -s
10、集群warnning处理
# mon is allowing insecure global_id reclaim
ceph config set mon auth_allow_insecure_global_id_reclaim false
# Module 'restful' has failed dependency: No module named 'pecan'
pip3 install pecan werkzeug
systemctl restart ceph-mon.target
systemctl restart ceph-mgr.target
11、列出集群节点上的所有可用磁盘
ceph-deploy disk list node01 node02 node03
12、擦除集群节点上用来用作OSD设备的磁盘
ceph-deploy disk zap node01 /dev/sdb /dev/sdc /dev/sdd
ceph-deploy disk zap node02 /dev/sdb /dev/sdc /dev/sdd
ceph-deploy disk zap node03 /dev/sdb /dev/sdc /dev/sdd
13、在集群节点上创建OSD
生产环境下可以单独指定block-db和block-wal设备以优化性能
ceph-deploy osd create node01 --bluestore --data /dev/sdb
ceph-deploy osd create node02 --bluestore --data /dev/sdb
ceph-deploy osd create node03 --bluestore --data /dev/sdb
ceph-deploy osd create node01 --bluestore --data /dev/sdc
ceph-deploy osd create node02 --bluestore --data /dev/sdc
ceph-deploy osd create node03 --bluestore --data /dev/sdc
ceph-deploy osd create node01 --bluestore --data /dev/sdd
ceph-deploy osd create node02 --bluestore --data /dev/sdd
ceph-deploy osd create node03 --bluestore --data /dev/sdd
14、创建和查看存储池
ceph osd pool create mypool 128 128 replicated
ceph osd pool ls
ceph osd pool stats mypool
15、存储池应用类型操作
osd pool application enable <pool> <app> [--yes-i-really-mean-it]
osd pool application disable <pool> <app> [--yes-i-really-mean-it]
osd pool application set <pool> <app> <key> <value>
osd pool application rm <pool> <app> <key>
osd pool application get [<pool>] [<app>] [<key>]
ceph osd pool application enable mypool rbd
16、查看和设置存储池副本数、最小副本数、pg数量、pgp数量等
osd pool get <poolname> size|min_size|pg_num|pgp_num
ceph osd pool get mypool size
ceph osd pool set mypool min_size 1
ceph osd pool get mypool all
17、rados存储池和文件操作
rados mkpool mypool 128 128
rados lspools
rados put <obj-name> <infile> [--offset offset]
rados get <obj-name> <outfile>
rados rm <obj-name>
rados ls -p mypool
18、查看指定文件在Ceph集群中的映射
ceph osd map mypool <obj-name>
集群维护
1、增加Mon节点
ceph-deploy mon add node04
ceph-deploy mon add node05
2、添加Mgr节点
ceph-deploy mgr create node03
3、查看Mon的quorum状态
ceph mon stat
ceph quorum_status --format json-pretty
4、创建OSD时,将OSD的三类数据都分开存放——Object Data Blobs、SST文件、wal文件
ceph-deploy osd create {node} --data /path/to/data --block-db /path/to/db-device --block-wal /path/to/wal-device
5、删除一个存储池
ceph daemon mon.node01 config set mon_allow_pool_delete true
ceph daemon mon.node02 config set mon_allow_pool_delete true
ceph daemon mon.node03 config set mon_allow_pool_delete true
ceph osd pool rm <poolname> {<poolname>} {<sure>}
rados rmpool <pool-name> [<pool-name> --yes-i-really-really-mean-it]
6、停止和移除OSD
ceph osd out <ids>...
systemctl stop ceph-osd@<id>.service
ceph osd crush reweight <id> <weight:float>
ceph osd purge <id|osd.id> [--force] [--yes-i-really-mean-it]
# L版本之前移除OSD
ceph osd crush reweight <id> 0
systemctl stop ceph-osd@<id>.service
ceph osd out <id>
ceph osd crush remove <id>
ceph osd rm <id>
ceph auth del <id>
7、设备类操作
# 建议创建常见的设备类包括hdd、ssd、nvme、scm和any
ceph osd crush class create <class>
ceph osd crush class ls
ceph osd crush class ls-osd <class>
ceph osd crush class rename <srcname> <dstname>
ceph osd crush class rm <class>
# 设置设备类
ceph osd crush get-device-class ssd osd.<id>
ceph osd crush set-device-class ssd osd.<id>
ceph osd crush rm-device-class osd.<id>
1、CRUSH map层次结构(示例)
ceph osd crush add-bucket datacenter0 datacenter
ceph osd crush add-bucket room0 room
ceph osd crush add-bucket rack0 rack
ceph osd crush add-bucket rack1 rack
ceph osd crush add-bucket rack2 rack
ceph osd crush move room0 datacenter=datacenter0
ceph osd crush move rack0 room=room0
ceph osd crush move rack1 room=room0
ceph osd crush move rack2 room=room0
ceph osd crush link node01 rack=rack0
ceph osd crush link node02 rack=rack1
ceph osd crush link node03 rack=rack2
查看示例树形结构
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.05867 datacenter datacenter0
-10 0.05867 room room0
-11 0.02928 rack rack0
-3 0.02928 host node01
0 hdd 0.00490 osd.0 up 1.00000 1.00000
3 hdd 0.00490 osd.3 up 1.00000 1.00000
6 hdd 0.01949 osd.6 up 1.00000 1.00000
-12 0.01469 rack rack1
-5 0.01469 host node02
1 hdd 0.00490 osd.1 up 1.00000 1.00000
4 hdd 0.00490 osd.4 up 1.00000 1.00000
7 hdd 0.00490 osd.7 up 1.00000 1.00000
-13 0.01469 rack rack2
-7 0.01469 host node03
2 hdd 0.00490 osd.2 up 1.00000 1.00000
5 hdd 0.00490 osd.5 up 1.00000 1.00000
8 hdd 0.00490 osd.8 up 1.00000 1.00000
-1 0.05867 root default
-3 0.02928 host node01
0 hdd 0.00490 osd.0 up 1.00000 1.00000
3 hdd 0.00490 osd.3 up 1.00000 1.00000
6 hdd 0.01949 osd.6 up 1.00000 1.00000
-5 0.01469 host node02
1 hdd 0.00490 osd.1 up 1.00000 1.00000
4 hdd 0.00490 osd.4 up 1.00000 1.00000
7 hdd 0.00490 osd.7 up 1.00000 1.00000
-7 0.01469 host node03
2 hdd 0.00490 osd.2 up 1.00000 1.00000
5 hdd 0.00490 osd.5 up 1.00000 1.00000
8 hdd 0.00490 osd.8 up 1.00000 1.00000
2、CRUSH规则
列出规则
ceph osd crush rule ls
规则转储
ceph osd crush rule dump {name}
增加简单规则
# ceph osd crush rule create-simple {rulename} {root} {bucket-type} {firstn|indep}
ceph osd crush rule create-simple deleteme default host firstn
增加复制规则
# ceph osd crush rule create-replicated <name> <root> <failure-domain> <class>
ceph osd crush rule create-replicated fast default host ssd
添加Erasure Code规则
ceph osd crush rule create-erasure {rulename} {profilename}
删除规则
ceph osd crush rule rm {name}
3、CURSH存储策略(示例)
设置存储类
# ceph osd crush set-device-class <class> <osdId> [<osdId>]
ceph osd crush rm-device-class osd.8 osd.7 osd.6
ceph osd crush set-device-class ssd osd.8 osd.7 osd.6
创建规则
# ceph osd crush rule create-replicated <rule-name> <root> <failure-domain-type> <device-class>:
ceph osd crush rule create-replicated slow default host hdd
ceph osd crush rule create-replicated fast default host ssd
储存池使用规则
# ceph osd pool set <poolname> crush_rule <rule-name>
ceph osd pool set mypool1 crush_rule slow
ceph osd pool set mypool2 crush_rule fast
CURSH map示例截取
...
rule slow {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
rule fast {
id 2
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
...
4、编辑CRUSH map
获取CRUSH map
ceph osd getcrushmap -o {compiled-crushmap-filename}
解译CRUSH map
crushtool -d {compiled-crushmap-filename} -o {decompiled-crushmap-filename}
编译CRUSH map
crushtool -c {decompiled-crush-map-filename} -o {compiled-crush-map-filename}
设置CRUSH map
ceph osd setcrushmap -i {compiled-crushmap-filename}
5、规则参数说明
rule <rulename> {
ruleset <ruleset>
type [replicated|erasure]
min_size <min-size>
max_size <max-size>
step take <bucket-name>
step select [choose|chooseleaf] [firstn|indep] <num> type <bucket-type>
step emit
}
-
type:规则类型,目前仅支持 replicated 和 erasure ,默认是 replicated 。
-
min_size:可以选择此规则的存储池最小副本数。
-
max_size:可以选择此规则的存储池最大副本数。
-
step take
:选取起始的桶名,并迭代到树底。 -
step choose firstn {num} type {bucket-type}:选择一定数量给定类型的桶,该值通常是该池的拷贝份数。如果{num}==0,则选择pool-num-replicas个桶,若{num} > 0 && < pool-num-replicas个桶,则选择num。如果{num} < 0, 意味着选择pool-num-replicas - {|num|}。
-
step chooseleaf firstn {num} type {bucket-type}, 选择给定类型的一个桶集合,并从该集合中的每个桶中选择一个叶子节点,其中的桶个数是池的拷贝份数。如果{num}==0,则选择pool-num-replicas个桶,若{num} > 0 && < pool-num-replicas个桶,则选择num。如果{num} < 0, 意味着选择pool-num-replicas - {|num|}。
-
step emit:输出当前值并清空堆栈。通常用于规则末尾,也适用于相同规则应用到不同树的情况。
-
choose : choose 在选择到预期类型的bucket后就到此结束,进行下一个select操作。
-
chooseleaf : chooseleaf 在选择到预期的bucket后会继续递归选到osd。
-
firstn 和indep : 都是深度优先遍历算法,主要区别在于如果选择的num为4,如果无法选够4个结果的时候 firstn 会返回[1,2,4] 这种结果,而indep会返回[1,2,CRUSH_ITEM_NONE,4], 一半情况下都是使用firstn。