一、基于Ceph的存储解决方案上
1、Kubernetes使用Rook部署Ceph存储集群
Rook https://rook.io 是一个自管理分布式存储编排系统,可为K8S提供便利的存储解决方案
Rook本身不提供存储,而是在kubernetes和存储系统之间提供适配层,简化存储系统的部署与维护工作。
为什么要使用Rook?
- Ceph官方推荐使用Rook进行部署管理
- 通过原生的Kubernetes机制和数据存储交互,
2、Ceph介绍
Ceph是一款为优秀的性能、可靠性和可扩展性而设计的统一的、分布式文件系统
Ceph支持三种存储
- 块存储(RDB):可直接作为磁盘挂载
- 文件系统(CephF):兼容的网络文件系统CephFS,专注高性能、大容量存储
- 对象存储(RADOSGW):提供RESTful接口,也提供多种编程语言绑定。兼容SE、Swift(OpenStack的对象存储)
2.1 核心组件
Ceph主要有三个核心组件
- OSD:用于集群所有数据与对象的存储,处理集群数据的负载、恢复、回填、再均衡、并向其他osd守护进程发送心跳,然后向Monitor提供一些监控信息
- Monitor:监控整个集群的状态,管理集群客户端认证与授权,保证集群数据的一致性
- MDS:负载保存we年系统的元数据,管理目录结构。对象存储和块设备不需要元数据服务
3、安装Ceph集群
Rook支持K8S v1.19+版本,CPU架构amd64、x86或arm64
通过Rook安装ceph集群必须满足以下先决条件
- 至少3个节点、并且全部可以调度Pod,满足Ceph副本高可用要求
- 已部署好K8S集群
- OSD节点没各节点至少有一块裸设备(Raw devices,未分区未系统格式化)
3.1Ceph在K8S部署
为K8S集群的每个工作节点添加一块额外的未格式化磁盘(裸设备),具体操作见下图
将新增的磁盘设置为独立模式(模拟公有云厂商提供的独立云磁盘),启动k8s集群,在node节点使用以下命令检查磁盘是否满足Ceph部署要求
# lsblk -f
[root@node-1-231 ~]# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 xfs ea9cfeb1-5d17-4cfd-8023-1d2db4e5ec4d /boot
└─sda2 LVM2_member eWjchz-pMad-Fhb2-kyJr-BMwp-aTpd-Mkkzkk
├─centos-root xfs fec91ee5-2ef1-49f3-8c13-65fc7b58518a /
└─centos-swap swap eac66189-d0c0-41dd-9262-9e46e0572202
sdb
sr0 iso9660 CentOS 7 x86_64 2020-11-03-14-55-29-00
[root@node-1-232 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 50G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 49G 0 part
├─centos-root 253:0 0 47G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm
sdb 8:16 0 50G 0 disk
sr0 11:0 1 973M 0 rom
您在 /var/spool/mail/root 中有新邮件
[root@node-1-232 ~]# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 xfs 5e7ffc85-45bc-4f5a-bb18-defc8064f0a3 /boot
└─sda2 LVM2_member SozvYr-Il0h-TGMA-SZIV-riiA-ZLFo-cH0zcP
├─centos-root xfs 974b0d0e-2e7e-4483-bcad-a55ed5f71fc6 /
└─centos-swap swap 6cb8ab0e-63c6-4186-9bc9-07bcd3cae8bb
sdb
sr0 iso9660 CentOS 7 x86_64 2020-11-03-14-55-29-00
[root@node-1-233 ~]# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 xfs fcabb174-6e36-4990-b6fe-425a2d67c6ad /boot
└─sda2 LVM2_member tPeLVy-DViM-Ww2D-5pac-iVM7-KBLB-jC7ZfP
├─centos-root xfs 90a807fd-3c20-47f6-b438-5e340c150195 /
└─centos-swap swap e148f1f3-0cfb-41c1-9d12-2a40824c99ca
sdb
sr0 iso9660 CentOS 7 x86_64 2020-11-03-14-55-29-00
lsblk -f输出中的sdb磁盘就是我们工作节点新添加的裸设备(FSTYPE 为空)。可以分配给Ceph使用
修改Rook CSI驱动注册路径
注意:rook csi渠道挂载的路径是挂载到kubelet配置的--root-dir参数指定的目录下,根据实际的--root-dir参数修改rook csi的kubelet路径地址,如果与实际kubelet的--root-dir路径不一致,会导致挂载存储时报错
默认安装路径:/var/lib/kubelet/ ,基本不用修改,如果非默认需要更改
vim rook/deploy/examples
ROOK_CSI_KUBELET_DIR_PATH
在k8s集群中企业Rook准入控制器。该准入控制器在身份认证和授权之后并在持久化对象之前,拦截发往K8S API Server的请求以进行认证。在安装Rook之前,使用以下命令在K8S集群中安装Rook准入控制器
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml
wget https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml
root@master-1-230 2.3]# kubectl apply -f cert-manager.yaml
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
configmap/cert-manager created
configmap/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
检查需要安装Ceph的node节点是否安装LVM2
#yum install lvm2 -y
# yum list installed|grep lvm2
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
lvm2.x86_64 7:2.02.187-6.el7_9.5 @updates
lvm2-libs.x86_64 7:2.02.187-6.el7_9.5 @updates
Ceph需要一个带有RBD模块的Linux内核。在K8S集群的节点运行 lsmod |grep rbd 检查,如果该命令返回为空,当前系统没有加载RBD模块。
#将RBD模块加载命令放到开机加载项
cat > /etc/sysconfig/modules/rbd.modules << EOF
#!/bin/bash
modprobe rbd
EOF
#为脚本添加可执行权限
chmod +x /etc/sysconfig/modules/rbd.modules
#查看RBD模块是否加载成功
# lsmod |grep rbd
rbd 83733 0
libceph 306750 1 rbd
二、基于Ceph的存储解决方案中
3.2 使用Rook在K8S集群部署Ceph存储集群
使用Rook官方提供的示例部署组件清单(mainifests)部署一个Ceph集群
使用git从github(https://github.com/rook/rook.git) clone太慢,改为gitee导入
使用git将部署清单示例下载到本地(使用gitee仓库)
[root@master-1-230 2.3]# git clone --single-branch --branch v1.12.8 https://gitee.com/ikubernetesi/rook.git
正克隆到 'rook'...
remote: Enumerating objects: 92664, done.
remote: Counting objects: 100% (92664/92664), done.
remote: Compressing objects: 100% (26041/26041), done.
remote: Total 92664 (delta 65003), reused 92443 (delta 64848), pack-reused 0
接收对象中: 100% (92664/92664), 49.01 MiB | 7.46 MiB/s, done.
处理 delta 中: 100% (65003/65003), done.
Note: checking out 'aa3eab85caba76b3a1b854aadb0b7e3faa8a43cb'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b new_branch_name
#进入到本地部署组件清单示例目录
cd rook/deploy/examples
#执行以下命令将Rook和Ceph相关CRD资源和通用资源创建到K8S集群
# kubectl apply -f crds.yaml -f common.yaml
[root@master-1-230 examples]# kubectl apply -f crds.yaml -f common.yaml
customresourcedefinition.apiextensions.k8s.io/cephblockpoolradosnamespaces.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephbucketnotifications.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephbuckettopics.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephclients.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephcosidrivers.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystemmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystemsubvolumegroups.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectrealms.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzonegroups.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzones.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephrbdmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/objectbucketclaims.objectbucket.io created
customresourcedefinition.apiextensions.k8s.io/objectbuckets.objectbucket.io created
namespace/rook-ceph created
clusterrole.rbac.authorization.k8s.io/cephfs-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/cephfs-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/objectstorage-provisioner-role created
clusterrole.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/rbd-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrole.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrole.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrole.rbac.authorization.k8s.io/rook-ceph-system created
clusterrolebinding.rbac.authorization.k8s.io/cephfs-csi-nodeplugin-role created
clusterrolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/objectstorage-provisioner-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-system created
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
role.rbac.authorization.k8s.io/rbd-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
role.rbac.authorization.k8s.io/rook-ceph-purge-osd created
role.rbac.authorization.k8s.io/rook-ceph-rgw created
role.rbac.authorization.k8s.io/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
serviceaccount/objectstorage-provisioner created
serviceaccount/rook-ceph-cmd-reporter created
serviceaccount/rook-ceph-mgr created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-purge-osd created
serviceaccount/rook-ceph-rgw created
serviceaccount/rook-ceph-system created
serviceaccount/rook-csi-cephfs-plugin-sa created
serviceaccount/rook-csi-cephfs-provisioner-sa created
serviceaccount/rook-csi-rbd-plugin-sa created
serviceaccount/rook-csi-rbd-provisioner-sa created
部署Rook Operator组件,该组件为Rook与Kubernetes 交互的组件,整个集群只需要一个副本。Rook Operator 的配置在Ceph集群安装后不能修改,否则Rook会删除Ceph集群并重建。修改operator.yaml的配置
vim operator.yaml
#需配置镜像加速
# these images to the desired release of the CSI driver.
110 # ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.9.0"
111 # ROOK_CSI_REGISTRAR_IMAGE: "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0"
112 # ROOK_CSI_RESIZER_IMAGE: "registry.k8s.io/sig-storage/csi-resizer:v1.8.0"
113 # ROOK_CSI_PROVISIONER_IMAGE: "registry.k8s.io/sig-storage/csi-provisioner:v3.5.0"
114 # ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2"
115 # ROOK_CSI_ATTACHER_IMAGE: "registry.k8s.io/sig-storage/csi-attacher:v4.3.0"
#生产环境一般将裸设备自动发现开关设置true
498 ROOK_ENABLE_DISCOVERY_DAEMON: "true"
#打开CephCSI提供者的节点(node)亲和性(去掉前面的注释即可,会同时作用于CephFS和RBD提供者,如果分开这两者的调度,打开后面专用的节点亲和性)
184 CSI_PROVISIONER_NODE_AFFINITY: "role=storage-node; storage=rook-ceph"
#如果CephFS和RBD提供者的调度亲和性要分开,则在上面的基础上继续打开他们专用的开关
209 #CSI_RBD_PROVISIONER_NODE_AFFINITY: "role=rbd-node"
226 #CSI_CEPHFS_PROVISIONER_NODE_AFFINITY: "role=cephfs-node"
#打开CephCSI插件的节点(node)亲和性(去掉前面的注释即可,会同时作用于CephFS和RBD提供者,如果分开这两者的调度,打开后面专用的节点亲和性)
196 CSI_PLUGIN_NODE_AFFINITY: "role=storage-node; storage=rook-ceph"
#如果CephFS和RBD提供者的调度亲和性要分开,则在上面的基础上继续打开他们专用的开关
217 # CSI_RBD_PLUGIN_NODE_AFFINITY: "role=rbd-node"
234 # CSI_CEPHFS_PLUGIN_NODE_AFFINITY: "role=cephfs-node"
#生产环境一般打开裸设备自动发现守护进程
ROOK_ENABLE_DISCOVERY_DAEMON: "true"
#同时打开发现代理的节点亲和性环境变量
修改完成后,根据节点标签亲和性设置,
[root@master-1-230 ~]# kubectl label nodes node-1-231 node-1-232 node-1-233 role=storage-node
node/node-1-231 labeled
node/node-1-232 labeled
node/node-1-233 labeled
[root@master-1-230 ~]# kubectl label nodes node-1-231 node-1-232 node-1-233 storage=rook-ceph
node/node-1-231 labeled
node/node-1-232 labeled
node/node-1-233 labeled
修改完成后,在master节点部署Rook Ceph Operator
cat operator.yaml
[root@master-1-230 examples]# cat operator.yaml
#################################################################################################################
# The deployment for the rook operator
# Contains the common settings for most Kubernetes deployments.
# For example, to create the rook-ceph cluster:
# kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# kubectl create -f cluster.yaml
#
# Also see other operator sample files for variations of operator.yaml:
# - operator-openshift.yaml: Common settings for running in OpenShift
###############################################################################################################
# Rook Ceph Operator Config ConfigMap
# Use this ConfigMap to override Rook-Ceph Operator configurations.
# NOTE! Precedence will be given to this config if the same Env Var config also exists in the
# Operator Deployment.
# To move a configuration(s) from the Operator Deployment to this ConfigMap, add the config
# here. It is recommended to then remove it from the Deployment to eliminate any future confusion.
kind: ConfigMap
apiVersion: v1
metadata:
name: rook-ceph-operator-config
# should be in the namespace of the operator
namespace: rook-ceph # namespace:operator
data:
# The logging level for the operator: ERROR | WARNING | INFO | DEBUG
ROOK_LOG_LEVEL: "INFO"
# Allow using loop devices for osds in test clusters.
ROOK_CEPH_ALLOW_LOOP_DEVICES: "false"
# Enable the CSI driver.
# To run the non-default version of the CSI driver, see the override-able image properties in operator.yaml
ROOK_CSI_ENABLE_CEPHFS: "true"
# Enable the default version of the CSI RBD driver. To start another version of the CSI driver, see image properties below.
ROOK_CSI_ENABLE_RBD: "true"
# Enable the CSI NFS driver. To start another version of the CSI driver, see image properties below.
ROOK_CSI_ENABLE_NFS: "false"
ROOK_CSI_ENABLE_GRPC_METRICS: "false"
# Set to true to enable Ceph CSI pvc encryption support.
CSI_ENABLE_ENCRYPTION: "false"
# Set to true to enable host networking for CSI CephFS and RBD nodeplugins. This may be necessary
# in some network configurations where the SDN does not provide access to an external cluster or
# there is significant drop in read/write performance.
# CSI_ENABLE_HOST_NETWORK: "true"
# Set to true to enable adding volume metadata on the CephFS subvolume and RBD images.
# Not all users might be interested in getting volume/snapshot details as metadata on CephFS subvolume and RBD images.
# Hence enable metadata is false by default.
# CSI_ENABLE_METADATA: "true"
# cluster name identifier to set as metadata on the CephFS subvolume and RBD images. This will be useful in cases
# like for example, when two container orchestrator clusters (Kubernetes/OCP) are using a single ceph cluster.
# CSI_CLUSTER_NAME: "my-prod-cluster"
# Set logging level for cephCSI containers maintained by the cephCSI.
# Supported values from 0 to 5. 0 for general useful logs, 5 for trace level verbosity.
# CSI_LOG_LEVEL: "0"
# Set logging level for Kubernetes-csi sidecar containers.
# Supported values from 0 to 5. 0 for general useful logs (the default), 5 for trace level verbosity.
# CSI_SIDECAR_LOG_LEVEL: "0"
# Set replicas for csi provisioner deployment.
CSI_PROVISIONER_REPLICAS: "2"
# OMAP generator will generate the omap mapping between the PV name and the RBD image.
# CSI_ENABLE_OMAP_GENERATOR need to be enabled when we are using rbd mirroring feature.
# By default OMAP generator sidecar is deployed with CSI provisioner pod, to disable
# it set it to false.
# CSI_ENABLE_OMAP_GENERATOR: "false"
# set to false to disable deployment of snapshotter container in CephFS provisioner pod.
CSI_ENABLE_CEPHFS_SNAPSHOTTER: "true"
# set to false to disable deployment of snapshotter container in NFS provisioner pod.
CSI_ENABLE_NFS_SNAPSHOTTER: "true"
# set to false to disable deployment of snapshotter container in RBD provisioner pod.
CSI_ENABLE_RBD_SNAPSHOTTER: "true"
# Enable cephfs kernel driver instead of ceph-fuse.
# If you disable the kernel client, your application may be disrupted during upgrade.
# See the upgrade guide: https://rook.io/docs/rook/latest/ceph-upgrade.html
# NOTE! cephfs quota is not supported in kernel version < 4.17
CSI_FORCE_CEPHFS_KERNEL_CLIENT: "true"
# (Optional) policy for modifying a volume's ownership or permissions when the RBD PVC is being mounted.
# supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
CSI_RBD_FSGROUPPOLICY: "File"
# (Optional) policy for modifying a volume's ownership or permissions when the CephFS PVC is being mounted.
# supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
CSI_CEPHFS_FSGROUPPOLICY: "File"
# (Optional) policy for modifying a volume's ownership or permissions when the NFS PVC is being mounted.
# supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
CSI_NFS_FSGROUPPOLICY: "File"
# (Optional) Allow starting unsupported ceph-csi image
ROOK_CSI_ALLOW_UNSUPPORTED_VERSION: "false"
# (Optional) control the host mount of /etc/selinux for csi plugin pods.
CSI_PLUGIN_ENABLE_SELINUX_HOST_MOUNT: "false"
# The default version of CSI supported by Rook will be started. To change the version
# of the CSI driver to something other than what is officially supported, change
# these images to the desired release of the CSI driver.
# ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.9.0"
# ROOK_CSI_REGISTRAR_IMAGE: "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0"
# ROOK_CSI_RESIZER_IMAGE: "registry.k8s.io/sig-storage/csi-resizer:v1.8.0"
# ROOK_CSI_PROVISIONER_IMAGE: "registry.k8s.io/sig-storage/csi-provisioner:v3.5.0"
# ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2"
# ROOK_CSI_ATTACHER_IMAGE: "registry.k8s.io/sig-storage/csi-attacher:v4.3.0"
# To indicate the image pull policy to be applied to all the containers in the csi driver pods.
# ROOK_CSI_IMAGE_PULL_POLICY: "IfNotPresent"
# (Optional) set user created priorityclassName for csi plugin pods.
CSI_PLUGIN_PRIORITY_CLASSNAME: "system-node-critical"
# (Optional) set user created priorityclassName for csi provisioner pods.
CSI_PROVISIONER_PRIORITY_CLASSNAME: "system-cluster-critical"
# CSI CephFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
# CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY: "OnDelete"
# A maxUnavailable parameter of CSI cephFS plugin daemonset update strategy.
# Default value is 1.
# CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY_MAX_UNAVAILABLE: "1"
# CSI RBD plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
# CSI_RBD_PLUGIN_UPDATE_STRATEGY: "OnDelete"
# A maxUnavailable parameter of CSI RBD plugin daemonset update strategy.
# Default value is 1.
# CSI_RBD_PLUGIN_UPDATE_STRATEGY_MAX_UNAVAILABLE: "1"
# CSI NFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
# CSI_NFS_PLUGIN_UPDATE_STRATEGY: "OnDelete"
# kubelet directory path, if kubelet configured to use other than /var/lib/kubelet path.
# ROOK_CSI_KUBELET_DIR_PATH: "/var/lib/kubelet"
# Labels to add to the CSI CephFS Deployments and DaemonSets Pods.
# ROOK_CSI_CEPHFS_POD_LABELS: "key1=value1,key2=value2"
# Labels to add to the CSI RBD Deployments and DaemonSets Pods.
# ROOK_CSI_RBD_POD_LABELS: "key1=value1,key2=value2"
# Labels to add to the CSI NFS Deployments and DaemonSets Pods.
# ROOK_CSI_NFS_POD_LABELS: "key1=value1,key2=value2"
# (Optional) CephCSI CephFS plugin Volumes
# CSI_CEPHFS_PLUGIN_VOLUME: |
# - name: lib-modules
# hostPath:
# path: /run/current-system/kernel-modules/lib/modules/
# - name: host-nix
# hostPath:
# path: /nix
# (Optional) CephCSI CephFS plugin Volume mounts
# CSI_CEPHFS_PLUGIN_VOLUME_MOUNT: |
# - name: host-nix
# mountPath: /nix
# readOnly: true
# (Optional) CephCSI RBD plugin Volumes
# CSI_RBD_PLUGIN_VOLUME: |
# - name: lib-modules
# hostPath:
# path: /run/current-system/kernel-modules/lib/modules/
# - name: host-nix
# hostPath:
# path: /nix
# (Optional) CephCSI RBD plugin Volume mounts
# CSI_RBD_PLUGIN_VOLUME_MOUNT: |
# - name: host-nix
# mountPath: /nix
# readOnly: true
# (Optional) CephCSI provisioner NodeAffinity (applied to both CephFS and RBD provisioner).
CSI_PROVISIONER_NODE_AFFINITY: "role=storage-node; storage=rook-ceph"
# (Optional) CephCSI provisioner tolerations list(applied to both CephFS and RBD provisioner).
# Put here list of taints you want to tolerate in YAML format.
# CSI provisioner would be best to start on the same nodes as other ceph daemons.
# CSI_PROVISIONER_TOLERATIONS: |
# - effect: NoSchedule
# key: node-role.kubernetes.io/control-plane
# operator: Exists
# - effect: NoExecute
# key: node-role.kubernetes.io/etcd
# operator: Exists
# (Optional) CephCSI plugin NodeAffinity (applied to both CephFS and RBD plugin).
CSI_PLUGIN_NODE_AFFINITY: "role=storage-node; storage=rook-ceph"
# (Optional) CephCSI plugin tolerations list(applied to both CephFS and RBD plugin).
# Put here list of taints you want to tolerate in YAML format.
# CSI plugins need to be started on all the nodes where the clients need to mount the storage.
# CSI_PLUGIN_TOLERATIONS: |
# - effect: NoSchedule
# key: node-role.kubernetes.io/control-plane
# operator: Exists
# - effect: NoExecute
# key: node-role.kubernetes.io/etcd
# operator: Exists
# (Optional) CephCSI RBD provisioner NodeAffinity (if specified, overrides CSI_PROVISIONER_NODE_AFFINITY).
# CSI_RBD_PROVISIONER_NODE_AFFINITY: "role=rbd-node"
# (Optional) CephCSI RBD provisioner tolerations list(if specified, overrides CSI_PROVISIONER_TOLERATIONS).
# Put here list of taints you want to tolerate in YAML format.
# CSI provisioner would be best to start on the same nodes as other ceph daemons.
# CSI_RBD_PROVISIONER_TOLERATIONS: |
# - key: node.rook.io/rbd
# operator: Exists
# (Optional) CephCSI RBD plugin NodeAffinity (if specified, overrides CSI_PLUGIN_NODE_AFFINITY).
# CSI_RBD_PLUGIN_NODE_AFFINITY: "role=rbd-node"
# (Optional) CephCSI RBD plugin tolerations list(if specified, overrides CSI_PLUGIN_TOLERATIONS).
# Put here list of taints you want to tolerate in YAML format.
# CSI plugins need to be started on all the nodes where the clients need to mount the storage.
# CSI_RBD_PLUGIN_TOLERATIONS: |
# - key: node.rook.io/rbd
# operator: Exists
# (Optional) CephCSI CephFS provisioner NodeAffinity (if specified, overrides CSI_PROVISIONER_NODE_AFFINITY).
# CSI_CEPHFS_PROVISIONER_NODE_AFFINITY: "role=cephfs-node"
# (Optional) CephCSI CephFS provisioner tolerations list(if specified, overrides CSI_PROVISIONER_TOLERATIONS).
# Put here list of taints you want to tolerate in YAML format.
# CSI provisioner would be best to start on the same nodes as other ceph daemons.
# CSI_CEPHFS_PROVISIONER_TOLERATIONS: |
# - key: node.rook.io/cephfs
# operator: Exists
# (Optional) CephCSI CephFS plugin NodeAffinity (if specified, overrides CSI_PLUGIN_NODE_AFFINITY).
# CSI_CEPHFS_PLUGIN_NODE_AFFINITY: "role=cephfs-node"
# NOTE: Support for defining NodeAffinity for operators other than "In" and "Exists" requires the user to input a
# valid v1.NodeAffinity JSON or YAML string. For example, the following is valid YAML v1.NodeAffinity:
# CSI_CEPHFS_PLUGIN_NODE_AFFINITY: |
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: myKey
# operator: DoesNotExist
# (Optional) CephCSI CephFS plugin tolerations list(if specified, overrides CSI_PLUGIN_TOLERATIONS).
# Put here list of taints you want to tolerate in YAML format.
# CSI plugins need to be started on all the nodes where the clients need to mount the storage.
# CSI_CEPHFS_PLUGIN_TOLERATIONS: |
# - key: node.rook.io/cephfs
# operator: Exists
# (Optional) CephCSI NFS provisioner NodeAffinity (overrides CSI_PROVISIONER_NODE_AFFINITY).
# CSI_NFS_PROVISIONER_NODE_AFFINITY: "role=nfs-node"
# (Optional) CephCSI NFS provisioner tolerations list (overrides CSI_PROVISIONER_TOLERATIONS).
# Put here list of taints you want to tolerate in YAML format.
# CSI provisioner would be best to start on the same nodes as other ceph daemons.
# CSI_NFS_PROVISIONER_TOLERATIONS: |
# - key: node.rook.io/nfs
# operator: Exists
# (Optional) CephCSI NFS plugin NodeAffinity (overrides CSI_PLUGIN_NODE_AFFINITY).
# CSI_NFS_PLUGIN_NODE_AFFINITY: "role=nfs-node"
# (Optional) CephCSI NFS plugin tolerations list (overrides CSI_PLUGIN_TOLERATIONS).
# Put here list of taints you want to tolerate in YAML format.
# CSI plugins need to be started on all the nodes where the clients need to mount the storage.
# CSI_NFS_PLUGIN_TOLERATIONS: |
# - key: node.rook.io/nfs
# operator: Exists
# (Optional) CEPH CSI RBD provisioner resource requirement list, Put here list of resource
# requests and limits you want to apply for provisioner pod
#CSI_RBD_PROVISIONER_RESOURCE: |
# - name : csi-provisioner
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-resizer
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-attacher
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-snapshotter
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-rbdplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : csi-omap-generator
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : liveness-prometheus
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# (Optional) CEPH CSI RBD plugin resource requirement list, Put here list of resource
# requests and limits you want to apply for plugin pod
#CSI_RBD_PLUGIN_RESOURCE: |
# - name : driver-registrar
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# - name : csi-rbdplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : liveness-prometheus
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# (Optional) CEPH CSI CephFS provisioner resource requirement list, Put here list of resource
# requests and limits you want to apply for provisioner pod
#CSI_CEPHFS_PROVISIONER_RESOURCE: |
# - name : csi-provisioner
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-resizer
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-attacher
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-snapshotter
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-cephfsplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : liveness-prometheus
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# (Optional) CEPH CSI CephFS plugin resource requirement list, Put here list of resource
# requests and limits you want to apply for plugin pod
#CSI_CEPHFS_PLUGIN_RESOURCE: |
# - name : driver-registrar
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# - name : csi-cephfsplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : liveness-prometheus
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# (Optional) CEPH CSI NFS provisioner resource requirement list, Put here list of resource
# requests and limits you want to apply for provisioner pod
# CSI_NFS_PROVISIONER_RESOURCE: |
# - name : csi-provisioner
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-nfsplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : csi-attacher
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# (Optional) CEPH CSI NFS plugin resource requirement list, Put here list of resource
# requests and limits you want to apply for plugin pod
# CSI_NFS_PLUGIN_RESOURCE: |
# - name : driver-registrar
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# - name : csi-nfsplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# Configure CSI Ceph FS grpc and liveness metrics port
# Set to true to enable Ceph CSI liveness container.
CSI_ENABLE_LIVENESS: "false"
# CSI_CEPHFS_GRPC_METRICS_PORT: "9091"
# CSI_CEPHFS_LIVENESS_METRICS_PORT: "9081"
# Configure CSI RBD grpc and liveness metrics port
# CSI_RBD_GRPC_METRICS_PORT: "9090"
# CSI_RBD_LIVENESS_METRICS_PORT: "9080"
# CSIADDONS_PORT: "9070"
# Set CephFS Kernel mount options to use https://docs.ceph.com/en/latest/man/8/mount.ceph/#options
# Set to "ms_mode=secure" when connections.encrypted is enabled in CephCluster CR
# CSI_CEPHFS_KERNEL_MOUNT_OPTIONS: "ms_mode=secure"
# Whether the OBC provisioner should watch on the operator namespace or not, if not the namespace of the cluster will be used
ROOK_OBC_WATCH_OPERATOR_NAMESPACE: "true"
# Whether to start the discovery daemon to watch for raw storage devices on nodes in the cluster.
# This daemon does not need to run if you are only going to create your OSDs based on StorageClassDeviceSets with PVCs.
ROOK_ENABLE_DISCOVERY_DAEMON: "true"
# The timeout value (in seconds) of Ceph commands. It should be >= 1. If this variable is not set or is an invalid value, it's default to 15.
ROOK_CEPH_COMMANDS_TIMEOUT_SECONDS: "15"
# Enable the csi addons sidecar.
CSI_ENABLE_CSIADDONS: "false"
# Enable watch for faster recovery from rbd rwo node loss
ROOK_WATCH_FOR_NODE_FAILURE: "true"
# ROOK_CSIADDONS_IMAGE: "quay.io/csiaddons/k8s-sidecar:v0.7.0"
# The CSI GRPC timeout value (in seconds). It should be >= 120. If this variable is not set or is an invalid value, it's default to 150.
CSI_GRPC_TIMEOUT_SECONDS: "150"
ROOK_DISABLE_ADMISSION_CONTROLLER: "true"
# Enable topology based provisioning.
CSI_ENABLE_TOPOLOGY: "false"
# Domain labels define which node labels to use as domains
# for CSI nodeplugins to advertise their domains
# NOTE: the value here serves as an example and needs to be
# updated with node labels that define domains of interest
# CSI_TOPOLOGY_DOMAIN_LABELS: "kubernetes.io/hostname,topology.kubernetes.io/zone,topology.rook.io/rack"
# Enable read affinity for RBD volumes. Recommended to
# set to true if running kernel 5.8 or newer.
CSI_ENABLE_READ_AFFINITY: "false"
# CRUSH location labels define which node labels to use
# as CRUSH location. This should correspond to the values set in
# the CRUSH map.
# Defaults to all the labels mentioned in
# https://rook.io/docs/rook/latest/CRDs/Cluster/ceph-cluster-crd/#osd-topology
# CSI_CRUSH_LOCATION_LABELS: "kubernetes.io/hostname,topology.kubernetes.io/zone,topology.rook.io/rack"
# Whether to skip any attach operation altogether for CephCSI PVCs.
# See more details [here](https://kubernetes-csi.github.io/docs/skip-attach.html#skip-attach-with-csi-driver-object).
# If set to false it skips the volume attachments and makes the creation of pods using the CephCSI PVC fast.
# **WARNING** It's highly discouraged to use this for RWO volumes. for RBD PVC it can cause data corruption,
# csi-addons operations like Reclaimspace and PVC Keyrotation will also not be supported if set to false
# since we'll have no VolumeAttachments to determine which node the PVC is mounted on.
# Refer to this [issue](https://github.com/kubernetes/kubernetes/issues/103305) for more details.
CSI_CEPHFS_ATTACH_REQUIRED: "true"
CSI_RBD_ATTACH_REQUIRED: "true"
CSI_NFS_ATTACH_REQUIRED: "true"
# Rook Discover toleration. Will tolerate all taints with all keys.
# (Optional) Rook Discover tolerations list. Put here list of taints you want to tolerate in YAML format.
# DISCOVER_TOLERATIONS: |
# - effect: NoSchedule
# key: node-role.kubernetes.io/control-plane
# operator: Exists
# - effect: NoExecute
# key: node-role.kubernetes.io/etcd
# operator: Exists
# (Optional) Rook Discover priority class name to set on the pod(s)
# DISCOVER_PRIORITY_CLASS_NAME: "<PriorityClassName>"
# (Optional) Discover Agent NodeAffinity.
DISCOVER_AGENT_NODE_AFFINITY: |
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values:
- storage-node
- key: storage
operator: In
values:
- rook-ceph
# (Optional) Discover Agent Pod Labels.
# DISCOVER_AGENT_POD_LABELS: "key1=value1,key2=value2"
# Disable automatic orchestration when new devices are discovered
ROOK_DISABLE_DEVICE_HOTPLUG: "false"
# The duration between discovering devices in the rook-discover daemonset.
ROOK_DISCOVER_DEVICES_INTERVAL: "60m"
# DISCOVER_DAEMON_RESOURCES: |
# - name: DISCOVER_DAEMON_RESOURCES
# resources:
# limits:
# cpu: 500m
# memory: 512Mi
# requests:
# cpu: 100m
# memory: 128Mi
---
# OLM: BEGIN OPERATOR DEPLOYMENT
apiVersion: apps/v1
kind: Deployment
metadata:
name: rook-ceph-operator
namespace: rook-ceph # namespace:operator
labels:
operator: rook
storage-backend: ceph
app.kubernetes.io/name: rook-ceph
app.kubernetes.io/instance: rook-ceph
app.kubernetes.io/component: rook-ceph-operator
app.kubernetes.io/part-of: rook-ceph-operator
spec:
selector:
matchLabels:
app: rook-ceph-operator
strategy:
type: Recreate
replicas: 1
template:
metadata:
labels:
app: rook-ceph-operator
spec:
serviceAccountName: rook-ceph-system
containers:
- name: rook-ceph-operator
#image: rook/ceph:v1.12.8
image: docker.io/rook/ceph:v1.12.8
args: ["ceph", "operator"]
securityContext:
runAsNonRoot: true
runAsUser: 2016
runAsGroup: 2016
capabilities:
drop: ["ALL"]
volumeMounts:
- mountPath: /var/lib/rook
name: rook-config
- mountPath: /etc/ceph
name: default-config-dir
- mountPath: /etc/webhook
name: webhook-cert
ports:
- containerPort: 9443
name: https-webhook
protocol: TCP
env:
# If the operator should only watch for cluster CRDs in the same namespace, set this to "true".
# If this is not set to true, the operator will watch for cluster CRDs in all namespaces.
- name: ROOK_CURRENT_NAMESPACE_ONLY
value: "false"
# Whether to start pods as privileged that mount a host path, which includes the Ceph mon and osd pods.
# Set this to true if SELinux is enabled (e.g. OpenShift) to workaround the anyuid issues.
# For more details see https://github.com/rook/rook/issues/1314#issuecomment-355799641
- name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED
value: "false"
# Provide customised regex as the values using comma. For eg. regex for rbd based volume, value will be like "(?i)rbd[0-9]+".
# In case of more than one regex, use comma to separate between them.
# Default regex will be "(?i)dm-[0-9]+,(?i)rbd[0-9]+,(?i)nbd[0-9]+"
# Add regex expression after putting a comma to blacklist a disk
# If value is empty, the default regex will be used.
- name: DISCOVER_DAEMON_UDEV_BLACKLIST
value: "(?i)dm-[0-9]+,(?i)rbd[0-9]+,(?i)nbd[0-9]+"
# Time to wait until the node controller will move Rook pods to other
# nodes after detecting an unreachable node.
# Pods affected by this setting are:
# mgr, rbd, mds, rgw, nfs, PVC based mons and osds, and ceph toolbox
# The value used in this variable replaces the default value of 300 secs
# added automatically by k8s as Toleration for
# <node.kubernetes.io/unreachable>
# The total amount of time to reschedule Rook pods in healthy nodes
# before detecting a <not ready node> condition will be the sum of:
# --> node-monitor-grace-period: 40 seconds (k8s kube-controller-manager flag)
# --> ROOK_UNREACHABLE_NODE_TOLERATION_SECONDS: 5 seconds
- name: ROOK_UNREACHABLE_NODE_TOLERATION_SECONDS
value: "5"
# The name of the node to pass with the downward API
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# The pod name to pass with the downward API
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# The pod namespace to pass with the downward API
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# Recommended resource requests and limits, if desired
#resources:
# limits:
# cpu: 500m
# memory: 512Mi
# requests:
# cpu: 100m
# memory: 128Mi
# Uncomment it to run lib bucket provisioner in multithreaded mode
#- name: LIB_BUCKET_PROVISIONER_THREADS
# value: "5"
# Uncomment it to run rook operator on the host network
#hostNetwork: true
volumes:
- name: rook-config
emptyDir: {}
- name: default-config-dir
emptyDir: {}
- name: webhook-cert
emptyDir: {}
# OLM: END OPERATOR DEPLOYMENT
[root@master-1-230 examples]# kubectl apply -f operator.yaml
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created
[root@master-1-230 examples]# kubectl get pod -n rook-ceph
NAME READY STATUS RESTARTS AGE
rook-ceph-operator-9864d576b-mdj9r 1/1 Running 0 2m7s
rook-discover-7ngcr 1/1 Running 0 2m6s
rook-discover-dh2tx 1/1 Running 0 2m6s
rook-discover-hpx98 1/1 Running 0 2m6s
检查rook-ceph-operator相关pod都运行正常,修改 rook/deploy/examples/cluster.yaml 文件
cat cluster.yaml
[root@master-1-230 examples]# cat cluster.yaml
#################################################################################################################
# Define the settings for the rook-ceph cluster with common settings for a production cluster.
# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required
# in this example. See the documentation for more details on storage settings available.
# For example, to create the cluster:
# kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# kubectl create -f cluster.yaml
#################################################################################################################
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph # namespace:cluster
spec:
cephVersion:
# The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
# v16 is Pacific, and v17 is Quincy.
# RECOMMENDATION: In production, use a specific version tag instead of the general v17 flag, which pulls the latest release and could result in different
# versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
# If you want to be more precise, you can always use a timestamp tag such quay.io/ceph/ceph:v17.2.6-20230410
# This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
image: quay.io/ceph/ceph:v17.2.6
# Whether to allow unsupported versions of Ceph. Currently `pacific`, `quincy`, and `reef` are supported.
# Future versions such as `squid` (v19) would require this to be set to `true`.
# Do not set to true in production.
allowUnsupported: false
# The path on the host where configuration files will be persisted. Must be specified.
# Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
# In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
dataDirHostPath: /var/lib/rook
# Whether or not upgrade should continue even if a check fails
# This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
# Use at your OWN risk
# To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/latest/ceph-upgrade.html#ceph-version-upgrades
skipUpgradeChecks: false
# Whether or not continue if PGs are not clean during an upgrade
continueUpgradeAfterChecksEvenIfNotHealthy: false
# WaitTimeoutForHealthyOSDInMinutes defines the time (in minutes) the operator would wait before an OSD can be stopped for upgrade or restart.
# If the timeout exceeds and OSD is not ok to stop, then the operator would skip upgrade for the current OSD and proceed with the next one
# if `continueUpgradeAfterChecksEvenIfNotHealthy` is `false`. If `continueUpgradeAfterChecksEvenIfNotHealthy` is `true`, then operator would
# continue with the upgrade of an OSD even if its not ok to stop after the timeout. This timeout won't be applied if `skipUpgradeChecks` is `true`.
# The default wait timeout is 10 minutes.
waitTimeoutForHealthyOSDInMinutes: 10
mon:
# Set the number of mons to be started. Generally recommended to be 3.
# For highest availability, an odd number of mons should be specified.
count: 3
# The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason.
# Mons should only be allowed on the same node for test environments where data loss is acceptable.
allowMultiplePerNode: false
mgr:
# When higher availability of the mgr is needed, increase the count to 2.
# In that case, one mgr will be active and one in standby. When Ceph updates which
# mgr is active, Rook will update the mgr services to match the active mgr.
count: 2
allowMultiplePerNode: false
modules:
# Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
# are already enabled by other settings in the cluster CR.
- name: pg_autoscaler
enabled: true
# enable the ceph dashboard for viewing cluster status
dashboard:
enabled: true
# serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
# urlPrefix: /ceph-dashboard
# serve the dashboard at the given port.
# port: 8443
# serve the dashboard using SSL
ssl: true
# The url of the Prometheus instance
# prometheusEndpoint: <protocol>://<prometheus-host>:<port>
# Whether SSL should be verified if the Prometheus server is using https
# prometheusEndpointSSLVerify: false
# enable prometheus alerting for cluster
monitoring:
# requires Prometheus to be pre-installed
enabled: false
# Whether to disable the metrics reported by Ceph. If false, the prometheus mgr module and Ceph exporter are enabled.
# If true, the prometheus mgr module and Ceph exporter are both disabled. Default is false.
metricsDisabled: false
network:
connections:
# Whether to encrypt the data in transit across the wire to prevent eavesdropping the data on the network.
# The default is false. When encryption is enabled, all communication between clients and Ceph daemons, or between Ceph daemons will be encrypted.
# When encryption is not enabled, clients still establish a strong initial authentication and data integrity is still validated with a crc check.
# IMPORTANT: Encryption requires the 5.11 kernel for the latest nbd and cephfs drivers. Alternatively for testing only,
# you can set the "mounter: rbd-nbd" in the rbd storage class, or "mounter: fuse" in the cephfs storage class.
# The nbd and fuse drivers are *not* recommended in production since restarting the csi driver pod will disconnect the volumes.
encryption:
enabled: false
# Whether to compress the data in transit across the wire. The default is false.
# Requires Ceph Quincy (v17) or newer. Also see the kernel requirements above for encryption.
compression:
enabled: false
# Whether to require communication over msgr2. If true, the msgr v1 port (6789) will be disabled
# and clients will be required to connect to the Ceph cluster with the v2 port (3300).
# Requires a kernel that supports msgr v2 (kernel 5.11 or CentOS 8.4 or newer).
requireMsgr2: false
# enable host networking
#provider: host
# enable the Multus network provider
#provider: multus
#selectors:
# The selector keys are required to be `public` and `cluster`.
# Based on the configuration, the operator will do the following:
# 1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface
# 2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network'
#
# In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus
#
# public: public-conf --> NetworkAttachmentDefinition object name in Multus
# cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus
# Provide internet protocol version. IPv6, IPv4 or empty string are valid options. Empty string would mean IPv4
#ipFamily: "IPv6"
# Ceph daemons to listen on both IPv4 and Ipv6 networks
#dualStack: false
# Enable multiClusterService to export the mon and OSD services to peer cluster.
# This is useful to support RBD mirroring between two clusters having overlapping CIDRs.
# Ensure that peer clusters are connected using an MCS API compatible application, like Globalnet Submariner.
#multiClusterService:
# enabled: false
# enable the crash collector for ceph daemon crash collection
crashCollector:
disable: false
# Uncomment daysToRetain to prune ceph crash entries older than the
# specified number of days.
#daysToRetain: 30
# enable log collector, daemons will log on files and rotate
logCollector:
enabled: true
periodicity: daily # one of: hourly, daily, weekly, monthly
maxLogSize: 500M # SUFFIX may be 'M' or 'G'. Must be at least 1M.
# automate [data cleanup process](https://github.com/rook/rook/blob/master/Documentation/Storage-Configuration/ceph-teardown.md#delete-the-data-on-hosts) in cluster destruction.
cleanupPolicy:
# Since cluster cleanup is destructive to data, confirmation is required.
# To destroy all Rook data on hosts during uninstall, confirmation must be set to "yes-really-destroy-data".
# This value should only be set when the cluster is about to be deleted. After the confirmation is set,
# Rook will immediately stop configuring the cluster and only wait for the delete command.
# If the empty string is set, Rook will not destroy any data on hosts during uninstall.
confirmation: ""
# sanitizeDisks represents settings for sanitizing OSD disks on cluster deletion
sanitizeDisks:
# method indicates if the entire disk should be sanitized or simply ceph's metadata
# in both case, re-install is possible
# possible choices are 'complete' or 'quick' (default)
method: quick
# dataSource indicate where to get random bytes from to write on the disk
# possible choices are 'zero' (default) or 'random'
# using random sources will consume entropy from the system and will take much more time then the zero source
dataSource: zero
# iteration overwrite N times instead of the default (1)
# takes an integer value
iteration: 1
# allowUninstallWithVolumes defines how the uninstall should be performed
# If set to true, cephCluster deletion does not wait for the PVs to be deleted.
allowUninstallWithVolumes: false
# To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
# The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
# tolerate taints with a key of 'storage-node'.
placement:
all:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values:
- storage-node
# podAffinity:
# podAntiAffinity:
# topologySpreadConstraints:
# tolerations:
# - key: storage-node
# operator: Exists
# The above placement information can also be specified for mon, osd, and mgr components
# mon:
# Monitor deployments may contain an anti-affinity rule for avoiding monitor
# collocation on the same node. This is a required rule when host network is used
# or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a
# preferred rule with weight: 50.
# osd:
# prepareosd:
# mgr:
# cleanup:
annotations:
# all:
# mon:
# osd:
# cleanup:
# prepareosd:
# clusterMetadata annotations will be applied to only `rook-ceph-mon-endpoints` configmap and the `rook-ceph-mon` and `rook-ceph-admin-keyring` secrets.
# And clusterMetadata annotations will not be merged with `all` annotations.
# clusterMetadata:
# kubed.appscode.com/sync: "true"
# If no mgr annotations are set, prometheus scrape annotations will be set by default.
# mgr:
labels:
# all:
# mon:
# osd:
# cleanup:
# mgr:
# prepareosd:
# monitoring is a list of key-value pairs. It is injected into all the monitoring resources created by operator.
# These labels can be passed as LabelSelector to Prometheus
# monitoring:
# crashcollector:
resources:
#The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
# mgr:
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
osd:
limits:
cpu: "800m"
memory: "2048Mi"
requests:
cpu: "800m"
memory: "2048Mi"
# The above example requests/limits can also be added to the other components
# mon:
# osd:
# For OSD it also is a possible to specify requests/limits based on device class
# osd-hdd:
# osd-ssd:
# osd-nvme:
# prepareosd:
# mgr-sidecar:
# crashcollector:
# logcollector:
# cleanup:
# exporter:
# The option to automatically remove OSDs that are out and are safe to destroy.
removeOSDsIfOutAndSafeToRemove: false
priorityClassNames:
#all: rook-ceph-default-priority-class
mon: system-node-critical
osd: system-node-critical
mgr: system-cluster-critical
#crashcollector: rook-ceph-crashcollector-priority-class
storage: # cluster level storage configuration and selection
useAllNodes: false
useAllDevices: false
#deviceFilter:
config:
# crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
# metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
# databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
# osdsPerDevice: "1" # this value can be overridden at the node or device level
# encryptedDevice: "true" # the default value for this option is "false"
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.
nodes:
- name: "node-1-231"
devices: # specific devices to use for storage can be specified for each node
- name: "sdb"
- name: "node-1-231"
devices: # specific devices to use for storage can be specified for each node
- name: "sdb"
- name: "node-1-231"
devices: # specific devices to use for storage can be specified for each node
- name: "sdb"
# - name: "nvme01" # multiple osds can be created on high performance devices
# config:
# osdsPerDevice: "5"
# - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
# config: # configuration can be specified at the node level which overrides the cluster level config
# - name: "172.17.4.301"
# deviceFilter: "^sd."
# when onlyApplyOSDPlacement is false, will merge both placement.All() and placement.osd
onlyApplyOSDPlacement: false
# Time for which an OSD pod will sleep before restarting, if it stopped due to flapping
# flappingRestartIntervalHours: 24
# The section for configuring management of daemon disruptions during upgrade or fencing.
disruptionManagement:
# If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
# via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
# block eviction of OSDs by default and unblock them safely when drains are detected.
managePodBudgets: true
# A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
# default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes.
osdMaintenanceTimeout: 30
# A duration in minutes that the operator will wait for the placement groups to become healthy (active+clean) after a drain was completed and OSDs came back up.
# Operator will continue with the next drain if the timeout exceeds. It only works if `managePodBudgets` is `true`.
# No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.
pgHealthCheckTimeout: 0
# healthChecks
# Valid values for daemons are 'mon', 'osd', 'status'
healthCheck:
daemonHealth:
mon:
disabled: false
interval: 45s
osd:
disabled: false
interval: 60s
status:
disabled: false
interval: 60s
# Change pod liveness probe timing or threshold values. Works for all mon,mgr,osd daemons.
livenessProbe:
mon:
disabled: false
mgr:
disabled: false
osd:
disabled: false
# Change pod startup probe timing or threshold values. Works for all mon,mgr,osd daemons.
startupProbe:
mon:
disabled: false
mgr:
disabled: false
osd:
disabled: false
主要修改集群osd资源限制。
resources:
#The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
# mgr:
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
osd:
limits:
cpu: "800m"
memory: "2048Mi"
requests:
cpu: "800m"
memory: "2048Mi"
确保工作节点打上标签,部署yaml文件
[root@master-1-230 examples]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master-1-230 Ready control-plane 15d v1.27.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master-1-230,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
node-1-231 Ready <none> 15d v1.27.6 apptype=core,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ingress=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1-231,kubernetes.io/os=linux,role=storage-node,route-reflector=true,storage=rook-ceph
node-1-232 Ready <none> 15d v1.27.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ingress=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1-232,kubernetes.io/os=linux,role=storage-node,route-reflector=true,storage=rook-ceph
node-1-233 Ready <none> 14d v1.27.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1-233,kubernetes.io/os=linux,role=storage-node,storage=rook-ceph
[root@master-1-230 examples]# kubectl apply -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph created
查看最终Pod的状态
没有发现以rook-ceph-osd-prepare pod。
[root@master-1-230 examples]# kubectl get pod -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-959c2 2/2 Running 0 25m
csi-cephfsplugin-cd569 2/2 Running 0 25m
csi-cephfsplugin-provisioner-5c7bd896d6-cmpvp 5/5 Running 0 25m
csi-cephfsplugin-provisioner-5c7bd896d6-vk6bc 5/5 Running 0 25m
csi-cephfsplugin-r6g2b 2/2 Running 0 25m
csi-rbdplugin-7thfw 2/2 Running 0 25m
csi-rbdplugin-provisioner-5b7866fb56-t8258 5/5 Running 1 (24m ago) 25m
csi-rbdplugin-provisioner-5b7866fb56-x7gtn 5/5 Running 0 25m
csi-rbdplugin-vj5pw 2/2 Running 1 25m
csi-rbdplugin-whn7d 2/2 Running 0 25m
rook-ceph-crashcollector-node-1-231-5bcd77ff56-x64gf 1/1 Running 0 2m21s
rook-ceph-crashcollector-node-1-232-68df497f97-b44vf 1/1 Running 0 22m
rook-ceph-crashcollector-node-1-233-5ff5fdc9db-hg5v5 1/1 Running 0 2m20s
rook-ceph-mgr-a-59f4bf89f7-mrknn 3/3 Running 0 23m
rook-ceph-mgr-b-6b8b5445c6-lffmq 3/3 Running 1 (11m ago) 23m
rook-ceph-mon-a-95f46b898-fjzs8 2/2 Running 0 26m
rook-ceph-mon-b-67b76b6c8-xhx8x 2/2 Running 0 23m
rook-ceph-mon-c-6c79d7566-nwq8c 2/2 Running 0 23m
rook-ceph-operator-9864d576b-dghhn 1/1 Running 0 26m
rook-ceph-osd-0-799fb76cbc-g442n 2/2 Running 0 2m23s
rook-ceph-osd-1-55fc8f5cbf-wb8xm 2/2 Running 0 2m21s
rook-ceph-osd-2-5fc8688ddc-s6bvk 2/2 Running 0 2m20s
rook-ceph-osd-prepare-node-1-231-bjgrg 0/1 Completed 0 108s
rook-ceph-osd-prepare-node-1-232-tqtz9 0/1 Completed 0 105s
rook-ceph-osd-prepare-node-1-233-njpqw 0/1 Completed 0 102s
rook-ceph-tools-757999d6c7-n9nrl 1/1 Running 0 12m
rook-discover-2x5nj 1/1 Running 0 26m
rook-discover-b45gk 1/1 Running 0 26m
rook-discover-pmbwh 1/1 Running 0 26m
检查原因:
1、检查 Rook Operator 状态: 首先,请确保 Rook Operator 正在运行。运行以下命令检查 Rook Operator 的 Pod 是否正在运行:
[root@master-1-230 examples]# kubectl get pods -n rook-ceph -l app=rook-ceph-operator
NAME READY STATUS RESTARTS AGE
rook-ceph-operator-9864d576b-mdj9r 1/1 Running 0 10m
2、检查集群状态: 使用以下命令检查 Rook Ceph 集群的状态:
[root@master-1-230 examples]# kubectl get cephclusters -n rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID
rook-ceph /var/lib/rook 3 7m38s Progressing Configuring Ceph Mons
[root@master-1-230 examples]# kubectl get cephclusters -n rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID
rook-ceph /var/lib/rook 3 14m Progressing Configuring Ceph OSDs
[root@master-1-230 examples]# kubectl get cephclusters -n rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID
rook-ceph /var/lib/rook 3 17m Progressing Configuring Ceph OSDs HEALTH_OK 29ac7b3c-9f78-4d1a-9874-838ba78d423a
[root@master-1-230 examples]# kubectl get cephclusters -n rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID
rook-ceph /var/lib/rook 3 35m Ready Cluster created successfully HEALTH_WARN 29ac7b3c-9f78-4d1a-9874-838ba78d423a
在 Rook 部署 Ceph 时,Ceph 集群的状态有几个不同的阶段,主要包括以下几个阶段:
-
Pending(等待): 这个阶段表示 Rook 正在等待执行操作,通常是等待 Operator 的进一步指示或等待 Kubernetes 资源的创建。
-
Progressing(进行中): 这个阶段表示正在进行一些操作,比如配置 Monitors、创建 OSD 等。在这个阶段,Rook 正在自动执行配置任务,以确保 Ceph 集群的正常运行。
-
Ready(就绪): 当 Ceph 集群的所有组件都正常启动和运行时,状态将变为 "Ready"。这意味着整个 Ceph 集群已经成功初始化,并且可以开始使用。
在Rook中,Ceph集群的"Configuring Ceph OSDs"阶段表示Rook正在配置Ceph OSD(Object Storage Daemon)节点。这个阶段涉及一系列操作,以确保每个OSD节点都正确初始化并准备好加入Ceph集群。以下是此阶段可能涉及的一些操作:
-
OSD初始化: Rook会负责在每个节点上初始化OSD。这包括创建文件系统、挂载Ceph数据目录等。
-
Ceph OSD进程启动: 一旦OSD初始化完成,Rook会启动Ceph OSD进程。这将使节点成为Ceph集群的一部分,负责存储和管理数据。
-
OSD池的配置: Rook可能还涉及配置Ceph中的OSD池,以确保适当的数据分布和存储策略。
以上是pod完成后的状态,以rook-ceph-osd-prepare开头的pod 用于自动感知集群挂载硬盘,前面收到指定节点,所有这个不起作用。osd-01、osd-02、osd-03容器存在且正常,上述pod均正常启动,视为集群安装成功。
如果没有启动osd-01 这3个pod,需要检查日志
# Get the prepare pods in the cluster
[root@master-1-230 examples]# kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare
NAME READY STATUS RESTARTS AGE
rook-ceph-osd-prepare-node-1-231-bjgrg 0/1 Completed 0 2m50s
rook-ceph-osd-prepare-node-1-232-tqtz9 0/1 Completed 0 2m47s
rook-ceph-osd-prepare-node-1-233-njpqw 0/1 Completed 0 2m44s
# view the logs for the node of interest in the "provision" container
[root@master-1-230 examples]# kubectl -n rook-ceph logs rook-ceph-osd-prepare-node-1-231-bjgrg provision
2023-11-27 12:46:59.288379 I | cephcmd: desired devices to configure osds: [{Name:sdb OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2023-11-27 12:46:59.289162 I | rookcmd: starting Rook v1.12.8 with arguments '/rook/rook ceph osd provision'
2023-11-27 12:46:59.289171 I | rookcmd: flag values: --cluster-id=7760cd16-0f50-452b-817e-b2a264ef3d34, --cluster-name=rook-ceph, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"sdb","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=node-1-231, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-store-type=bluestore, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --replace-osd=-1
2023-11-27 12:46:59.289176 I | ceph-spec: parsing mon endpoints: b=10.107.141.230:6789,c=10.111.113.147:6789,a=10.109.192.103:6789
2023-11-27 12:46:59.298174 I | op-osd: CRUSH location=root=default host=node-1-231
2023-11-27 12:46:59.298189 I | cephcmd: crush location of osd: root=default host=node-1-231
2023-11-27 12:46:59.298201 D | exec: Running command: dmsetup version
2023-11-27 12:46:59.303187 I | cephosd: Library version: 1.02.181-RHEL8 (2021-10-20)
参考:https://rook.github.io/docs/rook/latest-release/Troubleshooting/ceph-common-issues/#investigation_4
4、安装扩展
4.1、部署Ceph dashboard
Ceph Dashnnoard 是一个内置的基于web的管理和监视应用程序,它是开源Ceph发行版的一部分。通过Dashboard可以获取Ceph集群的各种基本状态信息。
cd rook/deploy/examples
[root@master-1-230 examples]# kubectl apply -f dashboard-external-https.yaml
service/rook-ceph-mgr-dashboard-external-https created
创建NodePort类型可以被外部访问
[root@master-1-230 examples]# kubectl get svc -n rook-ceph|grep dashboard
rook-ceph-mgr-dashboard ClusterIP 10.104.211.115 <none> 8443/TCP 7m55s
rook-ceph-mgr-dashboard-external-https NodePort 10.96.206.45 <none> 8443:31276/TCP 7s
浏览器访问:https://192.168.1.230:31276/
Rook创建一个默认admin,并在运行Rook的命名空间生产一个名为rook-ceph-dashboard-admin-password的Secret,通过下面命令获取密码
[root@master-1-230 examples]# kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}"|base64 --decode && echo
TK,hG.8@;uMa7>}o67)4
4.2 Rook 工具箱是一个包含用于Rook调试和测试的常用工具
cd rook/deploy/examples
[root@master-1-230 examples]# kubectl apply -f toolbox.yaml -n rook-ceph
deployment.apps/rook-ceph-tools created
进入容器rook-ceph-tools 运行 :ceph -s
[root@master-1-230 ~]# kubectl exec -it `kubectl get pods -n rook-ceph|grep rook-ceph-tools|awk '{print $1}'` -n rook-ceph -- bash
bash-4.4$ ceph -s
cluster:
id: 1d32b39f-eaa1-43f0-a3f7-1b9ac1aa2865
health: HEALTH_WARN
2 daemons have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 25m)
mgr: a(active, since 5m), standbys: b
osd: 3 osds: 3 up (since 5m), 3 in (since 5m)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 62 MiB used, 150 GiB / 150 GiB avail
pgs: 1 active+clean
总结:
通过Rook在K8S集群中部署ceph服务,这种方式可以直接在生产环境使用。
- 线上生产环境如有公有云不建议自建CRPH集群
- 部署搭建只是第一步,后续维护及优化才是重点
三、基于Ceph的存储解决方案下
3、基于RBD/CephFS的StorageClass
3.1、部署RBD SrorageClass
Ceph可以同时提供对象存储RADOSGW、块存储RBD、文件系统存储CephFS。
RBD即RADOS Block Device 的简称,RBD块存储是最稳定且最常用的存储类型。
RBD块设备北路磁盘可以被挂载
RBD块设备具有快照、多副本、克隆和一致性等特性,数据以条带化的方式存储在Ceph集群的多个OSD中。注意:RBD只支持ReadWriteOnece存储类型!
1)创建StorageClass
cd rook/deploy/examples/csi/rbd
[root@master-1-230 rbd]# kubectl apply -f storageclass.yaml
cephblockpool.ceph.rook.io/replicapool created
storageclass.storage.k8s.io/rook-ceph-block created
2)检查pool安装情况
[root@master-1-230 rbd]# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- bash
bash-4.4$ ceph osd lspools
1 .mgr
2 replicapool
bash-4.4$
3)查看StorageClass
[root@master-1-230 2.4]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 46h
nfs-storageclass k8s-sigs.io/nfs-subdir-external-provisioner Retain Immediate false 46h
rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 2m45s
4)将Ceph设置为默认存储卷
[root@master-1-230 rbd]# kubectl patch storageclass rook-ceph-block -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/rook-ceph-block patched
修改完成后再检查StorageClass状态(有default表识)
[root@master-1-230 2.4]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 46h
nfs-storageclass k8s-sigs.io/nfs-subdir-external-provisioner Retain Immediate false 46h
rook-ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 7m49s
5)测试验证
创建pvc指定storageClassName为rook-ceph-block
[root@master-1-230 2.4]# cat rbd-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-mysql-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: rook-ceph-block
[root@master-1-230 2.4]# kubectl apply -f rbd-pvc.yml
persistentvolumeclaim/my-mysql-data created
[root@master-1-230 2.4]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-20a64f91-4637-4bf8-b33f-27597536f44f 1Gi RWX Retain Bound default/nginx-storage-test-pvc-nginx-storage-stat-1 nfs-storageclass 46h
pvc-5cb97247-3811-477c-9b29-5100456d244f 500Mi RWX Retain Bound default/test-pvc nfs-storageclass 46h
pvc-62ff0d61-f674-4fbb-843c-de8d314e0243 1Gi RWO Retain Bound default/www-nfs-web-0 nfs-storageclass 34h
pvc-68d96576-f53a-43b0-9ff7-ac95ca8c401b 500Mi RWX Retain Bound default/test-pvc01 nfs-storageclass 34h
pvc-6e7f4303-5969-42a2-89a8-46d02f91c6f8 2Gi RWO Delete Bound default/my-mysql-data rook-ceph-block 57s
pvc-d76f65f2-0637-404f-8d08-5f174a07630a 1Gi RWX Retain Bound default/nginx-storage-test-pvc-nginx-storage-stat-2 nfs-storageclass 46h
pvc-f1106903-53db-490b-ba5f-f5981b431063 1Gi RWX Retain Bound default/nginx-storage-test-pvc-nginx-storage-stat-0 nfs-storageclass 46h
[root@master-1-230 2.4]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-mysql-data Bound pvc-6e7f4303-5969-42a2-89a8-46d02f91c6f8 2Gi RWO rook-ceph-block 59s
nginx-storage-test-pvc-nginx-storage-stat-0 Bound pvc-f1106903-53db-490b-ba5f-f5981b431063 1Gi RWX nfs-storageclass 46h
nginx-storage-test-pvc-nginx-storage-stat-1 Bound pvc-20a64f91-4637-4bf8-b33f-27597536f44f 1Gi RWX nfs-storageclass 46h
nginx-storage-test-pvc-nginx-storage-stat-2 Bound pvc-d76f65f2-0637-404f-8d08-5f174a07630a 1Gi RWX nfs-storageclass 46h
test-pvc Bound pvc-5cb97247-3811-477c-9b29-5100456d244f 500Mi RWX nfs-storageclass 46h
test-pvc01 Bound pvc-68d96576-f53a-43b0-9ff7-ac95ca8c401b 500Mi RWX nfs-storageclass 34h
www-nfs-web-0 Bound pvc-62ff0d61-f674-4fbb-843c-de8d314e0243 1Gi RWO nfs-storageclass 34h
3.2 部署CephFS StorageClass
Ceph允许用户挂载一个兼容posix的共享目录到多个主机,该存储和NFS共享存储以及CIFS共享目录类似
创建文件系统
cd rook/deploy/examples
[root@master-1-230 examples]# kubectl apply -f filesystem.yaml
cephfilesystem.ceph.rook.io/myfs created
确认文件系统启动
[root@master-1-230 examples]# kubectl -n rook-ceph get pod -l app=rook-ceph-mds
NAME READY STATUS RESTARTS AGE
rook-ceph-mds-myfs-a-59ff4c8cbb-tdfnk 2/2 Running 0 85s
rook-ceph-mds-myfs-b-57db4d84fb-zsrvz 2/2 Running 0 84s
查看文件系统详细状态
[root@master-1-230 ~]# kubectl exec -it `kubectl get pods -n rook-ceph|grep rook-ceph-tools|awk '{print $1}'` -n rook-ceph -- bash
bash-4.4$ ceph status
cluster:
id: 1d32b39f-eaa1-43f0-a3f7-1b9ac1aa2865
health: HEALTH_WARN
2 daemons have recently crashed
1 mgr modules have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 42m)
mgr: a(active, since 40m), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 42m), 3 in (since 95m)
data:
volumes: 1/1 healthy
pools: 4 pools, 81 pgs
objects: 33 objects, 498 KiB
usage: 82 MiB used, 150 GiB / 150 GiB avail
pgs: 81 active+clean
io:
client: 853 B/s rd, 1 op/s rd, 0 op/s wr
bash-4.4$
1)创建StorageClass
cd rook/deploy/examples/csi/cephfs
[root@master-1-230 cephfs]# kubectl apply -f storageclass.yaml
storageclass.storage.k8s.io/rook-cephfs created
2) 查看StorageClass
[root@master-1-230 cephfs]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 46h
nfs-storageclass k8s-sigs.io/nfs-subdir-external-provisioner Retain Immediate false 46h
rook-ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 13m
rook-cephfs rook-ceph.cephfs.csi.ceph.com Delete Immediate true 32s
ceph的使用和rbd一样,指定storageClassName即可,续要注意的是rbd只支持ReadWriteOnce,cephfs能够支持ReadWriteMany
3)测试验证
[root@master-1-230 2.4]# cat cephfs-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-data-pvc
spec:
accessModes:
#- ReadWriteOnce
- ReadWriteMany
resources:
requests:
storage: 2Gi
storageClassName: rook-cephfs
[root@master-1-230 2.4]# kubectl apply -f cephfs-pvc.yml
persistentvolumeclaim/redis-data-pvc created
创建一个pod使用pvc做存储验证持久化效果
[root@master-1-230 2.4]# cat test-cephfs-pod.yml
apiVersion: v1
kind: Pod
metadata:
name: redis
spec:
containers:
- name: redis
image: redis:4-alpine
ports:
- containerPort: 6379
name: redisport
volumeMounts:
- mountPath: /data
name: redis-pvc
volumes:
- name: redis-pvc
persistentVolumeClaim:
claimName: redis-data-pvc
应用yaml文件
[root@master-1-230 2.4]# kubectl apply -f test-cephfs-pod.yml
pod/redis created
验证cephfs存储效果
[root@master-1-230 cephfs]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-mysql-data Bound pvc-6e7f4303-5969-42a2-89a8-46d02f91c6f8 2Gi RWO rook-ceph-block 66m
nginx-storage-test-pvc-nginx-storage-stat-0 Bound pvc-f1106903-53db-490b-ba5f-f5981b431063 1Gi RWX nfs-storageclass 47h
nginx-storage-test-pvc-nginx-storage-stat-1 Bound pvc-20a64f91-4637-4bf8-b33f-27597536f44f 1Gi RWX nfs-storageclass 47h
nginx-storage-test-pvc-nginx-storage-stat-2 Bound pvc-d76f65f2-0637-404f-8d08-5f174a07630a 1Gi RWX nfs-storageclass 47h
redis-data-pvc Bound pvc-7e2ef016-b412-4d3b-93ee-fc89cd0b0ac5 2Gi RWX rook-cephfs 60m
test-pvc Bound pvc-5cb97247-3811-477c-9b29-5100456d244f 500Mi RWX nfs-storageclass 47h
test-pvc01 Bound pvc-68d96576-f53a-43b0-9ff7-ac95ca8c401b 500Mi RWX nfs-storageclass 35h
www-nfs-web-0 Bound pvc-62ff0d61-f674-4fbb-843c-de8d314e0243 1Gi RWO nfs-storageclass 35h
[root@master-1-230 cephfs]#
[root@master-1-230 cephfs]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-20a64f91-4637-4bf8-b33f-27597536f44f 1Gi RWX Retain Bound default/nginx-storage-test-pvc-nginx-storage-stat-1 nfs-storageclass 47h
pvc-5cb97247-3811-477c-9b29-5100456d244f 500Mi RWX Retain Bound default/test-pvc nfs-storageclass 47h
pvc-62ff0d61-f674-4fbb-843c-de8d314e0243 1Gi RWO Retain Bound default/www-nfs-web-0 nfs-storageclass 35h
pvc-68d96576-f53a-43b0-9ff7-ac95ca8c401b 500Mi RWX Retain Bound default/test-pvc01 nfs-storageclass 35h
pvc-6e7f4303-5969-42a2-89a8-46d02f91c6f8 2Gi RWO Delete Bound default/my-mysql-data rook-ceph-block 66m
pvc-7e2ef016-b412-4d3b-93ee-fc89cd0b0ac5 2Gi RWX Delete Bound default/redis-data-pvc rook-cephfs 7m20s
pvc-d76f65f2-0637-404f-8d08-5f174a07630a 1Gi RWX Retain Bound default/nginx-storage-test-pvc-nginx-storage-stat-2 nfs-storageclass 47h
pvc-f1106903-53db-490b-ba5f-f5981b431063 1Gi RWX Retain Bound default/nginx-storage-test-pvc-nginx-storage-stat-0 nfs-storageclass
[root@master-1-230 cephfs]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-web-0 1/1 Running 4 (132m ago) 35h
redis 1/1 Running 0 59m
[root@master-1-230 ~]# kubectl exec -it redis -- sh
/data # redis-cli
127.0.0.1:6379> set mykey "k8syyds"
OK
127.0.0.1:6379> get mykey
"k8syyds"
127.0.0.1:6379> BGSAVE
Background saving started
127.0.0.1:6379>
127.0.0.1:6379> exit
/data # ls
dump.rdb
/data # exit
#删除pod
[root@master-1-230 2.4]# cat test-cephfs-pod.yml
apiVersion: v1
kind: Pod
metadata:
name: redis
spec:
containers:
- name: redis
image: redis:4-alpine
ports:
- containerPort: 6379
name: redisport
volumeMounts:
- mountPath: /data
name: redis-pvc
volumes:
- name: redis-pvc
persistentVolumeClaim:
claimName: redis-data-pvc
[root@master-1-230 2.4]#
[root@master-1-230 2.4]# kubectl delete -f test-cephfs-pod.yml
pod "redis" deleted
[root@master-1-230 2.4]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-web-0 1/1 Running 4 (135m ago) 35h
[root@master-1-230 2.4]#
#再次创建Pod
[root@master-1-230 2.4]# kubectl apply -f test-cephfs-pod.yml
pod/redis created
#验证数据持久化
[root@master-1-230 2.4]# kubectl exec -it redis -- sh
/data # ls
dump.rdb
/data # redis-cli
127.0.0.1:6379> get mykey
"k8syyds"
127.0.0.1:6379>
参考文档:https://rook.io/docs/rook/latest-release/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-storage/
Filesystem Storage Overview
A filesystem storage (also named shared filesystem) can be mounted with read/write permission from multiple pods. This may be useful for applications which can be clustered using a shared filesystem.
This example runs a shared filesystem for the kube-registry.
Prerequisites¶
This guide assumes you have created a Rook cluster as explained in the main quickstart guide
Multiple Filesystems Support¶
Multiple filesystems are supported as of the Ceph Pacific release.
Create the Filesystem¶
Create the filesystem by specifying the desired settings for the metadata pool, data pools, and metadata server in the CephFilesystem CRD. In this example we create the metadata pool with replication of three and a single data pool with replication of three. For more options, see the documentation on creating shared filesystems.
Save this shared filesystem definition as filesystem.yaml:
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
name: myfs
namespace: rook-ceph
spec:
metadataPool:
replicated:
size: 3
dataPools:
- name: replicated
replicated:
size: 3
preserveFilesystemOnDelete: true
metadataServer:
activeCount: 1
activeStandby: true
The Rook operator will create all the pools and other resources necessary to start the service. This may take a minute to complete.
# Create the filesystem
kubectl create -f filesystem.yaml
[...]
To confirm the filesystem is configured, wait for the mds pods to start:
$ kubectl -n rook-ceph get pod -l app=rook-ceph-mds
NAME READY STATUS RESTARTS AGE
rook-ceph-mds-myfs-7d59fdfcf4-h8kw9 1/1 Running 0 12s
rook-ceph-mds-myfs-7d59fdfcf4-kgkjp 1/1 Running 0 12s
To see detailed status of the filesystem, start and connect to the Rook toolbox. A new line will be shown with ceph status for the mds service. In this example, there is one active instance of MDS which is up, with one MDS instance in standby-replay mode in case of failover.
$ ceph status
[...]
services:
mds: myfs-1/1/1 up {[myfs:0]=mzw58b=up:active}, 1 up:standby-replay
Provision Storage¶
Before Rook can start provisioning storage, a StorageClass needs to be created based on the filesystem. This is needed for Kubernetes to interoperate with the CSI driver to create persistent volumes.
Save this storage class definition as storageclass.yaml:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
# clusterID is the namespace where the rook cluster is running
# If you change this namespace, also change the namespace below where the secret namespaces are defined
clusterID: rook-ceph
# CephFS filesystem name into which the volume shall be created
fsName: myfs
# Ceph pool into which the volume shall be created
# Required for provisionVolume: "true"
pool: myfs-replicated
# The secrets contain Ceph admin credentials. These are generated automatically by the operator
# in the same namespace as the cluster.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
If you've deployed the Rook operator in a namespace other than "rook-ceph" as is common change the prefix in the provisioner to match the namespace you used. For example, if the Rook operator is running in "rook-op" the provisioner value should be "rook-op.rbd.csi.ceph.com".
Create the storage class.
kubectl create -f deploy/examples/csi/cephfs/storageclass.yaml
3.2 适用场景
cephfs:
优点:
- 读取延迟低,IO带宽表现良好,尤其是大文件
- 灵活度高,支持k8s的所有接入模式
- 支持ReadWriteMany
缺点
cephfs的小文件读写性能 一般,写入延迟高
适用场景:
- 适用于要求灵活度高(支持k8s多节点挂载)
- 对io延迟不敏感的文件读写操作或非海量小文件存储。常用的应用/中间件挂载到存储后端
Ceph RBD
优点:
- io带宽良好
- 读写延迟很低
- 支持镜像快照,镜像转储
缺点:
不支持多节点挂载(只支持ReadWriteOnce)
适用场景:
对io带苦啊和延迟要求较高,没有多个节点同时读写数据需求的应用。例如:数据库
清理CEPH数据:
OSD pods are not created on my devices
OSD pods are failing to start¶
Symptoms¶
OSD pods are failing to start
You have started a cluster after tearing down another cluster
Investigation¶
When an OSD starts, the device or directory will be configured for consumption. If there is an error with the configuration, the pod will crash and you will see the CrashLoopBackoff status for the pod. Look in the osd pod logs for an indication of the failure.
$ kubectl -n rook-ceph logs rook-ceph-osd-fl8fs
...
One common case for failure is that you have re-deployed a test cluster and some state may remain from a previous deployment. If your cluster is larger than a few nodes, you may get lucky enough that the monitors were able to start and form quorum. However, now the OSDs pods may fail to start due to the old state. Looking at the OSD pod logs you will see an error about the file already existing.
$ kubectl -n rook-ceph logs rook-ceph-osd-fl8fs
...
2017-10-31 20:13:11.187106 I | mkfs-osd0: 2017-10-31 20:13:11.186992 7f0059d62e00 -1 bluestore(/var/lib/rook/osd0) _read_fsid unparsable uuid
2017-10-31 20:13:11.187208 I | mkfs-osd0: 2017-10-31 20:13:11.187026 7f0059d62e00 -1 bluestore(/var/lib/rook/osd0) _setup_block_symlink_or_file failed to create block symlink to /dev/disk/by-partuuid/651153ba-2dfc-4231-ba06-94759e5ba273: (17) File exists
2017-10-31 20:13:11.187233 I | mkfs-osd0: 2017-10-31 20:13:11.187038 7f0059d62e00 -1 bluestore(/var/lib/rook/osd0) mkfs failed, (17) File exists
2017-10-31 20:13:11.187254 I | mkfs-osd0: 2017-10-31 20:13:11.187042 7f0059d62e00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (17) File exists
2017-10-31 20:13:11.187275 I | mkfs-osd0: 2017-10-31 20:13:11.187121 7f0059d62e00 -1 ** ERROR: error creating empty object store in /var/lib/rook/osd0: (17) File exists
Solution¶
If the error is from the file that already exists, this is a common problem reinitializing the Rook cluster when the local directory used for persistence has not been purged. This directory is the dataDirHostPath setting in the cluster CRD and is typically set to /var/lib/rook. To fix the issue you will need to delete all components of Rook and then delete the contents of /var/lib/rook (or the directory specified by dataDirHostPath) on each of the hosts in the cluster. Then when the cluster CRD is applied to start a new cluster, the rook-operator should start all the pods as expected.
OSD pods are not created on my devices¶
Symptoms¶
No OSD pods are started in the cluster
Devices are not configured with OSDs even though specified in the Cluster CRD
One OSD pod is started on each node instead of multiple pods for each device
Investigation¶
First, ensure that you have specified the devices correctly in the CRD. The Cluster CRD has several ways to specify the devices that are to be consumed by the Rook storage:
useAllDevices: true: Rook will consume all devices it determines to be available
deviceFilter: Consume all devices that match this regular expression
devices: Explicit list of device names on each node to consume
Second, if Rook determines that a device is not available (has existing partitions or a formatted filesystem), Rook will skip consuming the devices. If Rook is not starting OSDs on the devices you expect, Rook may have skipped it for this reason. To see if a device was skipped, view the OSD preparation log on the node where the device was skipped. Note that it is completely normal and expected for OSD prepare pod to be in the completed state. After the job is complete, Rook leaves the pod around in case the logs need to be investigated.
# Get the prepare pods in the cluster
$ kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare
NAME READY STATUS RESTARTS AGE
rook-ceph-osd-prepare-node1-fvmrp 0/1 Completed 0 18m
rook-ceph-osd-prepare-node2-w9xv9 0/1 Completed 0 22m
rook-ceph-osd-prepare-node3-7rgnv 0/1 Completed 0 22m
# view the logs for the node of interest in the "provision" container
$ kubectl -n rook-ceph logs rook-ceph-osd-prepare-node1-fvmrp provision
[...]
Here are some key lines to look for in the log:
# A device will be skipped if Rook sees it has partitions or a filesystem
2019-05-30 19:02:57.353171 W | cephosd: skipping device sda that is in use
2019-05-30 19:02:57.452168 W | skipping device "sdb5": ["Used by ceph-disk"]
# Other messages about a disk being unusable by ceph include:
Insufficient space (<5GB) on vgs
Insufficient space (<5GB)
LVM detected
Has BlueStore device label
locked
read-only
# A device is going to be configured
2019-05-30 19:02:57.535598 I | cephosd: device sdc to be configured by ceph-volume
# For each device configured you will see a report printed to the log
2019-05-30 19:02:59.844642 I | Type Path LV Size % of device
2019-05-30 19:02:59.844651 I | ----------------------------------------------------------------------------------------------------
2019-05-30 19:02:59.844677 I | [data] /dev/sdc 7.00 GB 100%
Solution¶
Either update the CR with the correct settings, or clean the partitions or filesystem from your devices. To clean devices from a previous install see the cleanup guide.
After the settings are updated or the devices are cleaned, trigger the operator to analyze the devices again by restarting the operator. Each time the operator starts, it will ensure all the desired devices are configured. The operator does automatically deploy OSDs in most scenarios, but an operator restart will cover any scenarios that the operator doesn't detect automatically.
# Restart the operator to ensure devices are configured. A new pod will automatically be started when the current operator pod is deleted.
$ kubectl -n rook-ceph delete pod -l app=rook-ceph-operator
[...]
Node hangs after reboot¶
This issue is fixed in Rook v1.3 or later.
参考官网解决问题:https://rook.github.io/docs/rook/latest-release/Troubleshooting/ceph-common-issues/#investigation_3