rook方式部署ceph-526互联

一. 名词解释

OSD：直接连接每一个集群节点的物理磁盘或者是目录。集群的副本数、高可用性和容错性。
MON：集群监控，所有集群的节点都会向Mon汇报，记录了集群的拓扑以及数据存储位置的信息。
MDS：元数据服务器，负责跟踪文件层次结构并存储ceph元数据。
RGW：restful API接口。
MGR：提供额外的监控和界面。
————————————————
官方要求：集群至少需要一个 Ceph Monitor 和两个 OSD 守护进程。

二. 部署步骤

1. 拉镜像

# 所有ceph集群的节点都要拉取这些镜像，非ceph集群节点只需要拉取csi-node-driver-registrar:v2.0.1
docker pull ceph/ceph:v15.2.5
docker pull rook/ceph:v1.5.1
docker pull registry.aliyuncs.com/it00021hot/cephcsi:v3.1.2
docker pull registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.0.1
docker pull registry.aliyuncs.com/it00021hot/csi-attacher:v3.0.0
docker pull registry.aliyuncs.com/it00021hot/csi-provisioner:v2.0.0
docker pull registry.aliyuncs.com/it00021hot/csi-snapshotter:v3.0.0
docker pull registry.aliyuncs.com/it00021hot/csi-resizer:v1.0.0

#修改镜像名
docker tag registry.aliyuncs.com/it00021hot/csi-snapshotter:v3.0.0 k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.0
docker tag registry.aliyuncs.com/it00021hot/csi-resizer:v1.0.0 k8s.gcr.io/sig-storage/csi-resizer:v1.0.0
docker tag registry.aliyuncs.com/it00021hot/cephcsi:v3.1.2 quay.io/cephcsi/cephcsi:v3.1.2
docker tag registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.0.1 k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1
docker tag registry.aliyuncs.com/it00021hot/csi-attacher:v3.0.0 k8s.gcr.io/sig-storage/csi-attacher:v3.0.0
docker tag registry.aliyuncs.com/it00021hot/csi-provisioner:v2.0.0 k8s.gcr.io/sig-storage/csi-provisioner:v2.0.0

2. 拉项目

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.git

3. 修改master节点污点设置，使其能够创建pod

kubectl get no -o yaml | grep taint -A 5
kubectl taint nodes --all node-role.kubernetes.io/master-

4. 安装ceph集群(master1节点上执行)

进入ceph配置文件目录
cd rook/cluster/examples/kubernetes/ceph/
kubectl create -f crds.yaml -f common.yaml 
kubectl create -f operator.yaml

查看创建状态
[root@k8s-master ceph]# kubectl -n rook-ceph get pod -o wide

注意：这里我测试只有一个rook-ceph-operator，和参考文档不太一样

5. 给osd节点增加label

我给master1, master2添加label

kubectl label nodes k8s-master1 ceph-osd=enabled
kubectl label nodes k8s-master2 ceph-osd=enabled

6. 修改cluster.yaml，主要修改节点要挂载的硬盘

首先使用lsblk -f查看空的硬盘名

1）修改storage部分配置
存储的设置，默认都是true，意思是会把集群所有node的设备清空初始化，需要把值改为false。

storage: # cluster level storage configuration and selection
 useAllNodes: false     #关闭使用所有Node
 useAllDevices: false   #关闭使用所有设备
 nodes:
 - name: "k8s-master1"  #指定存储节点主机
   devices:
   - name: "sdb"    #指定磁盘为/dev/sdb
 - name: "k8s-master2"
   devices:
   - name: "sda"

2）开启网络为host模式，解决无法使用cephfs pvc的bug（如果不用cephfs不用修改)

  network:
    provider: host

官方的一个例子，一个节点挂载多个硬盘做ceph

后期想扩容直接在配置文件中添加，然后重新apply即可

https://rook.io/docs/rook/v1.5/ceph-cluster-crd.html

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: ceph/ceph:v15.2.11
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
  dashboard:
    enabled: true
  # cluster level storage configuration and selection
  storage:
    useAllNodes: false
    useAllDevices: false
    deviceFilter:
    config:
      metadataDevice:
      databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
    nodes:
    - name: "172.17.4.201"
      devices:  # specific devices to use for storage can be specified for each node
      - name: "sdb" # Whole storage device
      - name: "sdc1" # One specific partition. Should not have a file system on it.
      - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # both device name and explicit udev links are supported
      config:         # configuration can be specified at the node level which overrides the cluster level config
        storeType: bluestore
    - name: "172.17.4.301"
      deviceFilter: "^sd."

7. 执行安装cluster.yaml

kubectl apply -f cluster.yaml

8. 设置外部访问 ceph-dashboard

修改rook-ceph-mgr-dashboard类型为nodeport还是不能访问，然后把type字段删除后，svc找不到了。各种尝试无果，需要重做，步骤如下

#删除服务
kubectl delete -f cluster.yaml
kubectl delete -f operator.yaml
kubectl delete -f crds.yaml -f common.yaml

#清理硬盘
参考rook(v0.9)+ceph 清理过程记录
https://blog.csdn.net/dazuiba008/article/details/90053860

9. 重新设置外部访问ceph-dashboard

# 下载的文件夹中有这个yaml文件
kubectl create -f dashboard-external-https.yaml

10. 获得dashboard的登录密码

用户为admin, 密码通过如下方式获得：
[root@k8s-master ceph]# kubectl get secrets -n rook-ceph rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d

访问https://10.5.86.9:31157

用户：admin

密码：[;oX}4/Q?<1AK^\HXTSX

11. 安装toolbox, 其中包含用于rook调试和测试的常用工具

kubectl apply -f toolbox.yaml
一旦 toolbox 的 Pod 运行成功后，我们就可以使用下面的命令进入到工具箱内部进行操作：

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

可能用到的命令

ceph status
ceph osd status
ceph df
rados df

12. 创建pool, storageclass

# 因为只有2台服务器做了挂载了硬盘做ceph集群，因此修改文件中副本数量为2
kubectl apply -f csi/rbd/storageclass.yaml

13. 后期开启prometheus监控

如果之前部署的时候忘记在cluster.yaml中开启prometheus监控，可以在tool容器中执行

ceph mgr module enable prometheus
ceph mgr services

参考：
https://blog.51cto.com/foxhound/2553979 -包含了安装prometheus监控

1）安装prometheus operator

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.40.0/bundle.yaml

2）安装prometheus

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph/monitoring
kubectl create -f service-monitor.yaml
kubectl create -f prometheus.yaml

kubectl create -f prometheus-service.yaml

https://blog.51cto.com/foxhound/2553979

https://www.cnblogs.com/deny/p/14229987.html

https://blog.csdn.net/vic_qxz/article/details/119512240 -自动发现新增磁盘

14.使用zabbix监控ceph测试

https://baijiahao.baidu.com/s?id=1690836335544859492&wfr=spider&for=pc

15.资源定制

OSD：每T磁盘建议4G内存，limit和request最好设置一样，这样qos比较高。如果前期没有在cluster.yaml中限制内存，可以在对应osd的deployment中进行限制

name: osd
        resources:
          limits:
            cpu: "1"
            memory: 4Gi
          requests:
            cpu: 300m
            memory: 256Mi

16. osd扩容

有两种方式：

可以在原先服务器上添加更多osd，或者添加额外node。修改cluster.yaml

nodes:
- name: "node-1"
  devices:
  - name: "vdb"
  - name: "vdc"  #新添加的磁盘
    config:
      storeType: bluestore #使用bluestore存储引擎可以加速存储
      journalSizeMB: "4096"
- name: "node-2"
  devices:
  - name: "vdb"
  - name: "/dev/vdc"
    config:
      storeType: bluestore
      journalSizeMB: "4096"

然后kubectl apply -f cluster.yaml

因为扩容后会自动进行数据均衡，数据会进行自动迁移，因此一次最好只添加一个OSD. 避免IO异常

17，配置bluestore加速, 不太懂

nodes:
- name: "node-1"
  devices:
  - name: "vdb"
  - name: "vdc"  #新添加的磁盘
    config:
      storeType: "bluestore" #使用bluestore存储引擎可以加速存储
      metadataDevice: "/dev/vdd"
      databaseSizeMB: "4096"
      walSizeMB: "4096"

arrangement stable 1621a rooks

ceph