云原生学习笔记-DAY6

发布时间 2023-06-15 14:30:26作者: jack_028

云原生学习笔记-DAY6

1 Velero结合minio实现k8s etcd数据备份与恢复

1.1 Velero简介

1.1.1 Velero简单介绍

  • Velero 是vmware开源的一个云原生的灾难恢复和迁移工具,它本身也是开源的,采用Go语言编写,可以安全的备份、恢复和迁移Kubernetes集群资源数据,官网https://velero.io/。

  • Velero 是西班牙语意思是帆船,非常符合Kubernetes社区的命名风格,Velero的开发公司Heptio,已被VMware收购。

  • Velero 支持标准的K8S集群,既可以是私有云平台也可以是公有云,除了灾备之外它还能做资源移转,支持把容器应用从一个集群迁移到另一个集群。

  • Velero 的工作方式就是把kubernetes中的数据备份到对象存储以实现高可用和持久化,默认的备份保存时间为720小时,并在需要的时候进行下载和恢复

1.1.2 Velero与etcd快照备份的区别

  • etcd 快照是全局完成备份(类似于MySQL全部备份),即使需要恢复一个资源对象(类似于只恢复MySQL的一个库),但是也需要做全局恢复到备份的状态(类似于MySQL的全库恢复),即会影响其它namespace中pod运行服务(类似于会影响MySQL其它数据库的数据)。

  • velero可以有针对性的备份,比如按照namespace单独备份、只备份单独的资源对象等,在恢复的时候可以根据备份只恢复单独的namespace或资源对象,而不影响其它namespace中pod运行服务。

  • velero支持ceph、oss等对象存储,etcd 快照是一个为本地文件。

  • velero支持任务计划实现周期备份,但etcd 快照也可以基于cronjob实现。

  • velero支持对AWS EBS创建快照及还原

    https://www.qloudx.com/velero-for-kubernetes-backup-restore-stateful-workloads-with-aws-ebs-snapshots/

    https://github.com/vmware-tanzu/velero-plugin-for-aws #Elastic Block Store

1.1.3 Velero整体架构

![](file:///C:/Users/winger/AppData/Roaming/marktext/images/2023-06-10-08-18-48-image.png?msec=1686777388607)

1.1.4 Velero备份流程

  • Velero 客户端调用Kubernetes API Server创建Backup任务。

  • Backup控制器(Velero服务端)基于watch 机制通过API Server获取到备份任务。

  • Backup控制器(Velero服务端)开始执行备份动作,其会通过请求API Server获取需要备份的数据。

  • Backup控制器(Velero服务端)将获取到的数据备份到指定的对象存储server端。

![](file:///C:/Users/winger/AppData/Roaming/marktext/images/2023-06-10-08-25-26-image.png?msec=1686777388559)

1.2 基于Velero实现对kubernetes集群中的业务数据进行备份与还原

1.2.1 部署环境

1.2.2 部署minio

选择合适节点部署minio, 本实验环境在deploy节点部署

1.2.2.1 创建数据目录并下载镜像

#在deploy节点部署minio
root@k8s-deploy:/usr/local/src# mkdir -p /data/minio
#镜像下载站点https://hub.docker.com/r/minio/minio
root@k8s-deploy:~# docker pull minio/minio:RELEASE.2022-04-12T06-55-35Z

1.2.2.2 创建minio容器

如果不指定,则默认用户名与密码为 minioadmin/minioadmin,可以通过环境变量自定义,如下

root@k8s-deploy:~# docker run --name minio \
-p 9000:9000 \
-p 9999:9999 \
-d --restart=always \
-e "MINIO_ROOT_USER=admin" \
-e "MINIO_ROOT_PASSWORD=12345678" \
-v /data/minio/data:/data \
minio/minio:RELEASE.2022-04-12T06-55-35Z server /data --console-address '0.0.0.0:9999'

root@k8s-deploy:~# docker logs 51180fafd8c6
API: http://172.17.0.2:9000  http://127.0.0.1:9000
Console: http://0.0.0.0:9999
Documentation: https://docs.min.io
Finished loading IAM sub-system (took 0.0s of 0.0s to load data).

1.2.2.3 登录minio

![](file:///C:/Users/winger/AppData/Roaming/marktext/images/2023-06-10-09-22-55-image.png?msec=1686777388708)

1.2.2.4 创建minio bucket

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-10-09-38-02-创建bucket.PNG?msec=1686777388604)

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-10-09-45-44-创建minio%20bucket.PNG?msec=1686777388668)

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-10-09-47-02-image.png?msec=1686777388602)

1.2.3 部署velero

选择合适节点部署velero, 本实验在master节点部署

1.2.3.1 查看velero支持的K8s版本

官网:https://github.com/vmware-tanzu/velero

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-10-09-54-30-image.png?msec=1686777388557)

1.2.3.2 在master部署velero

1.2.3.2.1 部署velero客户端
root@k8s-master1:/usr/local/src# wget https://github.com/vmware-tanzu/velero/releases/download/v1.11.0/velero-v1.11.0-linux-amd64.tar.gz
root@k8s-master1:/usr/local/src# tar zxvf velero-v1.11.0-linux-amd64.tar.gz
root@k8s-master1:/usr/local/src# cp velero-v1.11.0-linux-amd64/velero /usr/local/bin/
root@k8s-master1:/usr/local/src# velero --help
1.2.3.2.2 准备velero服务端部署环境
1 #创建工作目录
root@k8s-master1:~# mkdir /data/velero -p
root@k8s-master1:~# cd /data/velero
root@k8s-master1:/data/velero# 

2 #准备velero访问minio的认证文件
root@k8s-master1:/data/velero# vim velero-auth.txt 
[default]
aws_access_key_id = admin
aws_secret_access_key = 12345678

3 #准备证书签发环境
root@k8s-master1:/data/velero# apt install golang-cfssl
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.4/cfssl_1.6.4_linux_amd64
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.4/cfssljson_1.6.4_linux_amd64 
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.4/cfssl-certinfo_1.6.4_linux_amd64
root@k8s-master1:/data/velero# mv cfssl-certinfo_1.6.4_linux_amd64 cfssl-certinfo
root@k8s-master1:/data/velero# mv cfssl_1.6.4_linux_amd64 cfssl
root@k8s-master1:/data/velero# mv cfssljson_1.6.4_linux_amd64 cfssljson
root@k8s-master1:/data/velero# cp cfssl-certinfo cfssl cfssljson /usr/local/bin/
root@k8s-master1:/data/velero# chmod  a+x /usr/local/bin/cfssl* 

4 #准备awsuser-csr证书签名请求文件
root@k8s-master1:/data/velero# vim awsuser-csr.json
{
  "CN": "awsuser",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "Shanghai",
      "L": "Shanghai",
      "O": "k8s",
      "OU": "System"
    }
  ]
}

5 #执行证书签名,签发awsuser的证书
> = 1.24.x:
> root@k8s-deploy:~# scp /etc/kubeasz/clusters/k8s-cluster1/ssl/ca-config.json 192.168.1.101:/data/velero #从部署节点将ca-config.json拷贝到master
> root@k8s-master1:/data/velero# /usr/local/bin/cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem -ca-key=/etc/kubernetes/ssl/ca-key.pem -config=./ca-config.json -profile=kubernetes ./awsuser-csr.json | cfssljson -bare awsuser

6 #验证证书文件已生成
root@k8s-master1:/data/velero# ll awsuser*
-rw-r--r-- 1 root root  220 Apr 14 12:29 awsuser-csr.json
-rw------- 1 root root 1679 Apr 14 12:30 awsuser-key.pem
-rw-r--r-- 1 root root  997 Apr 14 12:30 awsuser.csr
-rw-r--r-- 1 root root 1387 Apr 14 12:30 awsuser.pem

7 #分发证书到api-server证书路径,这一步路径可以是别的,只是k8s其他组件的证书放在这里,所以也拷贝到这里:
root@k8s-master1:/data/velero# cp awsuser-key.pem /etc/kubernetes/ssl/
root@k8s-master1:/data/velero# cp awsuser.pem /etc/kubernetes/ssl/

8 #生成认证所用的kubeconfig文件
8.1 #生成kubeconfig文件中集群(clusters)参数
# export KUBE_APISERVER="https://192.168.1.101:6443"
# kubectl config set-cluster kubernetes \
--certificate-authority=/etc/kubernetes/ssl/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=./awsuser.kubeconfig

8.2 #生成kubeconfig文件中用户(users)参数
# kubectl config set-credentials awsuser \
--client-certificate=/etc/kubernetes/ssl/awsuser.pem \
--client-key=/etc/kubernetes/ssl/awsuser-key.pem \
--embed-certs=true \
--kubeconfig=./awsuser.kubeconfig

8.3 #生成kubeconfig文件中上下文(contexts)参数
# kubectl config set-context kubernetes \
--cluster=kubernetes \
--user=awsuser \
--namespace=velero-system \
--kubeconfig=./awsuser.kubeconfig

8.4 #设置kubeconfig文件中默认上下文(current-context)
# kubectl config use-context kubernetes --kubeconfig=awsuser.kubeconfig

9 #k8s集群中创建awsuser clusterrolebinding, 绑定awsuser到cluster-admin:
# kubectl create clusterrolebinding awsuser --clusterrole=cluster-admin --user=awsuser

10 #创建namespace:
# kubectl create ns velero-system
1.2.3.2.3 部署velero服务端
velero install \
    --namespace velero-system \
    --kubeconfig  ./awsuser.kubeconfig \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.5.5 \
    --bucket velerodata  \
    --secret-file ./velero-auth.txt \
    --use-volume-snapshots=false \
    --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.129:9000

如果安装过慢,可以尝试手动在节点上拉取所需的镜像
root@k8s-node2:~# nerdctl  pull velero/velero-plugin-for-aws:v1.5.5
root@k8s-node2:~# nerdctl  pull velero/velero:v1.11.0
1.2.3.2.4 验证velero服务端部署
root@k8s-master1:/data/velero# kubectl get pod -n velero-system -o wide
root@k8s-master1:/data/velero# kubectl logs deployment/velero -n velero-system

1.2.4 验证velero备份与恢复

1.2.4.1 对单个名称空间的备份与恢复

1.2.4.1.1 对default名称空间进行备份
root@k8s-master1:/data/velero# DATE=`date +%Y%m%d%H%M%S`
root@k8s-master1:/data/velero# velero backup create default-backup-${DATE} \
--include-cluster-resources=true \
--include-namespaces default \
--kubeconfig=./awsuser.kubeconfig \
--namespace velero-system
Backup request "default-backup-20230610160039" submitted successfully.
Run `velero backup describe default-backup-20230610160039` or `velero backup logs default-backup-20230610160039` for more details.
root@k8s-master1:/data/velero#
1.2.4.1.2 查看default名称空间备份

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-10-16-09-14-image.png?msec=1686777388604)

1.2.4.1.3 删除deploy名称空间数据并验证数据恢复
# 删除default名称空间的pod和deploy
root@k8s-master1:/data/velero# kubectl get deploy
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deploy   3/3     3            3           21d
root@k8s-master1:/data/velero# kubectl get pods
NAME                            READY   STATUS    RESTARTS        AGE
mysqltest                       1/1     Running   3 (114m ago)    3d9h
nginx-deploy-77b7c4686c-2rwhv   1/1     Running   11 (114m ago)   21d
nginx-deploy-77b7c4686c-2vwtq   1/1     Running   11 (114m ago)   21d
nginx-deploy-77b7c4686c-hrmbm   1/1     Running   11 (114m ago)   21d
root@k8s-master1:/data/velero# kubectl delete pod mysqltest
pod "mysqltest" deleted
root@k8s-master1:/data/velero# kubectl delete deploy nginx-deploy
deployment.apps "nginx-deploy" deleted
root@k8s-master1:/data/velero# kubectl get pods
No resources found in default namespace.
root@k8s-master1:/data/velero# kubectl get deploys
error: the server doesn't have a resource type "deploys"

#执行velero还原default名称空间数据
root@k8s-master1:/data/velero# velero restore create --from-backup default-backup-20230610160039 --wait --kubeconfig=./awsuser.kubeconfig --namespace velero-system
Restore request "default-backup-20230610160039-20230610161754" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.............................................
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe default-backup-20230610160039-20230610161754` and `velero restore logs default-backup-20230610160039-20230610161754`.

#查看还原已完成
root@k8s-master1:/data/velero# velero restore describe default-backup-20230610160039-20230610161754 -n velero-system
Name:         default-backup-20230610160039-20230610161754
Namespace:    velero-system
Labels:       <none>
Annotations:  <none>

Phase:                       Completed
Total items to be restored:  442
Items restored:              442

Started:    2023-06-10 16:17:54 +0800 CST
Completed:  2023-06-10 16:18:39 +0800 CST

#查看pod和deploy已恢复
root@k8s-master1:/data/velero# kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
mysqltest                       1/1     Running   0          3m16s
nginx-deploy-77b7c4686c-2rwhv   1/1     Running   0          3m16s
nginx-deploy-77b7c4686c-2vwtq   1/1     Running   0          3m16s
nginx-deploy-77b7c4686c-hrmbm   1/1     Running   0          3m16s
root@k8s-master1:/data/velero# kubectl get deploy
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deploy   3/3     3            3           3m4s
1.2.4.1.4 对p1名称空间进行备份
root@k8s-master1:/data/velero# kubectl get deploy -n p1
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
p1-consumer-deployment      1/1     1            1           3d
p1-dubboadmin-deployment    1/1     1            1           2d23h
p1-jenkins-deployment       1/1     1            1           3d3h
p1-nginx-deployment         1/1     1            1           6d23h
p1-provider-deployment      1/1     1            1           3d
p1-tomcat-app1-deployment   1/1     1            1           6d23h
wordpress-app-deployment    1/1     1            1           2d20h
zookeeper1                  1/1     1            1           3d1h
zookeeper2                  1/1     1            1           3d1h
zookeeper3                  1/1     1            1           3d1h

root@k8s-master1:/data/velero# kubectl get pods -n p1
NAME                                        READY   STATUS    RESTARTS        AGE
mysql-0                                     2/2     Running   6 (134m ago)    3d4h
mysql-1                                     2/2     Running   6 (134m ago)    3d4h
mysql-2                                     2/2     Running   6 (134m ago)    3d4h
p1-consumer-deployment-5d55cdc8b9-7v9mz     1/1     Running   3 (134m ago)    3d
p1-dubboadmin-deployment-58c584b79-ffh4q    1/1     Running   3 (134m ago)    2d23h
p1-jenkins-deployment-7fb4bdd4c5-7bt67      1/1     Running   3 (134m ago)    3d3h
p1-nginx-deployment-88b498c8f-f82h5         1/1     Running   14 (133m ago)   6d23h
p1-provider-deployment-bd8954776-pfdd9      1/1     Running   3 (134m ago)    3d
p1-tomcat-app1-deployment-f5bdb7f9b-29ct2   1/1     Running   7 (134m ago)    6d23h
ubuntu1804                                  1/1     Running   4 (134m ago)    3d20h
wordpress-app-deployment-6cd464f4b-qdrxm    2/2     Running   6 (134m ago)    2d20h
zookeeper1-7fccd69c54-r24cd                 1/1     Running   3 (134m ago)    3d1h
zookeeper2-784448bdcf-2mhtl                 1/1     Running   3 (134m ago)    3d1h
zookeeper3-cff5f4c48-rkz9k                  1/1     Running   3 (134m ago)    3d1h

root@k8s-master1:/data/velero# DATE=`date +%Y%m%d%H%M%S`

root@k8s-master1:/data/velero# velero backup create p1-ns-backup-${DATE} \
--include-cluster-resources=true \
--include-namespaces p1 \
--kubeconfig=/root/.kube/config \
--namespace velero-system
Backup request "p1-ns-backup-20230610163123" submitted successfully.
Run `velero backup describe p1-ns-backup-20230610163123` or `velero backup logs p1-ns-backup-20230610163123` for more details.
root@k8s-master1:/data/velero#
1.2.4.1.5 查看p1名称空间备份

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-10-16-33-51-image.png?msec=1686777388606)

1.2.4.1.6 删除p1名称空间数据并验证数据恢复
#删除p1名称空间数据
root@k8s-master1:/data/velero# kubectl delete deploy wordpress-app-deployment -n p1
deployment.apps "wordpress-app-deployment" deleted
root@k8s-master1:/data/velero# kubectl delete pod ubuntu1804 -n p1
pod "ubuntu1804" deleted
root@k8s-master1:/data/velero# kubectl delete svc wordpress-app-spec -n p1
service "wordpress-app-spec" deleted

root@k8s-master1:/data/velero# kubectl get deploy -n p1
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
p1-consumer-deployment      1/1     1            1           3d
p1-dubboadmin-deployment    1/1     1            1           2d23h
p1-jenkins-deployment       1/1     1            1           3d3h
p1-nginx-deployment         1/1     1            1           6d23h
p1-provider-deployment      1/1     1            1           3d
p1-tomcat-app1-deployment   1/1     1            1           6d23h
zookeeper1                  1/1     1            1           3d1h
zookeeper2                  1/1     1            1           3d1h
zookeeper3                  1/1     1            1           3d1h
root@k8s-master1:/data/velero# kubectl get pod -n p1
NAME                                        READY   STATUS    RESTARTS        AGE
mysql-0                                     2/2     Running   6 (138m ago)    3d4h
mysql-1                                     2/2     Running   6 (138m ago)    3d4h
mysql-2                                     2/2     Running   6 (138m ago)    3d4h
p1-consumer-deployment-5d55cdc8b9-7v9mz     1/1     Running   3 (138m ago)    3d
p1-dubboadmin-deployment-58c584b79-ffh4q    1/1     Running   3 (138m ago)    2d23h
p1-jenkins-deployment-7fb4bdd4c5-7bt67      1/1     Running   3 (138m ago)    3d3h
p1-nginx-deployment-88b498c8f-f82h5         1/1     Running   14 (138m ago)   6d23h
p1-provider-deployment-bd8954776-pfdd9      1/1     Running   3 (138m ago)    3d
p1-tomcat-app1-deployment-f5bdb7f9b-29ct2   1/1     Running   7 (138m ago)    6d23h
zookeeper1-7fccd69c54-r24cd                 1/1     Running   3 (138m ago)    3d1h
zookeeper2-784448bdcf-2mhtl                 1/1     Running   3 (138m ago)    3d1h
zookeeper3-cff5f4c48-rkz9k                  1/1     Running   3 (138m ago)    3d1h
root@k8s-master1:/data/velero# kubectl get svc -n p1
NAME                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                        AGE
mysql                    ClusterIP   None             <none>        3306/TCP                                       3d5h
mysql-read               ClusterIP   10.100.40.200    <none>        3306/TCP                                       3d5h
p1-consumer-server       NodePort    10.100.169.100   <none>        80:50161/TCP                                   3d
p1-dubboadmin-service    NodePort    10.100.237.71    <none>        80:31080/TCP                                   2d23h
p1-jenkins-service       NodePort    10.100.125.72    <none>        80:38080/TCP                                   3d3h
p1-nginx-service         NodePort    10.100.62.129    <none>        80:30090/TCP,443:30091/TCP                     6d23h
p1-provider-spec         NodePort    10.100.176.153   <none>        80:41948/TCP                                   3d
p1-tomcat-app1-service   ClusterIP   10.100.32.185    <none>        80/TCP                                         6d23h
zookeeper                ClusterIP   10.100.79.229    <none>        2181/TCP                                       3d1h
zookeeper1               NodePort    10.100.128.90    <none>        2181:32181/TCP,2888:34414/TCP,3888:36670/TCP   3d1h
zookeeper2               NodePort    10.100.198.234   <none>        2181:32182/TCP,2888:60525/TCP,3888:37743/TCP   3d1h
zookeeper3               NodePort    10.100.195.52    <none>        2181:32183/TCP,2888:43020/TCP,3888:49232/TCP   3d1h

#执行velero还原
root@k8s-master1:/data/velero# velero restore create --from-backup p1-ns-backup-20230610163123 --wait --kubeconfig=./awsuser.kubeconfig --namespace velero-system
Restore request "p1-ns-backup-20230610163123-20230610164002" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.......................................................
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe p1-ns-backup-20230610163123-20230610164002` and `velero restore logs p1-ns-backup-20230610163123-20230610164002`.

#查看还原已完成
root@k8s-master1:/data/velero# velero restore describe p1-ns-backup-20230610163123-20230610164002 -n velero-system
Name:         p1-ns-backup-20230610163123-20230610164002
Namespace:    velero-system
Labels:       <none>
Annotations:  <none>

Phase:                       Completed
Total items to be restored:  492
Items restored:              492

Started:    2023-06-10 16:40:02 +0800 CST
Completed:  2023-06-10 16:40:57 +0800 CST

#查看数据已恢复
root@k8s-master1:/data/velero# kubectl get pods -n p1 |grep ubuntu
ubuntu1804                                  1/1     Running   0               3m4s
root@k8s-master1:/data/velero# kubectl get deploy -n p1
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
p1-consumer-deployment      1/1     1            1           3d
p1-dubboadmin-deployment    1/1     1            1           2d23h
p1-jenkins-deployment       1/1     1            1           3d3h
p1-nginx-deployment         1/1     1            1           6d23h
p1-provider-deployment      1/1     1            1           3d
p1-tomcat-app1-deployment   1/1     1            1           6d23h
wordpress-app-deployment    1/1     1            1           2m53s
zookeeper1                  1/1     1            1           3d1h
zookeeper2                  1/1     1            1           3d1h
zookeeper3                  1/1     1            1           3d1h
root@k8s-master1:/data/velero# kubectl get svc -n p1
NAME                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                        AGE
mysql                    ClusterIP   None             <none>        3306/TCP                                       3d5h
mysql-read               ClusterIP   10.100.40.200    <none>        3306/TCP                                       3d5h
p1-consumer-server       NodePort    10.100.169.100   <none>        80:50161/TCP                                   3d
p1-dubboadmin-service    NodePort    10.100.237.71    <none>        80:31080/TCP                                   2d23h
p1-jenkins-service       NodePort    10.100.125.72    <none>        80:38080/TCP                                   3d3h
p1-nginx-service         NodePort    10.100.62.129    <none>        80:30090/TCP,443:30091/TCP                     6d23h
p1-provider-spec         NodePort    10.100.176.153   <none>        80:41948/TCP                                   3d
p1-tomcat-app1-service   ClusterIP   10.100.32.185    <none>        80/TCP                                         6d23h
wordpress-app-spec       NodePort    10.100.4.6       <none>        80:30031/TCP,443:30033/TCP                     3m16s
zookeeper                ClusterIP   10.100.79.229    <none>        2181/TCP                                       3d1h
zookeeper1               NodePort    10.100.128.90    <none>        2181:32181/TCP,2888:34414/TCP,3888:36670/TCP   3d1h
zookeeper2               NodePort    10.100.198.234   <none>        2181:32182/TCP,2888:60525/TCP,3888:37743/TCP   3d1h
zookeeper3               NodePort    10.100.195.52    <none>        2181:32183/TCP,2888:43020/TCP,3888:49232/TCP   3d1h

1.2.4.2 备份指定资源对象

#创建pod资源
root@k8s-master1:/data/velero# kubectl run net-test1 --image=centos:7.9.2009 sleep 10000000000 -n p1
pod/net-test1 created
root@k8s-master1:/data/velero# kubectl run net-test1 --image=centos:7.9.2009 sleep 10000000000 -n test
pod/net-test1 created
root@k8s-master1:/data/velero# kubectl get pods -n p1 |grep net-test
net-test1                                   1/1     Running   0               19s
root@k8s-master1:/data/velero# kubectl get pods -n test |grep net-test
net-test1   1/1     Running   0          19s

#对Pod资源进行备份
root@k8s-master1:/data/velero# DATE=`date +%Y%m%d%H%M%S`
root@k8s-master1:/data/velero# velero backup create pod-backup-${DATE} --include-cluster-resources=true --ordered-resources 'pods=p1/net-test1,test/net-test1' --namespace velero-system --include-namespaces=p1,test
Backup request "pod-backup-20230610170037" submitted successfully.
Run `velero backup describe pod-backup-20230610170037` or `velero backup logs pod-backup-20230610170037` for more details.

#删除Pod资源
root@k8s-master1:/data/velero# kubectl delete pod net-test1 -n p1
pod "net-test1" deleted
root@k8s-master1:/data/velero# kubectl delete pod net-test1 -n test
pod "net-test1" deleted
root@k8s-master1:/data/velero# kubectl get pods -n p1 |grep net-test
root@k8s-master1:/data/velero# kubectl get pods -n test |grep net-test
No resources found in test namespace.

#执行velero还原
root@k8s-master1:/data/velero# velero restore create --from-backup pod-backup-20230610170037 --wait \
--kubeconfig=./awsuser.kubeconfig \
--namespace velero-system
Restore request "pod-backup-20230610170037-20230610170617" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.......................................................
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe pod-backup-20230610170037-20230610170617` and `velero restore logs pod-backup-20230610170037-20230610170617`.

#查看Pod已恢复
root@k8s-master1:/data/velero# kubectl get pods -n p1 |grep net-test                                                                   net-test1                                   1/1     Running   0               65s
root@k8s-master1:/data/velero# kubectl get pods -n test |grep net-test
net-test1   1/1     Running   0          70s
root@k8s-master1:/data/velero#

1.2.4.3 批量备份所有名称空间

root@k8s-master1:/data/velero# cat ns-backup.sh
#do not backup default namespace
NS_NAME=`kubectl get ns | awk '{if (NR>2){print}}' | awk '{print $1}'`
DATE=`date +%Y%m%d%H%M%S`
cd /data/velero/
for i in $NS_NAME
do
  velero backup create ${i}-ns-backup-${DATE} --include-cluster-resources=true --include-namespaces ${i} --kubeconfig=/root/.kube/config --namespace velero-system
done

root@k8s-master1:/data/velero# bash /data/velero/ns-backup.sh
Backup request "kube-node-lease-ns-backup-20230610172713" submitted successfully.
Run `velero backup describe kube-node-lease-ns-backup-20230610172713` or `velero backup logs kube-node-lease-ns-backup-20230610172713` for more details.
Backup request "kube-public-ns-backup-20230610172713" submitted successfully.
Run `velero backup describe kube-public-ns-backup-20230610172713` or `velero backup logs kube-public-ns-backup-20230610172713` for more details.

验证备份已生成

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-10-17-30-24-image.png?msec=1686777388667)

2 K8s HPA控制器

2.1 Pod伸缩简介

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-10-18-20-35-image.png?msec=1686777388612)

2.1.1 手动调整pod副本数

root@k8s-master1:~# kubectl --help |grep scale
  scale           Set a new size for a deployment, replica set, or replication controller
  autoscale       Auto-scale a deployment, replica set, stateful set, or replication controller
root@k8s-master1:~# kubectl scale --help

root@k8s-master1:~# kubectl get deploy -n p1
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
p1-consumer-deployment      1/1     1            1           3d2h
p1-dubboadmin-deployment    1/1     1            1           3d1h
p1-jenkins-deployment       1/1     1            1           3d5h
p1-nginx-deployment         1/1     1            1           7d1h

root@k8s-master1:~# kubectl scale deploy p1-nginx-deployment --replicas=3 -n p1
deployment.apps/p1-nginx-deployment scaled

root@k8s-master1:~# kubectl get deploy -n p1
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
p1-consumer-deployment      1/1     1            1           3d2h
p1-dubboadmin-deployment    1/1     1            1           3d1h
p1-jenkins-deployment       1/1     1            1           3d5h
p1-nginx-deployment         3/3     3            3           7d1h

2.1.2 动态伸缩控制器类型

![](file:///C:/Users/winger/AppData/Roaming/marktext/images/2023-06-10-18-36-19-image.png?msec=1686777388672)

2.2 HPA控制器简介

  • Horizontal Pod Autoscaling (HPA)控制器,根据预定义好的阈值及pod当前的资源利用率,自动控制在k8s集群中运行的pod数量(自动弹性水平自动伸缩).

  • 在k8s 1.1引入HPA控制器,早期使用Heapster组件采集pod指标数据,在k8s 1.11版本开始使用Metrices Server完成数据采集,然后将采集到的数据通过API(Aggregated API,汇总API),例如metrics.k8s.io、custom.metrics.k8s.io、external.metrics.k8s.io,把数据提供给HPA控制器进行查询,以实现基于某个资源利用率对pod进行扩缩容的目的。

  • 扩缩容触发条件:avg(CurrentPodsConsumption) / Target >1.1 或 <0.9,把N个pod的数据相加后根据pod的数量计算出利用率平均数除以阈值, 大于1.1就扩容,小于0.9就缩容。

  • 期望副本数计算公式:期望副本数 = ceil[当前副本数 * (当前指标 / 期望指标)] #ceil是一个向上取整的整数。

    例如,如果当前指标值为 200m,而期望值为 100m,则副本数将加倍, 因为 200.0 / 100.0 == 2.0 如果当前值为 50m,则副本数将减半, 因为 50.0 / 100.0 == 0.5。如果比率足够接近 1.0(在全局可配置的容差范围内,默认为 0.1), 则控制平面会跳过扩缩操作。

  • kube-contoller-manager 关于HPA的默认参数如下

    --horizontal-pod-autoscaler-sync-period #默认每隔15s(可以通过–horizontal-pod-autoscaler-sync-period修改)查询metrics的资源使用情况

    --horizontal-pod-autoscaler-downscale-stabilization #缩容间隔周期,默认5分钟,在5分钟保持指标才会缩容。

    --horizontal-pod-autoscaler-cpu-initialization-period #初始化延迟时间,在此时间内 pod的CPU 资源指标将不会生效,默认为5分钟

    --horizontal-pod-autoscaler-initial-readiness-delay #用于设置 pod 准备时间, 在此时间内的 pod 统统被认为未就绪及不采集数据,默认为30秒

    --horizontal-pod-autoscaler-tolerance #HPA控制器能容忍的数据差异(浮点数,默认为0.1)。即当前指标要与阈值差异上下浮动在0.1或以上,即要大于1+0.1=1.1,或小于1-0.1=0.9,比如阈值为CPU利用率50%,当前为80%,那么80/50=1.6 > 1.1则会触发扩容,反之小于0.9的情况会缩容

2.3 metrics-server 部署

  • 指标数据需要部署metrics-server,HPA使用metrics-server作为数据源。

  • 官网:https://github.com/kubernetes-sigs/metrics-server

  • Metrics Server 是 Kubernetes 内置的容器资源指标来源。

  • Metrics Server 从node节点上的 Kubelet 收集资源指标,并通过Metrics API在 Kubernetes apiserver 中公开指标数据,以供Horizontal Pod Autoscaler和Vertical Pod Autoscaler使用,也可以通过访问kubectl top node/pod 查看指标数据。

2.3.1 部署metrics-server

![](file:///C:/Users/winger/AppData/Roaming/marktext/images/2023-06-11-15-00-44-image.png?msec=1686777388561)

#yaml 文件从github下载,注意要修改里面image的地址,默认地址无法下载
root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl apply -f metrics-server-v0.6.1.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl get pods -A |grep metrics-server
kube-system                    metrics-server-8c7f58775-d47pm              1/1     Running   0               25m

#查看pod指标数据
root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl top pod
NAME                            CPU(cores)   MEMORY(bytes)
mysqltest                       0m           0Mi
nginx-deploy-77b7c4686c-2rwhv   0m           7Mi
nginx-deploy-77b7c4686c-2vwtq   0m           7Mi
nginx-deploy-77b7c4686c-hrmbm   0m           7Mi 

#查看node指标数据
root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl top node
NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
192.168.1.101   92m          2%     1215Mi          15%
192.168.1.102   79m          1%     1094Mi          14%
192.168.1.103   75m          1%     1102Mi          14%
192.168.1.111   133m         3%     4194Mi          54%
192.168.1.112   335m         8%     5457Mi          71%
192.168.1.113   154m         3%     4434Mi          57%

2.4 HPA使用

2.4.1 HPA控制器效果图

![](file:///C:/Users/winger/AppData/Roaming/marktext/images/2023-06-11-15-08-45-image.png?msec=1686777388670)

2.4.2 HPA控制器部署文件

root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# cat hpa.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  namespace: p1
  name: p1-tomcat-app1-deployment
  labels:
    app: p1-tomcat-app1
    version: v2beta1
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: p1-tomcat-app1-deployment
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 60
  #metrics:
  #- type: Resource
  #  resource:
  #    name: cpu
  #    targetAverageUtilization: 60
  #- type: Resource
  #  resource:
  #    name: memory

2.4.3 web服务与HPA控制器部署

root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# cat tomcat-app1.yaml
kind: Deployment
#apiVersion: extensions/v1beta1
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app1-deployment-label
  name: p1-tomcat-app1-deployment
  namespace: p1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: p1-tomcat-app1-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app1-selector
    spec:
      containers:
      - name: p1-tomcat-app1-container
        #image: tomcat:7.0.93-alpine
        image: lorel/docker-stress-ng
        args: ["--vm", "2", "--vm-bytes", "256M"]
        ##command: ["/apps/tomcat/bin/run_tomcat.sh"]
        imagePullPolicy: IfNotPresent
        ##imagePullPolicy: Always
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
        env:
        - name: "password"
          value: "123456"
        - name: "age"
          value: "18"
        resources:
          limits:
            cpu: 1
            memory: "256Mi"
          requests:
            cpu: 500m
            memory: "256Mi"

---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: p1-tomcat-app1-service-label
  name: p1-tomcat-app1-service
  namespace: p1
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
    #nodePort: 40003
  selector:
    app: p1-tomcat-app1-selector

#创建pod
root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl apply -f tomcat-app1.yaml
deployment.apps/p1-tomcat-app1-deployment created
service/p1-tomcat-app1-service created
#创建hpa
root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl apply -f hpa.yaml
horizontalpodautoscaler.autoscaling/p1-tomcat-app1-deployment created

#查看hpa,由于是压力测试pod, CPU使用率都基本占满
root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl get hpa -n p1
NAME                        REFERENCE                              TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
p1-tomcat-app1-deployment   Deployment/p1-tomcat-app1-deployment   200%/60%   3         10        3          34s

# describe hpa可以看到在自动扩容
root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl describe hpa -n p1
Name:                                                  p1-tomcat-app1-deployment
Namespace:                                             p1
Labels:                                                app=p1-tomcat-app1
                                                       version=v2beta1
Annotations:                                           <none>
CreationTimestamp:                                     Sun, 11 Jun 2023 17:47:51 +0800
Reference:                                             Deployment/p1-tomcat-app1-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  200% (1) / 60%
Min replicas:                                          3
Max replicas:                                          10
Deployment pods:                                       6 current / 7 desired
Conditions:
  Type            Status  Reason              Message

  ----            ------  ------              -------

  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 7
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age   From                       Message

  ----    ------             ----  ----                       -------

  Normal  SuccessfulRescale  38s   horizontal-pod-autoscaler  New size: 3; reason: Current number of replicas below Spec.MinReplicas
  Normal  SuccessfulRescale  23s   horizontal-pod-autoscaler  New size: 6; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  7s    horizontal-pod-autoscaler  New size: 7; reason: cpu resource utilization (percentage of request) above target

2.4.4 验证HPA控制器伸缩

查看deploy也可以看到扩容记录

root@image-build:/opt/k8s-data/yaml/6-20230521/metrics-server-0.6.1-case# kubectl describe deploy p1-tomcat-app1-deployment -n p1
Name:                   p1-tomcat-app1-deployment
Namespace:              p1
CreationTimestamp:      Sun, 11 Jun 2023 17:47:35 +0800
Labels:                 app=p1-tomcat-app1-deployment-label
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=p1-tomcat-app1-selector
Replicas:               10 desired | 10 updated | 10 total | 10 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=p1-tomcat-app1-selector
  Containers:
   p1-tomcat-app1-container:
    Image:      lorel/docker-stress-ng
    Port:       8080/TCP
    Host Port:  0/TCP
    Args:
      --vm
      2
      --vm-bytes
      256M
    Limits:
      cpu:     1
      memory:  256Mi
    Requests:
      cpu:     500m
      memory:  256Mi
    Environment:
      password:  123456
      age:       18
    Mounts:      <none>
  Volumes:       <none>
Conditions:
  Type           Status  Reason

  ----           ------  ------

  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   p1-tomcat-app1-deployment-64fd4566f (10/10 replicas created)
Events:
  Type    Reason             Age    From                   Message

  ----    ------             ----   ----                   -------

  Normal  ScalingReplicaSet  2m17s  deployment-controller  Scaled up replica set p1-tomcat-app1-deployment-64fd4566f to 2
  Normal  ScalingReplicaSet  106s   deployment-controller  Scaled up replica set p1-tomcat-app1-deployment-64fd4566f to 3 from 2
  Normal  ScalingReplicaSet  91s    deployment-controller  Scaled up replica set p1-tomcat-app1-deployment-64fd4566f to 6 from 3
  Normal  ScalingReplicaSet  75s    deployment-controller  Scaled up replica set p1-tomcat-app1-deployment-64fd4566f to 7 from 6
  Normal  ScalingReplicaSet  60s    deployment-controller  Scaled up replica set p1-tomcat-app1-deployment-64fd4566f to 10 from 7

3 K8s资源限制

3.1 K8s资源限制概括

3.1.1 kubernetes中资源限制类型

  • kubernetes对单个容器的CPU及memory实现资源限制

  • kubernetes对单个pod的CPU及memory实现资源限制

  • kubernetes对整个namespace的CPU及memory实现资源限制

3.1.2 K8s资源限制介绍

  • 如果运行的容器没有定义资源(memory、CPU)等限制,但是在namespace定义了LimitRange限制,那么该容器会继承LimitRange中的默认限制。

  • 如果namespace没有定义LimitRange限制,那么该容器可以只要宿主机的最大可用资源,直到无资源可用而触发宿主机(OOM Killer)

  • https://kubernetes.io/zh/docs/tasks/configure-pod-container/assign-cpu-resource/

    CPU 以核心为单位进行限制,单位可以是整核、浮点核心数或毫核(m/milli):

    2=2核心=200% 0.5=500m=50% 1.2=1200m=120%

  • https://kubernetes.io/zh/docs/tasks/configure-pod-container/assign-memory-resource/

    memory 以字节为单位,单位可以是E、P、T、G、M、K、Ei、Pi、Ti、Gi、Mi、Ki

    1536Mi=1.5Gi

  • requests(请求)为kubernetes scheduler执行pod调度时,node节点至少需要拥有的资源。

  • limits(限制)为pod运行成功后,最多可以使用的资源上限。

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-12-11-14-25-image.png?msec=1686777388563)

3.2 资源限制配置

3.2.1 kubernetes对单个容器的CPU及memory实现资源限制

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-12-11-28-54-image.png?msec=1686777388699)

3.2.1.1 对单个容器的memory实现资源限制

#stress-ng 使用参数指定压测时占用的内存,但是container使用resources.limits.memory限制内存使用
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# cat case1-pod-memory-limit.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: limit-test-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels: #rs or deployment
      app: limit-test-pod
    # matchExpressions:
    # - {key: app, operator: In, values: [ng-deploy-80,ng-rs-81]}

  template:
    metadata:
      labels:
        app: limit-test-pod
    spec:
      containers:
      - name: limit-test-container
        image: lorel/docker-stress-ng
        resources:
          limits:
            cpu: 1
            memory: 256Mi #限制容器最多可以使用256M内存
          requests:
            cpu: 1
            memory: 256Mi #kube-scheduler调度时,node至少有256M的内存才可以被调度
        #command: ["stress"]
        args: ["--vm", "2", "--vm-bytes", "256M"] #表示压力测试时会占用2*256M=512M内存
      #nodeSelector:
      #  env: group1

# 部署测试deployment
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl apply -f case1-pod-memory-limit.yml
deployment.apps/limit-test-deployment created

可以看到pod的内存占用没有超过256M,是因为limits限制了内存最多256M

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-12-12-16-52-image.png?msec=1686777388609)

3.2.1.2 对单个容器的CPU及memory实现资源限制

# container使用limits限制CPU 1.2,内存384M
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# cat case2-pod-memory-and-cpu-limit.yml
#apiVersion: extensions/v1beta1
apiVersion: apps/v1
kind: Deployment
metadata:
  name: limit-test-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels: #rs or deployment
      app: limit-test-pod
    # matchExpressions:
    # - {key: app, operator: In, values: [ng-deploy-80,ng-rs-81]}

  template:
    metadata:
      labels:
        app: limit-test-pod
    spec:
      containers:
      - name: limit-test-container
        image: lorel/docker-stress-ng
        resources:
          limits:
            cpu: "1.2"
            memory: "384Mi"
          requests:
            memory: "100Mi"
            cpu: "500m"
        #command: ["stress"]
        args: ["--vm", "2", "--vm-bytes", "256M"]
      #nodeSelector:
      #  env: group1

root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl apply -f case2-pod-memory-and-cpu-limit.yml
deployment.apps/limit-test-deployment created

可以看到CPU,内存使用都没有超过限制

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-12-12-25-07-image.png?msec=1686777388564)

3.2.2 kubernetes对单个pod的CPU及memory实现资源限制

Limit Range是对具体某个Pod或容器的资源使用进行限制(只会对新创建的资源生效,在LimitRange之前创建的资源不受限制),其限制范围包括

https://kubernetes.io/zh/docs/concepts/policy/limit-range/

  • 限制namespace中每个Pod或容器的最小与最大计算资源

  • 限制namespace中每个Pod或容器计算资源request、limit之间的比例

  • 限制namespace中每个存储卷声明(PersistentVolumeClaim)可使用的最小与最大存储空间

  • 设置namespace中容器默认计算资源的request、limit,并在运行时自动注入到容器中

# 编辑limitrange yaml文件
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# cat case3-LimitRange.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: limitrange-p1
  namespace: p1
spec:
  limits:
  - type: Container       #限制的资源类型,可以是Container,Pod,PersistentVolumeClaim之一
    max:                  #containers.resources的值不能比这里的值大,大的话资源无法创建
      cpu: "2"            #限制单个容器的最大CPU
      memory: "2Gi"       #限制单个容器的最大内存
    min:                  #containers.resources的值不能比这里的值小,小的话资源无法创建。MAX与MIN联合起来定义了containers.resources的取值区间
      cpu: "500m"         #限制单个容器的最小CPU
      memory: "512Mi"     #限制单个容器的最小内存
    default:
      cpu: "500m"         #默认单个容器的运行CPU限制,在containers.resources.limits没有定义时生效
      memory: "512Mi"     #默认单个容器的运行内存限制,在containers.resources.limits没有定义时生效
    defaultRequest:
      cpu: "500m"         #默认单个容器的调度CPU限制,在containers.resources.requests没有定义时生效
      memory: "512Mi"     #默认单个容器的调度内存限制,在containers.resources.requests没有定义时生效
    maxLimitRequestRatio:
      cpu: 2              #限制CPU limit/request比值最大为2
      memory: 2           #限制内存limit/request比值最大为1.5
  - type: Pod             #限制资源的类型,Pod表示对名称空间的Pod进行限制
    max:
      cpu: "4"            #限制单个Pod的最大CPU
      memory: "4Gi"       #限制单个Pod最大内存
  - type: PersistentVolumeClaim #限制资源的类型,表示对名称空间的PersistentVolumeClaim进行限制
    max:
      storage: 50Gi       #限制PVC最大的requests.storage
    min:
      storage: 30Gi       #限制PVC最小的requests.storage

#创建limitrange
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl apply -f case3-LimitRange.yaml
limitrange/limitrange-p1 created

#运行一个新的压力测试POD
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl run stress-ng --image=lorel/docker-stress-ng  -n p1 -- --cpu 2 --vm 4 --vm-bytes 256M

#查看POD的cpu,memery继承了limitrange的default限制配置
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl top pod stress-ng -n p1
NAME        CPU(cores)   MEMORY(bytes)
stress-ng   502m         511Mi

#尝试删除limitrange
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl delete limitranges limitrange-p1 -n p1
limitrange "limitrange-p1" deleted

#查看pod的限制并未消除
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl top pod stress-ng -n p1
NAME        CPU(cores)   MEMORY(bytes)
stress-ng   499m         400Mi

#查看pod运行配置,限制值没有改变,还是limitrange的default值
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl get pods stress-ng -n p1 -o custom-columns=:.spec.containers[0].resources
map[limits:map[cpu:500m memory:512Mi] requests:map[cpu:500m memory:512Mi]]

3.2.3 kubernetes对整个namespace的CPU及memory实现资源限制

https://kubernetes.io/zh/docs/concepts/policy/resource-quotas/

  • 限定某个对象类型(如Pod、service)可创建对象的总数

  • 限定某个对象类型可消耗的计算资源(CPU、内存)与存储资源(存储卷声明)总数

#编辑resourcequota的yaml文件,限制pod总数为5
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# cat case6-ResourceQuota-magedu.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: quota-p1
  namespace: p1
spec:
  hard:
    requests.cpu: "8"
    limits.cpu: "8"
    requests.memory: 4Gi
    limits.memory: 4Gi
    requests.nvidia.com/gpu: 4
    #pods: "100"
    pods: "5"
    services: "100"

#创建resourcequota资源
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl apply -f case6-ResourceQuota-magedu.yaml
resourcequota/quota-p1 configured

#查看resourcequota资源,可以看到当前名称空间已经有两个pod
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl get resourcequota -n p1
NAME       AGE   REQUEST                                                                                               LIMIT
quota-p1   19m   pods: 2/5, requests.cpu: 0/8, requests.memory: 0/4Gi, requests.nvidia.com/gpu: 0/4, services: 0/100   limits.cpu: 0/8, limits.memory: 0/4Gi

#编辑deployment yaml, replicas设置为5
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# cat case7-namespace-pod-limit-test.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-nginx-deployment-label
  name: p1-nginx-deployment
  namespace: p1
spec:
  replicas: 5
  selector:
    matchLabels:
      app: p1-nginx-selector
  template:
    metadata:
      labels:
        app: p1-nginx-selector
    spec:
      containers:
      - name: p1-nginx-container
        image: nginx:1.16.1
        imagePullPolicy: Always
        ports:
        - containerPort: 80
          protocol: TCP
          name: http
        env:
        - name: "password"
          value: "123456"
        - name: "age"
          value: "18"
        resources:
          limits:
            cpu: 0.5
            memory: 0.5Gi
          requests:
            cpu: 0.5
            memory: 0.5Gi

---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: p1-nginx-service-label
  name: p1-nginx-service
  namespace: p1
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
    nodePort: 30033
  selector:
    app: p1-nginx-selector

#apply deployment yaml文件
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl apply -f case7-namespace-pod-limit-test.yaml
deployment.apps/p1-nginx-deployment created
service/p1-nginx-service created

#由于总POD限制数是5,而且已经有2个其它的POD,所以5个副本只能创建3个,另外2个副本无法创建
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case# kubectl get pods -n p1
NAME                                   READY   STATUS    RESTARTS        AGE
net-test1                              1/1     Running   2 (6h37m ago)   2d
p1-nginx-deployment-7b6b8f6ddd-26gwx   1/1     Running   0               6s
p1-nginx-deployment-7b6b8f6ddd-fnv6k   1/1     Running   0               6s
p1-nginx-deployment-7b6b8f6ddd-pjh6m   1/1     Running   0               6s
ubuntu1804                             1/1     Running   2 (6h37m ago)   2d
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-limit-case#

4 K8s RBAC 鉴权

4.1 K8s API鉴权流程

kubeconfig文件里面的用户用户用于身份验证(users.user.client-certificate-data里面的CN代表用户,O代表用户组,CN一般与users.name相同,也可以不同)。授权有两种方式,直接将kubeconfig用户(O或者CN)绑定到特定的角色进行授权,如果没有绑定,也可以在kubeconfig中添加users.user.token进行授权,添加token就是获取的对应的sa的权限(因为sa一般绑定到某个角色,并且sa的名字与kubeconfig里面的用户名可以不一致,两者没有直接的关联关系),添加token后可以使用kubeconfig直接登陆dashboard

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-12-17-59-44-image.png?msec=1686777388673)

4.2 K8s API鉴权类型

鉴权类型:https://kubernetes.io/zh/docs/reference/access-authn-authz/authorization

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-12-18-02-31-image.png?msec=1686777388671)

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-12-18-06-21-image.png?msec=1686777388702)

4.3 RBAC简介

RBAC API声明了四种Kubernetes对象:Role、ClusterRole、RoleBinding和ClusterRoleBinding。

  • Role: 定义一组规则,用于访问命名空间中的 Kubernetes 资源。

  • RoleBinding: 定义用户和角色(Role)的绑定关系。

  • ClusterRole: 定义了一组访问集群中 Kubernetes 资源(包括所有命名空间)的规则。

  • ClusterRoleBinding: 定义了用户和集群角色(ClusterRole)的绑定关系。

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-12-18-12-42-image.png?msec=1686777388702)

4.4 RBAC示例

4.4.1 创建sa和sa token

1 #创建jack serviceaccount
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# cat jack-sa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: jack
  namespace: p1
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# kubectl apply -f jack-sa.yaml
serviceaccount/jack created

2 #创建jack-role角色
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# cat jack-role.yaml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: p1
  name: jack-role
rules:
- apiGroups: ["*"]
  resources: ["pods"]
  #verbs: ["*"]
  ##RO-Role
  verbs: ["get", "watch", "list"]

- apiGroups: ["*"]
  resources: ["pods/exec"]
  #verbs: ["*"]
  ##RO-Role
  verbs: ["get", "watch", "list", "create"]

- apiGroups: ["extensions", "apps/v1"]
  resources: ["deployments"]
  #verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  ##RO-Role
  verbs: ["get", "watch", "list"]

- apiGroups: ["*"]
  resources: ["*"]
  #verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  ##RO-Role
  verbs: ["get", "watch", "list"]
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# kubectl apply -f jack-role.yaml
role.rbac.authorization.k8s.io/jack-role created

3 #绑定jack serviceaccount到角色jack-role,即是授予jack服务账号相关权限
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# cat jack-role-bind.yaml
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rolebind-jack-to-jack-role
  namespace: p1
subjects:
- kind: ServiceAccount
  name: jack
  namespace: p1
roleRef:
  kind: Role
  name: jack-role
  apiGroup: rbac.authorization.k8s.io
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# kubectl apply -f jack-role-bind.yaml
rolebinding.rbac.authorization.k8s.io/rolebind-jack-to-jack-role created

4 #给jack serviceaccount创建服务账号令牌类型的secret
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# cat jack-token.yaml
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: jack-user-token
  namespace: p1
  annotations:
    kubernetes.io/service-account.name: "jack"

root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# kubectl get secret -n p1
No resources found in p1 namespace.
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# kubectl apply -f jack-token.yaml
secret/jack-user-token created

5 #获取secret名称
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# kubectl get secret -n p1
NAME              TYPE                                  DATA   AGE
jack-user-token   kubernetes.io/service-account-token   3      5s 

6 #获取secret里面的令牌
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role# kubectl get secret jack-user-token -n p1 -o jsonpath={.data.token} |base64 -d
eyJhbGciOiJSUzI1NiIsImtpZCI6InR1WlBGMzZMM2VtUUtOzI5OVBCdXlKYpFSHJrUhwcXZpbzJrLVY4eTQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJwMSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJqYWNrLXVzZXItdG9rZW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiamFjayIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6Ijg5ZDgzZWVjLWJiODQtNGU1NC04NGMwLTA4NjQzNGM1MGE3YyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpwMTpqYWNrIn0.nBBtmyQ06cusMVOt8DyTkV8huq25kO2AAI1MuWDmY-jZwphjVteA6VbvmvXAMhYGEJHP3RxdQN9t8Xw0B2ytZgMBuXN2-ruHWdCyLnGdGgMvL5PmVX5M8pumumsdgqwxY_xWS8jEyyiMMJcvJ9YRYqiEg9ujhBLffiMdAePqfmAgWVP8bXBkmMVpXsz1R9JpN8sSa15xNJ7wgyL9i5ht4vjQG7cvMnTkK1oe3nFJC3Peyoa4e7FZHTp_vhRWfh7OF04nI9FpatD_OcE2-fllQVnsNHXArEBEXEy2f-FJYbh4z7N_ZwFhubii-hoUyQzmFbuZT3k78WevpPn2iOo9Jg 
root@image-build:/opt/k8s-data/yaml/6-20230521/p1-role#

4.4.2 基于token方式的登陆

使用上面获取到的sa token登陆dashboard

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-14-09-00-02-image.png?msec=1686777388562)

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-14-09-01-10-image.png?msec=1686777388613)

4.4.3 基于kubeconfig方式的登陆

4.4.3.1 生成用于登陆的kubeconfig文件

1 创建csr文件,注意里面的CN和O要相应改变:
# cat magedu-csr.json
{
  "CN": "China",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "System"
    }
  ]
}

2 签发证书
root@k8s-master1:/data/jack# cfssl gencert -ca=/etc/kubernetes/ssl/ca.pem  -ca-key=/etc/kubernetes/ssl/ca-key.pem -config=/data/jack/ca-config.json  -profile=kubernetes jack-csr.json | cfssljson -bare jack
2023/06/14 09:25:26 [INFO] generate received request
2023/06/14 09:25:26 [INFO] received CSR
2023/06/14 09:25:26 [INFO] generating key: rsa-2048
2023/06/14 09:25:26 [INFO] encoded CSR
2023/06/14 09:25:26 [INFO] signed certificate with serial number 72278441432380952837221348922187521983434009056
2023/06/14 09:25:26 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for
websites. For more information see the Baseline Requirements for the Issuance and Management
of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org);
specifically, section 10.2.3 ("Information Requirements").

root@k8s-master1:/data/jack# ls
ca-config.json  jack-csr.json  jack-key.pem  jack.csr  jack.pem

3 生成kubeconfig文件
3.1 #生成kubeconfig文件中集群(clusters)参数
root@k8s-master1:/data/jack# kubectl config set-cluster cluster1 --certificate-authority=/etc/kubernetes/ssl/ca.pem --embed-certs=true --server=https://192.168.1.188:6443 --kubeconfig=jack.kubeconfig
Cluster "cluster1" set.
root@k8s-master1:/data/jack# ls
ca-config.json  jack-csr.json  jack-key.pem  jack.csr  jack.kubeconfig  jack.pem

3.2 #生成kubeconfig文件中用户(users)参数
root@k8s-master1:/data/jack# cp *.pem /etc/kubernetes/ssl/
root@k8s-master1:/data/jack# kubectl config set-credentials jack \
--client-certificate=/etc/kubernetes/ssl/jack.pem \
--client-key=/etc/kubernetes/ssl/jack-key.pem \
--embed-certs=true \
--kubeconfig=jack.kubeconfig
User "jack" set.

3.3 #生成kubeconfig文件中上下文(contexts)参数
root@k8s-master1:/data/jack# kubectl config set-context jack@cluster1 \
--cluster=cluster1 \
--user=jack \
--namespace=p1 \
--kubeconfig=jack.kubeconfig
Context "jack@cluster1" created.

3.4 #设置kubeconfig文件中默认上下文(current-context)
root@k8s-master1:/data/jack# kubectl config use-context jack@cluster1 --kubeconfig=jack.kubeconfig
Switched to context "jack@cluster1".

4 查看secret中token信息
root@k8s-master1:/data/jack# kubectl describe secret -n p1
Name:         jack-user-token
Namespace:    p1
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: jack
              kubernetes.io/service-account.uid: 89d83eec-bb84-4e54-84c0-086434c50a7c

Type:  kubernetes.io/service-account-token

Data
====

ca.crt:     1310 bytes
namespace:  2 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IR1WlBGMzZMM2VtUUtOazI5OVBCdXlKY1pFSHJrUDhwcXZpbzJrLVY4eTQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJwMSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJqYWNrLXVzZXItdG9rZW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiamFjayIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6Ijg5ZDgzZWVjLWJiODQtNGU1NC04NGMwLTA4NjQzNGM1MGE3YyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpwMTpqYWNrIn0.nBBtmyQ06cusMVOt8DyTkV8huq25kO2AAI1MuWDmY-jZwphjVteA6VbvmvXAMhYGEJHP3RxdQN9t8Xw0B2ytZgMBuXN2-ruHWdCyLnGdGgMvL5PmVX5M8pumumsdgqwxY_xWS8jEyyiMMJcvJ9YRYqiEg9ujhBLffiMdAePqfmAgWVP8bXBkmMVpXsz1R9JpN8sSa15xNJ7wgyL9i5ht4vjQG7cvMnTkK1oe3nFJC3Peyoa4e7FZHTp_vhRWfh7OF04nI9FpatD_OcE2-fllQVnNHXArEBEXEy2f-FJYbh4z7N_ZwFhubii-hoUyQzmFbuZT3k78WevpPn2iOo9Jg

5 将token信息写入jack.kubeconfig文件
vi jack.kubeconfig

4.4.3.2 将编辑好的kubeconfig复制出来

登陆dashboard

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-14-10-00-29-image.png?msec=1686777388572)

5 K8s亲和反亲和

5.1 pod调度流程

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-14-11-21-28-pod调度流程1.PNG?msec=1686777388613)

![](file://C:\Users\winger\AppData\Roaming\marktext\images\2023-06-14-11-21-01-image.png?msec=1686777388614)

5.2 nodeSelector

5.2.1 nodeSelector简介

  • nodeSelector 基于node标签,将pod调度到具有特定标签的目的节点上。

    参考:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/assign-pod-node/

  • 可用于基于服务类型干预Pod调度结果,如对磁盘I/O要求高的pod调度到SSD节点,对内存要求比较高的pod调度的内存较高的节点。

  • 也可以用于区分不同项目的pod,如将node添加不同项目的标签,然后区分调度。

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl describe node 192.168.1.111
Name:               192.168.1.111
Roles:              node
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=192.168.1.111
                    kubernetes.io/os=linux
                    kubernetes.io/role=node
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true

5.2.2 nodeSelector案例

为node节点打标签

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl label node 192.168.1.111 project="p1"
node/192.168.1.111 labeled
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl label node 192.168.1.111 disktype="ssd"
node/192.168.1.111 labeled

将pod调度到目的node,yaml文件中指定的key与value必须精确匹配

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case1-nodeSelector.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 4
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        #imagePullPolicy: Always
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
        env:
        - name: "password"
          value: "123456"
        - name: "age"
          value: "18"
        resources:
          limits:
            cpu: 1
            memory: "512Mi"
          requests:
            cpu: 500m
            memory: "512Mi"
      nodeSelector:
        project: p1
        disktype: ssd

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case1-nodeSelector.yaml
deployment.apps/p1-tomcat-app2-deployment created

#可以看到pod都被调度到了192.168.1.111,被调度的目标节点必须同时满足nodeSelector下面的所有标签
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                        READY   STATUS              RESTARTS       AGE     IP              NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-5674f678b-fr4xd   0/1     ContainerCreating   0              16s     <none>          192.168.1.111   <none>           <none>
p1-tomcat-app2-deployment-5674f678b-n8cx8   0/1     ContainerCreating   0              16s     <none>          192.168.1.111   <none>           <none>
p1-tomcat-app2-deployment-5674f678b-ps6hv   0/1     ContainerCreating   0              16s     <none>          192.168.1.111   <none>           <none>
p1-tomcat-app2-deployment-5674f678b-tfsjm   0/1     ContainerCreating   0              16s     <none>          192.168.1.111   <none>           <none>
ubuntu1804                                  1/1     Running             3 (5h4m ago)   3d20h   10.200.117.36   192.168.1.111   <none>           <none>
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case#

为node节点打其它标签

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl label node 192.168.1.112 disktype="ssd"
node/192.168.1.112 labeled

5.3 nodeName

直接将pod调度到指定名称的节点上

5.3.1 nodeName案例

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get nodes
NAME            STATUS                     ROLES    AGE   VERSION
192.168.1.101   Ready,SchedulingDisabled   master   53d   v1.26.4
192.168.1.102   Ready,SchedulingDisabled   master   53d   v1.26.4
192.168.1.103   Ready,SchedulingDisabled   master   52d   v1.26.4
192.168.1.111   Ready                      node     53d   v1.26.4
192.168.1.112   Ready                      node     53d   v1.26.4
192.168.1.113   Ready                      node     52d   v1.26.4

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case2-nodename.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      nodeName: 192.168.1.113
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
        env:
        - name: "password"
          value: "123456"
        - name: "age"
          value: "18"
        resources:
          limits:
            cpu: 1
            memory: "512Mi"
          requests:
            cpu: 500m
            memory: "512Mi"

#创建pod
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case2-nodename.yaml
deployment.apps/p1-tomcat-app2-deployment created

#查看pod被调度到了指定的192.168.1.113
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                        READY   STATUS    RESTARTS        AGE     IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-6f4b69c7f-6vr5b   1/1     Running   0               9s      10.200.182.136   192.168.1.113   <none>           <none>

5.4 node affinity

affinity是Kubernetes 1.2版本后引入的新特性,类似于nodeSelector,允许使用者指定一些Pod在Node间调度的约束,目前支持两种形式:

  • requiredDuringSchedulingIgnoredDuringExecution #必须满足pod调度匹配条件,如果不满足则不进行调度

  • preferredDuringSchedulingIgnoredDuringExecution #倾向满足pod调度匹配条件,不满足的情况下会调度的不符合条件的Node上

  • IgnoreDuringExecution表示如果在Pod运行期间Node的标签发生变化,导致亲和性策略不能满足,也会继续运行当前的Pod。

  • Affinity与anti-affinity的目的也是控制pod的调度结果,但是相对于nodeSelector,Affinity(亲和)与anti-affinity(反亲和)的功能更加强大

affinity与nodeSelector对比:

1、亲和与反亲和对目的标签的选择匹配不仅仅支持and,还支持In、NotIn、Exists、DoesNotExist、Gt、Lt。

  • In:标签的值存在匹配列表中(匹配成功就调度到目的node,实现node亲和)

  • NotIn:标签的值不存在指定的匹配列表中(不会调度到目的node,实现反亲和)

  • Gt:标签的值大于某个值(字符串)

  • Lt:标签的值小于某个值(字符串)

  • Exists:指定的标签存在

  • DoesNotExist:指定的标签不存在

2、可以设置软匹配和硬匹配,在软匹配下,如果调度器无法匹配节点,仍然将pod调度到其它不符合条件的节点。

3、还可以对pod定义亲和策略,比如允许哪些pod可以或者不可以被调度至同一台node。

注:

  • 如果定义一个nodeSelectorTerms(条件)中通过一个matchExpressions基于列表指定了多个operator条件,则只要满足其中一个条件,就会被调度到相应的节点上,即or的关系,即如果nodeSelectorTerms下面有多个条件的话,只要满足任何一个条件就可以了,见 case 3.1.1。

  • 如果定义一个nodeSelectorTerms中都通过一个matchExpressions(匹配表达式)指定key匹配多个条件,则所有的目的条件都必须满足才会调度到对应的节点,即and的关系,即果matchExpressions有多个选项的话,则必须同时满足所有这些条件才能正常调度,见3.1.2 。

5.4.1 节点硬亲和

  1. 在不满足硬亲和指定的条件时,pod会处于pending无法调度

  2. 当nodeSelectorTerms有多个matchExpressions时,只要任何一个能匹配成功就能调度

  3. 当matchExpressions有多个key时,所有key都必须匹配成功才认为matchExpressions匹配成功

5.4.1.1 nodeAffinity多matchExpressions情况

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case3-1.1-nodeAffinity-requiredDuring-matchExpressions.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms: #如果下面有多个matchExpressions,只要有任意一个matchExpressions下的key与value匹配就可以调度
            - matchExpressions: #匹配条件1,有一个key但是有多个values、则只要匹配成功一个value就可以调度
              - key: disktype
                operator: In
                values:
                - ssd # 只有一个value是匹配成功就可以调度
                - xxx
            - matchExpressions: #匹配条件2,有一个key但是有多个values、则只要匹配成功一个value就可以调度
              - key: project
                operator: In
                values:
                - xxx  #如何matchExpressions1中有key和value匹配,即使这两value都匹配不上也可以调度
                - nnn

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case3-1.1-nodeAffinity-requiredDuring-matchExpressions.yaml
deployment.apps/p1-tomcat-app2-deployment created
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1
NAME                                         READY   STATUS    RESTARTS        AGE
p1-tomcat-app2-deployment-6f67848ff7-wn2bd   1/1     Running   0               9s
ubuntu1804                                   1/1     Running   3 (7h16m ago)   3d23h
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS        AGE     IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-6f67848ff7-wn2bd   1/1     Running   0               14s     10.200.182.173   192.168.1.113   <none>           <none>
ubuntu1804                                   1/1     Running   3 (7h16m ago)   3d23h   10.200.117.36    192.168.1.111   <none>           <none>
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get node 192.168.1.113 --show-labels
NAME            STATUS   ROLES   AGE   VERSION   LABELS
192.168.1.113   Ready    node    52d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.113,kubernetes.io/os=linux,kubernetes.io/role=node

5.4.1.2 nodeAffinity多keys情况

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case3-1.2-nodeAffinity-requiredDuring-matchExpressions.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: disktype #匹配key 1,同一个key的多个value只有有一个匹配成功就认为当前key匹配成功
                operator: In
                values:
                - ssd
                - hdd
              - key: project #匹配key 2,当前key也要匹配成功一个value,即key 1和key 2必须同时匹配成功, 否则不调度
                operator: In
                values:
                - p1

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case3-1.2-nodeAffinity-requiredDuring-matchExpressions.yaml
deployment.apps/p1-tomcat-app2-deployment created

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                        READY   STATUS    RESTARTS        AGE     IP              NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-c9bf59d6f-wgpkd   1/1     Running   0               21s     10.200.117.63   192.168.1.111   <none>           <none>
ubuntu1804                                  1/1     Running   3 (7h32m ago)   3d23h   10.200.117.36   192.168.1.111   <none>           <none>

#只有192.168.1.111同时满足disktype=ssd和project=p1, pod被调度到了这个节点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get node 192.168.1.111 --show-labels
NAME            STATUS   ROLES   AGE   VERSION   LABELS
192.168.1.111   Ready    node    53d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.111,kubernetes.io/os=linux,kubernetes.io/role=node,project=p1

5.4.2 节点软亲和

节点软亲和是尽量将pod调度到能匹配标签的节点上,如果所有节点都不能匹配标签,pod任然可以被调度

5.4.2.1 软亲和

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case3-2.1-nodeAffinity-preferredDuring.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 80 #软亲和条件1,weight值越大优先级越高,越优先匹配调度
            preference:
              matchExpressions:
              - key: project
                operator: In
                values:
                  - p1
          - weight: 60 #软亲和条件2,在条件1不满足时匹配条件2
            preference:
              matchExpressions:
              - key: disktype
                operator: In
                values:
                  - ssd


#创建pod,可以看到pod优先被调度到project=p1的节点上
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case3-2.1-nodeAffinity-preferredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS     AGE   IP              NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-84cc6dbd75-vxn6d   1/1     Running   0            13s   10.200.117.25   192.168.1.111   <none>           <none>
ubuntu1804                                   1/1     Running   3 (8h ago)   4d    10.200.117.36   192.168.1.111   <none>           <none>

#删除节点,并将节点的project=p1标签清除,重建pod,可以看到pod被调度到disktype=ssd的节点上
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl delete -f case3-2.1-nodeAffinity-preferredDuring.yaml
deployment.apps "p1-tomcat-app2-deployment" deleted
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl label node 192.168.1.111 project-
node/192.168.1.111 unlabeled
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case3-2.1-nodeAffinity-preferredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS     AGE   IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-84cc6dbd75-kvqtv   1/1     Running   0            12s   10.200.182.170   192.168.1.113   <none>           <none>
ubuntu1804                                   1/1     Running   3 (8h ago)   4d    10.200.117.36    192.168.1.111   <none>           <none>

#删除节点,清除节点的disktype=ssd标签,重建pod,可以看到pod在没有标签匹配的情况下任然可以被调度到某个节点上
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl delete -f case3-2.1-nodeAffinity-preferredDuring.yaml
deployment.apps "p1-tomcat-app2-deployment" deleted
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl label node 192.168.1.113 disktype-
node/192.168.1.113 unlabeled
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case3-2.1-nodeAffinity-preferredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS     AGE   IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-84cc6dbd75-nrcdm   1/1     Running   0            9s    10.200.182.181   192.168.1.113   <none>           <none>
ubuntu1804                                   1/1     Running   3 (8h ago)   4d    10.200.117.36    192.168.1.111   <none>           <none>

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get node 192.168.1.113 --show-labels
NAME            STATUS   ROLES   AGE   VERSION   LABELS
192.168.1.113   Ready    node    52d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.113,kubernetes.io/os=linux,kubernetes.io/role=node

5.4.3 节点硬亲和与软亲和结合使用

节点硬亲和:被调度的节点必须满足的条件,如果没有节点满足条件,则pod会pending无法调度

节点软亲和:pod倾向于调度到匹配特定标签的节点上,如果没有节点匹配标签,pod任然可以被调度

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case3-2.2-nodeAffinity-requiredDuring-preferredDuring.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution: #硬亲和,被调度到的节点必须满足的条件
            nodeSelectorTerms:
            - matchExpressions: #硬匹配条件1
              - key: "kubernetes.io/role"
                operator: NotIn
                values:
                - "master" #硬性匹配key的值kubernetes.io/role不包含master的节点, 即绝对不会调度到master节点(node反亲和)
          preferredDuringSchedulingIgnoredDuringExecution: #软亲和,倾向于调度到匹配相关标签的节点上
          - weight: 80
            preference:
              matchExpressions:
              - key: project
                operator: In
                values:
                  - p1
          - weight: 60
            preference:
              matchExpressions:
              - key: disktype
                operator: In
                values:
                  - ssd

#可以看到,在3个node节点都不匹配软亲和标签时,pod任然被调度到了192.168.1.113
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get nodes --show-labels
NAME            STATUS                     ROLES    AGE   VERSION   LABELS
192.168.1.101   Ready,SchedulingDisabled   master   53d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.101,kubernetes.io/os=linux,kubernetes.io/role=master
192.168.1.102   Ready,SchedulingDisabled   master   53d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.102,kubernetes.io/os=linux,kubernetes.io/role=master
192.168.1.103   Ready,SchedulingDisabled   master   52d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.103,kubernetes.io/os=linux,kubernetes.io/role=master
192.168.1.111   Ready                      node     53d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.111,kubernetes.io/os=linux,kubernetes.io/role=node
192.168.1.112   Ready                      node     53d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.112,kubernetes.io/os=linux,kubernetes.io/role=node
192.168.1.113   Ready                      node     52d   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.113,kubernetes.io/os=linux,kubernetes.io/role=node
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case3-2.2-nodeAffinity-requiredDuring-preferredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS     AGE   IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-5fb9c58bd7-9mxl9   1/1     Running   0            9s    10.200.182.143   192.168.1.113   <none>           <none>
ubuntu1804                                   1/1     Running   3 (9h ago)   4d    10.200.117.36    192.168.1.111   <none>           <none>

5.4.4 节点反亲和

使用NotIn或者DoesNotExist指定反亲和的条件

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat my-case3-3.1-nodeantiaffinity.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions: #匹配条件1
              - key: disktype
                operator: NotIn #调度的目的节点没有名为disktype且值为ssd的标签
                values:
                - ssd #绝对不会调度到含有label的名为disktype且label值为hdd的ssd的节点,即会调度到没有key为disktype且值为hdd的ssd的节点

#创建pod
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f my-case3-3.1-nodeantiaffinity.yaml
deployment.apps/p1-tomcat-app2-deployment configured

#可以看到pod被调度到了没有标签disktype=ssd的节点192.168.1.113.
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                        READY   STATUS    RESTARTS        AGE     IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-749fcc6cc-dlxgt   1/1     Running   0               14s     10.200.182.163   192.168.1.113   <none>           <none>
ubuntu1804                                  1/1     Running   3 (6h48m ago)   3d22h   10.200.117.36    192.168.1.111   <none>           <none>
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case#

#删除pod
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl delete -f my-case3-3.1-nodeantiaffinity.yaml
deployment.apps "p1-tomcat-app2-deployment" deleted

#给192.168.1.113打上标签disktype=ssd
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl label node 192.168.1.113 disktype=ssd
node/192.168.1.113 labeled

#重新创建pod
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f my-case3-3.1-nodeantiaffinity.yaml
deployment.apps/p1-tomcat-app2-deployment created

#查看pod状态是pending, 由于所有节点都有标签disktype=ssd, 导致pod无法被调度
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1
NAME                                        READY   STATUS    RESTARTS        AGE
p1-tomcat-app2-deployment-749fcc6cc-dv2dw   0/1     Pending   0               5s
ubuntu1804                                  1/1     Running   3 (6h54m ago)   3d22h

#通过describe node可以看到原因。3个master节点不可调度,3个node节点不满足affinity
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl describe pod p1-tomcat-app2-deployment-749fcc6cc-dv2dw  -n p1
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  30s   default-scheduler  0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..

5.5 podAffinity与podAntiAffinity

5.5.1 pod亲和与反亲和简介

  • Pod亲和性与反亲和性可以基于已经在node节点上运行的Pod的标签来约束新创建的Pod可以调度到的目的节点,注意不是基于node上的标签而是使用的已经运行在node上的pod标签匹配。

  • 其规则的格式为如果 node节点 A已经运行了一个或多个满足调度新创建的Pod B的规则,那么新的Pod B在亲和的条件下会调度到A节点之上,而在反亲和性的情况下则不会调度到A节点至上。

  • 其中规则表示一个具有可选的关联命名空间列表的LabelSelector,之所以Pod亲和与反亲和可以通过LabelSelector选择namespace,是因为Pod是命名空间限定的,而node不属于任何nemespace,所以node的亲和与反亲和不需要namespace,因此作用于Pod标签的标签选择算符必须指定选择算符应用在哪个命名空间。

  • 从概念上讲,node节点是一个拓扑域(具有拓扑结构的区域、宕机的时候的故障域),比如k8s集群中的单台node节点、一个机架、云供应商可用区、云供应商地理区域等,可以使用topologyKey来定义亲和或者反亲和的颗粒度是node级别还是可用区级别,以便kubernetes调度系统用来识别并选择正确的目的拓扑域

  • Pod 亲和性与反亲和性的合法操作符(operator)有 In、NotIn、Exists、DoesNotExist。

  • 在Pod亲和配置中,在requiredDuringSchedulingIgnoredDuringExecution和

    preferredDuringSchedulingIgnoredDuringExecution中,topologyKey不允许为空(Empty topologyKey is not allowed.)。

  • 在Pod反亲和配置中,requiredDuringSchedulingIgnoredDuringExecution和

    preferredDuringSchedulingIgnoredDuringExecution 中,topologyKey也不可以为空(Empty topologyKey is not allowed.)。

  • 对于requiredDuringSchedulingIgnoredDuringExecution要求的Pod反亲和性,准入控制器LimitPodHardAntiAffinityTopology被引入以确保topologyKey只能是kubernetes.io/hostname,如果希望 topologyKey 也可用于其他定制拓扑逻辑,可以更改准入控制器或者禁用。

  • 除上述情况外,topologyKey 可以是任何合法的标签键。

5.5.2 部署web服务

5.5.2.1 编写yaml文件

在magedu anmespace部署一个nginx服务,nginx pod将用于后续的pod 亲和及反亲和测试,且pod的label如下:

 app: python-nginx-selector

 project: python

5.5.2.2 部署nginx web服务

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case4-4.1-nginx.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: python-nginx-deployment-label
  name: python-nginx-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: python-nginx-selector
  template:
    metadata:
      labels:
        app: python-nginx-selector
        project: python
    spec:
      containers:
      - name: python-nginx-container
        image: nginx:1.20.2-alpine
        #command: ["/apps/tomcat/bin/run_tomcat.sh"]
        #imagePullPolicy: IfNotPresent
        imagePullPolicy: Always
        ports:
        - containerPort: 80
          protocol: TCP
          name: http
        - containerPort: 443
          protocol: TCP
          name: https
        env:
        - name: "password"
          value: "123456"
        - name: "age"
          value: "18"

---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: python-nginx-service-label
  name: python-nginx-service
  namespace: p1
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
    nodePort: 30014
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
    nodePort: 30453
  selector:
    app: python-nginx-selector
    project: python #一个或多个selector,至少能匹配目标pod的一个标签

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case4-4.1-nginx.yaml
deployment.apps/python-nginx-deployment created
service/python-nginx-service created

5.5.3 podAffinity-软亲和

pod软亲和:新建pod倾向于调度到pod软亲和标签匹配pod所在的节点,如果之前已经存在pod能与pod软亲和标签匹配,则新pod调度至匹配pod所在的同一个节点。如果没有pod能与pod软亲和标签匹配,则新pod也可以被调度

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case4-4.2-podaffinity-preferredDuring.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        #imagePullPolicy: Always
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        podAffinity:  #Pod亲和
          #requiredDuringSchedulingIgnoredDuringExecution: #硬亲和,与已存在的pod匹配,必须匹配成功才调度,如果匹配失败则拒绝调度。
          preferredDuringSchedulingIgnoredDuringExecution: #软亲和,与已存在的pod匹配,能匹配成功就调度到同一个topologyKey的node上,匹配不成功会由kubernetes自行调度。
          - weight: 100
            podAffinityTerm:
              labelSelector: #标签选择
                matchExpressions: #正则匹配
                - key: project
                  operator: In
                  values:
                    #- pythonX
                    - python
              topologyKey: kubernetes.io/hostname
              namespaces:
                - p1
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case#

#可以看到,由于标签匹配成功,新的pod p1-tomcat-app2-deployment-7ff64c99fd-7vccg也被调度到了192.168.1.112
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case4-4.2-podaffinity-preferredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS      AGE     IP              NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-7ff64c99fd-7vccg   1/1     Running   0             5s      10.200.81.43    192.168.1.112   <none>           <none>
python-nginx-deployment-cf5f7d897-wkppd      1/1     Running   1 (13m ago)   9h      10.200.81.50    192.168.1.112   <none>           <none>
ubuntu1804                                   1/1     Running   4 (13m ago)   4d12h   10.200.117.40   192.168.1.111   <none>           <none>

5.5.4 podAffinity-硬亲和

pod硬亲和:新pod一点要被调度到硬亲和标签匹配pod所在的节点,如果没有pod能匹配硬亲和标签,则就是没有node能被调度,新pod会处于pending。如果已存在pod能匹配到pod硬亲和标签,则新pod会被调度到匹配pod所在的节点。如果没有pod能匹配到pod硬亲和的标签,则新pod会处于pending无法调度成功

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case4-4.3-podaffinity-requiredDuring.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        #imagePullPolicy: Always
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution: #硬亲和
          - labelSelector:
              matchExpressions:
              - key: project
                operator: In
                values:
                  - python
            topologyKey: "kubernetes.io/hostname"
            namespaces:
              - p1
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case4-4.3-podaffinity-requiredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created
#看到新pod被调度到了匹配pod相同的节点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                        READY   STATUS    RESTARTS      AGE     IP              NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-765695f5d-g5d2t   1/1     Running   0             6s      10.200.81.22    192.168.1.112   <none>           <none>
python-nginx-deployment-cf5f7d897-wkppd     1/1     Running   1 (30m ago)   10h     10.200.81.50    192.168.1.112   <none>           <none>
ubuntu1804                                  1/1     Running   4 (30m ago)   4d13h   10.200.117.40   192.168.1.111   <none>           <none>

#修改待创建pod的硬亲和标签,导致已有所有pod都无法匹配,最终新pod无法调度
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl delete -f case4-4.3-podaffinity-requiredDuring.yaml
deployment.apps "p1-tomcat-app2-deployment" deleted
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# vi case4-4.3-podaffinity-requiredDuring.yaml
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case4-4.3-podaffinity-requiredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS      AGE     IP              NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-54964ddfd8-ljhr4   0/1     Pending   0             9s      <none>          <none>          <none>           <none>
python-nginx-deployment-cf5f7d897-wkppd      1/1     Running   1 (32m ago)   10h     10.200.81.50    192.168.1.112   <none>           <none>
ubuntu1804                                   1/1     Running   4 (32m ago)   4d13h   10.200.117.40   192.168.1.111   <none>           <none>

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl describe pod p1-tomcat-app2-deployment-54964ddfd8-ljhr4 -n p1

Events:
  Type     Reason            Age   From               Message

  ----     ------            ----  ----               -------

  Warning  FailedScheduling  51s   default-scheduler  0/6 nodes are available: 3 node(s) didn't match pod affinity rules, 3 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..

5.5.5 podAntiAffinity-硬反亲和

pod硬反亲和:新pod一定不能调度到pod硬反亲和标签匹配pod所在的节点。如果有pod能匹配到pod硬反亲和的标签,则新pod不能被调度到匹配pod所在的节点。如果没有pod能匹配到硬反亲和标签(即不能被调度的节点不存在),则新pod可以被调度

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case4-4.4-podAntiAffinity-requiredDuring.yaml
kind: Deployment
#apiVersion: extensions/v1beta1
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        #imagePullPolicy: Always
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: project
                operator: In
                values:
                  - python
            topologyKey: "kubernetes.io/hostname"
            namespaces:
              - p1
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case4-4.4-podAntiAffinity-requiredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created

#可以看到硬反亲和匹配后,新pod p1-tomcat-app2-deployment-59cf558b96-4hxdt被调度到了与匹配pod不同的节点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS      AGE     IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-59cf558b96-4hxdt   1/1     Running   0             7s      10.200.182.184   192.168.1.113   <none>           <none>
python-nginx-deployment-cf5f7d897-wkppd      1/1     Running   1 (43m ago)   10h     10.200.81.50     192.168.1.112   <none>           <none>
ubuntu1804                                   1/1     Running   4 (43m ago)   4d13h   10.200.117.40    192.168.1.111   <none>           <none>

5.5.6 podAntiAffinity-软反亲和

pod软反亲和:新pod倾向于不调度到与pod软反亲和标签匹配pod所在的节点。如果没有pod能匹配软反亲和标签,即倾向于不调度的节点不存在,新pod可以被调度。如果软发亲和标签匹配pod所在节点之外的其他节点调度不成功(可能时资源或其他原因),新pod可以被调度到软反亲和标签匹配pod所在的节点

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case4-4.5-podAntiAffinity-preferredDuring.yaml
kind: Deployment
#apiVersion: extensions/v1beta1
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app2-deployment-label
  name: p1-tomcat-app2-deployment
  namespace: p1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: p1-tomcat-app2-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app2-selector
    spec:
      containers:
      - name: p1-tomcat-app2-container
        image: tomcat:7.0.94-alpine
        imagePullPolicy: IfNotPresent
        #imagePullPolicy: Always
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: project
                  operator: In
                  values:
                    - python
              topologyKey: kubernetes.io/hostname
              namespaces:
                - p1

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case4-4.5-podAntiAffinity-preferredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment configured

#正常情况下,pod没有被调度到软反亲和标签匹配pod所在的节点。
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS       AGE     IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-697448c476-2jfss   1/1     Running   0              47s     10.200.182.134   192.168.1.113   <none>           <none>
p1-tomcat-app2-deployment-697448c476-dfnf8   1/1     Running   0              46s     10.200.117.33    192.168.1.111   <none>           <none>
python-nginx-deployment-cf5f7d897-wkppd      1/1     Running   1 (142m ago)   11h     10.200.81.50     192.168.1.112   <none>           <none>
ubuntu1804                                   1/1     Running   4 (142m ago)   4d15h   10.200.117.40    192.168.1.111   <none>           <none>

#删除pod
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl delete -f case4-4.5-podAntiAffinity-preferredDuring.yaml
#给pod软反亲和标签匹配pod所在节点以外的节点打上污点,禁止调度
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl taint node 192.168.1.111 key1=value1:NoSchedule
node/192.168.1.111 tainted
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl taint node 192.168.1.113 key1=value1:NoSchedule
node/192.168.1.113 tainted
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case4-4.5-podAntiAffinity-preferredDuring.yaml
deployment.apps/p1-tomcat-app2-deployment created
#在其他节点不能被调度的情况下,新pod被调度到了软反亲和标签匹配pod所在的节点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS       AGE     IP              NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app2-deployment-697448c476-dmvx4   1/1     Running   0              10s     10.200.81.1     192.168.1.112   <none>           <none>
p1-tomcat-app2-deployment-697448c476-rvrcc   1/1     Running   0              10s     10.200.81.41    192.168.1.112   <none>           <none>
python-nginx-deployment-cf5f7d897-wkppd      1/1     Running   1 (146m ago)   11h     10.200.81.50    192.168.1.112   <none>           <none>
ubuntu1804                                   1/1     Running   4 (145m ago)   4d15h   10.200.117.40   192.168.1.111   <none>           <none>

5.6 污点与容忍

打了污点的node默认一般是不可调度的,只有pod定义了容忍node的污点才可以被调度到node

5.6.1 污点简介

  • 污点(taints),用于node节点排斥 Pod调度,与亲和的作用是完全相反的, 即打了taint的node和pod是排斥调度关系。

  • 容忍(toleration), 用于Pod容忍node节点的污点信息,如果pod能容忍node的污点,即使node有污点信息也会将新的pod调度到node

    参考:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/taint-and-toleration/

  • 污点的三种类型

    NoSchedule: 表示k8s不会将Pod调度到具有该污点的Node上

    PreferNoSchedule: 表示k8s将尽量避免将Pod调度到具有该污点的Node上

    NoExecute: 表示k8s不会将Pod调度到具有该污点的Node上,同时会将Node上已经存在的Pod强制驱逐出去

#给节点打污点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl taint node 192.168.1.111 key1=value1:NoSchedule
node/192.168.1.111 tainted
#查看节点的污点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl describe node 192.168.1.111 | grep Taint
Taints:             key1=value1:NoSchedule
#取消节点的污点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl taint node 192.168.1.111 key1:NoSchedule-
node/192.168.1.111 untainted
#查看节点的污点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl describe node 192.168.1.111 | grep Taint
Taints:             <none>

![](file:///C:/Users/winger/AppData/Roaming/marktext/images/2023-06-15-06-22-18-image.png?msec=1686781340700)

5.6.2 容忍简介

  • tolerations容忍

    定义 Pod 的容忍度(可以接受node的哪些污点),容忍后可以将Pod调度至含有该污点的node。

  • 基于operator的污点匹配

    如果operator是Exists,则容忍度不需要value而是直接匹配污点类型。

    如果operator是Equal,则需要指定value并且value的值需要等于tolerations的key。

5.6.3 容忍案例

root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# cat case5.1-taint-tolerations.yaml
kind: Deployment
#apiVersion: extensions/v1beta1
apiVersion: apps/v1
metadata:
  labels:
    app: p1-tomcat-app1-deployment-label
  name: p1-tomcat-app1-deployment
  namespace: p1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1-tomcat-app1-selector
  template:
    metadata:
      labels:
        app: p1-tomcat-app1-selector
    spec:
      containers:
      - name: p1-tomcat-app1-container
        #image: harbor.p1.net/p1/tomcat-app1:v7
        image: tomcat:7.0.93-alpine
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          protocol: TCP
          name: http
      tolerations:
      - key: "key1"
        operator: "Equal"
        value: "value1"
        effect: "NoSchedule"

---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: p1-tomcat-app1-service-label
  name: p1-tomcat-app1-service
  namespace: p1
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
    #nodePort: 40003
  selector:
    app: p1-tomcat-app1-selector

#给节点打上污点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl taint node 192.168.1.111 key2=value2:NoSchedule
node/192.168.1.111 tainted
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl taint node 192.168.1.112 key2=value2:NoSchedule
node/192.168.1.112 tainted
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl taint node 192.168.1.113 key1=value1:NoSchedule
node/192.168.1.113 tainted

#创建pod
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case5.1-taint-tolerations.yaml
deployment.apps/p1-tomcat-app1-deployment created
service/p1-tomcat-app1-service created

#可以看到由于pod能容忍192.168.1.113的污点,所以pod都被调度到了该节点
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS       AGE     IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app1-deployment-5df4685596-crsjb   1/1     Running   0              6s      10.200.182.174   192.168.1.113   <none>           <none>
p1-tomcat-app1-deployment-5df4685596-hl2q4   1/1     Running   0              10s     10.200.182.166   192.168.1.113   <none>           <none>
p1-tomcat-app1-deployment-5df4685596-q4k6r   1/1     Running   0              10s     10.200.182.178   192.168.1.113   <none>           <none>

#将节点的污点效果改成PreferNoSchedule
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl describe node 192.168.1.113 | grep Taints
Taints:             key1=value1:PreferNoSchedule
#取消pod的容忍配置
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# vi case5.1-taint-tolerations.yaml
#重新创建pod
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl apply -f case5.1-taint-tolerations.yaml
deployment.apps/p1-tomcat-app1-deployment created
service/p1-tomcat-app1-service created 
#看到pod没有容忍配置的时候,如果node污点是PreferNoSchedule(尽量不调度到该节点),当没有其他节点可以被调度时,也可以被调度到PreferNoSchedule的节点上
root@image-build:/opt/k8s-data/yaml/6-20230521/Affinit-case# kubectl get pods -n p1 -o wide
NAME                                         READY   STATUS    RESTARTS       AGE     IP               NODE            NOMINATED NODE   READINESS GATES
p1-tomcat-app1-deployment-69d9d6d598-4lv27   1/1     Running   0              2s      10.200.182.145   192.168.1.113   <none>           <none>
p1-tomcat-app1-deployment-69d9d6d598-cdwmw   1/1     Running   0              17s     10.200.182.157   192.168.1.113   <none>           <none>
p1-tomcat-app1-deployment-69d9d6d598-xzwvc   1/1     Running   0              2s      10.200.182.144   192.168.1.113   <none>           <none>

5.7 驱逐

5.7.1 驱逐简介

节点压力驱逐是由各kubelet进程主动终止Pod,以回收节点上的内存、磁盘空间等资源的过程,kubelet监控当前node节点的CPU、内存、磁盘空间和文件系统的inode等资源,当这些资源中的一个或者多个达到特定的消耗水平,kubelet就会主动地将节点上一个或者多个Pod强制驱逐,以防止当前node节点资源无法正常分配而引发的OOM。

![](file:///C:/Users/winger/AppData/Roaming/marktext/images/2023-06-15-09-27-26-image.png?msec=1686792448567)

参考:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/

  • 宿主机内存

    memory.available #node节点可用内存,默认 <100Mi

  • nodefs是节点的主要文件系统,用于保存本地磁盘卷、emptyDir、日志存储等数据,默认是/var/lib/kubelet/,或是通过kubelet通过--root-dir所指定的磁盘挂载目录

    nodefs.inodesFree #nodefs的可用inode,默认<5%

    nodefs.available #nodefs的可用空间,默认<10%

  • imagefs是可选文件系统,用于给容器提供运行时存储容器镜像和容器可写层。

    imagefs.inodesFree #imagefs的inode可用百分比

    imagefs.available #imagefs的磁盘空间可用百分比,默认<15%

    pid.available #可用pid百分比

  • kube-controller-manager实现 eviction: node宕机后的驱逐

  • kubelet实现的eviction:基于node负载、资源利用率等进行pod驱逐

5.7.2 驱逐优先级

驱逐(eviction,节点驱逐),用于当node节点资源不足的时候自动将pod进行强制驱逐,以保证当前node节点的正常运行。Kubernetes基于是QoS(服务质量等级)驱逐Pod , Qos等级包括目前包括以下三个

Guaranteed: #limits和request的值相等,等级最高、最后被驱逐
resources:
  limits:
    cpu: 500m
    memory: 256Mi
  requests:
    cpu: 500m
    memory: 256Mi

Burstable: #limit和request不相等,等级折中、中间被驱逐
resources:
  limits:
    cpu: 500m
    memory: 256Mi
  requests:
    cpu: 256m
    memory: 128Mi

BestEffort: #没有限制,即resources为空,等级最低、最先被驱逐

5.7.3 驱逐条件

  • eviction-signal:kubelet捕获node节点驱逐触发信号,进行判断是否驱逐,比如通过cgroupfs获取memory.available的值来进行下一步匹配。

  • operator:操作符,通过操作符对比条件是否匹配资源量是否触发驱逐。

  • quantity:使用量,即基于指定的资源使用值进行判断,如memory.available: 300Mi、nodefs.available: 10%等。比如:nodefs.available<10% 表示node节点磁盘空间低于10%时就触发驱逐

5.7.4 软驱逐

软驱逐不会立即驱逐pod,可以自定义宽限期,在条件持续到宽限期还没有恢复,kubelet再强制杀死pod并触发驱逐。软驱逐条件有

  • eviction-soft: 软驱逐触发条件,比如memory.available < 1.5Gi,如果驱逐条件持续时长超过指定的宽限期,可以触发Pod驱逐。

  • eviction-soft-grace-period:软驱逐宽限期, 如 memory.available=1m30s,定义软驱逐条件在触发Pod驱逐之前必须保持多长时间。

  • eviction-max-pod-grace-period:终止pod的宽限期,即在满足软驱逐条件而终止Pod时使用的最大允许宽限期(以秒为单位)

5.7.5 硬驱逐

硬驱逐条件没有宽限期,当达到硬驱逐条件时,kubelet 会强制立即杀死 pod并驱逐

kubelet 具有以下默认硬驱逐条件(可以自行调整)

  • kubelet service文件,里面可以调整资源预留配置等

    vim /etc/systemd/system/kubelet.service

     --kube-reserved #kube-reserved 定于给 kube 组件预留的CPU及memory等资源

     --system-reserved #用于为sshd、udev 等系统守护进程定义预留资源,用于sshd、udev等进程使用

    预留资源参考: https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/reserve-compute-resources/

     --eviction-hard #Pod硬驱逐资源阈值

  • kubelet配置文件, 里面可以调整硬驱逐配置

    vim /var/lib/kubelet/config.yaml

    evictionHard:

    imagefs.available: 15%

    memory.available: 300Mi

    nodefs.available: 10%

    nodefs.inodesFree: 5%