记录一次生产环境因磁盘空间不足驱逐pod造成pod重建The node had condition: [DiskPressure]

发布时间 2023-11-27 17:52:35作者: YYQ-

     #记录一次生产报The node had condition: [DiskPressure]造成pod无限重启的监控不停的报警

#进入k8s的管理机检查发现msg的pod重启重建pod多次
[root@VM_248_6_centos ~]# kubectl get pod -n cms-v2-prod
NAME                                            READY   STATUS    RESTARTS   AGE
省略......
cms-msg-deploy-6987c5cb8d-6fczf                 1/1     Running   0          9d
cms-msg-deploy-6987c5cb8d-867ls                 1/1     Running   0          69m刚刚被重建的pod
cms-msg-deploy-6987c5cb8d-btlxd                 1/1     Running   0          8h
cms-msg-deploy-6987c5cb8d-k96xk                 1/1     Running   0          4h18m
cms-msg-deploy-6987c5cb8d-pjkx2                 1/1     Running   0          165m
cms-msg-deploy-6987c5cb8d-r4hdd                 1/1     Running   0          55m刚刚被重建的pod
省略。。。。。。。。。。。。。。。。。

#查看看k8s集群的events事件,可以看到pod发生重建都是在10.169.248.131和10.169.248.132的服务器
[root@VM_248_6_centos ~]# kubectl get event
LAST SEEN   TYPE      REASON             OBJECT                                 MESSAGE
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-4nk44    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-4nk44 to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-4nk44    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-85wgh    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-85wgh to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-85wgh    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-8929w    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-8929w to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-8929w    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-8tvtx    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-8tvtx to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-8tvtx    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-92l6g    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-92l6g to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-92l6g    The node had condition: [DiskPressure].
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-bcqrs    The node was low on resource: ephemeral-storage. Container sidecar-jdk was using 38073452Ki, which exceeds its request of 0.
54m         Normal    Killing            pod/cms-msg-deploy-6987c5cb8d-bcqrs    Stopping container sidecar-jdk
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-lvktl    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-lvktl to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-lvktl    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-m664z    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-m664z to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-m664z    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-mfdw2    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-mfdw2 to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-mfdw2    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-nd9rt    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-nd9rt to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-nd9rt    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-ngvhw    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-ngvhw to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-ngvhw    The node had condition: [DiskPressure].
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-r4hdd    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-r4hdd to 10.169.248.131
54m         Normal    Pulled             pod/cms-msg-deploy-6987c5cb8d-r4hdd    Container image "ccr.yxyun.yuexiu.com/idc1-yxhq-ump-registry/cms-v2-msg:release-2.0.27.20231010" already present on machine
54m         Normal    Created            pod/cms-msg-deploy-6987c5cb8d-r4hdd    Created container cms-msg-container
54m         Normal    Started            pod/cms-msg-deploy-6987c5cb8d-r4hdd    Started container cms-msg-container
54m         Normal    Pulled             pod/cms-msg-deploy-6987c5cb8d-r4hdd    Container image "ccr.yxyun.yuexiu.com/idc1-yxhq-ump-registry/openjdk:8u232-stretch-yak-dubbo-cmsapm" already present on machine
54m         Normal    Created            pod/cms-msg-deploy-6987c5cb8d-r4hdd    Created container sidecar-jdk
54m         Normal    Started            pod/cms-msg-deploy-6987c5cb8d-r4hdd    Started container sidecar-jdk
<unknown>   Normal    Scheduled          pod/cms-msg-deploy-6987c5cb8d-w8xsq    Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-w8xsq to 10.169.248.132
54m         Warning   Evicted            pod/cms-msg-deploy-6987c5cb8d-w8xsq    The node had condition: [DiskPressure].
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-m664z
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-lvktl
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-8tvtx
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-mfdw2
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-ngvhw
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-4nk44
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-w8xsq
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-nd9rt
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   Created pod: cms-msg-deploy-6987c5cb8d-85wgh
54m         Normal    SuccessfulCreate   replicaset/cms-msg-deploy-6987c5cb8d   (combined from similar events): Created pod: cms-msg-deploy-6987c5cb8d-r4hdd


#在10.169.248.131和10.169.248.132的服务器检查,提示磁盘到达了百分之85,超过了kubelet配置磁盘触发驱逐pod剩余的空间量
[root@VM_248_131_centos ~]# grep '85%' /var/log/messages
Nov 27 08:22:28 VM_248_131_centos kubelet: I1127 08:22:28.134844     935 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 5028521574 bytes down to the low threshold (80%).
[root@VM_248_131_centos ~]# grep '85%' /var/log/messages
Nov 27 08:22:28 VM_248_131_centos kubelet: I1127 08:22:28.134844     935 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 5028521574 bytes down to the low threshold (80%).
[root@VM_248_132_centos ~]# grep '85%' /var/log/messages
Nov 27 09:41:20 VM_248_132_centos kubelet: I1127 09:41:20.609047     940 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 4976412262 bytes down to the low threshold (80%).
Nov 27 15:26:21 VM_248_132_centos kubelet: I1127 15:26:21.135458     940 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 4357973606 bytes down to the low threshold (80%).

查看kubelet的pod驱逐策略
[root@VM_248_131_centos ~]# cat /var/lib/kubelet/config.yaml
evictionHard:
  imagefs.available: 15%
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%
[root@VM_248_132_centos ~]# cat /var/lib/kubelet/config.yaml
evictionHard:
  imagefs.available: 15%#磁盘空间小于百分之15就会发生pod驱逐
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%

#查看132的磁盘情况
[root@VM_248_132_centos ~]# df -h
Filesystem       Size  Used Avail Use% Mounted on
devtmpfs          32G     0   32G   0% /dev
tmpfs             32G   24K   32G   1% /dev/shm
tmpfs             32G  2.8M   32G   1% /run
tmpfs             32G     0   32G   0% /sys/fs/cgroup
/dev/vda1         50G   32G   16G  67% /
/dev/vdb          99G   43G   52G  46% /var/lib/docker
10.169.248.85:/  200G   22G  179G  11% /paas
tmpfs            6.3G     0  6.3G   0% /run/user/0
#查看131的磁盘情况
[root@VM_248_131_centos ~]# df -h
Filesystem       Size  Used Avail Use% Mounted on
devtmpfs          32G     0   32G   0% /dev
tmpfs             32G   24K   32G   1% /dev/shm
tmpfs             32G  2.6M   32G   1% /run
tmpfs             32G     0   32G   0% /sys/fs/cgroup
/dev/vda1         50G   15G   33G  30% /
/dev/vdb          99G   71G   24G  76% /var/lib/docker
10.169.248.85:/  200G   22G  179G  11% /paas
tmpfs            6.3G     0  6.3G   0% /run/user/0

解决方法扩容/var/lib/docker分区的磁盘空间,调整msg的pod产生的日志清理脚本