一.系统环境
本文主要基于Kubernetes1.21.9和Linux操作系统CentOS7.4。
服务器版本 | docker软件版本 | Kubernetes(k8s)集群版本 | CPU架构 |
---|---|---|---|
CentOS Linux release 7.4.1708 (Core) | Docker version 20.10.12 | v1.21.9 | x86_64 |
Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点
服务器 | 操作系统版本 | CPU架构 | 进程 | 功能描述 |
---|---|---|---|---|
k8scloude1/192.168.110.130 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico | k8s master节点 |
k8scloude2/192.168.110.129 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
k8scloude3/192.168.110.128 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
二.前言
在Kubernetes中,保证应用的高可用性和稳定性非常重要。为此,Kubernetes提供了一些机制来监视容器的状态,并自动重启或删除不健康的容器。其中之一就是livenessprobe探测和readinessprobe探测。
本文将介绍Kubernetes中的livenessprobe探测和readinessprobe探测,并提供示例来演示如何使用它们。
使用livenessprobe探测和readinessprobe探测的前提是已经有一套可以正常运行的Kubernetes集群,关于Kubernetes(k8s)集群的安装部署,可以查看博客《Centos7 安装部署Kubernetes(k8s)集群》https://www.cnblogs.com/renshengdezheli/p/16686769.html。
三.Kubernetes健康性检查简介
Kubernetes支持三种健康检查,它们分别是:livenessprobe, readinessprobe 和 startupprobe。这些探针可以周期性地检查容器内的服务是否处于健康状态。
- livenessprobe:用于检查容器是否正在运行。如果容器内的服务不再响应,则Kubernetes会将其标记为Unhealthy状态并尝试重启该容器。通过重启来解决问题(重启指的是删除pod,然后创建一个相同的pod),方法有:command,httpGet,tcpSocket。
- readinessprobe:用于检查容器是否已准备好接收流量。当容器未准备好时,Kubernetes会将其标记为Not Ready状态,并将其从Service endpoints中删除。不重启,把用户发送过来的请求不在转发到此pod(需要用到service),方法有:command,httpGet,tcpSocket 。
- startupprobe:用于检查容器是否已经启动并准备好接收请求。与readinessprobe类似,但只在容器启动时运行一次。
在本文中,我们将重点介绍livenessprobe探测和readinessprobe探测。
四.创建没有探测机制的pod
创建存放yaml文件的目录和namespace
[root@k8scloude1 ~]# mkdir probe
[root@k8scloude1 ~]# kubectl create ns probe
namespace/probe created
[root@k8scloude1 ~]# kubens probe
Context "kubernetes-admin@kubernetes" modified.
Active namespace is "probe".
现在还没有pod
[root@k8scloude1 ~]# cd probe/
[root@k8scloude1 probe]# pwd
/root/probe
[root@k8scloude1 probe]# kubectl get pod
No resources found in probe namespace.
先创建一个普通的pod,创建了一个名为liveness-exec的Pod,使用busybox镜像来创建一个容器。该容器会执行args参数中的命令:touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 6000
。
[root@k8scloude1 probe]# vim pod.yaml
[root@k8scloude1 probe]# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
#terminationGracePeriodSeconds属性,将其设置为0,意味着容器在接收到终止信号时将立即关闭,而不会等待一段时间来完成未完成的工作。
terminationGracePeriodSeconds: 0
containers:
- name: liveness
image: busybox
imagePullPolicy: IfNotPresent
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 6000
#先创建一个普通的pod
[root@k8scloude1 probe]# kubectl apply -f pod.yaml
pod/liveness-exec created
查看pod
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 0 6s 10.244.112.176 k8scloude2 <none> <none>
查看pod里的/tmp文件
[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp
pod运行30秒之后,/tmp/healthy文件被删除,pod还会继续运行6000秒,/tmp/healthy文件存在就判定pod正常,/tmp/healthy文件不存在就判定pod异常,但是目前没有探测机制,所以pod还是正在运行状态。
[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 0 3m29s 10.244.112.176 k8scloude2 <none> <none>
删除pod,添加探测机制
[root@k8scloude1 probe]# kubectl delete -f pod.yaml
pod "liveness-exec" deleted
[root@k8scloude1 probe]# kubectl get pod -o wide
No resources found in probe namespace.
五.添加livenessprobe探测
5.1 使用command的方式进行livenessprobe探测
创建具有livenessprobe探测的pod
创建了一个名为liveness-exec的Pod,使用busybox镜像来创建一个容器。该容器会执行args参数中的命令:touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600。
Pod还定义了一个名为livenessProbe的属性来定义liveness探针。该探针使用exec检查/tmp/healthy文件是否存在。如果该文件存在,则Kubernetes认为容器处于健康状态;否则,Kubernetes将尝试重启该容器。
liveness探测将在容器启动后5秒钟开始,并每隔5秒钟运行一次。
[root@k8scloude1 probe]# vim podprobe.yaml
#现在加入健康检查:command的方式
[root@k8scloude1 probe]# cat podprobe.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
terminationGracePeriodSeconds: 0
containers:
- name: liveness
image: busybox
imagePullPolicy: IfNotPresent
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
#容器启动的5秒内不监测
initialDelaySeconds: 5
#每5秒检测一次
periodSeconds: 5
[root@k8scloude1 probe]# kubectl apply -f podprobe.yaml
pod/liveness-exec created
观察pod里的/tmp文件和pod状态
[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp
healthy
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 0 18s 10.244.112.177 k8scloude2 <none> <none>
[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp
healthy
[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 0 36s 10.244.112.177 k8scloude2 <none> <none>
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 0 43s 10.244.112.177 k8scloude2 <none> <none>
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 1 50s 10.244.112.177 k8scloude2 <none> <none>
加了探测机制之后,当/tmp/healthy不存在,则会进行livenessProbe重启pod,如果不加宽限期terminationGracePeriodSeconds: 0,一般75秒的时候会重启一次
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 3 2m58s 10.244.112.177 k8scloude2 <none> <none>
删除pod
[root@k8scloude1 probe]# kubectl delete -f podprobe.yaml
pod "liveness-exec" deleted
[root@k8scloude1 probe]# kubectl get pod -o wide
No resources found in probe namespace.
5.2 使用httpGet的方式进行livenessprobe探测
创建了一个名为liveness-httpget的Pod,使用nginx镜像来创建一个容器。该容器设置了一个HTTP GET请求的liveness探针,检查是否能够成功访问Nginx的默认主页/index.html。如果标准无法满足,则Kubernetes将认为容器不健康,并尝试重启该容器。
liveness探测将在容器启动后10秒钟开始,并每隔10秒钟运行一次。failureThreshold属性表示最大连续失败次数为3次,successThreshold属性表示必须至少1次成功才能将容器视为“健康”。timeoutSeconds属性表示探测请求的超时时间为10秒
。
[root@k8scloude1 probe]# vim podprobehttpget.yaml
#httpGet的方式
[root@k8scloude1 probe]# cat podprobehttpget.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-httpget
spec:
terminationGracePeriodSeconds: 0
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /index.html
port: 80
scheme: HTTP
#容器启动的10秒内不监测
initialDelaySeconds: 10
#每10秒检测一次
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
[root@k8scloude1 probe]# kubectl apply -f podprobehttpget.yaml
pod/liveness-httpget created
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-httpget 1/1 Running 0 6s 10.244.112.178 k8scloude2 <none> <none>
查看/usr/share/nginx/html/index.html文件
[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html
/usr/share/nginx/html/index.html
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-httpget 1/1 Running 0 2m3s 10.244.112.178 k8scloude2 <none> <none>
删除/usr/share/nginx/html/index.html文件
[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- rm /usr/share/nginx/html/index.html
[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html
ls: cannot access '/usr/share/nginx/html/index.html': No such file or directory
command terminated with exit code 2
观察pod状态和/usr/share/nginx/html/index.html文件,通过端口80探测文件/usr/share/nginx/html/index.html,探测不到说明文件有问题,则进行livenessProbe重启pod。
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-httpget 1/1 Running 1 2m43s 10.244.112.178 k8scloude2 <none> <none>
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-httpget 1/1 Running 1 2m46s 10.244.112.178 k8scloude2 <none> <none>
[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html
/usr/share/nginx/html/index.html
#通过端口80探测文件/usr/share/nginx/html/index.html,探测不到说明文件有问题,则进行livenessProbe重启pod
[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html
/usr/share/nginx/html/index.html
删除pod
[root@k8scloude1 probe]# kubectl delete -f podprobehttpget.yaml
pod "liveness-httpget" deleted
[root@k8scloude1 probe]# kubectl get pod -o wide
No resources found in probe namespace.
5.3 使用tcpSocket的方式进行livenessprobe探测
创建了一个名为liveness-tcpsocket的Pod,使用nginx镜像来创建一个容器。该容器设置了一个TCP Socket连接的liveness探针,检查是否能够成功连接到指定的端口8080。如果无法连接,则Kubernetes将认为容器不健康,并尝试重启该容器。
liveness探测将在容器启动后10秒钟开始,并每隔10秒钟运行一次。failureThreshold属性表示最大连续失败次数为3次,successThreshold属性表示必须至少1次成功才能将容器视为“健康”。timeoutSeconds属性表示探测请求的超时时间为10秒。
[root@k8scloude1 probe]# vim podprobetcpsocket.yaml
#tcpSocket的方式:
[root@k8scloude1 probe]# cat podprobetcpsocket.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-tcpsocket
spec:
terminationGracePeriodSeconds: 0
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
tcpSocket:
port: 8080
#容器启动的10秒内不监测
initialDelaySeconds: 10
#每10秒检测一次
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
[root@k8scloude1 probe]# kubectl apply -f podprobetcpsocket.yaml
pod/liveness-tcpsocket created
观察pod状态,因为nginx运行的是80端口,但是我们探测的是8080端口,所以肯定探测失败,livenessProbe就会重启pod
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-tcpsocket 1/1 Running 0 10s 10.244.112.179 k8scloude2 <none> <none>
[root@k8scloude1 probe]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-tcpsocket 1/1 Running 1 55s 10.244.112.179 k8scloude2 <none> <none>
删除pod
[root@k8scloude1 probe]# kubectl delete -f podprobetcpsocket.yaml
pod "liveness-tcpsocket" deleted
下面添加readinessprobe探测
六.readinessprobe探测
因为readiness probe的探测机制是不重启的,只是把用户发送过来的请求不再转发到此pod上,为了模拟此情景,创建三个pod,svc把用户请求转发到这三个pod上。
小技巧TIPS:要想看文字有没有对齐,可以使用 :set cuc ,取消使用 :set nocuc
创建pod,readinessProbe探测 /tmp/healthy文件,如果 /tmp/healthy文件存在则正常,不存在则异常。lifecycle postStart表示容器启动之后创建/tmp/healthy文件。
[root@k8scloude1 probe]# vim podreadinessprobecommand.yaml
[root@k8scloude1 probe]# cat podreadinessprobecommand.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: readiness
name: readiness-exec
spec:
terminationGracePeriodSeconds: 0
containers:
- name: readiness
image: nginx
imagePullPolicy: IfNotPresent
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
#容器启动的5秒内不监测
initialDelaySeconds: 5
#每5秒检测一次
periodSeconds: 5
lifecycle:
postStart:
exec:
command: ["/bin/sh","-c","touch /tmp/healthy"]
创建三个名字不同的pod
[root@k8scloude1 probe]# kubectl apply -f podreadinessprobecommand.yaml
pod/readiness-exec created
[root@k8scloude1 probe]# sed 's/readiness-exec/readiness-exec2/' podreadinessprobecommand.yaml | kubectl apply -f -
pod/readiness-exec2 created
[root@k8scloude1 probe]# sed 's/readiness-exec/readiness-exec3/' podreadinessprobecommand.yaml | kubectl apply -f -
pod/readiness-exec3 created
查看pod的标签
[root@k8scloude1 probe]# kubectl get pod -o wide --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
readiness-exec 1/1 Running 0 23s 10.244.112.182 k8scloude2 <none> <none> test=readiness
readiness-exec2 1/1 Running 0 15s 10.244.251.236 k8scloude3 <none> <none> test=readiness
readiness-exec3 0/1 Running 0 9s 10.244.112.183 k8scloude2 <none> <none> test=readiness
三个pod的标签是一样的
[root@k8scloude1 probe]# kubectl get pod -o wide --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
readiness-exec 1/1 Running 0 26s 10.244.112.182 k8scloude2 <none> <none> test=readiness
readiness-exec2 1/1 Running 0 18s 10.244.251.236 k8scloude3 <none> <none> test=readiness
readiness-exec3 1/1 Running 0 12s 10.244.112.183 k8scloude2 <none> <none> test=readiness
为了标识3个pod的不同,修改nginx的index文件
[root@k8scloude1 probe]# kubectl exec -it readiness-exec -- sh -c "echo 111 > /usr/share/nginx/html/index.html"
[root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- sh -c "echo 222 > /usr/share/nginx/html/index.html"
[root@k8scloude1 probe]# kubectl exec -it readiness-exec3 -- sh -c "echo 333 > /usr/share/nginx/html/index.html"
创建一个service服务,把用户请求转发到这三个pod上
[root@k8scloude1 probe]# kubectl expose --name=svc1 pod readiness-exec --port=80
service/svc1 exposed
test=readiness这个标签有3个pod
[root@k8scloude1 probe]# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
svc1 ClusterIP 10.101.38.121 <none> 80/TCP 23s test=readiness
[root@k8scloude1 probe]# kubectl get pod --show-labels
NAME READY STATUS RESTARTS AGE LABELS
readiness-exec 1/1 Running 0 7m14s test=readiness
readiness-exec2 1/1 Running 0 7m6s test=readiness
readiness-exec3 1/1 Running 0 7m test=readiness
访问service 服务 ,发现用户请求都分别转发到三个pod
[root@k8scloude1 probe]# while true ; do curl -s 10.101.38.121 ; sleep 1 ; done
333
111
333
222
111
......
删除pod readiness-exec2的探测文件
[root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- rm /tmp/healthy
因为/tmp/healthy探测不成功,readiness-exec2的READY状态变为了0/1,但是STATUS还为Running状态,还可以进入到readiness-exec2 pod里。由于readinessprobe只是不把用户请求转发到异常pod,所以异常pod不会被删除。
[root@k8scloude1 probe]# kubectl get pod --show-labels
NAME READY STATUS RESTARTS AGE LABELS
readiness-exec 1/1 Running 0 10m test=readiness
readiness-exec2 0/1 Running 0 10m test=readiness
readiness-exec3 1/1 Running 0 10m test=readiness
[root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- bash
root@readiness-exec2:/# exit
exit
kubectl get ev (查看事件),可以看到“88s Warning Unhealthy pod/readiness-exec2 Readiness probe failed: cat: /tmp/healthy: No such file or directory”警告
[root@k8scloude1 probe]# kubectl get ev
LAST SEEN TYPE REASON OBJECT MESSAGE
......
32m Normal Pulled pod/readiness-exec2 Container image "nginx" already present on machine
32m Normal Created pod/readiness-exec2 Created container readiness
32m Normal Started pod/readiness-exec2 Started container readiness
15m Normal Killing pod/readiness-exec2 Stopping container readiness
13m Normal Scheduled pod/readiness-exec2 Successfully assigned probe/readiness-exec2 to k8scloude3
13m Normal Pulled pod/readiness-exec2 Container image "nginx" already present on machine
13m Normal Created pod/readiness-exec2 Created container readiness
13m Normal Started pod/readiness-exec2 Started container readiness
88s Warning Unhealthy pod/readiness-exec2 Readiness probe failed: cat: /tmp/healthy: No such file or directory
32m Normal Scheduled pod/readiness-exec3 Successfully assigned probe/readiness-exec3 to k8scloude3
32m Normal Pulled pod/readiness-exec3 Container image "nginx" already present on machine
32m Normal Created pod/readiness-exec3 Created container readiness
32m Normal Started pod/readiness-exec3 Started container readiness
15m Normal Killing pod/readiness-exec3 Stopping container readiness
13m Normal Scheduled pod/readiness-exec3 Successfully assigned probe/readiness-exec3 to k8scloude2
13m Normal Pulled pod/readiness-exec3 Container image "nginx" already present on machine
13m Normal Created pod/readiness-exec3 Created container readiness
13m Normal Started pod/readiness-exec3 Started container readiness
再次访问service服务,发现用户请求只转发到了111和333,说明readiness probe探测生效。
[root@k8scloude1 probe]# while true ; do curl -s 10.101.38.121 ; sleep 1 ; done
111
333
333
333
111
......
七.总结
通过本文,您应该已经了解到如何使用livenessprobe探测和readinessprobe探测来监视Kubernetes中容器的健康状态。通过定期检查服务状态、命令退出码、HTTP响应和内存使用情况,您可以自动重启不健康的容器,并提高应用的可用性和稳定性。
- readinessprobe livenessprobe Kubernetes k8s 8sreadinessprobe livenessprobe kubernetes k8s readinessprobe livenessprobe k8s k8 kubernetes gitlab runner k8s kubernetes k8s 8s k8 kubernetes dubbo nacos k8s 探针kubernetes实战k8s kubernetes概念 基础k8s kubernetes ingress k8s k8 kubernetes configmap文件k8s kubernetes kubesphere k8s k8