【Kubernetes】 容器探针

发布时间 2023-11-23 11:46:59作者: 皮皮1109

容器探针

【Kubernetes】 容器探针

Kubernetes提供了探针,通过Kubelet对容器执行定期诊断,以了解容器内应用的状态,以探测结果来决定做哪些操作(比如重启容器、关闭流量),kubernetes中提供了三种探针,分别是就绪探针存活探针启动探针,如果不使用探针,默认认为是成功的。

每种探针又提供了四种探测方法,他们分别是:execgrpchttpGettcpSocket

  1. exec:该方式是通过在容器内执行命令,如果命令执行完毕后正常退出,返回状态码0,则认为诊断成功。

  2. grpc:该方式是通过gRPC执行一个远程过程调用,如果响应的状态是 SERVING,则认为诊断成功,该方式需要自行实现grpc健康检查,如何实现官网有说明【https://grpc.github.io/grpc/core/md_doc_health-checking.html】,类似下方的代码,定义proto文件的时候,定义健康检查服务:

    syntax = "proto3";
     
    package grpc.health.v1;
     
    message HealthCheckRequest {
      string service = 1;
    }
     
    message HealthCheckResponse {
      enum ServingStatus {
        UNKNOWN = 0;
        SERVING = 1;
        NOT_SERVING = 2;
        SERVICE_UNKNOWN = 3;  // Used only by the Watch method.
      }
      ServingStatus status = 1;
    }
     
    service Health {
      rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
     
      rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
    }
    
  3. httpGet:通过GET请求方式,对容器内的地址进行访问,如果返回的状态码是[200, 400),左开右闭区间,则认为探测成功。

  4. tcpSocket:对容器的IP地址和端口,进行检查,如果端口打开状态,则认为是探测成功。

存活探针

存活指针(livenessProbe)用于指示容器是否正在运行,如果探测结果是失败的,那么kubelet会杀死容器,并且容器将依据重启策略来决定是否重启,如果不提供该指针,默认认为探测结果返回成功。

exec

apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
spec:
  containers:
    - name: liveness
      image: busybox
      args:
        - /bin/sh
        - -c
        - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
      livenessProbe:
        exec:
          command:
            - cat
            - /tmp/healthy
        initialDelaySeconds: 5
        periodSeconds: 5

在上面的文件中,使用的是exec的探测方式,容器启动的时候会创建/tmp/healthy文件,所以探针是成功的(30s内)。

[root@linux1 yamls]#  kubectl create -f liveness-exec.yaml 
pod/liveness-exec created
[root@linux1 yamls]# kubectl describe -f liveness-exec.yaml 
// 省略一部分
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  31s   default-scheduler  Successfully assigned default/liveness-exec to linux2
  Normal  Pulling    31s   kubelet            Pulling image "busybox"
  Normal  Pulled     15s   kubelet            Successfully pulled image "busybox" in 15.482s (15.482s including waiting)
  Normal  Created    15s   kubelet            Created container liveness
  Normal  Started    15s   kubelet            Started container liveness

可以看到没有问题,成功启动,但是30秒后,/tmp/healthy文件会被删除,此时探针就会失败,30秒后我们在查看下事件,结果如下:

[root@linux1 yamls]# kubectl describe -f liveness-exec.yaml 
// 省略一部分
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m19s                default-scheduler  Successfully assigned default/liveness-exec to linux2
  Normal   Pulled     2m3s                 kubelet            Successfully pulled image "busybox" in 15.482s (15.482s including waiting)
  Warning  Unhealthy  79s (x3 over 89s)    kubelet            Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    79s                  kubelet            Container liveness failed liveness probe, will be restarted
  Normal   Pulling    49s (x2 over 2m19s)  kubelet            Pulling image "busybox"
  Normal   Pulled     34s                  kubelet            Successfully pulled image "busybox" in 15.49s (15.49s including waiting)
  Normal   Created    33s (x2 over 2m3s)   kubelet            Created container liveness
  Normal   Started    33s (x2 over 2m3s)   kubelet            Started container liveness

可以看到Container liveness failed liveness probe, will be restarted,说明探针失败,因为重启策略的原因,容器会被重启

[root@linux1 yamls]# kubectl get pod
NAME            READY   STATUS    RESTARTS      AGE
liveness-exec   1/1     Running   3 (33s ago)   5m3s

可以看到,截至到我查询的时候,已经被重启了三次了。

httpGet

liveness-http.yaml

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
spec:
  containers:
    - name: liveness
      image: nginx
      livenessProbe:
        httpGet:
          path: / # 指定请求路径
          port: 80 # 指定端口,也可以使用从其名字,比如liveness。
          scheme: http # 指定协议,HTTP、HTTPS, 必须都大写。
          httpHeaders: # 也可以指定http请求头
            - name: key
              value: value
        initialDelaySeconds: 3
        periodSeconds: 3

应用该资源文件并查看下状态:

[root@linux1 yamls]#  kubectl create -f liveness-http.yaml
pod/liveness-http created

[root@linux1 yamls]# kubectl get pod
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   0          101s

并未重启,说明探针是没有问题的,在测试下有问题的情况,比如我将端口修改为81,重新应用资源文件:

[root@linux1 yamls]# kubectl delete -f liveness-http.yaml 
pod "liveness-http" deleted

[root@linux1 yamls]# kubectl create -f liveness-http.yaml 
pod/liveness-http created

[root@linux1 yamls]# kubectl get pod -w
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   0          23s
liveness-http   1/1     Running   1 (16s ago)   28s

可以看到重启了,查看下描述信息:

[root@linux1 yamls]# kubectl describe -f liveness-http.yaml
# 省略一部分
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  65s                default-scheduler  Successfully assigned default/liveness-http to linux2
  Normal   Pulled     64s                kubelet            Successfully pulled image "nginx" in 491ms (491ms including waiting)
  Normal   Pulled     37s                kubelet            Successfully pulled image "nginx" in 15.487s (15.487s including waiting)
  Normal   Pulling    26s (x3 over 65s)  kubelet            Pulling image "nginx"
  Normal   Killing    26s (x2 over 53s)  kubelet            Container liveness failed liveness probe, will be restarted
  Normal   Created    10s (x3 over 64s)  kubelet            Created container liveness
  Normal   Started    10s (x3 over 64s)  kubelet            Started container liveness
  Normal   Pulled     10s                kubelet            Successfully pulled image "nginx" in 15.517s (15.517s including waiting)
  Warning  Unhealthy  2s (x8 over 59s)   kubelet            Liveness probe failed: Get "http://192.168.64.152:81/": dial tcp 192.168.64.152:81: connect: connection refused

可以看到,探针探测确实失败了,无法访问到192.168.64.152:81地址。

grpc

liveness-grpc.yaml

apiVersion: v1
kind: Pod
metadata:
  name: liveness-grpc
spec:
  containers:
  - name: etcd
    image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.1-0
    command: [ "/usr/local/bin/etcd", "--data-dir",  "/var/lib/etcd", "--listen-client-urls", "http://0.0.0.0:2379", "--advertise-client-urls", "http://127.0.0.1:2379", "--log-level", "debug"]
    ports:
    - containerPort: 2379
    livenessProbe:
      grpc:
        port: 2379
      initialDelaySeconds: 10

应用该资源

[root@linux1 yamls]# kubectl create -f liveness-grpc.yaml 
pod/liveness-grpc created
[root@linux1 yamls]# kubectl get -f liveness-grpc.yaml -w
NAME            READY   STATUS    RESTARTS   AGE
liveness-grpc   1/1     Running   0          14s

因为etcdgrpc端口2379正常提供服务,所以探针返回成功。

tcpSocket

liveness-tcpsocket.yaml

apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcpsocket
spec:
  containers:
    - name: liveness
      image: nginx
      livenessProbe:
        tcpSocket:
          port: 80 # 可以写容器名字,比如liveness
        initialDelaySeconds: 3
        periodSeconds: 3

应用资源文件,并查看状态信息:

[root@linux1 yamls]# kubectl create -f liveness-tcpsocket.yaml 
pod/liveness-tcpsocket created
[root@linux1 yamls]# kubectl get -f liveness-tcpsocket.yaml -w
NAME                 READY   STATUS              RESTARTS   AGE
liveness-tcpsocket   0/1     ContainerCreating   0          7s
liveness-tcpsocket   1/1     Running             0          17s

就绪探针

就绪探针(readinessProbe)用于指示容器是否已经准备好提供服务,如果就绪探针失败,端点控制器将端点列表中移除该Pod的IP地址,不允许该Pod接收流量。

readiness-http.yaml

apiVersion: v1
kind: Pod
metadata:
  name: readiness-http
spec:
  containers:
    - name: readiness
      image: nginx
      readinessProbe:
        httpGet:
          port: readiness
          path: /

应用资源

[root@linux1 yamls]# kubectl create -f readiness-http.yaml 
pod/readiness-http created
[root@linux1 yamls]# kubectl get -f readiness-http.yaml -w
NAME             READY   STATUS              RESTARTS   AGE
readiness-http   0/1     ContainerCreating   0          7s
readiness-http   0/1     Running             0          17s
readiness-http   1/1     Running             0          17s

可以看到READY列显示了1/1没有问题可以访问,这是因为就绪探针正确。

加下来看下就绪探针返回失败的情况,将就绪探针端口修改为81:

[root@linux1 yamls]# kubectl delete -f readiness-http.yaml 
pod "readiness-http" deleted
[root@linux1 yamls]# vim readiness-http.yaml 
[root@linux1 yamls]# kubectl create -f readiness-http.yaml 
pod/readiness-http created
[root@linux1 yamls]# kubectl get -f readiness-http.yaml -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP                NODE     NOMINATED NODE   READINESS GATES
readiness-http   0/1     Running   0          44s   192.168.247.143   linux3   <none>           <none>

可以看到状态虽然是Running,但是READY列显示0/1。此时不会接收流量。

启动探针

启动探针(startupProbe)指示容器是否已经启动,如果提供了该探针,那么其他探针则会被禁用,直到启动探针返回成功为止,如果启动探针失败,kubelet会杀死容器,容器依赖启动策略来决定是否重启。

有了上面的探针,为什么还要启动探针呢?如果您使用过Spring Boot项目,您就会直到,在Spring Boot启动的时候,端口是先被打开的,但是后面还有一些操作可能比较耗时,也就是说距离真正的启动还有一段时间,此时就可以使用启动探针来探测,等到真正启动后,别的探针再执行。

startup-http.yaml

apiVersion: v1
kind: Pod
metadata:
  name: startupprobe-http
spec:
  containers:
    - name: startupprobe
      image: nginx
      readinessProbe:
        httpGet:
          port: 80
          path: /
      startupProbe:
        httpGet:
          port: 80
          path: /

应用资源文件

[root@linux1 yamls]# kubectl get -o wide -w -f startup-http.yaml 
NAME                READY   STATUS              RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
startupprobe-http   0/1     ContainerCreating   0          12s   <none>   linux2   <none>           <none>
startupprobe-http   0/1     Running             0          18s   192.168.64.157   linux2   <none>           <none>
startupprobe-http   0/1     Running             0          20s   192.168.64.157   linux2   <none>           <none>
startupprobe-http   1/1     Running             0          21s   192.168.64.157   linux2   <none>           <none>