k8s集群根据进程PID获取Pod名称

发布时间 2023-06-15 11:48:27作者: yuhaohao

简单说明

在实际的应用场景中,我们如果看到某个进程资源或服务异常,需要根据这个进程排查到底是哪个服务的Pod,这里我们介绍一种根据PID快速寻找Pod名称的方法。

实际操作

查看进程PID

这里我们以GPU任务为例说明,可以看到占用显卡的任务PID为8241

[root@centos ~]# nvidia-smi
Thu Jun 15 11:34:39 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 4xx.xx.xx    Driver Version: 4xx.xx.xx    CUDA Version: xx.x     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      On   | 00000000:26:00.0 Off |                    0 |
| N/A   30C    P0    65W / 400W |   6474MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  A100-SXM4-40GB      On   | 00000000:2C:00.0 Off |                    0 |
| N/A   30C    P0    55W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  A100-SXM4-40GB      On   | 00000000:65:00.0 Off |                    0 |
| N/A   29C    P0    53W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  A100-SXM4-40GB      On   | 00000000:6A:00.0 Off |                    0 |
| N/A   29C    P0    52W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   4  A100-SXM4-40GB      On   | 00000000:A2:00.0 Off |                    0 |
| N/A   28C    P0    55W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   5  A100-SXM4-40GB      On   | 00000000:A7:00.0 Off |                    0 |
| N/A   29C    P0    52W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   6  A100-SXM4-40GB      On   | 00000000:E1:00.0 Off |                    0 |
| N/A   30C    P0    53W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   7  A100-SXM4-40GB      On   | 00000000:E7:00.0 Off |                    0 |
| N/A   28C    P0    55W / 400W |      3MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8241      C   python                           6471MiB |
+-----------------------------------------------------------------------------+

根据PID查看容器ID

[root@centos ~]# cat /proc/8241/cgroup
11:hugetlb:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
10:memory:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
9:blkio:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
8:freezer:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
7:perf_event:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
6:net_prio,net_cls:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
5:pids:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
4:cpuset:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
3:devices:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
2:cpuacct,cpu:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
1:name=systemd:/kubepods/burstable/pod6bb2e0f4-1f90-4699-b397-d24a617ceaad/2d680a961895ee47f4b1aeca3965766480752d906d208c746d599e202391f89c
# 可以看到进程对应的容器ID为2d680a961....,这里太长了,我们只截取一部分
[root@centos ~]# cat /proc/8241/cgroup |awk -F '/' '{print $5}' |head -n 1 |cut -b 1-8
2d680a96

根据容器的PID查看Pod名称

[root@centos ~]# crictl inspect -o go-template --template='{{index .status.labels "io.kubernetes.pod.name"}}' 2d680a96
gputask-64c5557974-kff4j