k8s nodeport 端口故障排查记录-526互联

有反馈K8S集群nodeport 端口经常不通，不断重试后可能恢复，现场可复现。nginx-ingress 服务也用的是nodeport模式，上机测试，确认问题存在。

故障现象:

1、在集群外telnet ingress 端口，偶然性出现超时。

2、集群ingress pod 不停在重启。(在10.10.1.13，10.10.1.31节点上正常)。

排查记录:

因刚完成集群版本从v1.15 到 v1.18 版本升级，首先想到是否升级过程异常？

1、查看kubelet 日志发现存在一些报错

2、查看ingres pod 状态，一直在重启。。

root@xx-10-10-1-17 ~]# kubectl get pods -o wide -n ingress-nginx
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-ingress-controller-54956f969c-6295x 1/1 Running 1 5m19s 10.20.2.163 xx-10-10-1-14 <none> <none>
nginx-ingress-controller-54956f969c-6nmkq 1/1 Running 1 118m 10.20.0.133 xx-10-10-1-11 <none> <none>
nginx-ingress-controller-54956f969c-ddsrk 1/1 Running 0 118m 10.20.1.120 xx-10-10-1-31 <none> <none>
nginx-ingress-controller-54956f969c-h7pvc 1/1 Running 0 118m 10.20.0.255 xx-10-10-1-13 <none> <none>
nginx-ingress-controller-54956f969c-pvqtf 1/1 Running 2 9m28s 10.20.1.23 xx-10-10-1-15 <none> <none>
nginx-ingress-controller-54956f969c-wfsx4 0/1 Running 2 12m 10.20.2.250 xx-10-10-1-21 <none> <none>