ETCD 下线Member未剔除引发的日志报错

发布时间 2023-12-27 19:23:39作者: 善战者求之于势

背景介绍

容器化的etcd集群原来具有三个节点分别为etcd-0,etcd-1,etcd-2,在节点etcd-2下线后剩两个节点etcd-0,etcd-1

# kubectl get pod -n apisix
NAME                                         READY   STATUS    RESTARTS   AGE
etcd-0                                       1/1     Running   0          108m
etcd-1                                       1/1     Running   0          109m

每个节点日志请求一直有报错:

{"level":"warn","ts":"2023-12-27T09:35:15.756Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"5355ca4c835fd788","rtt":"0s","error":"dial tcp: lookup etcd-2.etcd-headless.apisix.svc.cluster.local on 10.44.12.155:53: no such host"}
  • 开始怀疑是etcd容器化编排文件有残留etcd-2相关配置信息,但经过排查及搜索并发发现与etcd-2相关的固化配置
  • 后面怀疑是CoreDNS出现异常,经过排查非DNS问题
  • 后面登录etcd集群命令查看还在存在etcd-2的成员信息,从而确定etcd-2未被下线导致的

查询etcd member有哪些

确认是否下线的etcd-2是否还存在集群里面

$ etcdctl --endpoints=http://etcd-0.etcd-headless.apisix.svc.cluster.local:2379,http://etcd-1.etcd-headless.apisix.svc.cluster.local:2379 member list
549aed3ff392fe0, started, etcd-0, http://etcd-0.etcd-headless.apisix.svc.cluster.local:2380, http://etcd-0.etcd-headless.apisix.svc.cluster.local:2379, false
5355ca4c835fd788, started, etcd-2, http://etcd-2.etcd-headless.apisix.svc.cluster.local:2380, http://etcd-2.etcd-headless.apisix.svc.cluster.local:2379, false
7cbbec80dc91e205, started, etcd-1, http://etcd-1.etcd-headless.apisix.svc.cluster.local:2380, http://etcd-1.etcd-headless.apisix.svc.cluster.local:2379, false

通过以上发现,etcd-2并未被剔除出集群,需要手动执行剔除操作

集群剔除etcd-2

$ etcdctl --endpoints=http://etcd-0.etcd-headless.apisix.svc.cluster.local:2379 member remove  5355ca4c835fd788 --user="root" --password="TZ7dVhdjabmpiRJz"
Member 5355ca4c835fd788 removed from cluster 6bcee157055be989

检验是否剔除etcd-2成功

$ etcdctl --endpoints=http://etcd-0.etcd-headless.apisix.svc.cluster.local:2379 member list
549aed3ff392fe0, started, etcd-0, http://etcd-0.etcd-headless.apisix.svc.cluster.local:2380, http://etcd-0.etcd-headless.apisix.svc.cluster.local:2379, false
7cbbec80dc91e205, started, etcd-1, http://etcd-1.etcd-headless.apisix.svc.cluster.local:2380, http://etcd-1.etcd-headless.apisix.svc.cluster.local:2379, false

日志请求错误检查已消失