k8s metrics-server

发布时间 2023-09-21 15:05:15作者: 潇潇暮鱼鱼

一.

1.报错:

在使用kubectl get nodes时会报错,Metrics-Server API is not found 

2.排查:

查看kubectl get pods -n kube-system | grep metrics ,pod是在的

查看kubectl get apiservice -n kube-system | grep metrics ,apiservice中没有metrics注册

查看pod日志 kubectl logs --tail=3000 -f podname -n kube-system 发现只有两行日志没有进行工作

可以得出结论pod没有注册到metrics

3.解决方法:

重装apiservice

Metrics Server Metrics API group/version 支持k8s版本
0.6.x metrics.k8s.io/v1beta1 1.19+
0.5.x metrics.k8s.io/v1beta1 *1.8+
0.4.x metrics.k8s. metrics.k8s.io/v1beta1 *1.8+
0.3.x metrics.k8s.io/v1beta1 1.8-1.21

 

本集群的k8s版本是1.20.4,之前装的是0.3.6版本,这个版本在github仓库已查不到,此次装0.6.2版本

下载地址:https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.2/components.yaml

主要 改两个地方

134行添加参数 “--kubelet-insecure-tls” 用以禁用证书验证
140行更改镜像地址为阿里云:“registry.cn-hangzhou.aliyuncs.com/chenby/metrics-server:v0.6.2”

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --kubelet-insecure-tls
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        image: registry.cn-hangzhou.aliyuncs.com/chenby/metrics-server:v0.6.2
        #image: k8s.gcr.io/metrics-server/metrics-server:v0.6.2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

执行kubectl apply -f components.yaml,在metrics-server部署成功后,查看kubectl top nodes,可以显示资源使用情况了。

二.

1.报错:

在使用kubectl get nodes时会报错,Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

2.排查:

查看kubectl get pods -n kube-system | grep metrics ,pod是在的

查看kubectl get apiservice -n kube-system | grep metrics ,metrics已经注册到apiservice中

查看pod日志 kubectl logs --tail=3000 -f podname -n kube-system 发现日志正常输出

3.解决方法:

尝试使用重装metrics-server进行问题解决,解决方法同上,重装后查看kubectl top nodes,可以显示资源使用情况了。