k8s Prometheus自定义监控指标

发布时间 2023-09-08 18:00:50作者: 小吉猫

prometheus-adapter

Prometheus并非Kubernetes系统的聚合API服务器,其PromQL接口无法直接作为自定义指标数据源,我们还需要一个专门的中间层将PromQL的指标转换为符合Kubernetes系统聚合API格式的指标。这些自定义指标再经由Kubernetes系统上的custom.metrics.k8s.io或external.metrics.k8s.io API提供给相应的客户端使用,例如HPAv2等。目前最流行的中间层解决方案是托管在GitHub上的prometheus-adapter项目,另外可选的还有kube-metrics-adapter等。

配置适配器

适配器通过以下方式考虑指标:

发现机制(Discovery)

定义适配器如何从Prometheus中为当前规则查找待暴露的指标,使用seriesQuery来指定传递给Prometheus的查询条件,且能够使用seriesFilters进一步缩小指标范围。下面的条件表示从每个名称空间查询所有Pod上的http_requests_total指标,其中的kubernetes_namespace代表名称空间的名称标识,而kubernetes_pod_name代表Pod自身名称标识,它们是适配器中固定的Go模板变量。
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'

关联方式

定义发现机制中指定的指标可以附加到Kubernetes的哪些资源上,即暴露哪些资源的指定指标。关联方式使用resources字段进行定义,支持两种格式:一种是嵌套使用template字段以Go模板的形式限定目标资源,使用Group代表资源群组,使用Resouce代表资源类型;另一种是嵌套使用overrides字段将特定的资源标签转为Kubernetes资源类型。
下面的示例把具体的名称空间的名称统一为固定的资源类型标识namespace(也可以是namespaces),把具体的Pod名称统一为固定的资源类型标识pod(也可以是pods),它们都隶属于core群组,因而无须指定群组名称。
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
  resources:
    overrides:
      kubernetes_namespace: {resource: "namespace"}
      kubernetes_pod_name: {resource: "pod"}

指标命名

定义如何将Prometheus的指标名称转换为所需的自定义指标名称,它由name字段进行定义,并嵌套使用match字段选定要转换的指标(默认为“.*”),使用as字段指定要使用的名称,支持正则表达式的分组引用机制,例如$0或${0}等。例如,下面的示例表示把所有指标名称中的_total后缀修改为_per_second。
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
  resources:
    overrides:
      kubernetes_namespace: {resource: "namespace"}
      kubernetes_pod_name: {resource: "pod"}
  name:
    matches: "^(.*)_total"
    as: "${1}_per_second"

查询语句

定义具体发往PromQL的查询语句,在metricsQuery字段以Go模板格式进行定义,并在具体执行时基于目标对象的信息进行模板渲染后转为具体PromQL语句。模板固定以Series引用发现机制中指定的指标名称;以LabelMatchers引用资源标签匹配条件列表,目前该匹配条件的默认值是资源类型及其所属的名称空间,因而集群级别的资源无此条件;以GroupBy引用分组条件列表,目前该分组条件默认为资源类型。例如,下面的语句代表以指定的指标查询满足标签选择条件的、监控对象上的Prometheus指标,而后将其速率值进行分组求和:
rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
  resources:
    overrides:
      kubernetes_namespace: {resource: "namespace"}
      kubernetes_pod_name: {resource: "pod"}
  name:
    matches: "^(.*)_total"
    as: "${1}_per_second"
  metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

配置样例

rules:
# Each rule represents a some naming and discovery logic.
# Each rule is executed independently of the others, so
# take care to avoid overlap.  As an optimization, rules
# with the same `seriesQuery` but different
# `name` or `seriesFilters` will use only one query to
# Prometheus for discovery.

# some of these rules are taken from the "default" configuration, which
# can be found in pkg/config/default.go

# this rule matches cumulative cAdvisor metrics measured in seconds
- seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
  resources:
    # skip specifying generic resource<->label mappings, and just
    # attach only pod and namespace resources by mapping label names to group-resources
    overrides:
      namespace: {resource: "namespace"}
      pod: {resource: "pod"}
  # specify that the `container_` and `_seconds_total` suffixes should be removed.
  # this also introduces an implicit filter on metric family names
  name:
    # we use the value of the capture group implicitly as the API name
    # we could also explicitly write `as: "$1"`
    matches: "^container_(.*)_seconds_total$"
  # specify how to construct a query to fetch samples for a given series
  # This is a Go template where the `.Series` and `.LabelMatchers` string values
  # are available, and the delimiters are `<<` and `>>` to avoid conflicts with
  # the prometheus query language
  metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"

# this rule matches cumulative cAdvisor metrics not measured in seconds
- seriesQuery: '{__name__=~"^container_.*_total",container!="POD",namespace!="",pod!=""}'
  resources:
    overrides:
      namespace: {resource: "namespace"}
      pod: {resource: "pod"}
  seriesFilters:
  # since this is a superset of the query above, we introduce an additional filter here
  - isNot: "^container_.*_seconds_total$"
  name: {matches: "^container_(.*)_total$"}
  metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"

# this rule matches cumulative non-cAdvisor metrics
- seriesQuery: '{namespace!="",__name__!="^container_.*"}'
  name: {matches: "^(.*)_total$"}
  resources:
    # specify an a generic mapping between resources and labels.  This
    # is a template, like the `metricsQuery` template, except with the `.Group`
    # and `.Resource` strings available.  It will also be used to match labels,
    # so avoid using template functions which truncate the group or resource.
    # Group will be converted to a form acceptible for use as a label automatically.
    template: "<<.Resource>>"
    # if we wanted to, we could also specify overrides here
  metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"

# this rule matches only a single metric, explicitly naming it something else
# It's series query *must* return only a single metric family
- seriesQuery: 'cheddar{sharp="true"}'
  # this metric will appear as "cheesy_goodness" in the custom metrics API
  name: {as: "cheesy_goodness"}
  resources:
    overrides:
      # this should still resolve in our cluster
      brand: {group: "cheese.io", resource: "brand"}
  metricsQuery: 'count(cheddar{sharp="true"})'

# external rules are not tied to a Kubernetes resource and can reference any metric
# https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects
externalRules:
- seriesQuery: '{__name__="queue_consumer_lag",name!=""}'
  metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (name)
- seriesQuery: '{__name__="queue_depth",topic!=""}'
  metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (name)
  # Kubernetes metric queries include a namespace in the query by default
  # but you can explicitly disable namespaces if needed with "namespaced: false"
  # this is useful if you have an HPA with an external metric in namespace A
  # but want to query for metrics from namespace B
  resources:
    namespaced: false

# TODO: should we be able to map to a constant instance of a resource
# (e.g. `resources: {constant: [{resource: "namespace", name: "kube-system"}}]`)?

自定义规则

prometheus-adapter-values.yaml

# Prometheus 地址要和实际环境保持一致
prometheus:
  url: http://prom-prometheus-server.monitoring.svc.cluster.local
  port: 80
  path: ""

replicas: 1

metricsRelistInterval: 1m

listenPort: 6443

service:
  annotations: {}
  port: 443
  type: ClusterIP

rules:
  default: true   # 是否加载默认规则;
  custom:
#  - seriesQuery: '{__name__=~"^http_requests_.*",kubernetes_namespace!="",kubernetes_pod_name!=""}'
#    resources:
#      overrides:
#        kubernetes_namespace: {resource: "namespace"}
#        kubernetes_pod_name: {resource: "pod"}
#    metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'
  - seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
    resources:
      overrides:
        kubernetes_namespace: {resource: "namespace"}
        kubernetes_pod_name: {resource: "pod"}
    name:
      matches: "^(.*)_total"
      as: "${1}_per_second"
    metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
  existing:
  external: []

tls:
  enable: false
  ca: |-
    # Public CA file that signed the APIService
  key: |-
    # Private key of the APIService
  certificate: |-
    # Public key of the APIService

应用自定义指标

Helm Hub的仓库中名为prometheus-community的项目便是用于部署prometheus-adapter的Chart,部署时需要自定义的通常只是与后端的Prometheus服务相关的参数。
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prometheus-adapter -f prometheus-adapter-values.yaml prometheus-community/prometheus-adapter

查看指标信息

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1"
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "pods/http_requests_total",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": ["get"]
    },
    {
      "name": "namespaces/http_requests_total",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": ["get"]
    }
  ]
}

查看自定义指标信息

# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second",
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "name": "frontend-server-abcd-0123",
        "apiVersion": "/__internal",
      },
      "metricName": "http_requests_per_second",
      "timestamp": "2018-08-07T17:45:22Z",
      "value": "16m"
    },
    {
      "describedObject": {
        "kind": "Pod",
        "name": "frontend-server-abcd-4567",
        "apiVersion": "/__internal",
      },
      "metricName": "http_requests_per_second",
      "timestamp": "2018-08-07T17:45:22Z",
      "value": "22m"
    }
  ]
}

参考文档

https://github.com/kubernetes-sigs/prometheus-adapter