prometheus-adapter

Prometheus并非Kubernetes系统的聚合API服务器，其PromQL接口无法直接作为自定义指标数据源，我们还需要一个专门的中间层将PromQL的指标转换为符合Kubernetes系统聚合API格式的指标。这些自定义指标再经由Kubernetes系统上的custom.metrics.k8s.io或external.metrics.k8s.io API提供给相应的客户端使用，例如HPAv2等。目前最流行的中间层解决方案是托管在GitHub上的prometheus-adapter项目，另外可选的还有kube-metrics-adapter等。

配置适配器

适配器通过以下方式考虑指标：

发现机制(Discovery)

定义适配器如何从Prometheus中为当前规则查找待暴露的指标，使用seriesQuery来指定传递给Prometheus的查询条件，且能够使用seriesFilters进一步缩小指标范围。下面的条件表示从每个名称空间查询所有Pod上的http_requests_total指标，其中的kubernetes_namespace代表名称空间的名称标识，而kubernetes_pod_name代表Pod自身名称标识，它们是适配器中固定的Go模板变量。

rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'

关联方式

定义发现机制中指定的指标可以附加到Kubernetes的哪些资源上，即暴露哪些资源的指定指标。关联方式使用resources字段进行定义，支持两种格式：一种是嵌套使用template字段以Go模板的形式限定目标资源，使用Group代表资源群组，使用Resouce代表资源类型；另一种是嵌套使用overrides字段将特定的资源标签转为Kubernetes资源类型。

下面的示例把具体的名称空间的名称统一为固定的资源类型标识namespace（也可以是namespaces），把具体的Pod名称统一为固定的资源类型标识pod（也可以是pods），它们都隶属于core群组，因而无须指定群组名称。

rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
  resources:
    overrides:
      kubernetes_namespace: {resource: "namespace"}
      kubernetes_pod_name: {resource: "pod"}

指标命名

定义如何将Prometheus的指标名称转换为所需的自定义指标名称，它由name字段进行定义，并嵌套使用match字段选定要转换的指标（默认为“.*”），使用as字段指定要使用的名称，支持正则表达式的分组引用机制，例如$0或${0}等。例如，下面的示例表示把所有指标名称中的_total后缀修改为_per_second。

rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
  resources:
    overrides:
      kubernetes_namespace: {resource: "namespace"}
      kubernetes_pod_name: {resource: "pod"}
  name:
    matches: "^(.*)_total"
    as: "${1}_per_second"

查询语句

定义具体发往PromQL的查询语句，在metricsQuery字段以Go模板格式进行定义，并在具体执行时基于目标对象的信息进行模板渲染后转为具体PromQL语句。模板固定以Series引用发现机制中指定的指标名称；以LabelMatchers引用资源标签匹配条件列表，目前该匹配条件的默认值是资源类型及其所属的名称空间，因而集群级别的资源无此条件；以GroupBy引用分组条件列表，目前该分组条件默认为资源类型。例如，下面的语句代表以指定的指标查询满足标签选择条件的、监控对象上的Prometheus指标，而后将其速率值进行分组求和：

rules:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
  resources:
    overrides:
      kubernetes_namespace: {resource: "namespace"}
      kubernetes_pod_name: {resource: "pod"}
  name:
    matches: "^(.*)_total"
    as: "${1}_per_second"
  metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

配置样例

rules:
# Each rule represents a some naming and discovery logic.
# Each rule is executed independently of the others, so
# take care to avoid overlap.  As an optimization, rules
# with the same `seriesQuery` but different
# `name` or `seriesFilters` will use only one query to
# Prometheus for discovery.

# some of these rules are taken from the "default" configuration, which
# can be found in pkg/config/default.go

# this rule matches cumulative cAdvisor metrics measured in seconds
- seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
  resources:
    # skip specifying generic resource<->label mappings, and just
    # attach only pod and namespace resources by mapping label names to group-resources
    overrides:
      namespace: {resource: "namespace"}
      pod: {resource: "pod"}
  # specify that the `container_` and `_seconds_total` suffixes should be removed.
  # this also introduces an implicit filter on metric family names
  name:
    # we use the value of the capture group implicitly as the API name
    # we could also explicitly write `as: "$1"`
    matches: "^container_(.*)_seconds_total$"
  # specify how to construct a query to fetch samples for a given series
  # This is a Go template where the `.Series` and `.LabelMatchers` string values
  # are available, and the delimiters are `<<` and `>>` to avoid conflicts with
  # the prometheus query language
  metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"

# this rule matches cumulative cAdvisor metrics not measured in seconds
- seriesQuery: '{__name__=~"^container_.*_total",container!="POD",namespace!="",pod!=""}'
  resources:
    overrides:
      namespace: {resource: "namespace"}
      pod: {resource: "pod"}
  seriesFilters:
  # since this is a superset of the query above, we introduce an additional filter here
  - isNot: "^container_.*_seconds_total$"
  name: {matches: "^container_(.*)_total$"}
  metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"

# this rule matches cumulative non-cAdvisor metrics
- seriesQuery: '{namespace!="",__name__!="^container_.*"}'
  name: {matches: "^(.*)_total$"}
  resources:
    # specify an a generic mapping between resources and labels.  This
    # is a template, like the `metricsQuery` template, except with the `.Group`
    # and `.Resource` strings available.  It will also be used to match labels,
    # so avoid using template functions which truncate the group or resource.
    # Group will be converted to a form acceptible for use as a label automatically.
    template: "<<.Resource>>"
    # if we wanted to, we could also specify overrides here
  metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[2m])) by (<<.GroupBy>>)"

# this rule matches only a single metric, explicitly naming it something else
# It's series query *must* return only a single metric family
- seriesQuery: 'cheddar{sharp="true"}'
  # this metric will appear as "cheesy_goodness" in the custom metrics API
  name: {as: "cheesy_goodness"}
  resources:
    overrides:
      # this should still resolve in our cluster
      brand: {group: "cheese.io", resource: "brand"}
  metricsQuery: 'count(cheddar{sharp="true"})'

# external rules are not tied to a Kubernetes resource and can reference any metric
# https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects
externalRules:
- seriesQuery: '{__name__="queue_consumer_lag",name!=""}'
  metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (name)
- seriesQuery: '{__name__="queue_depth",topic!=""}'
  metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (name)
  # Kubernetes metric queries include a namespace in the query by default
  # but you can explicitly disable namespaces if needed with "namespaced: false"
  # this is useful if you have an HPA with an external metric in namespace A
  # but want to query for metrics from namespace B
  resources:
    namespaced: false

# TODO: should we be able to map to a constant instance of a resource
# (e.g. `resources: {constant: [{resource: "namespace", name: "kube-system"}}]`)?

自定义规则

prometheus-adapter-values.yaml

# Prometheus 地址要和实际环境保持一致
prometheus:
  url: http://prom-prometheus-server.monitoring.svc.cluster.local
  port: 80
  path: ""

replicas: 1

metricsRelistInterval: 1m

listenPort: 6443

service:
  annotations: {}
  port: 443
  type: ClusterIP

rules:
  default: true   # 是否加载默认规则；
  custom:
#  - seriesQuery: '{__name__=~"^http_requests_.*",kubernetes_namespace!="",kubernetes_pod_name!=""}'
#    resources:
#      overrides:
#        kubernetes_namespace: {resource: "namespace"}
#        kubernetes_pod_name: {resource: "pod"}
#    metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'
  - seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
    resources:
      overrides:
        kubernetes_namespace: {resource: "namespace"}
        kubernetes_pod_name: {resource: "pod"}
    name:
      matches: "^(.*)_total"
      as: "${1}_per_second"
    metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
  existing:
  external: []

tls:
  enable: false
  ca: |-
    # Public CA file that signed the APIService
  key: |-
    # Private key of the APIService
  certificate: |-
    # Public key of the APIService

应用自定义指标

Helm Hub的仓库中名为prometheus-community的项目便是用于部署prometheus-adapter的Chart，部署时需要自定义的通常只是与后端的Prometheus服务相关的参数。

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prometheus-adapter -f prometheus-adapter-values.yaml prometheus-community/prometheus-adapter

查看指标信息

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1"

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "pods/http_requests_total",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": ["get"]
    },
    {
      "name": "namespaces/http_requests_total",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": ["get"]
    }
  ]
}

查看自定义指标信息

# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .

{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second",
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "name": "frontend-server-abcd-0123",
        "apiVersion": "/__internal",
      },
      "metricName": "http_requests_per_second",
      "timestamp": "2018-08-07T17:45:22Z",
      "value": "16m"
    },
    {
      "describedObject": {
        "kind": "Pod",
        "name": "frontend-server-abcd-4567",
        "apiVersion": "/__internal",
      },
      "metricName": "http_requests_per_second",
      "timestamp": "2018-08-07T17:45:22Z",
      "value": "22m"
    }
  ]
}