Argo Rollouts AnalysisTemplate CRD

发布时间 2023-12-16 14:44:01作者: 小吉猫

AnalysisTemplate CRD

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:                   # 模板参数,模板内部引用的格式为“{{args.NAME}}”;可在调用该模板时对其赋值;
  - name: <string>
    value: <string>
    valueFrom: 
      secretKeyRef:
        name: <string>
        key: <string>
  metrics:                # 必选字段,定义用于对交付效果进行分析的指标
  - name: <string>        # 必选字段,指标名称;
    initialDelay: 5m      # 延迟特定指标分析
    interval: 5m          # 多次测试时的测试间隔时长
    consecutiveErrorLimit: <Object>
    count: <Object>       # 总共测试的次数
    failureCondition: result[0] >= 0.95  # 测试结果为“失败”的条件表达式
    # NOTE: prometheus queries return results in the form of a vector.
    # So it is common to access the index 0 of the returned array to obtain the value
    successCondition: result[0] >= 0.95     # 测试结果为“成功”的条件表达式
    failureLimit: 3       # 允许的最大失败运行次数
    provider:             # 指标供应方,支持web、wavefront、skywalking、prometheus、plugin、newRelic、kayenta、job、influxdb、graphite、datadog、cloudWatch。
      prometheus:
        # Prometheus服务的访问入口
        address: http://prometheus.example.com:9090
        # 向Prometheus服务发起的查询请求(PromQL)
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
          )) /
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
          ))
  dryRun:                        # 运行于dryRun模式的metric列表,这些metric的结果不会影响最终分析结果
  - metricName: <string>         # 指标名称
  measurementRetention:          # 测量结果历史的保留数,dryRun模式的参数也支持历史结果保留
  - metricName: <string>         # 指标名称
    limit: <integer>             # 保留数量

ClusterAnalysisTemplate CRD

Rollout 可以引用称为 ClusterAnalysisTemplate 的集群范围 AnalysisTemplate。当您想要在多个rollouts之间共享 AnalysisTemplate 时,这会很有用;在不同的命名空间中,并避免在每个命名空间中重复相同的模板。使用字段 clusterScope: true 来引用 ClusterAnalysisTemplate 而不是 AnalysisTemplate。
apiVersion: argoproj.io/v1alpha1
kind: ClusterAnalysisTemplate
metadata:
  name: success-rate
spec:
  args:                   # 模板参数,模板内部引用的格式为“{{args.NAME}}”;可在调用该模板时对其赋值;
  - name: <string>
    value: <string>
    valueFrom: 
      secretKeyRef:
        name: <string>
        key: <string>
  metrics:                # 必选字段,定义用于对交付效果进行分析的指标
  - name: <string>        # 必选字段,指标名称;
    interval: 5m          # 多次测试时的测试间隔时长
    # NOTE: prometheus queries return results in the form of a vector.
    # So it is common to access the index 0 of the returned array to obtain the value
    successCondition: result[0] >= 0.95     # 测试结果为“成功”的条件表达式
    failureLimit: 3       # 允许的最大失败运行次数
    provider:             # 指标供应方,支持web、wavefront、skywalking、prometheus、plugin、newRelic、kayenta、job、influxdb、graphite、datadog、cloudWatch。
      prometheus:
        # Prometheus服务的访问入口
        address: http://prometheus.example.com:9090
        # 向Prometheus服务发起的查询请求(PromQL)
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
          )) /
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
          ))
  dryRun:                        # 运行于dryRun模式的metric列表,这些metric的结果不会影响最终分析结果
  - metricName: <string>         # 指标名称
  measurementRetention:          # 测量结果历史的保留数,dryRun模式的参数也支持历史结果保留
  - metricName: <string>         # 指标名称
    limit: <integer>             # 保留数量

AnalysisRun CRD

配置格式与AnalysisTemplaste大致相同,所不同的是,AnalysisRun用于调用并实例化分析模板。
apiVersion: argoproj.io/v1alpha1
kind: AnalysisRun
metadata:
  name: success-rate
spec:
  args:                   # 模板参数,模板内部引用的格式为“{{args.NAME}}”;可在调用该模板时对其赋值;
  - name: <string>
    value: <string>
    valueFrom: 
      secretKeyRef:
        name: <string>
        key: <string>
  metrics:                # 必选字段,定义用于对交付效果进行分析的指标
  - name: <string>        # 必选字段,指标名称;
    interval: 5m          # 多次测试时的测试间隔时长
    # NOTE: prometheus queries return results in the form of a vector.
    # So it is common to access the index 0 of the returned array to obtain the value
    successCondition: result[0] >= 0.95     # 测试结果为“成功”的条件表达式
    failureLimit: 3       # 允许的最大失败运行次数
    provider:             # 指标供应方,支持web、wavefront、skywalking、prometheus、plugin、newRelic、kayenta、job、influxdb、graphite、datadog、cloudWatch。
      prometheus:
        # Prometheus服务的访问入口
        address: http://prometheus.example.com:9090
        # 向Prometheus服务发起的查询请求(PromQL)
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
          )) /
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
          ))
  dryRun:                        # 运行于dryRun模式的metric列表,这些metric的结果不会影响最终分析结果
  - metricName: <string>         # 指标名称
  measurementRetention:          # 测量结果历史的保留数,dryRun模式的参数也支持历史结果保留
  - metricName: <string>         # 指标名称
    limit: <integer>             # 保留数量
  terminate: <boolean>

AnalysisTemplate 示例

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: guestbook
spec:
...
  strategy:
    canary:
      analysis:
        templates:
        - templateName: success-rate
          # 引用ClusterAnalysisTemplate配置
          # clusterScope: true  
        startingStep: 2 # delay starting analysis run until setWeight: 40%
        args:
        - name: service-name
          value: guestbook-svc.default.svc.cluster.local
      steps:
      - setWeight: 20
      - pause: {duration: 10m}
      - setWeight: 40
      - pause: {duration: 10m}
      - setWeight: 60
      - pause: {duration: 10m}
      - setWeight: 80
      - pause: {duration: 10m}
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
  - name: service-name
  metrics:
  - name: success-rate
    interval: 5m
    successCondition: result[0] >= 0.95
    failureLimit: 3
    count:4
    provider:
      prometheus:
        address: http://prometheus.example.com:9090
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
          )) /
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
          ))
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate
spec:
  args:
  - name: service-name
  measurementRetention:
  - metricName: total-5xx-errors
    limit: 20
  dryRun:
  - metricName: total-5xx-errors
  metrics:
  - name: total-5xx-errors
    interval: 5m
    failureCondition: result[0] >= 10
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus.example.com:9090
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code~"5.*"}[5m]
          ))
  - name: total-4xx-errors
    interval: 5m
    failureCondition: result[0] >= 10
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus.example.com:9090
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code~"4.*"}[5m]
          ))

参考文档

https://argoproj.github.io/argo-rollouts/features/analysis/