Prometheus监控RocketMQ

发布时间 2023-10-17 10:12:00作者: 記憶や空白

本文基于官方提供的RocketMQ Exporter来监控RocketMQ集群

1.Broker TPS/QPS的监控
2.消息积压监控
3.消费组消费演示监控

最终的Grafana面板效果图如下:

楼主RocketMQ环境是三主三从集群(只要在其中一台部署监控即可)

配置步骤

1.安装RocketMQ Exporter

RocketMQ官方已经提供了exporter,官方链接https://github.com
但是未直接提供镜像,需要我们下载源码然后mvn package -Dmaven.test.skip=true docker:build生成镜像

以下是已经生成好的镜像可直接使用:
docker pull sawyerlan/rocketmq-exporter:latest
#https://hub.docker.com/repository/docker/sawyerlan/rocketmq-exporter
docker启动命令:
docker run --name rocketmq-exporter --restart=always -p 5557:5557 -d sawyerlan/rocketmq-exporter --rocketmq.config.namesrvAddr="10.249.1.58:9876;10.249.1.123:9876;10.249.1.6:9876"
验证是否有数据,访问http://你的ip:5557/metrics

2.配置prometheus

  - job_name: 'wms-rocketmq'
    static_configs:
    - targets: ['10.249.1.6:5557']
      labels:
        env: prod_wms
        app: rocketmq
        instance: 10.249.1.123:9876
reload 你的Prometheus使配置生效:
curl -X POST http://localhost:9090/-/reload

3.配置告警规则

groups:
- name: rocketmq
  rules:
  - alert: RocketMQ Exporter is Down 
    expr: up{job="rocketmq"} == 0
    for: 20s
    labels: 
      severity: disaster
    annotations:
      summary: RocketMQ {{ $labels.instance }} is down
  - alert: RocketMQ 存在消息积压
    expr: (sum(irate(rocketmq_producer_offset[1m])) by (topic)  - on(topic) group_right sum(irate(rocketmq_consumer_offset[1m])) by (group,topic)) > 5
    for: 5m
    labels: 
      severity: warning
    annotations:
      summary: RocketMQ (group={{ $labels.group }} topic={{ $labels.topic }})积压数 = {{ .Value }}
  - alert: GroupGetLatencyByStoretime 消费组的消费延时时间过高
    expr: rocketmq_group_get_latency_by_storetime/1000  > 10 and rate(rocketmq_group_get_latency_by_storetime[5m]) >0
    for: 3m
    labels:
      severity: warning
    annotations:
      description: 'consumer {{$labels.group}} on {{$labels.broker}}, {{$labels.topic}} consume time lag behind message store time
        and (behind value is {{$value}}).'
      summary: 消费组的消费延时时间过高
  - alert: RocketMQClusterProduceHigh 集群TPS > 20
    expr: sum(rocketmq_producer_tps) by (cluster) >= 20
    for: 3m
    labels:
      severity: warning
    annotations:
      description: '{{$labels.cluster}} Sending tps too high. now TPS = {{ .Value }}'
      summary: cluster send tps too high
reload使配置生效
curl -X POST http://localhost:9090/-/reload

4.配置grafana

直接导入模板14612