61、Prometheus-Consul分布式集群部署

发布时间 2023-04-12 11:32:41作者: 小粉优化大师

1、简介

1.1、Consul介绍

Consul 是基于 GO 语言开发的开源工具,主要面向分布式,服务化的系统提供服务注册、服务发现和配置管理的功能。Consul 提供服务注册/发现、健康检查、Key/Value 存储、多数
据中心和分布式一致性保证等功能。Prometheus 通过 Consul 可以很方便的实现服务自动发现和维护,同时 Consul 支持分布式集群部署,将大大提高了稳定性,通过 Prometheus 跟
Consul 集群二者结合起来,能够高效的进行数据维护同时保证系统稳定。

2、Consul布署

2.1、环境准备

2.1.1、准备3个主机

这里准备如下IP地址主机
192.168.10.34
192.168.10.30
192.168.10.29

2.1.2、3个主机下载consul软件

https://releases.hashicorp.com/consul/1.8.0/consul_1.8.0_linux_amd64.zip

2.1.3、解压软件

unzip consul_1.8.0_linux_amd64.zip -d /usr/local/bin/
mkdir /data/

2.2、启动consul服务

2.2.1、192.168.10.34

]# nohup consul agent -server -bootstrap-expect=3 -data-dir=/data/consul -node=192.168.10.34 -bind=192.168.10.34 -client=0.0.0.0 -datacenter=consulManager -ui &


]# tail -f nohup.out 
    2023-04-12T10:37:13.129+0800 [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
    2023-04-12T10:37:13.129+0800 [INFO]  agent.server.raft: entering follower state: follower="Node at 192.168.10.34:8300 [Follower]" leader=
    2023-04-12T10:37:13.130+0800 [INFO]  agent.server: Adding LAN server: server="192.168.10.34 (Addr: tcp/192.168.10.34:8300) (DC: consulmanager)"
    2023-04-12T10:37:13.130+0800 [INFO]  agent.server: Handled event for server in area: event=member-join server=192.168.10.34.consulmanager area=wan
    2023-04-12T10:37:13.130+0800 [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2023-04-12T10:37:13.130+0800 [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
    2023-04-12T10:37:13.130+0800 [INFO]  agent: started state syncer
==> Consul agent running!
    2023-04-12T10:37:20.180+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"

2.2.2、192.168.10.30

]# nohup consul agent -server -bootstrap-expect=3 -data-dir=/data/consul -node=192.168.10.30 -bind=192.168.10.30 -client=0.0.0.0 -datacenter=consulManager -ui &

]#
tail -f nohup.out 2023-04-12T10:37:45.562+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: 192.168.10.30 192.168.10.30 2023-04-12T10:37:45.562+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp 2023-04-12T10:37:45.562+0800 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.10.30:8300 [Follower]" leader= 2023-04-12T10:37:45.563+0800 [INFO] agent.server: Adding LAN server: server="192.168.10.30 (Addr: tcp/192.168.10.30:8300) (DC: consulmanager)" 2023-04-12T10:37:45.563+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=192.168.10.30.consulmanager area=wan 2023-04-12T10:37:45.563+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp 2023-04-12T10:37:45.563+0800 [INFO] agent: Started HTTP server: address=[::]:8500 network=tcp 2023-04-12T10:37:45.563+0800 [INFO] agent: started state syncer ==> Consul agent running! 2023-04-12T10:37:51.172+0800 [WARN] agent.server.raft: no known peers, aborting election 2023-04-12T10:37:52.598+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" 2023-04-12T10:37:57.242+0800 [INFO] agent: Newer Consul version available: new_version=1.15.2 current_version=1.8.0 2023-04-12T10:38:18.841+0800 [ERROR] agent: Coordinate update error: error="No cluster leader"

2.2.3、192.168.10.29

]# nohup consul agent -server -bootstrap-expect=3 -data-dir=/data/consul -node=192.168.10.29 -bind=192.168.10.29 -client=0.0.0.0 -datacenter=consulManager -ui &


]# tail -f nohup.out 
    2023-04-12T10:37:37.124+0800 [INFO]  agent.server.raft: entering follower state: follower="Node at 192.168.10.29:8300 [Follower]" leader=
    2023-04-12T10:37:37.124+0800 [INFO]  agent.server: Adding LAN server: server="192.168.10.29 (Addr: tcp/192.168.10.29:8300) (DC: consulmanager)"
    2023-04-12T10:37:37.124+0800 [INFO]  agent.server: Handled event for server in area: event=member-join server=192.168.10.29.consulmanager area=wan
    2023-04-12T10:37:37.124+0800 [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2023-04-12T10:37:37.125+0800 [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
    2023-04-12T10:37:37.125+0800 [INFO]  agent: started state syncer
==> Consul agent running!
    2023-04-12T10:37:43.767+0800 [WARN]  agent.server.raft: no known peers, aborting election
    2023-04-12T10:37:44.168+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"

2.2.4、检查启动端口

]# netstat -tunlp | grep consul
tcp        0      0 192.168.10.34:8300      0.0.0.0:*               LISTEN      28148/consul        
tcp        0      0 192.168.10.34:8301      0.0.0.0:*               LISTEN      28148/consul        
tcp        0      0 192.168.10.34:8302      0.0.0.0:*               LISTEN      28148/consul        
tcp6       0      0 :::8500                 :::*                    LISTEN      28148/consul        
tcp6       0      0 :::8600                 :::*                    LISTEN      28148/consul        
udp        0      0 192.168.10.34:8301      0.0.0.0:*                           28148/consul        
udp        0      0 192.168.10.34:8302      0.0.0.0:*                           28148/consul        
udp6       0      0 :::8600                 :::*                                28148/consul 

2.2.5、注意事项

此时三台机器还未 join,不能算是一个集群,三台机器上的 consul 均不能正常工作,因为leader 未选出。

2.3、给集群加入节点

在任意一台consul主机上执行

2.3.1、选择192.168.10.34主机加入节点

]# consul join 192.168.10.29
Successfully joined cluster by contacting 1 nodes.
]# consul join 192.168.10.30 Successfully joined cluster by contacting 1 nodes.

2.3.2、分析日志

   ...2023-04-12T10:43:58.685+0800 [INFO]  agent.server: New leader elected: payload=192.168.10.34
    2023-04-12T10:43:58.685+0800 [INFO]  agent.server.raft: pipelining replication: peer="{Voter c2f9e827-afcf-54c3-b702-9ec3111491d9 192.168.10.30:8300}"
    2023-04-12T10:43:58.686+0800 [WARN]  agent.server.raft: appendEntries rejected, sending older logs: peer="{Voter 696892ba-c77e-c5ad-5709-2cd4bdbc06dc 192.168.10.29:8300}" next=1
    2023-04-12T10:43:58.687+0800 [INFO]  agent.server.raft: pipelining replication: peer="{Voter 696892ba-c77e-c5ad-5709-2cd4bdbc06dc 192.168.10.29:8300}"
    2023-04-12T10:43:58.687+0800 [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
    2023-04-12T10:43:58.687+0800 [INFO]  agent.leader: started routine: routine="federation state pruning"
    2023-04-12T10:43:58.687+0800 [INFO]  agent.leader: started routine: routine="CA root pruning"
    2023-04-12T10:43:58.687+0800 [INFO]  agent.server: member joined, marking health alive: member=192.168.10.34
    2023-04-12T10:43:58.691+0800 [INFO]  agent.server: member joined, marking health alive: member=192.168.10.29
    2023-04-12T10:43:58.692+0800 [INFO]  agent.server: federation state anti-entropy synced
    2023-04-12T10:43:58.692+0800 [INFO]  agent: Synced node info
    2023-04-12T10:43:58.692+0800 [INFO]  agent.server: member joined, marking health alive: member=192.168.10.30
# 此时说明leader和成员都有了

2.4、集群检查

2.4.1、查看集群状态

]# consul operator raft list-peers
Node           ID                                    Address             State     Voter  RaftProtocol
192.168.10.29  696892ba-c77e-c5ad-5709-2cd4bdbc06dc  192.168.10.29:8300  follower  true   3
192.168.10.30  c2f9e827-afcf-54c3-b702-9ec3111491d9  192.168.10.30:8300  follower  true   3
192.168.10.34  5edfe595-4f0e-b507-bd8d-5f04fd1109d2  192.168.10.34:8300  leader    true   3

2.4.2、查看成员状态

]# consul members
Node           Address             Status  Type    Build  Protocol  DC             Segment
192.168.10.29  192.168.10.29:8301  alive   server  1.8.0  2         consulmanager  <all>
192.168.10.30  192.168.10.30:8301  alive   server  1.8.0  2         consulmanager  <all>
192.168.10.34  192.168.10.34:8301  alive   server  1.8.0  2         consulmanager  <all>

2.4.3、集群测试

# 设置值
]# consul kv put name cyc
Success! Data written to: name

# 获取值
]# consul kv get name
cyc

其他两台机器查看该 key 值 也是返回 shanwaiyun 这个 说明 key 值已经在集群中同步

2.4.4、web浏览

http://192.168.10.34:8500/
http://192.168.10.30:8500/
http://192.168.10.29:8500/

3、Prometheus 与 consul 整合

3.1、原理流程

1、通过在 consul 注册服务或注销服务(监控 targets)
2、Prometheus 一直监视(watch)consul 服务,当发现 consul 中符合要求的服务有新变化是更新 Prometheus 监控对象

3.2、准备一台新的node_exporter

192.168.10.30:9100 已经安装node_exporter

3.3、consul注册与注销

3.3.1、node_exporter服务注册到consul

curl -X PUT -d '{"id": "node-exporter-30","name":"node-exporter","address": "192.168.10.30","port": 9100,"tags":["linux","prome"],"checks": [{"http": "http://192.168.10.30:9100/metrics","interval": "5s"}]}' http://192.168.10.34:8500/v1/agent/service/register

 

3.3.2、node_exporter服务从consul注销

curl -X PUT http://192.168.10.34:8500/v1/agent/service/deregister/node-exporter-30

3.4、将consul增加至prometheus的配置

3.4.1、配置prometheus.yaml

]# vi /data/server/prometheus/etc/prometheus.yml
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: 'node_discovery_by_consul'
    metrics_path: /metrics
    scheme: http
    consul_sd_configs:
    - server: 192.168.10.29:8500
      services:
      - node-exporter
    - server: 192.168.10.30:8500
      services:
      - node-exporter
    - server: 192.168.10.34:8500
      services:
      - node-exporter

3.4.2、检查语法

]# promtool check config /data/server/prometheus/etc/prometheus.yml 
Checking /data/server/prometheus/etc/prometheus.yml
  SUCCESS: 1 rule files found
 SUCCESS: /data/server/prometheus/etc/prometheus.yml is valid prometheus config file syntax

Checking /data/server/prometheus/rules/metrics_request_rules.yaml
  SUCCESS: 2 rules found

3.4.3、重启prometheus服务

systemctl restart prometheus

3.4.4、prometheus Web查询

说明增加节点监控增加成功

 

 

3.5、再增加多一个node_exporter

3.5.1、往consul注册多一个node_exporter

curl -X PUT -d '{"id": "node-exporter-29","name":"node-exporter","address": "192.168.10.29","port": 9100,"tags":["linux","prome"],"checks": [{"http": "http://192.168.10.29:9100/metrics","interval": "5s"}]}' http://192.168.10.34:8500/v1/agent/service/register

3.5.2、查询consul注册情况

3.5.3、prometheus Web查询

3.5.4、使用PromQL查询