部署基于etcd的coredns集群-526互联

前言

现需要为公司搭建私有DNS，私有服务器都使用私有DNS的地址，便于访问内部自定义的域名。采用CoreDNS + ETCD方案部署，coredns和etcd都以三实例运行，etcd为集群模式，使用nginx做coredns的udp负载均衡，避免单机性能问题。另使用prometheus监控coredns和etcd。

本文中的etcd、coredns、prometheus都以二进制方式运行，也可以用docker容器。

环境信息

IP	系统版本	应用	备注
192.168.0.10	CentOS 7.9 x86_64	Nginx 1.21	udp负载均衡
192.168.0.11	CentOS 7.9 x86_64	coredns v1.10.0, etcd v3.5.4
192.168.0.12	CentOS 7.9 x86_64	coredns v1.10.0, etcd v3.5.4
192.168.0.13	CentOS 7.9 x86_64	coredns v1.10.0, etcd v3.5.4
192.168.0.14	CentOS 7.9 x86_64	prometheus

步骤

1. 部署etcd

官方下载etcd的二进制压缩包，将解压后目录内的二进制文件放到/usr/local/bin
找个空目录，执行启动脚本。注意修改每个etcd服务器的IP。脚本里面用python自动获取本机ip，然后根据ip启动对应的etcd。（PS：其实脚本里面的启动函数写重复了，只需要写一个然后传参就行了。）

#!/bin/bash

set -u
script_dir=$(cd $(dirname $0) && pwd)

# 注意修改IP
etcd1IP='192.168.0.11'
etcd2IP='192.168.0.12'
etcd3IP='192.168.0.13'
etcdClusterToken='etcd-cluster-1'

function getLocalIP(){
    # 通过python获取本机IP, 如果新的linux发行版默认没有python2, 注意更改为python3
    cat > /tmp/getLocalIP.py <<EOF
#!/usr/bin/env python
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect(("8.8.8.8", 53))
print(s.getsockname()[0])
EOF
    IP=$(python /tmp/getLocalIP.py)
}

# 在节点一运行etcd
function startNode1() {
    nohup etcd --name etcd1 \
        --listen-client-urls http://${etcd1IP}:2379 \
        --advertise-client-urls http://${etcd1IP}:2379 \
        --listen-peer-urls http://${etcd1IP}:2380 \
        --initial-advertise-peer-urls http://${etcd1IP}:2380 \
        --initial-cluster-token ${etcdClusterToken} \
        --initial-cluster="etcd1=http://${etcd1IP}:2380,etcd2=http://${etcd2IP}:2380,etcd3=http://${etcd3IP}:2380" \
        --initial-cluster-state 'new' \
        --enable-pprof \
        --logger 'zap' \
        --log-outputs=stderr \
        --data-dir="${script_dir}/data" > ${script_dir}/logs/app.log 2>&1 &
}

# 在节点二运行etcd
function startNode2() {
    nohup etcd --name etcd2 \
        --listen-client-urls http://${etcd2IP}:2379 \
        --advertise-client-urls http://${etcd2IP}:2379 \
        --listen-peer-urls http://${etcd2IP}:2380 \
        --initial-advertise-peer-urls http://${etcd2IP}:2380 \
        --initial-cluster-token ${etcdClusterToken} \
        --initial-cluster="etcd1=http://${etcd1IP}:2380,etcd2=http://${etcd2IP}:2380,etcd3=http://${etcd3IP}:2380" \
        --initial-cluster-state 'new' \
        --enable-pprof \
        --logger 'zap' \
        --log-outputs=stderr \
        --data-dir="${script_dir}/data" > ${script_dir}/logs/app.log 2>&1 &
}

# 在节点三运行etcd
function startNode3() {
    nohup etcd --name etcd3 \
        --listen-client-urls http://${etcd3IP}:2379 \
        --advertise-client-urls http://${etcd3IP}:2379 \
        --listen-peer-urls http://${etcd3IP}:2380 \
        --initial-advertise-peer-urls http://${etcd3IP}:2380 \
        --initial-cluster-token ${etcdClusterToken} \
        --initial-cluster="etcd1=http://${etcd1IP}:2380,etcd2=http://${etcd2IP}:2380,etcd3=http://${etcd3IP}:2380" \
        --initial-cluster-state 'new' \
        --enable-pprof \
        --logger 'zap' \
        --log-outputs=stderr \
        --data-dir="${script_dir}/data" > ${script_dir}/logs/app.log 2>&1 &
}

function main() {
    mkdir -p ./{data,logs}
    getLocalIP
    case $IP in
        $etcd1IP)
            startNode1
            ;;
        $etcd2IP)
            startNode2
            ;;
        $etcd3IP)
            startNode3
            ;;
        *)
            echo "Unknown Node ip"
    esac
}

main

测试集群是否正常

# 如果正常则全部显示successfully
etcdctl --endpoints http://192.168.0.11:2379,http://192.168.0.12:2379,http://192.168.0.13:2379 endpoint health

2. 部署coredns

从官方github仓库下载二进制包，个人一般会把coredns的二进制文件放到coredns/bin目录下
编写配置文件coredns.conf/Corefile。注意修改本机IP和etcd实例的host

.:53 {
    # 绑定本机IP
    bind 192.168.0.11
    etcd {
        path /coredns
        endpoint http://192.168.0.11:2379 http://192.168.0.12:2379 http://192.168.0.13:2379
        fallthrough
    }
    # 最后所有的都转发到系统配置的上游dns服务器去解析
    forward . /home/apps/coredns/conf/forwards
    # 缓存时间ttl
    cache 1800
    # 自动加载配置文件的间隔时间
    reload 6s
    # 输出日志
    #log
    # 输出错误
    errors
    # 监控
    prometheus 192.168.0.11:19097
}

其中forwards内容如下：

nameserver 223.6.6.6
nameserver 223.5.5.5

通过脚本启动coredns

#!/bin/bash
# description: 启动CoreDNS

set -u

scriptDir=$(cd $(dirname $0) && pwd)
baseDir=$(cd ${scriptDir}/.. && pwd)
pidFile=${baseDir}/logs/app.pid

function prepare(){
    # 检查当前用户是否为root
    if [[ $(whoami) != "root" ]]; then
        echo "please use root privilege"
        exit 1
    fi

    # 检查是否存在配置文件, 无则报错退出
    if [[ ! -f ${baseDir}/conf/Corefile ]]; then
        echo "${baseDir}/conf/Corefile not found"
        exit 1
    fi
    
    # 检测是否存在日志目录, 无则创建
    if [[ ! -d ${baseDir}/logs ]]; then
        mkdir -p ${baseDir}/logs
    fi
    
    # 检查进程是否已存在, 存在则退出
    ps -ef | grep -v grep | grep ${scriptDir}/coredns > /dev/null
    if [[ $? -eq 0 ]]; then
        echo "coredns is running"
        exit 1
    fi
}

function startApp(){
    nohup ${scriptDir}/coredns --conf ${baseDir}/conf/Corefile \
        -pidfile ${pidFile} > ${baseDir}/logs/start.log 2>&1 &
}

function check(){
    # 检查是否正常启动
    for i in $(seq 2); do
        echo "checking coredns whether is running or not ..."
        sleep 1
    done
    ps -ef | grep -v grep | grep ${scriptDir}/coredns > /dev/null
    if [[ $? -eq 0 ]]; then
        echo "coredns is running"
    fi
}

function main(){
    prepare
    startApp
    check
}

main

3. nginx配置udp负载均衡

PS：强调一点，nginx早就支持tcp和udp的四层网络代理转发了，别再听信一些老掉牙的教程说nginx只支持http的七层网络代理转发。

nginx在stream域的配置udp的网络转发，以下为示例

stream {
    upstream coredns {
        server 192.168.0.11:53;
        server 192.168.0.12:53;
        server 192.168.0.13:53;
    }

    server {
        listen 53 udp;
        proxy_pass coredns;
    }
}

4. 测试dns是否可用

通过etcdctl添加一条dns解析记录。这里将zhangsan.com解析到192.168.0.10

etcdctl --endpoints http://192.168.0.11:2379,http://192.168.0.12:2379,http://192.168.0.13:2379 put /coredns/com/zhangsan/x1 '{"host":"192.168.0.10", "ttl": 60}'

使用nslookup工具测试。如果提示没有命令，centos 可安装bind-utils，ubuntu可安装dns-utils。

nslookup zhangsan.com 192.168.0.10

如果上条命令正常返回解析结果，说明coredns集群搭建完成，后面新增服务器只需要将dns配置为nginx的地址即可。

5. 配置prometheus监控

prometheus的安装和使用略过，以下仅为etcd和coredns的监控配置示例，使用的是基于文件的服务发现。

  - job_name: "etcd"
    file_sd_configs:
    - files: ['/home/apps/prometheus/sd_configs/etcd/*.yaml']
      refresh_interval:  10s
  - job_name: "coredns"
    file_sd_configs:
    - files: ['/home/apps/prometheus/sd_configs/coredns/*.yaml']
      refresh_interval:  10s

sd_configs/etcd/nodes.yaml内容如下：

- targets: ['192.168.0.11:2379']
  labels:
    instance: 192.168.0.11

- targets: ['192.168.0.12:2379']
  labels:
    instance: 192.168.0.12

- targets: ['192.168.0.13:2379']
  labels:
    instance: 192.168.0.13

sd_configs/coredns/nodes.yaml内容如下：

- targets: ['192.168.0.11:19097']
  labels:
    instance: 192.168.0.11

- targets: ['192.168.0.12:19097']
  labels:
    instance: 192.168.0.12

- targets: ['192.168.0.13:19097']
  labels:
    instance: 192.168.0.13

之后去grafana官网找个合适的dashboard进行import即可。

集群check-etcd execution时候