kubeasz部署k8s集群

发布时间 2023-05-31 15:20:19作者: ZANAN

官网:https://github.com/easzlab/kubeasz

架构图

免密登入

ssh-keygen 
ssh-copy-id 172.16.251.4
ssh-copy-id 172.16.251.5
ssh-copy-id 172.16.251.6
ssh-copy-id 172.16.251.7
ssh-copy-id 172.16.251.8
ssh-copy-id 172.16.251.9

下载ezdown脚本

root@k8s-deploy:~# export release=3.3.1
root@k8s-deploy:~# wget https://github.com/easzlab/kubeasz/releases/download/${release}/ezdown

root@k8s-deploy:~# chmod +x ./ezdown


root@k8s-deploy:~# cat ezdown
#!/bin/bash
#--------------------------------------------------
# This script is used for:
# 1. to download the scripts/binaries/images needed for installing a k8s cluster with kubeasz
# 2. to run kubeasz in a container (recommended way to run 'ezctl')
# @author:   gjmzj
# @usage:    ./ezdown
# @repo:     https://github.com/easzlab/kubeasz
# @ref:      https://github.com/kubeasz/dockerfiles
#--------------------------------------------------
set -o nounset
set -o errexit
#set -o xtrace

# default settings, can be overridden by cmd line options, see usage
DOCKER_VER=20.10.16
KUBEASZ_VER=3.3.1
K8S_BIN_VER=v1.24.2
EXT_BIN_VER=1.2.0
SYS_PKG_VER=0.4.3
HARBOR_VER=v2.1.3
REGISTRY_MIRROR=CN

# images downloaded by default(with '-D')
calicoVer=v3.19.4
dnsNodeCacheVer=1.21.1
corednsVer=1.9.3
dashboardVer=v2.5.1
dashboardMetricsScraperVer=v1.0.8
metricsVer=v0.5.2
pauseVer=3.7

# images not downloaded by default(only download  with '-X')
ciliumVer=1.11.6
flannelVer=v0.15.1
nfsProvisionerVer=v4.0.2
promChartVer=35.5.1

# images not downloaded
kubeRouterVer=v0.3.1
kubeOvnVer=v1.5.3

下载kubeasz代码、二次制作、默认器镜像(更多关于ezdown的参数,运行./ezdown查看)

# 国内环境
./ezdown -D
# 海外环境
#./ezdown -D -m standard
【可选】下载外部容器镜像(cilium,flannel,prometheus等)

./ezdown -X
【可选】下载离线系统包(适用于无法使用yum/apt仓库情形)

./ezdown -P
上记脚本运行成功后,所有文件(kubeasz代码、二次制作、离线镜像)均已整理好放入目录/etc/kubeasz

创建新集群 k8s-01

./kubeasz ezctl new k8s-cluster1

2021-01-19 10:48:23 DEBUG generate custom cluster files in /etc/kubeasz/clusters/k8s-01

2021-01-19 10:48:23 DEBUG set version of common plugins

2021-01-19 10:48:23 DEBUG cluster k8s-01: files successfully created.

2021-01-19 10:48:23 INFO next steps 1: to config '/etc/kubeasz/clusters/k8s-01/hosts'

2021-01-19 10:48:23 INFO next steps 2: to config '/etc/kubeasz/clusters/k8s-01/config.yml'
然后根据提示配置'/etc/kubeasz/clusters/k8s-01/hosts''/etc/kubeasz/clusters/k8s-01/config.yml':根据前面节计划修改hosts文件和其他群集的要领选项;其他集合群组件等配置项可以在config.yml文件中修改。

host文件解读

 
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster1# cat hosts
# 'etcd' cluster should have odd member(s) (1,3,5,...)
[etcd]
172.16.251.5
172.16.251.6
#172,16.251.7

# master node(s)
[kube_master]
172.16.251.5
172.16.251.6

# work node(s)
[kube_node]
172.16.251.8


# [optional] harbor server, a private docker registry
# 'NEW_INSTALL': 'true' to install a harbor server; 'false' to integrate with existed one
[harbor]
172.16.251.4 NEW_INSTALL=true

# [optional] loadbalance for accessing k8s from outside
[ex_lb]
172.16.251.8 LB_ROLE=backup EX_APISERVER_VIP=172.16.251.10 EX_APISERVER_PORT=8443
172.16.251.9 LB_ROLE=master EX_APISERVER_VIP=172.16.251.10 EX_APISERVER_PORT=8443

# [optional] ntp server for the cluster
[chrony]
172.16.251.5

[all:vars]
# --------- Main Variables ---------------
# Secure port for apiservers
SECURE_PORT="6443"

# Cluster container-runtime supported: docker, containerd
# if k8s version >= 1.24, docker is not supported
CONTAINER_RUNTIME="containerd" #启用的CRI组件

# Network plugins supported: calico, flannel, kube-router, cilium, kube-ovn
CLUSTER_NETWORK="calico" #集群使用的网络插件

# Service proxy mode of kube-proxy: 'iptables' or 'ipvs'
PROXY_MODE="ipvs" #kube-proxy运行的模式

# K8S Service CIDR, not overlap with node(host) networking
SERVICE_CIDR="10.68.0.0/16" #svc网段

# Cluster CIDR (Pod CIDR), not overlap with node(host) networking
CLUSTER_CIDR="172.20.0.0/16"  #pod网段

# NodePort Range
NODE_PORT_RANGE="30000-32767" #nodePort开放的端口范围

# Cluster DNS Domain
CLUSTER_DNS_DOMAIN="cluster.local" #集群域名后缀

# -------- Additional Variables (don't change the default value right now) ---
# Binaries Directory
bin_dir="/usr/bin" #二进制存放的目录

# Deploy Directory (kubeasz workspace)
base_dir="/etc/kubeasz"

# Directory for a specific cluster
cluster_dir="{{ base_dir }}/clusters/k8s-cluster1"

# CA and other components cert/key Directory
ca_dir="/etc/kubernetes/ssl" #证书存放的目录
config.yml
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster1# cat config.yml
############################
# prepare
############################
# 可选离线安装系统软件包 (offline|online)
INSTALL_SOURCE: "online" 

# 可选进行系统安全加固 github.com/dev-sec/ansible-collection-hardening
OS_HARDEN: false

#它配置:

#删除未使用的 yum 存储库并启用 GPG 密钥检查
#删除有已知问题的包
#配置 pam 以进行强密码检查
#安装和配置 auditd
#通过软限制禁用核心转储
#设置限制性 umask
#配置系统路径下文件的执行权限
#加强对影子和密码文件的访问
#禁用未使用的文件系统
#禁用 rhosts
#配置安全 ttys
#通过 sysctl 配置内核参数
#在基于 EL 的系统上启用 selinux
#删除 SUID 和 GUID
#配置系统帐户的登录名和密码

############################
# role:deploy
############################
# default: ca will expire in 100 years
# default: certs issued by the ca will expire in 50 years
CA_EXPIRY: "876000h"
CERT_EXPIRY: "438000h"

# kubeconfig 配置参数 #配置集群名字和上下文
CLUSTER_NAME: "cluster1"
CONTEXT_NAME: "context-{{ CLUSTER_NAME }}"

# k8s version
K8S_VER: "1.24.2"

############################
# role:etcd
############################
# 设置不同的wal目录,可以避免磁盘io竞争,提高性能
ETCD_DATA_DIR: "/var/lib/etcd"
ETCD_WAL_DIR: ""


############################
# role:runtime [containerd,docker]
############################
# ------------------------------------------- containerd
# [.]启用容器仓库镜像
ENABLE_MIRROR_REGISTRY: true

# [containerd]基础容器镜像
SANDBOX_IMAGE: "easzlab.io.local:5000/easzlab/pause:3.7"

# [containerd]容器持久化存储目录
CONTAINERD_STORAGE_DIR: "/var/lib/containerd"

# ------------------------------------------- docker
# [docker]容器存储目录
DOCKER_STORAGE_DIR: "/var/lib/docker"

# [docker]开启Restful API
ENABLE_REMOTE_API: false

# [docker]信任的HTTP仓库
INSECURE_REG: '["http://easzlab.io.local:5000"]'


############################
# role:kube-master
############################
# k8s 集群 master 节点证书配置,可以添加多个ip和域名(比如增加公网ip和域名)
MASTER_CERT_HOSTS:
  - "172.16.251.10"
  - "172.16.251.5"
  - "172.16.251.6"
  - "172.16.251.7"
  - "k8s.easzlab.io"
  #- "www.test.com"

# node 节点上 pod 网段掩码长度(决定每个节点最多能分配的pod ip地址)
# 如果flannel 使用 --kube-subnet-mgr 参数,那么它将读取该设置为每个节点分配pod网段
# https://github.com/coreos/flannel/issues/847
NODE_CIDR_LEN: 24


############################
# role:kube-node
############################
# Kubelet 根目录
KUBELET_ROOT_DIR: "/var/lib/kubelet"

# node节点最大pod 数
MAX_PODS: 110

# 配置为kube组件(kubelet,kube-proxy,dockerd等)预留的资源量
# 数值设置详见templates/kubelet-config.yaml.j2
KUBE_RESERVED_ENABLED: "no"

# k8s 官方不建议草率开启 system-reserved, 除非你基于长期监控,了解系统的资源占用状况;
# 并且随着系统运行时间,需要适当增加资源预留,数值设置详见templates/kubelet-config.yaml.j2
# 系统预留设置基于 4c/8g 虚机,最小化安装系统服务,如果使用高性能物理机可以适当增加预留
# 另外,集群安装时候apiserver等资源占用会短时较大,建议至少预留1g内存
SYS_RESERVED_ENABLED: "no"


############################
# role:network [flannel,calico,cilium,kube-ovn,kube-router]
############################
# ------------------------------------------- flannel
# [flannel]设置flannel 后端"host-gw","vxlan"等
FLANNEL_BACKEND: "vxlan"
DIRECT_ROUTING: false

# [flannel] flanneld_image: "quay.io/coreos/flannel:v0.10.0-amd64"
flannelVer: "v0.15.1"
flanneld_image: "easzlab.io.local:5000/easzlab/flannel:{{ flannelVer }}"

# ------------------------------------------- calico
# [calico]设置 CALICO_IPV4POOL_IPIP=“off”,可以提高网络性能,条件限制详见 docs/setup/calico.md
CALICO_IPV4POOL_IPIP: "Always"

# [calico]设置 calico-node使用的host IP,bgp邻居通过该地址建立,可手工指定也可以自动发现
IP_AUTODETECTION_METHOD: "can-reach={{ groups['kube_master'][0] }}"

# [calico]设置calico 网络 backend: brid, vxlan, none
CALICO_NETWORKING_BACKEND: "brid"

# [calico]设置calico 是否使用route reflectors
# 如果集群规模超过50个节点,建议启用该特性
CALICO_RR_ENABLED: false

# CALICO_RR_NODES 配置route reflectors的节点,如果未设置默认使用集群master节点
# CALICO_RR_NODES: ["192.168.1.1", "192.168.1.2"]
CALICO_RR_NODES: []

# [calico]更新支持calico 版本: [v3.3.x] [v3.4.x] [v3.8.x] [v3.15.x]
calico_ver: "v3.19.4"

# [calico]calico 主版本
calico_ver_main: "{{ calico_ver.split('.')[0] }}.{{ calico_ver.split('.')[1] }}"

# ------------------------------------------- cilium
# [cilium]镜像版本
cilium_ver: "1.11.6"
cilium_connectivity_check: true
cilium_hubble_enabled: false
cilium_hubble_ui_enabled: false

# ------------------------------------------- kube-ovn
# [kube-ovn]选择 OVN DB and OVN Control Plane 节点,默认为第一个master节点
OVN_DB_NODE: "{{ groups['kube_master'][0] }}"

# [kube-ovn]离线镜像tar包
kube_ovn_ver: "v1.5.3"

# ------------------------------------------- kube-router
# [kube-router]公有云上存在限制,一般需要始终开启 ipinip;自有环境可以设置为 "subnet"
OVERLAY_TYPE: "full"

# [kube-router]NetworkPolicy 支持开关
FIREWALL_ENABLE: true

# [kube-router]kube-router 镜像版本
kube_router_ver: "v0.3.1"
busybox_ver: "1.28.4"


############################
# role:cluster-addon
############################
# coredns 自动安装
dns_install: "yes"
corednsVer: "1.9.3"
ENABLE_LOCAL_DNS_CACHE: true
dnsNodeCacheVer: "1.21.1"
# 设置 local dns cache 地址
LOCAL_DNS_CACHE: "169.254.20.10"

# metric server 自动安装
metricsserver_install: "yes"
metricsVer: "v0.5.2"

# dashboard 自动安装
dashboard_install: "yes"
dashboardVer: "v2.5.1"
dashboardMetricsScraperVer: "v1.0.8"

# prometheus 自动安装
prom_install: "no"
prom_namespace: "monitor"
prom_chart_ver: "35.5.1"

# nfs-provisioner 自动安装
nfs_provisioner_install: "no"
nfs_provisioner_namespace: "kube-system"
nfs_provisioner_ver: "v4.0.2"
nfs_storage_class: "managed-nfs-storage"
nfs_server: "192.168.1.10"
nfs_path: "/data/nfs"

# network-check 自动安装
network_check_enabled: false
network_check_schedule: "*/5 * * * *"

############################
# role:harbor
############################
# harbor version,完整版本号
HARBOR_VER: "v2.1.3"
HARBOR_DOMAIN: "harbor.easzlab.io.local"
HARBOR_TLS_PORT: 8443

# if set 'false', you need to put certs named harbor.pem and harbor-key.pem in directory 'down'
HARBOR_SELF_SIGNED_CERT: true

# install extra component
HARBOR_WITH_NOTARY: false
HARBOR_WITH_TRIVY: false
HARBOR_WITH_CLAIR: false
HARBOR_WITH_CHARTMUSEUM: true
ezctl
root@k8s-deploy:/etc/kubeasz# ./ezctl
Usage: ezctl COMMAND [args]
-------------------------------------------------------------------------------------
Cluster setups:
    list                 to list all of the managed clusters
    checkout    <cluster>            to switch default kubeconfig of the cluster
    new         <cluster>            to start a new k8s deploy with name 'cluster'
    setup       <cluster>  <step>    to setup a cluster, also supporting a step-by-step way
    start       <cluster>            to start all of the k8s services stopped by 'ezctl stop'
    stop        <cluster>            to stop all of the k8s services temporarily
    upgrade     <cluster>            to upgrade the k8s cluster
    destroy     <cluster>            to destroy the k8s cluster
    backup      <cluster>            to backup the cluster state (etcd snapshot)
    restore     <cluster>            to restore the cluster state from backups
    start-aio                 to quickly setup an all-in-one cluster with 'default' settings

Cluster ops:
    add-etcd    <cluster>  <ip>      to add a etcd-node to the etcd cluster
    add-master  <cluster>  <ip>      to add a master node to the k8s cluster
    add-node    <cluster>  <ip>      to add a work node to the k8s cluster
    del-etcd    <cluster>  <ip>      to delete a etcd-node from the etcd cluster
    del-master  <cluster>  <ip>      to delete a master node from the k8s cluster
    del-node    <cluster>  <ip>      to delete a work node from the k8s cluster

Extra operation:
    kcfg-adm    <cluster>  <args>    to manage client kubeconfig of the k8s cluster

Use "ezctl help <command>" for more information about a given command.
命令集 1:集群安装相关操作

显示当前所有管理的集群

切换默认集群
创建新集群配置
安装新集群
启动临时停止的集群
临时停止某个集群(包括集群内运行的pod)
升级集群k8s组件版本
删除集群
备份集群(仅etcd数据,不包括pv数据和业务应用数据)
从备份中恢复集群
创建单机集群(类似 minikube)
命令集 2:集群节点操作

增加 etcd 节点
增加主节点
增加工作节点
删除 etcd 节点
删除主节点
删除工作节点
命令集3:额外操作

管理客户端kubeconfig

setup

./ezctl setup k8s-cluster1 01
初始化集群节点,进行内核调优,系统加固,签发证书(/etc/kubeasz/clusters/k8s-cluster1/ssl),创建配置文件(/etc/kubeasz/clusters/k8s-cluster1/)
./ezctl setup k8s-cluster1 02 
安装etcd节点
./ezctl setup k8s-cluster1 03
安装container-runtime
./ezctl setup k8s-cluster1  04 
安装master节点
#./ezctl setup   k8s-cluster1 all 表示全部安装
....

available steps:
    01  prepare            to prepare CA/certs & kubeconfig & other system settings
    02  etcd               to setup the etcd cluster
    03  container-runtime  to setup the container runtime(docker or containerd)
    04  kube-master        to setup the master nodes
    05  kube-node          to setup the worker nodes
    06  network            to setup the network plugin
    07  cluster-addon      to setup other useful plugins
    90  all                to run 01~07 all at once
    10  ex-lb              to install external loadbalance for accessing k8s from outside
    11  harbor             to install a new harbor server or to integrate with an existed one

验证etcd集群状态

systemctl status etcd 查看服务状态
journalctl -u etcd 查看运行日志
在任一个etcd集群节点上执行如下命令
# 根据hosts中配置设置shell变量 $NODE_IPS
export NODE_IPS="172.16.251.5 172.16.251.6 "
for ip in ${NODE_IPS}; do
  ETCDCTL_API=3 etcdctl \
  --endpoints=https://${ip}:2379  \
  --cacert=/etc/kubernetes/ssl/ca.pem \
  --cert=/etc/kubernetes/ssl/etcd.pem \
  --key=/etc/kubernetes/ssl/etcd-key.pem \
  endpoint health; done

预期结果

https://192.168.1.1:2379 is healthy: successfully committed proposal: took = 2.210885ms
https://192.168.1.2:2379 is healthy: successfully committed proposal: took = 2.784043ms
https://192.168.1.3:2379 is healthy: successfully committed proposal: took = 3.275709ms

配置docker/containerd信任harbor证书

因为我们创建的harbor仓库使用了自签证书,所以当docker/containerd客户端拉取自建harbor仓库镜像前必须配置信任harbor证书,否则出现如下错误:

docker

$ docker pull harbor.test.lo/pub/hello:v0.1.4

Error response from daemon: Get [https://harbor.test.lo/v1/_ping:](https://harbor.test.lo/v1/_ping:) x509: certificate signed by unknown authority

containerd

$ crictl pull harbor.test.lo/pub/hello:v0.1.4
FATA[0000] pulling image failed: rpc error: code = Unknown desc = failed to resolve image "harbor.test.lo/pub/hello:v0.1.4": no available registry endpoint: failed to do request: Head [https://harbor.test.lo/v2/pub/hello/manifests/v0.1.4:](https://harbor.test.lo/v2/pub/hello/manifests/v0.1.4:) x509: certificate signed by unknown authority

项目脚本11.harbor.yml中已经自动为k8s集群的每个node节点配置 docker/containerd 信任自建 harbor 证书;如果你无法运行此脚本,可以参考下述手工配置(使用受信任的正式证书 SELF_SIGNED_CERT=no 可忽略)

配置信任harbor证书

在集群每个 node 节点进行如下配置

  • 创建目录 /etc/docker/certs.d/harbor.test.lo/ (harbor.test.lo为你的harbor域名)
  • 复制 harbor 安装时的 CA 证书到上述目录,并改名 ca.crt 即可

docker客户端

docker 客户端,需要将harbor证书同步过来,且必须要创建一个/etc/docker/certs.d/<yourdomain.com>

同步harbor证书

harbor# scp /usr/local/src/harbor/certs/myharbor.crt  DockerClientHost:/etc/docker/certs.d/yourdomain.com

containerd客户端

配置文件配置

 
 /etc/containerd/config.toml 
 
 [plugins."io.containerd.grpc.v1.cri".registry.mirrors] #在这一行下面添加如下内容
 [plugins."io.containerd.grpc.v1.cri".registry.mirrors."myharbor.com"]   ## 意思是harbor服务器是myharbor.com
  endpoint = ["https://myharbor.com"]
 [plugins."io.containerd.grpc.v1.cri".registry.configs."myharbor.com".tls]  #跳过证书校验
  insecure_skip_verify = true
 [plugins."io.containerd.grpc.v1.cri".registry.configs."myharbor.com".auth]  #使用账号密码登入
  username = "admin"
  password = "Harbor12345"