K8s安装（单节点部署可行性确定）

本片文章参考如何用 Kubeadm 在 Debian 11 上安装 Kubernetes 集群、使用kubeadm安装单节点Kubernetes、使用阿里云源解决k8s安装拉取镜像失败的问题、解决crictl不能给镜像打tag的问题、k8s-国内源安装、安装 Docker

介绍

Kubernetes（简称k8s）是Google在2014年6月开源的一个容器集群管理系统，使用Go语言开发，用于管理云平台中多个主机上的容器化的应用，Kubernetes的目标是让部署容器化的应用简单并且高效,Kubernetes提供了资源调度、部署管理、服务发现、扩容缩容、监控，维护等一整套功能。努力成为跨主机集群的自动部署、扩展以及运行应用程序容器的平台。它支持一系列容器工具, 包括Docker等。

安装前准备

已安装 Debian 12
2 CPU / vCPU
2 GB RAM
20 GB 空闲硬盘空间
有管理员权限的用户
稳定的网络连接

本次主要是单节点安装，集群安装仅参考网络教程，未标注集群的均为单节点和集群所需操作，反之依然。

在本文中，我使用了 3 个 Debian 11 系统的节点，配置如下（来自使用kubeadm安装单节点Kubernetes）

主控节点（k8s-master） – 192.168.1.236
工作节点 1（k8s-worker1） – 192.168.1.237
工作节点 2（k8s-worker2） – 192.168.1.238
事不宜迟，我们直接进入安装步骤。

安装

设置主机名和更新 /etc/hosts 文件（集群）

在主控节点和工作节点上使用 hostnamectl 命令来设置主机名：

hostnamectl set-hostname "k8s-master"       // 在主控节点运行
hostnamectl set-hostname "k8s-worker1"      // 在工作节点 1 运行
hostnamectl set-hostname "k8s-worker2"      // 在工作节点 2 运行

在所有节点的 /etc/hosts 文件末尾添加下面几行内容：

192.168.1.236       k8s-master
192.168.1.237       k8s-worker1
192.168.1.238       k8s-worker2

在所有节点上关闭交换分区

推荐关闭交换分区，以便更丝滑地使用 kubelet。在所有节点上执行以下命令来关闭交换分区（不关容易出bug）：

swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# 查看swap,确保swap那行为0
free -m

关闭防火墙

可以暴力一点

systemctl stop firewalld
systemctl disable firewalld

如果像 Ubuntu Server，可以 ufw 控制
在主控节点，执行：

ufw allow 6443/tcp
ufw allow 2379/tcp
ufw allow 2380/tcp
ufw allow 10250/tcp
ufw allow 10251/tcp
ufw allow 10252/tcp
ufw allow 10255/tcp
ufw reload

在工作节点，执行：

ufw allow 10250/tcp
ufw allow 30000:32767/tcp
ufw reload

当然，最小安装Debian是没有防火墙的

在所有节点安装 Containerd 或者 Docker

Containerd安装

Containerd 是容器运行时的行业标准，所有节点必须安装 Containerd。

先在所有节点上配置如下的核心参数，再安装 Containerd。

# 写入配置
cat <<EOF | tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

# 加载模块
modprobe overlay
modprobe br_netfilter

# 设置系统
cat <<EOF | tee /etc/sysctl.d/99-kubernetes-k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# 使设置生效
sysctl --system

现在，在所有节点上运行如下 apt 命令来安装 Conatinerd。

apt  update
apt -y install containerd

在所有节点上运行如下命令来配置 Containerd：

containerd config default | tee /etc/containerd/config.toml >/dev/null 2>&1

在所有节点上设置 cgroupdriver 为 systemd，编辑 /etc/containerd/config.toml 文件，找到 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] 部分，修改一行内容：SystemdCgroup = true：
（这部分设置可以参考容器运行时）

vi /etc/containerd/config.toml

保存并退出文件。

在所有节点上重启并启用 containerd 服务：

systemctl restart containerd
systemctl enable containerd

小bug

这个时候运行 crictl info 会报错，原因是嵌套调用地址错误

crictl config runtime-endpoint unix:///var/run/containerd/containerd.sock

Docker安装

自动安装 Docker

curl -fsSL https://get.docker.com -o get-docker.sh
DOWNLOAD_URL=https://mirrors.ustc.edu.cn/docker-ce sh get-docker.sh

手动安装 Docker

# 卸载旧版
apt-get remove docker docker-engine docker.io

# 安装所需软件
apt update
apt install -y apt-transport-https ca-certificates curl gnupg lsb-release

# 添加GPG密钥
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/debian/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# 添加软件源
## 国内源
echo"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://mirrors.aliyun.com/docker-ce/linux/debian $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null


## 官方源
#  echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

# 安装 `Docker`
apt update
apt install docker-ce docker-ce-cli containerd.io

# 安装 `Docker-compose` （可选，非必要）
curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose

# 启动 `Docker`
systemctl enable docker
systemctl start docker

docker的cgroup driver默认是cgroupfs，需要更改为systemed
使用docker info命令查看：

docker info | grep 'Cgroup Driver'

如果显示的是cgroupfs，那么需要进行更改
改动方法：创建 /etc/docker/daemon.json，编辑内容为：

# 编辑
nano /etc/docker/daemon.json

# 写入
{
   "exec-opts": ["native.cgroupdriver=systemd"]
}

# 重启 `Docker`
systemctl restart docker

添加 Kubernetes Apt 库

安装依赖

apt update
apt install -y apt-transport-https ca-certificates curl gnupg gnupg2 curl software-properties-common

执行以下命令，添加 Kubernetes Apt 库：
国内源：

# 添加并信任APT证书
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -

# 添加源地址
add-apt-repository "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main"

国外源

# 添加并信任APT证书
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | gpg --dearmour -o /etc/apt/trusted.gpg.d/cgoogle.gpg

# 添加源地址
apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

在所有节点上安装 kubelet、kubectl 和 kubeadm

在所有节点上执行以下 apt 命令，安装 Kubernetes 集群组件，如 kubelet、kubectl 以及 kubeadm。

apt update
apt install kubelet kubeadm kubectl -y
# 锁定软件包 不更新 不删除
apt-mark hold kubelet kubeadm kubectl

使用 Kubeadm 创建 Kubernetes

单节点安装

使用kubeadm安装kubernetes
使用国外镜像

kubeadm init

使用国内镜像（有bug，请看下面）

kubeadm init –image-repository=registry.aliyuncs.com/google_containers

安装完成后输入

kubectl get nodes

若返回

NAME   STATUS     ROLES           AGE   VERSION
k8s    NotReady   control-plane   57m   v1.28.2

即为成功

如果失败，需要重新设置

kubeadm reset

国内源bug

使用了国内源，使用命令 kubectl get nodes 可能会提示

E1101 17:40:41.670644    3144 memcache.go:265] couldn't get current server API group list: Get "https://172.25.10.129:6443/api?timeout=32s": dial tcp 172.25.10.129:6443: connect: connection refused

原因可能是镜像名错误，所以我们需要一个一个改镜像名（真烦）
正确镜像名为

registry.k8s.io/kube-apiserver:v1.28.3
registry.k8s.io/kube-controller-manager:v1.28.3
registry.k8s.io/kube-scheduler:v1.28.3
registry.k8s.io/kube-proxy:v1.28.3
registry.k8s.io/pause:3.9
registry.k8s.io/etcd:3.5.9-0
registry.k8s.io/coredns/coredns:v1.10.1

更改镜像名

# containerd
ctr -n k8s.io i tag registry.aliyuncs.com/google_containers/coredns:v1.10.1 k8s.gcr.io/coredns:v1.10.1

ctr -n k8s.io i tag registry.aliyuncs.com/google_containers/etcd:3.5.9-0  k8s.gcr.io/etcd:3.5.9-0

ctr -n k8s.io i tag  registry.aliyuncs.com/google_containers/coredns:v1.10.1  k8s.gcr.io/coredns:v1.10.1

ctr -n k8s.io i tag  registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.3  k8s.gcr.io/kube-apiserver:v1.28.3

ctr -n k8s.io i tag  registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.3  k8s.gcr.io/kube-controller-manager:v1.28.3

ctr -n k8s.io i tag  registry.aliyuncs.com/google_containers/kube-proxy:v1.28.3  k8s.gcr.io/kube-proxy:v1.28.3

ctr -n k8s.io i tag  registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.3  k8s.gcr.io/kube-scheduler:v1.28.3

ctr -n k8s.io i tag  registry.aliyuncs.com/google_containers/pause:3.9  k8s.gcr.io/pause:3.9


# docker
docker tag registry.aliyuncs.com/google_containers/coredns:v1.10.1 k8s.gcr.io/coredns:v1.10.1

docker tag registry.aliyuncs.com/google_containers/etcd:3.5.9-0  k8s.gcr.io/etcd:3.5.9-0

docker tag  registry.aliyuncs.com/google_containers/coredns:v1.10.1  k8s.gcr.io/coredns:v1.10.1

docker tag  registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.3  k8s.gcr.io/kube-apiserver:v1.28.3

docker tag  registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.3  k8s.gcr.io/kube-controller-manager:v1.28.3

docker tag  registry.aliyuncs.com/google_containers/kube-proxy:v1.28.3  k8s.gcr.io/kube-proxy:v1.28.3

docker tag  registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.3  k8s.gcr.io/kube-scheduler:v1.28.3

docker tag  registry.aliyuncs.com/google_containers/pause:3.9  k8s.gcr.io/pause:3.9

重置k8s

kubeadm reset

安装k8s

kubeadm init

集群安装

现在我们可以创建 Kubernetes 集群了，在主控节点上执行以下命令：

kubeadm init --control-plane-endpoint=k8s-master

要开始与集群进行交互，请在主控节点上运行以下命令：

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

执行以下 kubectl 命令来获取节点和集群的信息：

kubectl get nodes
kubectl cluster-info

通过执行 kubeadm join 命令来把两个工作节点加入到集群。
注意：请从 kubeadm init 命令的输出中复制完整的命令。在我的例子中，命令如下：

# 在工作节点运行
kubeadm join k8s-master:6443 --token ta622t.enl212euq7z87mgj \
  --discovery-token-ca-cert-hash sha256:2be58f54458d0e788c96b8841f811069019161f9a3dd8502a38c773e5c6ead17

在主控节点上执行以下命令，检查节点的状态：

kubectl get nodes
NAME          STATUS     ROLES           AGE     VERSION
k8s-master    NotReady   control-plane   23m     v1.25.0
k8s-worker1   NotReady   <none>          9m27s   v1.25.0
k8s-worker2   NotReady   <none>          2m19s   v1.25.0

安装 Calico Pod 网络插件（不清楚是否必要安装）

在主控节点上执行以下命令安装 Calico：

kubectl apply -f https://projectcalico.docs.tigera.io/manifests/calico.yaml

在所有节点上执行以下命令，配置防火墙允许 Calico 的端口：

ufw allow 179/tcp
ufw allow 4789/udp
ufw allow 51820/udp
ufw allow 51821/udp
ufw allow 4789/udp
ufw reload

执行以下命令检查下 Calico 的状态：

kubectl get pods -n kube-system

# 显示为以下
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-7ddc4f45bc-bh8g2   1/1     Running   0          12m
calico-node-l4kj2                          1/1     Running   0          12m
coredns-5dd5756b68-6vrfv                   1/1     Running   0          14m
coredns-5dd5756b68-zhhrp                   1/1     Running   0          14m
etcd-k8s                                   1/1     Running   28         14m
kube-apiserver-k8s                         1/1     Running   27         14m
kube-controller-manager-k8s                1/1     Running   18         14m
kube-proxy-sn6wm                           1/1     Running   0          14m
kube-scheduler-k8s                         1/1     Running   19         14m

完美！现在再检查下节点状态：

kubectl get nodes

# 显示为ready即为成功
NAME   STATUS   ROLES           AGE   VERSION
k8s    Ready    control-plane   15m   v1.28.2

检查 Kubernetes 集群安装是否正确

我们尝试通过 deployment 命令来部署基于 Nginx 的应用程序，来验证 Kubernetes 集群的安装是否正确。执行以下命令：

kubectl create deployment nginx-app --image=nginx --replicas 2
kubectl expose deployment nginx-app --name=nginx-web-svc --type NodePort --port 80 --target-port 80
kubectl describe svc nginx-web-svc

# 输出结果为
┌─(~)─────────────────────────────────────────────────────────────────────────────────────(ROOT@K8S:pts/0)─┐
└─(10:49:20)──> kubectl create deployment nginx-app --image=nginx --replicas 2           1 ↵ ──(五,11月03)─┘
deployment.apps/nginx-app created
┌─(~)─────────────────────────────────────────────────────────────────────────────────────(ROOT@K8S:pts/0)─┐
└─(10:50:45)──> kubectl expose deployment nginx-app --name=nginx-web-svc --type NodePort --port 80 --target-port 80
service/nginx-web-svc exposed
┌─(~)─────────────────────────────────────────────────────────────────────────────────────(ROOT@K8S:pts/0)─┐
└─(10:50:52)──> kubectl describe svc nginx-web-svc                                           ──(五,11月03)─┘
Name:                     nginx-web-svc
Namespace:                default
Labels:                   app=nginx-app
Annotations:              <none>
Selector:                 app=nginx-app
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.97.255.96
IPs:                      10.97.255.96
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  31228/TCP
Endpoints:                <none>
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

使用以下的 curl 命令通过节点端口 30036 来访问基于 nginx 的应用程序。
注意：在 curl 命令中，可以使用两个工作节点任一的主机名。

curl http://k8s-worker1:nodeport

# 结果为
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

以上的输出说明我们可以正常访问基于 nginx 的应用程序了。

K8s常用命令

kubectl apply -f <yaml_file> 添加应用
kubectl delete -f <yaml_file> 删除应用
kubectl get all 查看 default 里所有信息，--all-namespaces 查看所有 namespace 信息
kubectl get namespaces 查看 namespaces 信息
kubectl get svc 查看服务的详细信息，显示了服务名称，类型，集群ip，端口，时间等信息，-n namespace 指定 namespace，-ALL_namespace 显示所有 namespace
kubectl get nodes 查看节点及其状态
kubectl get pods / deployment 查看 namespace 为 default 的 pod / deployment，-n namespace 指定 namespace，-ALL_namespace 显示所有 namespace里的 pod / 4. deployment
kubectl cluster-info 显示集群信息
kubectl describe pod / deployment <name> 查看 pod / deployment 详细信息，-n namespace 指定 namespace
kubectl describe nodes <name> 查看节点详情
kubectl exec -it <pod-name> -- <shell> 进入 pod 并开启终端
kubectl config view 查看配置信息

Debug

node NotReady

安装完成后，使用 kubectl get nodes 查看发现状态为 NotReady
使用 kubectl describe ndoes 查看报错信息，有如下输出，可以看到问题是 network plugin is not ready: cni config uninitialized

...
Conditions:
 Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
 ----             ------  -----------------                 ------------------                ------                       -------
 MemoryPressure   False   Sat, 04 Sep 2021 10:16:25 +0800   Sat, 04 Sep 2021 09:15:45 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
 DiskPressure     False   Sat, 04 Sep 2021 10:16:25 +0800   Sat, 04 Sep 2021 09:15:45 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
 PIDPressure      False   Sat, 04 Sep 2021 10:16:25 +0800   Sat, 04 Sep 2021 09:15:45 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
 Ready            False   Sat, 04 Sep 2021 10:16:25 +0800   Sat, 04 Sep 2021 09:15:45 +0800   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
...

解决方法：安装weave
根据Integrating Kubernetes via the Addon，只需运行如下命令:

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

# 再次查看nodes
kubectl get nodes

node(s) had taints that the pod didn’t tolerate

这个问题是在创建pod时出现的

kubectl describe pod (pod名称)

# 提示有
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  31s   default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..

这是因为kubernetes默认不许往master安装，强制允许

kubectl taint nodes --all node-role.kubernetes.io/master-

如有使用此命令提示 NOT Found

error: taint "node-role.kubernetes.io/master" not found

则使用

kubectl get no -o yaml | grep taint -A 5

# 查看key
    taints:
    - effect: NoSchedule
      key: node-role.kubernetes.io/control-plane
  status:
    addresses:
    - address: 172.25.10.129

我这边角色就是 node-role.kubernetes.io/control-plane，正确命令是

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

# 运行完成提示
node/k8s untainted

再次查看pod详情

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  2m4s  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled         8s    default-scheduler  Successfully assigned default/nginx-deployment-54bbf55b54-zbp5f to k8s
  Normal   Pulling           8s    kubelet            Pulling image "nginx:1.7.9"