【Kubernetes】Calico CrossSubnet 模式实

发布时间 2023-11-09 09:43:42作者: 技术颜良

网络环境

主机名宿主机 IP 地址
k8s-master1 192.168.3.241
k8s-master2 192.168.3.242
k8s-master3 192.168.3.243
k8s-node1 192.168.32.105

注意:k8s-node1 是 32.0/24 网段,跟其他三个节点不同网段。

部署 Vxlan CroossSubnet

修改 calico.yaml,将 CALICO_IPV4POOL_IPIP 改为 Never,CALICO_IPV4POOL_VXLAN 改为 CrossSubnet。

- name: CALICO_IPV4POOL_IPIP
              value: "Never"
- name: CALICO_IPV4POOL_VXLAN
              value: "CrossSubnet"

执行 kubectl apply -f calico.yaml 命令。

root@k8s-master1:~# kubectl describe ippool default-ipv4-ippool
...
...
Spec:
  Allowed Uses:
    Workload
    Tunnel
  Block Size:     26
  Cidr:           172.16.0.0/16
  Ipip Mode:      Never
  Nat Outgoing:   true
  Node Selector:  all()
  Vxlan Mode:     CrossSubnet # 启用 CroosSubnet 模式
Events:           <none>
root@k8s-master1:~#

待 calico-node 运行起来后,在每个节点都会创建一个 vxlan.calico 网卡。

看下所有节点的 vxlan.calico IP地址以及路由表。

# k8s-master1
root@k8s-master1:~# ip addr show vxlan.calico
11: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 66:04:84:48:ab:6a brd ff:ff:ff:ff:ff:ff
    inet 172.16.159.128/32 scope global vxlan.calico
       valid_lft forever preferred_lft forever
    inet6 fe80::6404:84ff:fe48:ab6a/64 scope link 
       valid_lft forever preferred_lft forever
root@k8s-master1:~#
root@k8s-master1:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.3.1     0.0.0.0         UG    0      0        0 eth0
172.16.36.64    172.16.36.64    255.255.255.192 UG    0      0        0 vxlan.calico
172.16.135.192  192.168.3.243   255.255.255.192 UG    0      0        0 eth0
172.16.159.128  0.0.0.0         255.255.255.192 U     0      0        0 *
172.16.159.145  0.0.0.0         255.255.255.255 UH    0      0        0 cali64d6bf8fbc2
172.16.159.146  0.0.0.0         255.255.255.255 UH    0      0        0 cali65229b3901f
172.16.159.147  0.0.0.0         255.255.255.255 UH    0      0        0 cali371c0d785a0
172.16.159.148  0.0.0.0         255.255.255.255 UH    0      0        0 cali260c8237869
172.16.224.0    192.168.3.242   255.255.255.192 UG    0      0        0 eth0
192.168.3.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.32.0    192.168.3.130   255.255.255.0   UG    0      0        0 eth0
root@k8s-master1:~#

# k8s-master2
root@k8s-master2:~# ip addr show vxlan.calico
11: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 66:6c:9e:74:3f:25 brd ff:ff:ff:ff:ff:ff
    inet 172.16.224.0/32 scope global vxlan.calico
       valid_lft forever preferred_lft forever
    inet6 fe80::646c:9eff:fe74:3f25/64 scope link 
       valid_lft forever preferred_lft forever
root@k8s-master2:~#
root@k8s-master2:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.3.1     0.0.0.0         UG    0      0        0 eth0
172.16.36.64    172.16.36.64    255.255.255.192 UG    0      0        0 vxlan.calico
172.16.135.192  192.168.3.243   255.255.255.192 UG    0      0        0 eth0
172.16.159.128  192.168.3.241   255.255.255.192 UG    0      0        0 eth0
172.16.224.0    0.0.0.0         255.255.255.192 U     0      0        0 *
172.16.224.15   0.0.0.0         255.255.255.255 UH    0      0        0 cali1efce61525b
172.16.224.16   0.0.0.0         255.255.255.255 UH    0      0        0 calib842f790f49
172.16.224.17   0.0.0.0         255.255.255.255 UH    0      0        0 cali274c832c1f9
172.16.224.18   0.0.0.0         255.255.255.255 UH    0      0        0 cali40e7bc80fa0
172.16.224.19   0.0.0.0         255.255.255.255 UH    0      0        0 calid91ffb2ed56
192.168.3.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.32.0    192.168.3.130   255.255.255.0   UG    0      0        0 eth0
root@k8s-master2:~#

# k8s-master3
root@k8s-master3:~# ip addr show vxlan.calico
8: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 66:71:16:28:d9:6c brd ff:ff:ff:ff:ff:ff
    inet 172.16.135.192/32 scope global vxlan.calico
       valid_lft forever preferred_lft forever
    inet6 fe80::6471:16ff:fe28:d96c/64 scope link 
       valid_lft forever preferred_lft forever
root@k8s-master3:~#
root@k8s-master3:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.3.1     0.0.0.0         UG    0      0        0 eth0
172.16.36.64    172.16.36.64    255.255.255.192 UG    0      0        0 vxlan.calico
172.16.135.192  0.0.0.0         255.255.255.192 U     0      0        0 *
172.16.135.201  0.0.0.0         255.255.255.255 UH    0      0        0 calidd1fb919b68
172.16.135.202  0.0.0.0         255.255.255.255 UH    0      0        0 cali70c7ede3931
172.16.159.128  192.168.3.241   255.255.255.192 UG    0      0        0 eth0
172.16.224.0    192.168.3.242   255.255.255.192 UG    0      0        0 eth0
192.168.3.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.32.0    192.168.3.130   255.255.255.0   UG    0      0        0 eth0
root@k8s-master3:~#

# k8s-node1
root@k8s-node1:~# ip addr show vxlan.calico
10: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 66:8c:73:5a:90:99 brd ff:ff:ff:ff:ff:ff
    inet 172.16.36.64/32 scope global vxlan.calico
       valid_lft forever preferred_lft forever
    inet6 fe80::648c:73ff:fe5a:9099/64 scope link 
       valid_lft forever preferred_lft forever
root@k8s-node1:~# 
root@k8s-node1:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.32.1    0.0.0.0         UG    0      0        0 eth0
172.16.36.64    0.0.0.0         255.255.255.192 U     0      0        0 *
172.16.36.80    0.0.0.0         255.255.255.255 UH    0      0        0 caliedeb416dbd0
172.16.36.81    0.0.0.0         255.255.255.255 UH    0      0        0 cali003b4fb9240
172.16.135.192  172.16.135.192  255.255.255.192 UG    0      0        0 vxlan.calico
172.16.159.128  172.16.159.128  255.255.255.192 UG    0      0        0 vxlan.calico
172.16.224.0    172.16.224.0    255.255.255.192 UG    0      0        0 vxlan.calico
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.3.0     192.168.32.130  255.255.255.0   UG    0      0        0 eth0
192.168.32.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
root@k8s-node1:~#

k8s-master1 跟 k8s-master2、k8s-master3 是同一个网段,从路由表来看通信是由 eth0 负责的,而 k8s-node1 是不同网段的,需要 vxlan 封装再传输出去。

小提示:所有节点的路由表都是动态维护的,如果其中一个节点的路由表有缺失会自动补全,人为删除也会自动补回来。

 

据说 CrossSubnet 模式会自行判断是用 vxlan 还是 bgp,比如跨网段用 vxlan,同网段用 bgp。

我是实战派,通过实践来了解一下通信流程,建议结合 tcpdump 抓包。

同网段之间通信流程

按照 CrossSubnet 模式的工作原理,同网段之间通信,不需要做 vxlan 封包,所以不需要抓 vxlan.calico。

预先在 k8s-master1 上部署 tcpdump 抓包。

tcpdump -i cali64d6bf8fbc2 -nnet icmp # cali64d6bf8fbc2 是 Pod-1 的网卡
tcpdump -i eth0 -nnet icmp

简单分析一下。数据包从 Pod-1 网卡到宿主机 eth0 网卡,IP 地址都没有改变,但是 MAC 地址变了。

这两个 MAC 地址正是 k8s-master1 和 k8s-master2 的。

root@k8s-master1:~# tcpdump -i cali64d6bf8fbc2 -nnet icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cali64d6bf8fbc2, link-type EN10MB (Ethernet), capture size 262144 bytes
5e:62:2b:8c:4d:9d > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: 172.16.159.145 > 172.16.224.15: ICMP echo request, id 106, seq 0, length 64
ee:ee:ee:ee:ee:ee > 5e:62:2b:8c:4d:9d, ethertype IPv4 (0x0800), length 98: 172.16.224.15 > 172.16.159.145: ICMP echo reply, id 106, seq 0, length 64

#
root@k8s-master1:~# tcpdump -i eth0 -nnet icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
00:15:5d:03:0b:0a > 00:15:5d:03:0b:0b, ethertype IPv4 (0x0800), length 98: 172.16.159.145 > 172.16.224.15: ICMP echo request, id 106, seq 0, length 64
00:15:5d:03:0b:0b > 00:15:5d:03:0b:0a, ethertype IPv4 (0x0800), length 98: 172.16.224.15 > 172.16.159.145: ICMP echo reply, id 106, seq 0, length 64

# k8s-master1
root@k8s-master1:~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:03:0b:0a brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.241/24 brd 192.168.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.3.240/24 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fe03:b0a/64 scope link 
       valid_lft forever preferred_lft forever
root@k8s-master1:~# 

# k8s-master2
root@k8s-master2:~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:03:0b:0b brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.242/24 brd 192.168.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fe03:b0b/64 scope link 
       valid_lft forever preferred_lft forever
root@k8s-master2:~#

为何使用 Pod IP 地址就能完成通信?重点是 MAC 地址,当数据包到达对方后,链路层负责 MAC 地址,解包后得到 Pod IP 地址,交由网络层负责,最终通过判断路由表将数据包转发给指定网卡。

小结:同一网段的节点之间只需要使用 Pod IP地址就能通信。

注意事项:某些云厂商的网卡有开启源目的检查功能,服务器之间不能使用 Pod IP 地址通信,解决办法也简单,关闭该功能即可。

不同网段之间通信流程

不同网段之间通信需要做 vxlan.calico 封包,需要抓取经过 vxlan.calico 的数据包。

预先在 k8s-msater1 上部署 tcpdump 抓包。

tcpdump -i cali64d6bf8fbc2 -nnet icmp
tcpdump -i vxlan.calico -nnet icmp
tcpdump -i eth0 -nnet udp

数据包从 Pod-1 网卡到 vxlan.calico 网卡,IP 地址没变,MAC 地址变化了。然后再到宿主机 eth0 网卡,做了 VXLAN 封包,IP 地址和 MAC 地址都变成了宿主机的,往目的端口 4789 传输数据包,VXLAN 里面包含了 Pod 的 IP 地址。

root@k8s-master1:~# tcpdump -i cali64d6bf8fbc2 -nnet icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cali64d6bf8fbc2, link-type EN10MB (Ethernet), capture size 262144 bytes
5e:62:2b:8c:4d:9d > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: 172.16.159.145 > 172.16.36.80: ICMP echo request, id 111, seq 0, length 64
ee:ee:ee:ee:ee:ee > 5e:62:2b:8c:4d:9d, ethertype IPv4 (0x0800), length 98: 172.16.36.80 > 172.16.159.145: ICMP echo reply, id 111, seq 0, length 64

#
root@k8s-master1:~# tcpdump -i vxlan.calico -nnet icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vxlan.calico, link-type EN10MB (Ethernet), capture size 262144 bytes
66:04:84:48:ab:6a > 66:8c:73:5a:90:99, ethertype IPv4 (0x0800), length 98: 172.16.159.145 > 172.16.36.80: ICMP echo request, id 111, seq 0, length 64
66:8c:73:5a:90:99 > 66:04:84:48:ab:6a, ethertype IPv4 (0x0800), length 98: 172.16.36.80 > 172.16.159.145: ICMP echo reply, id 111, seq 0, length 64

#
root@k8s-master1:~# tcpdump -i eth0 -nnet udp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
00:15:5d:03:0b:0a > 00:15:5d:03:0b:12, ethertype IPv4 (0x0800), length 148: 192.168.3.241.56247 > 192.168.32.105.4789: VXLAN, flags [I] (0x08), vni 4096
66:04:84:48:ab:6a > 66:8c:73:5a:90:99, ethertype IPv4 (0x0800), length 98: 172.16.159.145 > 172.16.36.80: ICMP echo request, id 111, seq 0, length 64
00:15:5d:03:0b:12 > 00:15:5d:03:0b:0a, ethertype IPv4 (0x0800), length 148: 192.168.32.105.49738 > 192.168.3.241.4789: VXLAN, flags [I] (0x08), vni 4096
66:8c:73:5a:90:99 > 66:04:84:48:ab:6a, ethertype IPv4 (0x0800), length 98: 172.16.36.80 > 172.16.159.145: ICMP echo reply, id 111, seq 0, length 64

当数据包到达对方后,先做 VXLAN 解包得到 Pod IP 地址,然后根据路由表将数据包转发给指定的网卡。

小结:不同网段的节点之间通信,需要做 VXLAN 封包