OCP节点内核参数调整方法
本文以OCP4.11版本为基础,介绍了其修改系统核预留、核隔离、内核参数等配置方法
参数配置前检查
$ oc get tuned -A
$ oc get performanceprofile -A
$ oc get profiles -A
标记节点标签
label node:
[core@bastion install-ocp-hub]$ oc label node worker4.ocp-hub.openlab.com node-role.kubernetes.io/worker-ht=
[core@bastion install-ocp-hub]$ oc get node
NAME STATUS ROLES AGE VERSION
master1.ocp-hub.openlab.com Ready master 6d21h v1.24.0+dc5a2fd
master2.ocp-hub.openlab.com Ready master 6d22h v1.24.0+dc5a2fd
master3.ocp-hub.openlab.com Ready master 6d21h v1.24.0+dc5a2fd
worker1.ocp-hub.openlab.com Ready infra,worker 6d20h v1.24.0+dc5a2fd
worker2.ocp-hub.openlab.com Ready infra,worker 6d20h v1.24.0+dc5a2fd
worker3.ocp-hub.openlab.com Ready infra,worker 6d20h v1.24.0+dc5a2fd
worker4.ocp-hub.openlab.com Ready worker,worker-ht,worker-std 6d20h v1.24.0+dc5a2fd
worker5.ocp-hub.openlab.com Ready worker,worker-ht,worker-std 6d20h v1.24.0+dc5a2fd
创建MCP(Machine configuration pool)
[core@bastion install-ocp-hub]$ cat mcp-worker-ht.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-ht
labels:
machineconfiguration.openshift.io/role: worker-ht
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,worker-ht]
}
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-ht: ""
检查现有的值
[core@bastion install-ocp-hub]$ oc debug node/worker4.ocp-hub.openlab.com
Starting pod/worker4ocp-hubopenlabcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 183.62.100.34
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# sysctl net.unix.max_dgram_qlen
net.unix.max_dgram_qlen = 512 # --> here
sh-4.4# sysctl net.core.somaxconn
net.core.somaxconn = 128 #--> here
修改PP(performanceprofile)文件
创建前可以按照官方文档使用run-perf-profile-creator.sh先生成一个初始配置文件
[core@bastion install-ocp-hub-bak]$ cat run-perf-profile-creator.sh
#!/bin/bash
readonly CONTAINER_RUNTIME=${CONTAINER_RUNTIME:-podman}
readonly CURRENT_SCRIPT=$(basename "$0")
readonly CMD="${CONTAINER_RUNTIME} run --entrypoint performance-profile-creator"
readonly IMG_EXISTS_CMD="${CONTAINER_RUNTIME} image exists"
readonly IMG_PULL_CMD="${CONTAINER_RUNTIME} image pull --tls-verify=false"
readonly MUST_GATHER_VOL="/must-gather"
NTO_IMG="mirror.ocp-hub.openlab.com:8443/registry.redhat.io/openshift4/ose-cluster-node-tuning-operator:v4.11"
MG_TARBALL=""
DATA_DIR=""
usage() {
print "Wrapper usage:"
print " ${CURRENT_SCRIPT} [-h] [-p image][-t path] -- [performance-profile-creator flags]"
print ""
print "Options:"
print " -h help for ${CURRENT_SCRIPT}"
print " -p Node Tuning Operator image"
print " -t path to a must-gather tarball"
${IMG_EXISTS_CMD} "${NTO_IMG}" && ${CMD} "${NTO_IMG}" -h
}
function cleanup {
[ -d "${DATA_DIR}" ] && rm -rf "${DATA_DIR}"
}
trap cleanup EXIT
exit_error() {
print "error: $*"
usage
exit 1
}
print() {
echo "$*" >&2
}
check_requirements() {
${IMG_EXISTS_CMD} "${NTO_IMG}" || ${IMG_PULL_CMD} "${NTO_IMG}" || \
exit_error "Node Tuning Operator image not found"
[ -n "${MG_TARBALL}" ] || exit_error "Must-gather tarball file path is mandatory"
[ -f "${MG_TARBALL}" ] || exit_error "Must-gather tarball file not found"
DATA_DIR=$(mktemp -d -t "${CURRENT_SCRIPT}XXXX") || exit_error "Cannot create the data directory"
tar -zxf "${MG_TARBALL}" --directory "${DATA_DIR}" || exit_error "Cannot decompress the must-gather tarball"
chmod a+rx "${DATA_DIR}"
return 0
}
main() {
while getopts ':hp:t:' OPT; do
case "${OPT}" in
h)
usage
exit 0
;;
p)
NTO_IMG="${OPTARG}"
;;
t)
MG_TARBALL="${OPTARG}"
;;
?)
exit_error "invalid argument: ${OPTARG}"
;;
esac
done
shift $((OPTIND - 1))
check_requirements || exit 1
${CMD} -v "${DATA_DIR}:${MUST_GATHER_VOL}:z" "${NTO_IMG}" "$@" --must-gather-dir-path "${MUST_GATHER_VOL}"
echo "" 1>&2
}
main "$@"
执行并生成初始PP文件
[core@bastion install-ocp-hub-bak] ./run-perf-profile-creator.sh -t must-gather.tar.gz -- --mcp-name=worker-cnf --reserved-cpu-coun2 --rt-kernel=false > my-performance-profile.yaml
level=info msg="Nodes targetted by worker-cnf MCP are: [worker-0.ocp411.lab.upshift.rdu2.redhat.com]"
level=info msg="NUMA cell(s): 1"
level=info msg="NUMA cell 0 : [0 1 2 3]"
level=info msg="CPU(s): 4"
level=info msg="2 reserved CPUs allocated: 0-1 "
level=info msg="2 isolated CPUs allocated: 2-3"
level=info msg="Additional Kernel Args based on configuration: []"
在此基础上进行修改
[core@bastion install-ocp-hub]$ cat worker-ht-performance-profile.yaml
---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: worker-ht-performance-profile
annotations:
kubeletconfig.experimental: |
{"allowedUnsafeSysctls":["net.unix.max_dgram_qlen","net.core.somaxconn"],
"systemReserved": {"memory": "24Gi"}}
spec:
cpu:
isolated: 1-19,21-39,41-59,61-79
reserved: 0,20,40,60
hugepages:
defaultHugepagesSize: "1G"
pages:
- size: "1G"
node: 0
count: 100
- size: "1G"
node: 1
count: 100
machineConfigPoolSelector:
machineconfiguration.openshift.io/role: worker-ht
nodeSelector:
node-role.kubernetes.io/worker-ht: ""
numa:
topologyPolicy: restricted
realTimeKernel:
enabled: false
workloadHints:
highPowerConsumption: false
realTime: false
配置内核参数
通过创建PP文件使内核参数配置生效,这过程节点将会重启
[core@bastion install-ocp-hub]# oc create -f worker-ht-performance-profile.yaml
重启后登录可以查看系统核预留、核隔离、巨页等参数
[core@bastion install-ocp-hub]# oc debug node/worker4.ocp-hub.openlab.com
Starting pod/worker4ocp-hubopenlabcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 183.62.100.34
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# cat /etc/kubernetes/kubelet.conf |grep reservedSystemCPUs --> 这里查看系统核预留
"reservedSystemCPUs": "0,20,40,60",
sh-4.4# cat /proc/cmdline --> 这里查看系统核预留及核隔离
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-07b1c3aee1b4b7007ce2547d70f664dfa7e4684044e36c9c23302bc2cecd445c/vmlinuz-4.18.0-372.26.1.el8_6.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/07b1c3aee1b4b7007ce2547d70f664dfa7e4684044e36c9c23302bc2cecd445c/0 root=UUID=67975151-1eb9-4a99-92cb-921802f7d355 rw rootflags=prjquota boot=UUID=ee730c51-162c-432b-bab0-11262df86f67 skew_tick=1 nohz=on rcu_nocbs=1-19,21-39,41-59,61-79 tuned.non_isolcpus=10000100,00100001 systemd.cpu_affinity=0,40,20,60 intel_iommu=on iommu=pt isolcpus=managed_irq,1-19,21-39,41-59,61-79 default_hugepagesz=1G +
sh-4.4# grep -i huge /proc/meminfo --> 这里查看巨页或内存使用情况
AnonHugePages: 231424 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 200
HugePages_Free: 195
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 209715200 kB
创建pod验证
[core@bastion install-ocp-hub]# cat sysctl-example-unsafe.yaml
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example-unsafe
spec:
containers:
- name: podexample
image: registry.redhat.io/rhel7/rhel
command: ["bin/bash", "-c", "sleep INF"]
securityContext:
runAsUser: 2000
runAsGroup: 3000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
sysctls:
- name: net.unix.max_dgram_qlen
value: "1024"
- name: net.core.somaxconn
value: "256"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
进入pod查看,pod中定制内核参数修改情况
[core@bastion install-ocp-hub]# oc get tuned -A
NAMESPACE NAME AGE
openshift-cluster-node-tuning-operator configuration-netsocket 6d16h
openshift-cluster-node-tuning-operator configuration-netsocket-infra 5d17h
openshift-cluster-node-tuning-operator default 6d23h
openshift-cluster-node-tuning-operator openshift-node-performance-infra-performance-profile 5d18h
openshift-cluster-node-tuning-operator openshift-node-performance-worker-ht-performance-profile 6d16h
openshift-cluster-node-tuning-operator rendered 6d23h
使用Tuned CR配置sysctls内核参数
配置文件中需要指定对应performance-profile
[core@bastion install-ocp-hub-bak]$ cat configuration-netsocket.yaml
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
name: configuration-netsocket
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- data: |
[main]
summary=Configuration changes profile inherited from performance created tuned
include=openshift-node-performance-performance --> 此处include使用oc get tuned -A获得的openshift-node-performance-performance
[sysctl]
net.core.wmem_max = 412992000
net.core.rmem_max = 412992000
net.core.wmem_default = 212992
net.core.rmem_default = 212992
net.ipv4.neigh.default.gc_thresh1 = 10240
name: openshift-configuration-netsocket
recommend:
- match:
- label: node-role.kubernetes.io/worker-ht
- label: node-role.kubernetes.io/worker-std
priority: 20
profile: openshift-configuration-netsocket
应用Tuned文件使内核参数修改生效
[core@bastion install-ocp-hub-bak]$oc create -f configuration-netsocket.yaml