01 配置OCP节点Kernal 与sysctls参数

发布时间 2023-03-29 10:29:37作者: 老钱学技术

OCP节点内核参数调整方法

本文以OCP4.11版本为基础,介绍了其修改系统核预留、核隔离、内核参数等配置方法

参数配置前检查

$ oc get tuned -A
$ oc get performanceprofile -A
$ oc get profiles -A

标记节点标签

label node:

[core@bastion install-ocp-hub]$ oc label node worker4.ocp-hub.openlab.com node-role.kubernetes.io/worker-ht=
[core@bastion install-ocp-hub]$ oc get node
NAME                          STATUS   ROLES                         AGE     VERSION
master1.ocp-hub.openlab.com   Ready    master                        6d21h   v1.24.0+dc5a2fd
master2.ocp-hub.openlab.com   Ready    master                        6d22h   v1.24.0+dc5a2fd
master3.ocp-hub.openlab.com   Ready    master                        6d21h   v1.24.0+dc5a2fd
worker1.ocp-hub.openlab.com   Ready    infra,worker                  6d20h   v1.24.0+dc5a2fd
worker2.ocp-hub.openlab.com   Ready    infra,worker                  6d20h   v1.24.0+dc5a2fd
worker3.ocp-hub.openlab.com   Ready    infra,worker                  6d20h   v1.24.0+dc5a2fd
worker4.ocp-hub.openlab.com   Ready    worker,worker-ht,worker-std   6d20h   v1.24.0+dc5a2fd
worker5.ocp-hub.openlab.com   Ready    worker,worker-ht,worker-std   6d20h   v1.24.0+dc5a2fd

创建MCP(Machine configuration pool)

[core@bastion install-ocp-hub]$ cat mcp-worker-ht.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-ht
  labels:
    machineconfiguration.openshift.io/role: worker-ht
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,worker-ht]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-ht: ""

检查现有的值

[core@bastion install-ocp-hub]$ oc debug node/worker4.ocp-hub.openlab.com
Starting pod/worker4ocp-hubopenlabcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 183.62.100.34
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# sysctl net.unix.max_dgram_qlen
net.unix.max_dgram_qlen = 512    # --> here
sh-4.4# sysctl net.core.somaxconn
net.core.somaxconn = 128     #--> here

修改PP(performanceprofile)文件

创建前可以按照官方文档使用run-perf-profile-creator.sh先生成一个初始配置文件

[core@bastion install-ocp-hub-bak]$ cat run-perf-profile-creator.sh
#!/bin/bash

readonly CONTAINER_RUNTIME=${CONTAINER_RUNTIME:-podman}
readonly CURRENT_SCRIPT=$(basename "$0")
readonly CMD="${CONTAINER_RUNTIME} run --entrypoint performance-profile-creator"
readonly IMG_EXISTS_CMD="${CONTAINER_RUNTIME} image exists"
readonly IMG_PULL_CMD="${CONTAINER_RUNTIME} image pull --tls-verify=false"
readonly MUST_GATHER_VOL="/must-gather"

NTO_IMG="mirror.ocp-hub.openlab.com:8443/registry.redhat.io/openshift4/ose-cluster-node-tuning-operator:v4.11"
MG_TARBALL=""
DATA_DIR=""

usage() {
  print "Wrapper usage:"
  print "  ${CURRENT_SCRIPT} [-h] [-p image][-t path] -- [performance-profile-creator flags]"
  print ""
  print "Options:"
  print "   -h                 help for ${CURRENT_SCRIPT}"
  print "   -p                 Node Tuning Operator image"
  print "   -t                 path to a must-gather tarball"

  ${IMG_EXISTS_CMD} "${NTO_IMG}" && ${CMD} "${NTO_IMG}" -h
}

function cleanup {
  [ -d "${DATA_DIR}" ] && rm -rf "${DATA_DIR}"
}
trap cleanup EXIT

exit_error() {
  print "error: $*"
  usage
  exit 1
}

print() {
  echo  "$*" >&2
}

check_requirements() {
  ${IMG_EXISTS_CMD} "${NTO_IMG}" || ${IMG_PULL_CMD} "${NTO_IMG}" || \
      exit_error "Node Tuning Operator image not found"

  [ -n "${MG_TARBALL}" ] || exit_error "Must-gather tarball file path is mandatory"
  [ -f "${MG_TARBALL}" ] || exit_error "Must-gather tarball file not found"

  DATA_DIR=$(mktemp -d -t "${CURRENT_SCRIPT}XXXX") || exit_error "Cannot create the data directory"
  tar -zxf "${MG_TARBALL}" --directory "${DATA_DIR}" || exit_error "Cannot decompress the must-gather tarball"
  chmod a+rx "${DATA_DIR}"

  return 0
}

main() {
  while getopts ':hp:t:' OPT; do
    case "${OPT}" in
      h)
        usage
        exit 0
        ;;
      p)
        NTO_IMG="${OPTARG}"
        ;;
      t)
        MG_TARBALL="${OPTARG}"
        ;;
      ?)
        exit_error "invalid argument: ${OPTARG}"
        ;;
    esac
  done
  shift $((OPTIND - 1))

  check_requirements || exit 1

  ${CMD} -v "${DATA_DIR}:${MUST_GATHER_VOL}:z" "${NTO_IMG}" "$@" --must-gather-dir-path "${MUST_GATHER_VOL}"
  echo "" 1>&2
}

main "$@"

执行并生成初始PP文件

[core@bastion install-ocp-hub-bak] ./run-perf-profile-creator.sh -t must-gather.tar.gz -- --mcp-name=worker-cnf --reserved-cpu-coun2 --rt-kernel=false > my-performance-profile.yaml
level=info msg="Nodes targetted by worker-cnf MCP are: [worker-0.ocp411.lab.upshift.rdu2.redhat.com]"
level=info msg="NUMA cell(s): 1"
level=info msg="NUMA cell 0 : [0 1 2 3]"
level=info msg="CPU(s): 4"
level=info msg="2 reserved CPUs allocated: 0-1 "
level=info msg="2 isolated CPUs allocated: 2-3"
level=info msg="Additional Kernel Args based on configuration: []"

在此基础上进行修改

[core@bastion install-ocp-hub]$ cat worker-ht-performance-profile.yaml
---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: worker-ht-performance-profile
  annotations:
   kubeletconfig.experimental: |
     {"allowedUnsafeSysctls":["net.unix.max_dgram_qlen","net.core.somaxconn"],
      "systemReserved": {"memory": "24Gi"}}
spec:
  cpu:
    isolated: 1-19,21-39,41-59,61-79
    reserved: 0,20,40,60
  hugepages:
   defaultHugepagesSize: "1G"
   pages:
   - size: "1G"
     node: 0
     count: 100
   - size: "1G"
     node: 1
     count: 100
  machineConfigPoolSelector:
    machineconfiguration.openshift.io/role: worker-ht
  nodeSelector:
    node-role.kubernetes.io/worker-ht: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: false
  workloadHints:
    highPowerConsumption: false
    realTime: false

配置内核参数

通过创建PP文件使内核参数配置生效,这过程节点将会重启
[core@bastion install-ocp-hub]# oc create -f worker-ht-performance-profile.yaml
重启后登录可以查看系统核预留、核隔离、巨页等参数

[core@bastion install-ocp-hub]# oc debug node/worker4.ocp-hub.openlab.com
Starting pod/worker4ocp-hubopenlabcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 183.62.100.34
If you don't see a command prompt, try pressing enter.

sh-4.4# chroot /host
sh-4.4# cat /etc/kubernetes/kubelet.conf |grep reservedSystemCPUs     --> 这里查看系统核预留
  "reservedSystemCPUs": "0,20,40,60",
sh-4.4# cat /proc/cmdline         --> 这里查看系统核预留及核隔离
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-07b1c3aee1b4b7007ce2547d70f664dfa7e4684044e36c9c23302bc2cecd445c/vmlinuz-4.18.0-372.26.1.el8_6.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/07b1c3aee1b4b7007ce2547d70f664dfa7e4684044e36c9c23302bc2cecd445c/0 root=UUID=67975151-1eb9-4a99-92cb-921802f7d355 rw rootflags=prjquota boot=UUID=ee730c51-162c-432b-bab0-11262df86f67 skew_tick=1 nohz=on rcu_nocbs=1-19,21-39,41-59,61-79 tuned.non_isolcpus=10000100,00100001 systemd.cpu_affinity=0,40,20,60 intel_iommu=on iommu=pt isolcpus=managed_irq,1-19,21-39,41-59,61-79 default_hugepagesz=1G +
sh-4.4# grep -i huge /proc/meminfo          --> 这里查看巨页或内存使用情况
AnonHugePages:    231424 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:     200
HugePages_Free:      195
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        209715200 kB

创建pod验证

[core@bastion install-ocp-hub]# cat sysctl-example-unsafe.yaml
apiVersion: v1
kind: Pod
metadata:
  name: sysctl-example-unsafe
spec:
  containers:
  - name: podexample
    image: registry.redhat.io/rhel7/rhel
    command: ["bin/bash", "-c", "sleep INF"]
    securityContext:
      runAsUser: 2000
      runAsGroup: 3000
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
    sysctls:
    - name: net.unix.max_dgram_qlen
      value: "1024"
    - name: net.core.somaxconn
      value: "256"
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""

进入pod查看,pod中定制内核参数修改情况

[core@bastion install-ocp-hub]# oc get tuned -A
NAMESPACE                                NAME                                                       AGE
openshift-cluster-node-tuning-operator   configuration-netsocket                                    6d16h
openshift-cluster-node-tuning-operator   configuration-netsocket-infra                              5d17h
openshift-cluster-node-tuning-operator   default                                                    6d23h
openshift-cluster-node-tuning-operator   openshift-node-performance-infra-performance-profile       5d18h
openshift-cluster-node-tuning-operator   openshift-node-performance-worker-ht-performance-profile   6d16h
openshift-cluster-node-tuning-operator   rendered                                                   6d23h

使用Tuned CR配置sysctls内核参数

配置文件中需要指定对应performance-profile

[core@bastion install-ocp-hub-bak]$ cat configuration-netsocket.yaml
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: configuration-netsocket
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Configuration changes profile inherited from performance created tuned
      include=openshift-node-performance-performance        --> 此处include使用oc get tuned -A获得的openshift-node-performance-performance
      [sysctl]
      net.core.wmem_max = 412992000
      net.core.rmem_max = 412992000
      net.core.wmem_default = 212992
      net.core.rmem_default = 212992
      net.ipv4.neigh.default.gc_thresh1 = 10240
    name: openshift-configuration-netsocket
  recommend:
  - match:
    - label: node-role.kubernetes.io/worker-ht
    - label: node-role.kubernetes.io/worker-std
    priority: 20
    profile: openshift-configuration-netsocket

应用Tuned文件使内核参数修改生效
[core@bastion install-ocp-hub-bak]$oc create -f configuration-netsocket.yaml