Linux: CPU C-states

发布时间 2023-12-08 15:22:53作者: lubanseven

0. Overview

There are various power modes of the CPU which are determined based on their current usage and are collectively called “C-states” or “C-modes.” With CPU C-states, the CPU can enter the idle status to optimize energy consumption.

image

The CPU has a different C-state, and the deeper state means more energy saving. To save power/energy, stop the CPU clock and circuits. ie., so when CPU moving to the work(C0) state from idle state(Cx), it needs more switch time. The deeper the state, the more switch time is needed.

The CPU will switch the state from running to idle automatically, so when the fast path (lower latency) application has bound to the CPU, the application shall keep the busy state (ex: busy-loop) to avoid the state switch of the CPU, and for the slow path (no high require for latency), the application can allow the CPU wake up from idle to running for service.

1. CPU C-state

1.1 Enable CPU C-state

To enable the CPU C-state, an operator can set the max_cstate during BIOS installation. Take an example:

// server 1
# cat /sys/module/intel_idle/parameters/max_cstate
9

// server 2
# cat /sys/module/intel_idle/parameters/max_cstate
0

Let's see, server 1 has enabled a max 9 C-state, and server 2 has set no CPU C-state supported.

Note: It's an example of setting the C-state, more detailed information can refer to the CPU C-states

1.2 CPU C-state latency

We can also read the latency from cpu_dma_latency as:

// server 1
# hexdump -C /dev/cpu_dma_latency
00000000  00 94 35 77                                       |..5w|
00000004
# echo $(( 0x77359400 ))
2000000000

// server 2
# hexdump -C /dev/cpu_dma_latency
00000000  01 00 00 00                                       |....|
00000004
# echo $(( 0x00000001 ))
1

Here server 1 has 2000 seconds latency(from idle to C0 running state), and server 2 has 1 microsecond latency.

1.3 CPU C-states monitor

The cpupower-monitor can monitor the CPU processor and report processor frequency and idle statistics, for example:

# cpupower monitor
    | Nehalem                   || Mperf              || Idle_Stats
 CPU| C3   | C6   | PC3  | PC6   || C0   | Cx   | Freq  || POLL | C1   | C1E  | C6
   0|  0.00|  0.00|  0.00|  0.00||  0.53| 99.47|  2084||  0.00|  0.01| 99.52|  0.00
  24|  0.00|  0.00|  0.00|  0.00||  0.26| 99.74|  2252||  0.00|  0.24| 99.51|  0.00
   1|  0.00|  0.00|  0.00|  0.00||  0.33| 99.67|  2422||  0.00|  0.00| 99.67|  0.00
  25|  0.00|  0.00|  0.00|  0.00||  0.90| 99.10|  2580||  0.01|  0.26| 98.91|  0.00
   2|  0.00|  0.00|  0.00|  0.00||  0.20| 99.80|  1810||  0.00|  0.00| 99.81|  0.00
  26|  0.00|  0.00|  0.00|  0.00||  0.84| 99.16|  2867||  0.01|  0.29| 98.88|  0.00
   3|  0.00|  0.00|  0.00|  0.00||  0.83| 99.17|  2686||  0.01|  0.55| 98.66|  0.00
  27|  0.00|  0.00|  0.00|  0.00||  1.47| 98.53|  2979||  0.00|  0.00| 98.53|  0.00
   4|  0.00|  0.00|  0.00|  0.00||  0.40| 99.60|  1914||  0.00|  0.02| 99.66|  0.00
  28|  0.00|  0.00|  0.00|  0.00||  1.61| 98.39|  2995||  0.00|  0.00| 98.39|  0.00
   5|  0.00|  0.00|  0.00|  0.00||  0.73| 99.27|  2527||  0.00|  0.29| 99.03|  0.00

There are three monitors Nehalem, Mperf and Idle_Stats has monitor the process information. From the report, we can see that most CPUs are in the C1E idle state.

To make the CPU switch to a running(C0) state, we can use stress to make a trial, after stress the CPU load is higher and higher, the more CPUs will switch from the C1E idle state to C0, the detailed information as the CPU C-states.

1.4 CPU C-states driver

To enable the CPU C-states, the hardware driver acpi_idle or intel_idle is needed.

  • "acpi_idle" cpuidle driver: The acpi_idle cpuidle driver retrieves available sleep states (C-states) from the ACPI BIOS tables (from the _CST ACPI function on recent platforms or from the FADT BIOS table on older ones). The C1 state is not retrieved from ACPI tables. If the C1 state is entered, the kernel will call the hlt instruction (or mwait on Intel).
  • "intel_idle" cpuidle driver: In kernel 2.6.36 the intel_idle driver was introduced. It only serves recent Intel CPUs (Nehalem, Westmere, Sandybridge, Atoms or newer). On older Intel CPUs the acpi_idle driver is still used (if the BIOS provides C-state ACPI tables). The intel_idle driver knows the sleep state capabilities of the processor and ignores ACPI BIOS exported processor sleep states tables.

To Check the cpuidle driver from /sys/devices/system/cpu/cpuidle/current_driver as:

// server 1
# cat /sys/devices/system/cpu/cpuidle/current_driver
intel_idle

// server 2
# cat /sys/devices/system/cpu/cpuidle/current_driver
acpi_idle

2. Reference