Transparent Hugepage Support (翻译 by chatgpt)

发布时间 2023-12-01 19:33:39作者: 摩斯电码

原文:https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html

Objective

目标

Performance critical computing applications dealing with large memory working sets are already running on top of libhugetlbfs and in turn hugetlbfs. Transparent HugePage Support (THP) is an alternative mean of using huge pages for the backing of virtual memory with huge pages that supports the automatic promotion and demotion of page sizes and without the shortcomings of hugetlbfs.
已经在libhugetlbfs和hugetlbfs之上运行的性能关键型计算应用程序,正在使用透明巨大页支持(THP)作为使用巨大页支持虚拟内存的替代手段,支持页面大小的自动提升和降级,而且没有hugetlbfs的缺点。

Currently THP only works for anonymous memory mappings and tmpfs/shmem. But in the future it can expand to other filesystems.
目前,THP仅适用于匿名内存映射和tmpfs/shmem。但在未来,它可以扩展到其他文件系统。

Note
in the examples below we presume that the basic page size is 4K and the huge page size is 2M, although the actual numbers may vary depending on the CPU architecture.

注意:在下面的示例中,我们假设基本页面大小为4K,巨大页面大小为2M,尽管实际数字可能会根据CPU架构而变化。

The reason applications are running faster is because of two factors. The first factor is almost completely irrelevant and it's not of significant interest because it'll also have the downside of requiring larger clear-page copy-page in page faults which is a potentially negative effect. The first factor consists in taking a single page fault for each 2M virtual region touched by userland (so reducing the enter/exit kernel frequency by a 512 times factor). This only matters the first time the memory is accessed for the lifetime of a memory mapping. The second long lasting and much more important factor will affect all subsequent accesses to the memory for the whole runtime of the application. The second factor consist of two components:
应用程序运行更快的原因是由于两个因素。第一个因素几乎完全不相关,而且并不具有重要意义,因为它也会带来一个负面效果,即在页面错误中需要更大的清除页面复制页面,这可能是一个负面影响。第一个因素是对用户空间触及的每个2M虚拟区域只产生一个页面错误(从而将进入/退出内核的频率减少了512倍)。这仅在内存映射的生命周期内首次访问内存时才重要。第二个长期且更重要的因素将影响应用程序整个运行时内存的所有后续访问。第二个因素包括两个组成部分:

  1. the TLB miss will run faster (especially with virtualization using nested pagetables but almost always also on bare metal without virtualization)
    TLB缺失将更快地运行(特别是在使用嵌套页表的虚拟化中,但几乎在没有虚拟化的裸机上也是如此)

  2. a single TLB entry will be mapping a much larger amount of virtual memory in turn reducing the number of TLB misses. With virtualization and nested pagetables the TLB can be mapped of larger size only if both KVM and the Linux guest are using hugepages but a significant speedup already happens if only one of the two is using hugepages just because of the fact the TLB miss is going to run faster.
    单个TLB条目将映射更大量的虚拟内存,从而减少TLB缺失的数量。在虚拟化和嵌套页表中,只有在KVM和Linux客户机都使用巨大页时,TLB才能映射更大的大小,但如果两者中的任何一个使用巨大页,就会出现显著的加速,因为TLB缺失将更快地运行。

我的理解:使用大页有两个好处:一个是减少了缺页发生的次数,这个不用过多解释,另一个是容易忽略的是它减少了TLB Miss的次数,因为如果TLB发生miss,那么MMU在进行页表翻译时需要访问内存,导致延迟增大,这在虚拟化场景下更为明显

THP can be enabled system wide or restricted to certain tasks or even memory ranges inside task's address space. Unless THP is completely disabled, there is khugepaged daemon that scans memory and collapses sequences of basic pages into huge pages.
THP可以在整个系统范围内启用,也可以限制在某些任务或甚至任务地址空间内的内存范围内。除非完全禁用THP,否则会有一个名为khugepaged的守护程序扫描内存并将基本页面序列合并为巨大页面。

The THP behaviour is controlled via sysfs interface and using madvise(2) and prctl(2) system calls.
通过sysfs接口和使用madvise(2)和prctl(2)系统调用来控制透明巨大页支持的行为。

Transparent Hugepage Support maximizes the usefulness of free memory if compared to the reservation approach of hugetlbfs by allowing all unused memory to be used as cache or other movable (or even unmovable entities). It doesn't require reservation to prevent hugepage allocation failures to be noticeable from userland. It allows paging and all other advanced VM features to be available on the hugepages. It requires no modifications for applications to take advantage of it.
与hugetlbfs的保留方法相比,透明巨大页支持最大化了未使用内存的可用性,因为它允许所有未使用的内存用作缓存或其他可移动(甚至不可移动)实体。它不需要保留以防止巨大页分配失败对用户空间可见。它允许在巨大页上使用分页和所有其他高级VM功能。它不需要应用程序进行任何修改即可利用它。

Applications however can be further optimized to take advantage of this feature, like for example they've been optimized before to avoid a flood of mmap system calls for every malloc(4k). Optimizing userland is by far not mandatory and khugepaged already can take care of long lived page allocations even for hugepage unaware applications that deals with large amounts of memory.
然而,应用程序可以进一步优化以利用此功能,例如,它们以前已经优化以避免为每个malloc(4k)调用大量的mmap系统调用。优化用户空间绝对不是强制性的,而且khugepaged已经可以处理长期存在的页面分配,即使是对于处理大量内存的不了解巨大页的应用程序。

In certain cases when hugepages are enabled system wide, application may end up allocating more memory resources. An application may mmap a large region but only touch 1 byte of it, in that case a 2M page might be allocated instead of a 4k page for no good. This is why it's possible to disable hugepages system-wide and to only have them inside MADV_HUGEPAGE madvise regions.
在某些情况下,当系统范围内启用巨大页时,应用程序可能会分配更多的内存资源。应用程序可能会映射一个大区域,但只触及其中的1字节,在这种情况下,可能会分配一个2M页面,而不是一个4k页面,这是没有好处的。这就是为什么可以在整个系统范围内禁用巨大页,并且仅在MADV_HUGEPAGE madvise区域内使用它们。

Embedded systems should enable hugepages only inside madvise regions to eliminate any risk of wasting any precious byte of memory and to only run faster.
嵌入式系统应该仅在madvise区域内启用巨大页,以消除浪费任何宝贵内存的风险,并且只运行更快。

Applications that gets a lot of benefit from hugepages and that don't risk to lose memory by using hugepages, should use madvise(MADV_HUGEPAGE) on their critical mmapped regions.
那些从巨大页中获益很多且不会因使用巨大页而丢失内存的应用程序,应该在其关键的mmapped区域使用madvise(MADV_HUGEPAGE)。

sysfs

Global THP controls

Transparent Hugepage Support for anonymous memory can be entirely disabled (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE regions (to avoid the risk of consuming more memory resources) or enabled system wide. This can be achieved with one of:
THP 支持用于匿名内存可以完全禁用(主要用于调试目的),或者仅在 MADV_HUGEPAGE 区域内启用(以避免消耗更多内存资源),或者在整个系统范围内启用。可以通过以下方式之一实现:

echo always >/sys/kernel/mm/transparent_hugepage/enabled
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
echo never >/sys/kernel/mm/transparent_hugepage/enabled

It's also possible to limit defrag efforts in the VM to generate anonymous hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular pages unless hugepages are immediately available. Clearly if we spend CPU time to defrag memory, we would expect to gain even more by the fact we use hugepages later instead of regular pages. This isn't always guaranteed, but it may be more likely in case the allocation is for a MADV_HUGEPAGE region.
还可以限制在虚拟内存中的碎片整理努力,以生成匿名巨大页,以防它们不立即释放到 madvise 区域,或者永远不尝试整理内存,并简单地回退到常规页,除非巨大页立即可用。显然,如果我们花费 CPU 时间来整理内存,我们期望通过稍后使用巨大页而不是常规页来获得更多收益。这并不总是保证的,但在分配用于 MADV_HUGEPAGE 区域的情况下可能更有可能。

echo always >/sys/kernel/mm/transparent_hugepage/defrag
echo defer >/sys/kernel/mm/transparent_hugepage/defrag
echo defer+madvise >/sys/kernel/mm/transparent_hugepage/defrag
echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
echo never >/sys/kernel/mm/transparent_hugepage/defrag
  • always
    means that an application requesting THP will stall on allocation failure and directly reclaim pages and compact memory in an effort to allocate a THP immediately. This may be desirable for virtual machines that benefit heavily from THP use and are willing to delay the VM start to utilise them.
    表示请求 THP 的应用程序将在分配失败时停止,并直接回收页面并压缩内存,以立即分配 THP。这对于大量受益于 THP 使用并愿意延迟 VM 启动以利用它们的虚拟机可能是理想的。

  • defer
    means that an application will wake kswapd in the background to reclaim pages and wake kcompactd to compact memory so that THP is available in the near future. It's the responsibility of khugepaged to then install the THP pages later.
    表示应用程序将在后台唤醒 kswapd 以回收页面,并唤醒 kcompactd 以压缩内存,以便 THP 在不久的将来可用。然后由 khugepaged 负责稍后安装 THP 页面。

  • defer+madvise
    will enter direct reclaim and compaction like always, but only for regions that have used madvise(MADV_HUGEPAGE); all other regions will wake kswapd in the background to reclaim pages and wake kcompactd to compact memory so that THP is available in the near future.
    将像 always 一样进入直接回收和压缩,但仅适用于使用 madvise(MADV_HUGEPAGE) 的区域;所有其他区域将在后台唤醒 kswapd 以回收页面,并唤醒 kcompactd 以压缩内存,以便 THP 在不久的将来可用。

  • madvise
    will enter direct reclaim like always but only for regions that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.
    将像 always 一样进入直接回收,但仅适用于使用 madvise(MADV_HUGEPAGE) 的区域。这是默认行为。

  • never
    should be self-explanatory.
    应该是不言而喻的。

By default kernel tries to use huge zero page on read page fault to anonymous mapping. It's possible to disable huge zero page by writing 0 or enable it back by writing 1:
默认情况下,内核尝试在读取页面故障到匿名映射时使用巨大的零页。可以通过写入 0 来禁用巨大的零页,或者通过写入 1 来重新启用它:

echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page

Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage:
某些用户空间(例如测试程序或优化的内存分配库)可能希望知道透明巨大页的大小(以字节为单位):

cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size

khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never".
当 transparent_hugepage/enabled 设置为 "always" 或 "madvise" 时,khugepaged 将自动启动,并且如果设置为 "never",它将自动关闭。

Khugepaged controls

khugepaged runs usually at low frequency so while one may not want to invoke defrag algorithms synchronously during the page faults, it should be worth invoking defrag at least in khugepaged. However it's also possible to disable defrag in khugepaged by writing 0 or enable defrag in khugepaged by writing 1:
khugepaged 通常以低频率运行,因此虽然可能不希望在页面故障期间同步调用碎片整理算法,但至少在 khugepaged 中调用碎片整理应该是值得的。但也可以通过写入 0 来禁用 khugepaged 中的碎片整理,或者通过写入 1 来启用 khugepaged 中的碎片整理:

echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag

You can also control how many pages khugepaged should scan at each pass:
您还可以控制 khugepaged 每次扫描多少页:

/sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan

and how many milliseconds to wait in khugepaged between each pass (you can set this to 0 to run khugepaged at 100% utilization of one core):
以及 khugepaged 在每次扫描之间等待多少毫秒(您可以将此设置为 0 以使 khugepaged 在一个核心的 100% 利用率下运行):

/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs

and how many milliseconds to wait in khugepaged if there's an hugepage allocation failure to throttle the next allocation attempt:
以及如果出现巨大页分配失败,khugepaged 等待多少毫秒以限制下一次分配尝试:

/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs

The khugepaged progress can be seen in the number of pages collapsed (note that this counter may not be an exact count of the number of pages collapsed, since "collapsed" could mean multiple things: (1) A PTE mapping being replaced by a PMD mapping, or (2) All 4K physical pages replaced by one 2M hugepage. Each may happen independently, or together, depending on the type of memory and the failures that occur. As such, this value should be interpreted roughly as a sign of progress, and counters in /proc/vmstat consulted for more accurate accounting):
khugepaged 的进展可以通过合并的页面数量来查看(请注意,此计数器可能不是合并页面数量的精确计数,因为“合并”可能意味着多种情况:(1) PTE 映射被 PMD 映射替换,或者 (2) 所有 4K 物理页面被一个 2M 巨大页替换。每个可能独立发生,也可能同时发生,这取决于内存类型和发生的故障。因此,此值应该粗略地解释为进展的标志,并且应查阅 /proc/vmstat 中的计数器以获得更准确的计数):

/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed

for each pass:
每次扫描:

/sys/kernel/mm/transparent_hugepage/khugepaged/full_scans

max_ptes_none specifies how many extra small pages (that are not already mapped) can be allocated when collapsing a group of small pages into one large page:
max_ptes_none 指定在将一组小页面合并为一个大页面时可以分配多少额外的小页面(尚未映射):

/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none

A higher value leads to use additional memory for programs. A lower value leads to gain less thp performance. Value of max_ptes_none can waste cpu time very little, you can ignore it.
较高的值会导致程序使用额外的内存。较低的值会导致 THP 性能下降。max_ptes_none 的值可能会浪费极少的 CPU 时间,您可以忽略它。

max_ptes_swap specifies how many pages can be brought in from swap when collapsing a group of pages into a transparent huge page:
max_ptes_swap 指定在将一组页面合并为一个透明巨大页时可以从交换空间中带入多少页面:

/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap

A higher value can cause excessive swap IO and waste memory. A lower value can prevent THPs from being collapsed, resulting fewer pages being collapsed into THPs, and lower memory access performance.
较高的值可能会导致过多的交换 I/O 和内存浪费。较低的值可能会阻止 THP 的合并,导致较少的页面合并为 THP,并降低内存访问性能。

max_ptes_shared specifies how many pages can be shared across multiple processes. Exceeding the number would block the collapse:
max_ptes_shared 指定可以在多个进程之间共享多少页面。超过该数字将阻止合并:

/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared

A higher value may increase memory footprint for some workloads.
较高的值可能会增加某些工作负载的内存占用。

Boot parameter

You can change the sysfs boot time defaults of Transparent Hugepage Support by passing the parameter transparent_hugepage=always or transparent_hugepage=madvise or transparent_hugepage=never to the kernel command line.
您可以通过在内核命令行中传递参数 transparent_hugepage=always、transparent_hugepage=madvise 或 transparent_hugepage=never 来更改 Transparent Hugepage Support 的 sysfs 引导时间默认值。

Hugepages in tmpfs/shmem

You can control hugepage allocation policy in tmpfs with mount option huge=. It can have following values:
您可以使用挂载选项 huge= 控制 tmpfs 中的大页分配策略。它可以具有以下值:

  • always
    Attempt to allocate huge pages every time we need a new page;
    每次需要新页面时尝试分配大页;

  • never
    Do not allocate huge pages;
    不分配大页

  • within_size
    Only allocate huge page if it will be fully within i_size. Also respect fadvise()/madvise() hints;
    仅在完全位于 i_size 内时分配大页。还要遵守 fadvise()/madvise() 提示;

  • advise
    Only allocate huge pages if requested with fadvise()/madvise();
    仅在使用 fadvise()/madvise() 请求时分配大页;

The default policy is never.
默认策略为 never。

mount -o remount,huge= /mountpoint works fine after mount: remounting huge=never will not attempt to break up huge pages at all, just stop more from being allocated.
mount -o remount,huge= /mountpoint 在挂载后运行良好:重新挂载 huge=never 将不会尝试分解大页,只是停止分配更多大页。

There's also sysfs knob to control hugepage allocation policy for internal shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem.
还有一个 sysfs 旋钮可用于控制内部 shmem 挂载的大页分配策略:/sys/kernel/mm/transparent_hugepage/shmem_enabled。该挂载用于 SysV SHM、memfds、共享匿名 mmap(/dev/zero 或 MAP_ANONYMOUS)、GPU 驱动程序的 DRM 对象、Ashmem。

In addition to policies listed above, shmem_enabled allows two further values:
除了上述列出的策略,shmem_enabled 还允许两个进一步的值:

  • deny
    For use in emergencies, to force the huge option off from all mounts;
    用于紧急情况,强制关闭所有挂载点的大选项;

  • force
    Force the huge option on for all - very useful for testing;
    强制为所有挂载点打开大选项 - 用于测试非常有用;

Need of application restart

The transparent_hugepage/enabled values and tmpfs mount option only affect future behavior. So to make them effective you need to restart any application that could have been using hugepages. This also applies to the regions registered in khugepaged.
transparent_hugepage/enabled 的值和 tmpfs 挂载选项仅影响未来行为。因此,要使它们生效,您需要重新启动可能正在使用大页的任何应用程序。这也适用于 khugepaged 中注册的区域。

Monitoring usage

The number of anonymous transparent huge pages currently used by the system is available by reading the AnonHugePages field in /proc/meminfo. To identify what applications are using anonymous transparent huge pages, it is necessary to read /proc/PID/smaps and count the AnonHugePages fields for each mapping.
系统当前使用的匿名透明大页数量可通过读取 /proc/meminfo 中的 AnonHugePages 字段获得。要确定哪些应用程序正在使用匿名透明大页,需要读取 /proc/PID/smaps 并计算每个映射的 AnonHugePages 字段。

The number of file transparent huge pages mapped to userspace is available by reading ShmemPmdMapped and ShmemHugePages fields in /proc/meminfo. To identify what applications are mapping file transparent huge pages, it is necessary to read /proc/PID/smaps and count the FileHugeMapped fields for each mapping.
用户空间映射的文件透明大页数量可通过读取 /proc/meminfo 中的 ShmemPmdMapped 和 ShmemHugePages 字段获得。要确定哪些应用程序正在映射文件透明大页,需要读取 /proc/PID/smaps 并计算每个映射的 FileHugeMapped 字段。

Note that reading the smaps file is expensive and reading it frequently will incur overhead.
请注意,读取 smaps 文件开销很大,频繁读取会产生开销。

There are a number of counters in /proc/vmstat that may be used to monitor how successfully the system is providing huge pages for use.
/proc/vmstat 中有许多计数器可用于监视系统成功为使用提供大页的情况。

  • thp_fault_alloc
    is incremented every time a huge page is successfully allocated to handle a page fault.
    每当成功分配一个巨大页面来处理页面错误时递增

  • thp_collapse_alloc
    is incremented by khugepaged when it has found a range of pages to collapse into one huge page and has successfully allocated a new huge page to store the data.
    当khugepaged找到一系列页面并将其合并为一个巨大页面,并成功分配一个新的巨大页面来存储数据时递增。

  • thp_fault_fallback
    is incremented if a page fault fails to allocate a huge page and instead falls back to using small pages.
    如果页面错误无法分配巨大页面,而是回退到使用小页面,则递增。

  • thp_fault_fallback_charge
    is incremented if a page fault fails to charge a huge page and instead falls back to using small pages even though the allocation was successful.
    如果页面错误无法计费巨大页面,而是回退到使用小页面,即使分配成功,也递增。

  • thp_collapse_alloc_failed
    is incremented if khugepaged found a range of pages that should be collapsed into one huge page but failed the allocation.
    如果khugepaged找到应该合并为一个巨大页面的页面范围,但分配失败,则递增。

  • thp_file_alloc
    is incremented every time a file huge page is successfully allocated.
    每当成功分配一个文件巨大页面时递增

  • thp_file_fallback
    is incremented if a file huge page is attempted to be allocated but fails and instead falls back to using small pages.
    如果尝试分配文件巨大页面失败,并且回退到使用小页面,则递增。

  • thp_file_fallback_charge
    is incremented if a file huge page cannot be charged and instead falls back to using small pages even though the allocation was successful.
    如果无法计费文件巨大页面,并且回退到使用小页面,即使分配成功,也递增。

  • thp_file_mapped
    is incremented every time a file huge page is mapped into user address space.
    每当将文件巨大页面映射到用户地址空间时递增。

  • thp_split_page
    is incremented every time a huge page is split into base pages. This can happen for a variety of reasons but a common reason is that a huge page is old and is being reclaimed. This action implies splitting all PMD the page mapped with.
    每当将巨大页面拆分为基本页面时递增。这可能发生的原因有很多,但常见的原因是巨大页面过旧并正在被回收。此操作意味着拆分页面映射的所有PMD。

  • thp_split_page_failed
    is incremented if kernel fails to split huge page. This can happen if the page was pinned by somebody.
    如果内核无法拆分巨大页面,则递增。如果页面被某人固定,可能会发生这种情况。

  • thp_deferred_split_page
    is incremented when a huge page is put onto split queue. This happens when a huge page is partially unmapped and splitting it would free up some memory. Pages on split queue are going to be split under memory pressure.
    当将巨大页面放入拆分队列时递增。当巨大页面部分取消映射并且拆分将释放一些内存时,会发生这种情况。在内存压力下,拆分队列中的页面将被拆分。

  • thp_split_pmd
    is incremented every time a PMD split into table of PTEs. This can happen, for instance, when application calls mprotect() or munmap() on part of huge page. It doesn't split huge page, only page table entry.
    每当将PMD拆分为PTE表时递增。例如,当应用程序对巨大页面的一部分调用mprotect()或munmap()时,可能会发生这种情况。它不会拆分巨大页面,只会拆分页表项。

  • thp_zero_page_alloc
    is incremented every time a huge zero page used for thp is successfully allocated. Note, it doesn't count every map of the huge zero page, only its allocation.
    每当成功分配用于巨大页面的零页面时递增。请注意,它不会计算巨大零页面的每次映射,只计算其分配。

  • thp_zero_page_alloc_failed
    is incremented if kernel fails to allocate huge zero page and falls back to using small pages.
    如果内核无法分配巨大零页面,并回退到使用小页面,则递增。

  • thp_swpout
    is incremented every time a huge page is swapout in one piece without splitting.
    每当一个巨大页面在不拆分的情况下被交换出时递增。

  • thp_swpout_fallback
    is incremented if a huge page has to be split before swapout. Usually because failed to allocate some continuous swap space for the huge page.
    如果在交换出之前必须拆分一个巨大页面,则递增。通常是因为无法为巨大页面分配连续的交换空间。

As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in /proc/vmstat to help monitor this overhead.
随着系统的老化,分配巨大页面可能会很昂贵,因为系统使用内存压缩来在内存中复制数据,以释放巨大页面供使用。在/proc/vmstat中有一些计数器可以帮助监视这种开销。

  • compact_stall
    is incremented every time a process stalls to run memory compaction so that a huge page is free for use.
    每当一个进程因为运行内存压缩而停顿,以便释放一个巨大页面供使用时递增。

  • compact_success
    is incremented if the system compacted memory and freed a huge page for use.
    如果系统压缩了内存并释放了一个巨大页面供使用,则递增。

  • compact_fail
    is incremented if the system tries to compact memory but failed.
    如果系统尝试压缩内存但失败,则递增。

It is possible to establish how long the stalls were using the function tracer to record how long was spent in __alloc_pages() and using the mm_page_alloc tracepoint to identify which allocations were for huge pages.
可以使用函数跟踪器记录在__alloc_pages()中花费的时间,并使用mm_page_alloc跟踪点来确定哪些分配是用于巨大页面的,从而确定停顿的持续时间。

Optimizing the applications

To be guaranteed that the kernel will map a 2M page immediately in any memory region, the mmap region has to be hugepage naturally aligned. posix_memalign() can provide that guarantee.
要确保内核会立即在任何内存区域中映射 2M 页面,mmap 区域必须自然对齐到大页。posix_memalign() 可以提供该保证。

Hugetlbfs

You can use hugetlbfs on a kernel that has transparent hugepage support enabled just fine as always. No difference can be noted in hugetlbfs other than there will be less overall fragmentation. All usual features belonging to hugetlbfs are preserved and unaffected. libhugetlbfs will also work fine as usual.
在启用了透明大页支持的内核上,您可以像往常一样使用 hugetlbfs。在 hugetlbfs 中不会有任何区别,只是整体碎片化会减少。所有属于 hugetlbfs 的常规功能都得到保留并不受影响。libhugetlbfs 也会像往常一样正常工作。