调度器54—WALT的RTG功能-526互联

注：本文基于Qcom-msm5.4内核进行分析，Android12.

一、概述

RTG 叫 related thread group，顾名思义“相关线程组”。

我们设想一种场景，有两个跑分线程，一个叫thread0，另外一个叫thread1. 其中thread0执行一段时间之后唤醒thread1执行，然后自己睡眠。同样的thread1运行一段时间之后唤醒thread0执行，自己睡眠，如此反复。假如thread0跟thread1分别运行在不同的CPU上，示意图如下：

又假设thread0跟thread1每次运行时长都是一样。这时候CPU0跟CPU1的使用率都是50%，一般来讲达不到80%的CPU使用率这样的DCVS提频条件，由于调频选的是一个cluster里面uutil最大的那个CPU的util用来调频，因此频点也不会很高。

如果假如thread0跟thread1都运行在同一个CPU上呢，如下示意图：

那么CPU0始终处于100%的使用率，CPU频率会一直上提，一直到最高。

同样的测试用例，因为调度时任务运行在不同的CPU上，导致调频结果完全不一样。与这种情况比较类似的就是android显示里面的应用主线程及render线程。它们之间存在着依赖关系。但是调度和DCVS忽视了这种依赖关系的存在。因此才有RTG横空出世。

二、相关结构体

1. struct walt_task_struct

struct walt_task_struct { //linux/sched.h
    ...
    u32                coloc_demand;
    struct walt_related_thread_group __rcu *grp;
    struct list_head    grp_list;
    u32                unfilter;
};

此结构是内嵌到 task_struct 中的per-task的，其中与RTG相关成员有：

coloc_demand: update_history()中保存的是任务最近5个窗口的平均负载，_set_preferred_cluster()中计算聚合负载时累积。
grp_list: add_task_to_group()中通过它挂在 grp->tasks 链表上。
grp: add_task_to_group()中指向自己所挂载的组，task_in_related_thread_group()中判断若此成员不为NULL表示在分组中。
unfilter: update_history()中，若判断任务负载大于35，则100ms内不过滤。35和100通过sysctl文件sched_min_task_util_for_colocation 和 sched_task_unfilter_period 进行配置。

2. struct walt_related_thread_group

struct walt_related_thread_group { //sched/sched.h
    int id;
    raw_spinlock_t lock;
    struct list_head tasks;
    struct list_head list;
    bool skip_min;
    struct rcu_head rcu;
    u64 last_update;
    u64 downmigrate_ts;
    u64 start_ts;
};

此结构描述一个RTG分组，

id: 每一个 related_thread_group 的唯一识别号，从1开始，系统默认的rtg组的 DEFAULT_CGROUP_COLOC_ID 就是1。alloc_related_thread_groups()中一次性创建20个分组，id为序号。
tasks: 为一个链表，里面都是属于该组的进程，通过 add_task_to_group()/remove_task_from_group() 来添加或者删除。
list: 当grp被启动，上面挂了任务就认为是active的，在__sched_set_group_id()中通过grp->list将其挂入到全局 active_related_thread_groups 链表中。
last_update: _set_preferred_cluster() 中有使用，防止这个函数被频繁调用，限制最快更新频率1.6ms(sched_ravg_window / 10)。
skip_min: update_best_cluster()/_set_preferred_cluster() 中有赋值，应该是跳过小核的意思。trace_sched_set_preferred_cluster/trace_sched_task_util 中有打印其值。
downmigrate_ts: update_best_cluster()中有使用

3. struct walt_task_group

struct walt_task_group {
    bool sched_boost_no_override;
    bool sched_boost_enabled;
    bool colocate;
    bool colocate_update_disabled;
};

此结构内嵌在 struct task_group 中，主要实现cpu cgroup中colocate相关的逻辑，成员：

sched_boost_no_override: walt_init_sched_boost()中初始化为false，通过写文件 cpu.uclamp.sched_boost_no_override 修改其值，sched_boosts[CONSERVATIVE_BOOST].enter --> sched_conservative_boost_enter -->update_cgroup_boost_settings()中遍历tg时若发现 tg->wtg.sched_boost_no_override 为真就跳过此tg，为假就将 tg->wtg.sched_boost_enabled 设置为 false。
sched_boost_enabled: walt_init_sched_boost(tg)中初始化为false, 上面进入 CONSERVATIVE_BOOST boost的时候设置为false, 退出CONSERVATIVE_BOOST boost时设置为true, 路径:[CONSERVATIVE_BOOST].exit --> sched_conservative_boost_exit --> restore_cgroup_boost_settings --> tg->wtg.sched_boost_enabled = true.
colocate: 写入到 cpu.uclamp.colocate 文件中的值，会保存在这个成员中，cat直接返回这个成员的值。当唤醒一个新创建的任务时，uclamp_task_colocated()中判断任务对应的task_group的此值不为0就添加到默认rtg分组。当将一个任务attach到cpu cgrup中时，若判断其task_group对应的这个值不为0就添加到默认rtg分组中。注意，它只允许初始化设置一次值。
colocate_update_disabled: walt_init_sched_boost(*tg)中初始化为false, 当写cpu cgroup的 cpu.uclamp.colocate 文件时，判断为真返回无权限，为假设置为真并执行相应逻辑。

4. struct walt_sched_cluster

struct walt_sched_cluster {
    ...
    u64            aggr_grp_load;
};

aggr_grp_load: 是聚合负载，统计见 walt_irq_work(), freq_policy_load() 中有使用这个聚合负载去调频。

5. struct walt_rq

scale_exec_time代码scale walt

walt update_task_ravg入口代码

功能

多功能

功能区