【多进程】python多进程CPU密集型任务的进程数选择

发布时间 2024-01-05 17:14:15作者: BJFU-VTH

实验思路

从1加到100000000, 分别用单进程,多进程方案去做。

实验代码

from multiprocessing import Pool, Process, Queue
import os, time, random


def test_func(left, right):
    res = 0
    for i in range(left, right):
        res += i
    return res


def join(q):
    r = 0
    while True:
        res = q.get()
        print('merge进程正在干活')
        if res is None:
            break
        r += res
    q.put(r)


def long_time_task(left, right, queue):
    """
    子进程
    :param left:
    :param right:
    :param queue:
    :return:
    """
    res = test_func(left, right)
    print('分离计算进程干完活了')
    queue.put(res)


def get_split_step(target, thread_num):
    """
    获取任务列表, 保证列表长度等于线程长度
    :param target:
    :param thread_num:
    :return:
    """
    if thread_num == 1:
        return [[1, target]]
    step = target // thread_num
    remains = target % thread_num
    res = []
    for i in range(thread_num):
        res.append([i*step, (i+1)*step])
    res[-1][-1] += remains
    return res


def get_res(target):
    r = 0
    for i in range(1, target):
        r += i
    return r


if __name__ == '__main__':
    multi_start = time.time()
    pool = Pool()
    queue = Queue()
    p_res_merge = Process(target=join, args=(queue, ))
    p_list = []
    thread_num = 8
    target = 100000000
    task_list = get_split_step(target, thread_num)
    for i in range(thread_num):
        p_list.append(Process(target=long_time_task, args=(task_list[i][0], task_list[i][1], queue)))
    for pp in p_list:
        pp.start()
    p_res_merge.start()
    for pp in p_list:
        pp.join()
    queue.put(None)
    p_res_merge.join()
    multi_end = time.time()
    print(f"多进程res: {queue.get()}, cost: {multi_end-multi_start}")
    single_start = time.time()
    single_res = get_res(target)
    single_end = time.time()
    print(f"单进程res: {single_res}, cost: {single_end-single_start}")

mac M1 8c 测试结果:

 linux 48c 测试结果:

 据此推断,大概有70%的开销花在了进程创建、切换上。

结论

按照核心数分配最大进程数是合理的,但也要考虑系统中进程的数量。如果进程多,那进程被调度的机会就少。所以具体应用要做实验。但如果是拍脑袋的经典值,那就是按照核心数=最大进程数来给建议。