[CMU 15-418] (Lecture4) Parallel Programming Basics

发布时间 2023-04-24 21:54:57作者: zncleon

本系列文章为 CMU 15-418/15-618: Parallel Computer Architecture and Programming, Fall 2018 课程学习笔记
课程官网:CMU 15-418/15-618: Parallel Computer Architecture and Programming
参考文章:CMU 15-418 notes
相关资源与介绍:CMU 15-418/Stanford CS149: Parallel Computing - CS自学指南


Creating a parallel program

  • 分解 Decomposition: Most of the time programmer is responsible (does not have a sophisticated compiler to achieve this yet).
  • 分配 Assignment: Many language/runtimes are able to take the responsibility. It could be done statically by programmer (Pthread workload assignment by programmer), statically by compiler (ISPC foreach) and also can be done dynamically (ISPC tasks).
  • 编排 Orchestration: Most happens at runtime, but need to be declared or defined by programmer.
  • 映射 Mapping: maybe OS (mapping pthread to CPU cores); maybe Compiler (ISPC assigns program instances to data lanes); maybe hardware (mapping CUDA thread blocks to GPU cores).

image.png


1. Decomposition

image.png


加速比被程序中内部串行的部分S所限制

image.png


2. Assignment

statically: pthread, ISPC foreach(abstraction leaves room for dynamic assignment, but current ISPC implementation is static)
dynamically: ISPC tasks

image.png


只从processors之间的communication出发
blocked assignment较interleaved assignment更好,因为需要更少的处理器之间的通讯(该计算需要计算每个红点的十字范围,若在边界上,则需要跨处理器通讯,具体见课件)

image.png


3. Orchestrain

image.png


4. Mapping

image.png


Summary

image.png