本系列文章为 CMU 15-418/15-618: Parallel Computer Architecture and Programming, Fall 2018 课程学习笔记
课程官网:CMU 15-418/15-618: Parallel Computer Architecture and Programming
参考文章:CMU 15-418 notes
相关资源与介绍:CMU 15-418/Stanford CS149: Parallel Computing - CS自学指南
Creating a parallel program
- 分解 Decomposition: Most of the time programmer is responsible (does not have a sophisticated compiler to achieve this yet).
- 分配 Assignment: Many language/runtimes are able to take the responsibility. It could be done statically by programmer (Pthread workload assignment by programmer), statically by compiler (ISPC foreach) and also can be done dynamically (ISPC tasks).
- 编排 Orchestration: Most happens at runtime, but need to be declared or defined by programmer.
- 映射 Mapping: maybe OS (mapping pthread to CPU cores); maybe Compiler (ISPC assigns program instances to data lanes); maybe hardware (mapping CUDA thread blocks to GPU cores).
1. Decomposition
加速比被程序中内部串行的部分S所限制
2. Assignment
statically: pthread, ISPC foreach(abstraction leaves room for dynamic assignment, but current ISPC implementation is static)
dynamically: ISPC tasks
只从processors之间的communication出发
blocked assignment较interleaved assignment更好,因为需要更少的处理器之间的通讯(该计算需要计算每个红点的十字范围,若在边界上,则需要跨处理器通讯,具体见课件)
3. Orchestrain
4. Mapping
Summary
- Programming Lecture4 Parallel Lecture Basicsprogramming lecture4 parallel lecture programming parallel lecture basics abstractions programming parallel lecture abstractions programming lecture3 parallel lecture4 lecture4 lecture references recursion lecture4 lecture programming network basics neural programming parallel basic basics