Lambda expression

发布时间 2023-09-27 14:10:18作者: 0x7F

lambda structure
[capture list] (parameter list) -> return type { function body }

(parameter list) and return type are optional

Value and Reference capture

There is a notable question: when lambda used between CPU and GPU, we need to notice the store location of variable.
For example, if we open up a device memory, but the code is written in host code and the lambda also in host code. Now, we want use lambda to capture a variable, as shown in the following code.

template <typename Func>
__global__ void myFunc(Func func) {
	std::cout << func(0) << std::endl;
}

int main() {
    std::vector<int, CudaAllocator<int>> arr(n);

	muFunc<<<gridSize, blockSize>>> ([] __device__ (int i) -> void{
		arr[i] = 0;
	});

    return 0;
}

The CudaAllocator is a custom class which implements the allocate and deallocate member function

Please note that the above code is incomplete, if we just use the [], we can't use the arr in the lambda function body. But what should we use in [] ?
In fact, we use [&] or [=] directly are both wrong method. When we use [&], its meaning is like we let a device code to access a variable which is opened up in host memory. And if we use the [=], its meaning is like we copy the entire data of vector, it's time and resource consuming.

So, what is the right way to implement it ? We can use arr.data() to get the orginal pointer which points to the device memory, note that this variable is still in the host memory, so next we can use [=] to get a copy value so that device code can use it. In summary, the first step is used to solve the resource copy problem, and the second step is used to solve the resource access between host and device memory.