Cannot re-initialize CUDA in forked subprocess.

发布时间 2024-01-09 15:30:32作者: 无左无右

"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/3-tmp/data_custom.py", line 106, in __getitem__
    flag_same, type_classes, tra, rank_, img, bev_feature_x1 = self.get_from_idx(idx)
  File "/3-tmp/data_custom.py", line 65, in get_from_idx
    bev_feature_x1 = torch.load(path_bev_feature_x1)  # [2, 64, 200, 96]
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/serialization.py", line 853, in _load
    result = unpickler.load()
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/serialization.py", line 845, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/serialization.py", line 834, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/serialization.py", line 157, in _cuda_deserialize
    return obj.cuda(device)
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/_utils.py", line 71, in _cuda
    with torch.cuda.device(device):
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 225, in __enter__
    self.prev_idx = torch._C._cuda_getDevice()
  File "/anconda_install/envs/pytorch1.7/lib/python3.7/site-packages/torch/cuda/__init__.py", line 164, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

解决方法1:

开头添加:

from torch.multiprocessing import set_start_method
try:
     set_start_method('spawn')
except RuntimeError:
    pass
torch.cuda.empty_cache()

解决方法2:

    train_loader = torch.utils.data.DataLoader(train_dataset, 
                                               batch_size=batch_size,
                                               shuffle=True,
                                               drop_last=True,
                                               num_workers=num_workers,
                                              multiprocessing_context='spawn')