[swin-trans]分布式训练的debug:ValueError: Error initializing torch.distributed using env:// rendezvous: en

发布时间 2023-10-21 10:00:12作者: 咖啡陪你

在用torch.distributed.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank)时,出现

1、ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set

解决

加入

os.environ['MASTER_ADDR'] = 'localhost'

 

2、ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_PORT expected, but not set

解决

加入

os.environ['MASTER_PORT'] = '12345'