YOLO系列代码调试笔记-526互联

环境：Windows10、Python 3.8.5、torch 1.13.0+cu116、torchvision 0.14.0+cu116

工程：https://github.com/abeardear/pytorch-YOLO-v1

bug1：

    # resnet = models.resnet50(pretrained=True)
    resnet = models.resnet50(weights=ResNet50_Weights.DEFAULT)

因为版本原因，加载预训练模型时采用参数“pretrained=True”会报错，可以改为“weights=ResNet50_Weights.DEFAULT”或者类似的其他参数。

bug2：进程池错误

追溯报错信息，看到问题出在 for i, (images, target) in enumerate(train_loader): 这一行，其实只需要把整个迭代循环过程放在主函数下即可：

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its .

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

修改：

"""
num_iter = 0
# vis = Visualizer()
best_test_loss = np.inf

for epoch in range(num_epochs):
"""
# 修改后
if __name__ == '__main__':
    num_iter = 0
    # vis = Visualizer()
    best_test_loss = np.inf

    for epoch in range(num_epochs):

bug3: opencv报错：包含信息“ (-215:Assertion failed) dims <= 2 && step[0] > 0 in function 'cv::Mat::locateROI' ”

维度错误，如果 tensor/ array的shape为 [0 ,3, 1080, 1920]诸如此类，需要通过 torch.squeeze(tensor, dim=0) 去掉多余的维度，最后通过 torch.unsqueeze(tensor, dim=0) 变换回去。

bug4：opencv报错：

  File "D:/PythonCVWorkspace/pytorch-YOLO-v1/yolodataset.py", line 138, in BGR2HSV
    return cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
cv2.error: OpenCV(4.2.0) c:\projects\opencv-python\opencv\modules\imgproc\src\color.simd_helpers.hpp:92: error: (-2:Unspecified error) in 
    function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<3,4,-1>,struct cv::impl::A0x3b52564f
    ::Set<3,-1,-1>,struct cv::impl::A0x3b52564f::Set<0,5,-1>,2>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)'
> Invalid number of channels in input image:
>     'VScn::contains(scn)'
> where
>     'scn' is 1

bug5: 显存爆了

    return torch.batch_norm(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 2.48 GiB already allocated; 0 bytes free; 2.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

纯属显卡太弱，我的显卡 GeForce GTX 只有 4GB 显存，降低 batch_size 到4即可。

bug6：数值类型错误

    total_loss += loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number

附

    total_loss += loss.data.item
TypeError: unsupported operand type(s) for +=: 'float' and 'builtin_function_or_method'

修改方法：

loss.data 是一个Tensor 类型的标量， tensor(61.8650, device='cuda:0') <class 'torch.Tensor'>

loss.data.item 是一个方法名，正确的是 loss.data.item() 获取其中数值。

total_loss += loss.data.item()

warning 7：UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.

添加屏蔽代码

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

至此，除了关于数据增强方面的两个opencv的错误（随机亮度变换、随机色彩空间变换），以及后来显存又爆了之外。通过自己做的150张图片的小数据集测试跑通了源码。