YOLO系列代码调试笔记

发布时间 2023-08-04 11:40:37作者: 倦鸟已归时

环境:Windows10、Python 3.8.5、torch 1.13.0+cu116、torchvision 0.14.0+cu116

工程:https://github.com/abeardear/pytorch-YOLO-v1

bug1:

    # resnet = models.resnet50(pretrained=True)
    resnet = models.resnet50(weights=ResNet50_Weights.DEFAULT)

因为版本原因,加载预训练模型时采用参数“pretrained=True”会报错,可以改为“weights=ResNet50_Weights.DEFAULT”或者类似的其他参数。

bug2:进程池错误

追溯报错信息,看到问题出在 for i, (images, target) in enumerate(train_loader): 这一行,其实只需要把整个迭代循环过程放在主函数下即可:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its .

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

修改:

"""
num_iter = 0
# vis = Visualizer()
best_test_loss = np.inf

for epoch in range(num_epochs):
"""
# 修改后
if __name__ == '__main__':
    num_iter = 0
    # vis = Visualizer()
    best_test_loss = np.inf

    for epoch in range(num_epochs):

 bug3: opencv报错:包含信息“ (-215:Assertion failed) dims <= 2 && step[0] > 0 in function 'cv::Mat::locateROI' ”

维度错误,如果 tensor/ array的shape为 [0 ,3, 1080, 1920]诸如此类,需要 通过 torch.squeeze(tensor, dim=0) 去掉多余的维度,最后通过 torch.unsqueeze(tensor, dim=0) 变换回去。

bug4:opencv报错:

  File "D:/PythonCVWorkspace/pytorch-YOLO-v1/yolodataset.py", line 138, in BGR2HSV
    return cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
cv2.error: OpenCV(4.2.0) c:\projects\opencv-python\opencv\modules\imgproc\src\color.simd_helpers.hpp:92: error: (-2:Unspecified error) in 
function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<3,4,-1>,struct cv::impl::A0x3b52564f
::Set<3,-1,-1>,struct cv::impl::A0x3b52564f::Set<0,5,-1>,2>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)
' > Invalid number of channels in input image: > 'VScn::contains(scn)' > where > 'scn' is 1

bug5: 显存爆了

    return torch.batch_norm(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 2.48 GiB already allocated; 0 bytes free; 2.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

纯属显卡太弱,我的显卡 GeForce GTX 只有 4GB 显存,降低 batch_size 到4即可。

bug6:数值类型错误

    total_loss += loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number

    total_loss += loss.data.item
TypeError: unsupported operand type(s) for +=: 'float' and 'builtin_function_or_method'

修改方法:

loss.data 是一个Tensor 类型的标量, tensor(61.8650, device='cuda:0') <class 'torch.Tensor'>

loss.data.item 是一个方法名,正确的是 loss.data.item() 获取其中数值。

total_loss += loss.data.item()

warning 7:UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.

添加屏蔽代码

import warnings
warnings.filterwarnings("ignore", category=UserWarning) 

 

至此,除了关于数据增强方面的两个opencv的错误(随机亮度变换、随机色彩空间变换),以及后来显存又爆了之外。通过自己做的150张图片的小数据集测试跑通了源码。