训练CV比赛常用Tips & Tricks

引言
1. 图像增强
2.更好的模型
- 在backbone后面添加更多的隐藏层
- 逐层解冻
3.学习率和学习率调度器
- 学习率Schedulers
  - One Cycle Cosine Scheduling
- 使用学习率调度器的一些小技巧
4.优化器
5.过拟合和正则化
6.标签平滑
7.知识蒸馏
8.伪标签
9.错误分析

引言

本文主要记录CV竞赛方面的一些Tricks,主要集中在9个方面：

图像增强
更好的模型
学习率和学习率调度器
优化器
正则化手段
标签平滑
知识蒸馏
位标签
错误分析

1. 图像增强

以下列出一些可能会用到的图像增强方式

颜色增强

这种增强方式通过将每个通道乘以随机选择的系数来随机调整图像的色调、饱和度和亮度。系数从[0:6,1:4]的范围内选择，以确保生成的图像不会过于失真。

def color_skew(image):
    h, s, v = cv2.split(image)
    h = h * np.random.uniform(low=0, high=6)
    s = s * np.random.uniform(low=1, high=4)
    v = v * np.random.uniform(low=0, high=6)
    return cv2.merge((h, s, v))

RGB Norm

这种增强通过从每个通道的值中减去每个通道的平均值并除以通道的标准差来标准化图像的RGB通道。这有助于标准化图像中的值，并可以提高模型的性能。

def rgb_norm(image):
    r, g, b = cv2.split(image)
    r = (r - np.mean(r)) / np.std(r)
    g = (g - np.mean(g)) / np.std(g)
    b = (b - np.mean(b)) / np.std(b)
    return cv2.merge((r, g, b))

Black and White

这种增强通过将图像转换为灰度图

def black_and_white(image):
    return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

Ben Graham: Grayscale + Gaussian Blur

这种增强将图像转换为灰度并应用高斯模糊来平滑图像中的任何噪声或细节

def ben_graham(image):
    image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
    image = cv2.GaussianBlur(image, (5, 5), 0)
    return image

Hue,Saturation,Brightness

这种增强将图像转换为HLS色彩空间，HLS色彩空间将图像分成色调、饱和度和亮度通道

def hsb(image):
    return cv2.cvtColor(image, cv2.COLOR_RGB2HLS)

LUV Color Space

这种增强将图像转换为LUV色彩空间，该空间旨再感知上保持一致并实现更精准的色彩比较

def luv(image):
    return cv2.cvtColor(image, cv2.COLOR_RGB2LUV)

Alpha Channel

这种增强为图像添加了一个alpha通道，可用于增加透明效果

def alpha_channel(image):
    return cv2.cvtColor(image, cv2.COLOR_RGB2RGBA)

YZ Color Space

这种增强将图像转换为XYZ颜色空间，这是一种与设备无关的颜色空间，可以实现更准确的颜色表示

def xyz(image):
    return cv2.cvtColor(image, cv2.COLOR_RGB2XYZ)

Luma Chroma

这种增强将图像转换为YCrCb颜色空间，它将图像分成亮度和色度通道

def luma_chroma(image):
    return cv2.cvtColor(image, cv2.COLOR_RGB2YCrCb)

CIE Lab

这种增强将图像转换为CIE Lab颜色空间，该颜色空间设计为感知均匀，可实现更准确的颜色比较

def cie_lab(image):
    return cv2.cvtColor(image, cv2.COLOR_RGB2Lab)

YUV Color Space

这种增强将图像转换为YUV颜色空间

def yuv(image):
    return cv2.cvtColor(image, cv2.COLOR_RGB2YUV)

Center Crop

这种增强随机裁剪长宽比为[3/4,4/3]的矩形区域，然后按[8%,100%]之间的因子随机缩放裁剪，最后将裁剪调整为正方形

transforms.CenterCrop((100, 100))

Flippings

这种增强增加了图像随机水平翻转的概率。

def flippings(image):
    if np.random.uniform() < 0.5:
        image = cv2.flip(image, 1)
    return image

Random Crop

这种增强从图像中随机裁剪出一个矩形区域

transforms.RandomCrop(100, 100)

Random Resized Crop

这种增强从图像中随机调整大小合裁剪矩形区域

transforms.RandomResizedCrop((100, 100))

Color Jitter

这种增强随机调整图像的亮度、对比度、饱和度和色度

transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)

Random Affine

这种增强对图像随机应用仿射变换，包括旋转、缩放和剪切

transforms.RandomAffine(degrees=45, translate=(0.1, 0.1), scale=(0.5, 2.0), shear=45)

Random Horizontal Flip

以0.5的概率随机水平翻转图像

transforms.RandomHorizontalFlip()

Random Vertical Flip

以0.5的概率随机垂直翻转图像

transforms.RandomVerticalFlip()

Random Perspective

随机对图像应用透视变换

transforms.RandomPerspective()

Random Rotation

将图像随机旋转给定的度数范围

transforms.RandomRotation(degress=45)

Random Invert

随机反转图像的颜色

transforms.RandomInvert()

Random Slarize

对图像随机应用曝光效果，其中高于某个强度阈值的像素被反转

transforms.RandomSolarize(threshold=128)

Random Autocontrast

通过将强度值拉伸到真个可用范围来随机调整图像的对比度

transforms.RandomAutocontrast()

Random Equalize

随机均衡了图像的直方图，从而增加了对比度

transforms.RandomEqualize()

Auto Augment

使用强化学习来搜索给定数据集的最佳增强策略

from autoaugment import AutoAugment

auto_augment = AutoAugment()
image = auto_augment(image)

Fast Autoaugment

是Auto Augment方法的更快实现。它使用神经网络来预测给定数据集的最佳扩充策略

from fast_autoaugment import FastAutoAugment

fast_auto_augment = FastAutoAugment()
image = fast_auto_augment(image)

Augmix

它将多个增强图像组合起来创建一个单一的、更加多样化和逼真的图像

from augmix import AugMix

aug_mix = AugMix()
image = aug_mix(image)

Mixup/Cutout

Mixup通过线性插值像素值来组合两个图像。Cutout从图像中随机删除矩形区域。

"You take a picture of a cat and add some "transparent dog" on top of it. The amount of transparency is a hyperparam."

x=lambda*x1+(1-lambda)x2,

y=lambda*x1+(1-lambda)y2

Test Time Augmentations(TTA)

图像增强不仅在训练期间有用，在测试期间也有用。在测试阶段，只需将测试集的图像进行多次增强，应用于预测并对结果进行平均极客。这种方法可以增强预测的鲁棒性，但相对的会增加时间。对测试集做增强，不适应台高级的增强方式，常见的有改变图像尺度、crop不同的地方、进行翻转等。

2.更好的模型

EfficientNet V1、V2系列
Seresnext
Swin Transformer
BeIT Transformer
ViT Transformer

在backbone后面添加更多的隐藏层

添加更多层是有益的，因为可以学到更多高级特征。但是也可能会损害模型性能。

逐层解冻

随着训练的进行解冻预训练骨干的层。先添加更多层并冻结backbone，然后再慢慢解冻backbone的参数让其参与训练。

## Weight freezing
for param in model.parameters():
  param.requires_grad = False 

## Weight unfreezing
for param in model.parameters():
  param.requires_grad = True

3.学习率和学习率调度器

学习率和学习率调度器会影响模型的训练性能。改变学习率会对性能和训练收敛产生很大影响。

学习率Schedulers

One Cycle Cosine Scheduling

from torch.optim.lr_scheduler import CosineAnnealingLR
optimizer = torch.optim.Adam(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)
#这里使用
scheduler = CosineAnnealingLR(optimizer, T_max=num_train_optimization_steps)
num_training_steps = num_train_optimization_steps / args.gradient_accumulation_steps
# Update the scheduler
scheduler.step()
# step the learning rate scheduler here, 
# you will want to step the learning rate scheduler only once per optimizer step nothing more nothing less. 
# So in this case, it should be called before you expect the gradients to be applied.

使用学习率调度器的一些小技巧

使用Trianglar或One Cycle方法进行学习率调整可以提供显著的改进，可以克服一些Batch大小问题
要花时间研究适合自己的任务和模型的最佳学习率调度方法，可以使得模型收敛
学习率调整策略可用于训练具有较低batchsize或多个学习率的模型
优先尝试低学习率，再看看提高学习率是有助于还是损害于模型
在训练后期增加学习率或多个学习率会帮助模型收敛
当使用梯度累积或多个学习率时，Loss Scaling有助于减少损失方差并改善梯度流

4.优化器

找到最佳的权重衰减值需要依靠大量的实验

在使用Adam或AdamW时，需要了解：

一个重要的超参数是Adam优化器中使用的beta1和beta2，选择最佳值取决于你的任务和数据
在Adam优化器中，不要低估优化器epsilon值的重要性
不要过度使用梯度裁剪范数
梯度累积可以提供一些好处

一些其他的优化器

AdamW：这是Adam算法的拓展，可防止外层模型权重的指数权重衰减。
Adafactor：被设计为低内存使用率和可扩展性。该优化器可以使用多个GPU提供显著的优化器性能
Novograd：它是用于训练Bert-Large模型的优化器之一
Ranger：在性能优化方面的解决方案中取得了不错的成绩
Lamb:由GLUE和QQP竞赛获胜者开发的GPU优化可重用Adam优化器
Lookahead:可以提供一些性能提升

5.过拟合和正则化

使用dropout
正则化
Multi Validations

6.标签平滑

from torch.nn.modules.loss import _WeightedLoss

class SmoothBCEwLogits(_WeightedLoss):
    def __init__(self, weight = None, reduction = 'mean', smoothing = 0.0, pos_weight = None):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction
        self.pos_weight = pos_weight

    @staticmethod
    def _smooth(targets, n_labels, smoothing = 0.0):
        assert 0 <= smoothing < 1
        with torch.no_grad(): targets = targets * (1.0 - smoothing) + 0.5 * smoothing
        return targets

    def forward(self, inputs, targets):
        targets = SmoothBCEwLogits._smooth(targets, inputs.size(-1), self.smoothing)
        loss = F.binary_cross_entropy_with_logits(inputs, targets,self.weight, pos_weight = self.pos_weight)
        if  self.reduction == 'sum': loss = loss.sum()
        elif  self.reduction == 'mean': loss = loss.mean()
        return loss

7.知识蒸馏

用一个大的teacher network来指导一个small network的学习。步骤：

训练大型模型：在数据上训练大型模型
计算软标签：使用训练好的大模型计算软标签。即大模型软化后softmax的输出
Student模型训练：在大模型的基础上，训练一个基于教师输出的学生模型作为额外的软标签损失函数，通过插值调整两个损失函数的比例。

8.伪标签

使用模型标记未标记的数据(例如测试数据)，然后使用新的标记数据来重新训练模型。步骤：

训练教师模型：根据你拥有的数据训练模型
计算伪标签：使用训练好的大模型为未标注数据计算软标签
仅使用置信度的预测作为伪标签
Student模型训练：根据拥有的新标记数据训练学生模型

9.错误分析

根据模型的置信度分数对验证样本进行排序，并查看哪些样本的预测置信度最低。

mistakes_idx = [img_idx for img_idx in range(len(train)) if int(pred[img_idx] > 0.5) != target[img_idx]]
mistakes_preds = pred[mistakes_idx]
sorted_idx = np.argsort(mistakes_preds)[:20]
# Show the images of the sorted idx here..