BEV开山之作LSS（Lift,Splat,Shoot）代码浅析之一：数据加载-526互联

compile_data:

1）初始化Nuscenes API

2）Segmentation Data类，__getitem__得到traindata和valdata，主要调用NuscData的初始化，

a.get_scenes调用create_split_scenes得到train，val的场景ID，比如scene-xxxx list赋值scenes

b.调用prepro得到sle，赋值ixes，sle即一段场景视频中每隔0.5s采样的一帧信息

c.调用gen_dx_bx，dx：[0.5,0.5,20]代表单位长度，bx是[-49.75,49.75,0]代表起始网格点的中心，nx[200,200,1] 代表网格数目

DataLoader包装,trainloader,valloader

3) 训练数据读取

a.ixes读取sle，赋值rec

b.调用choose_cams，从6个cam中抽取5个

c.调用get_image_data，输入rec，cams，遍历每个相机，读取采集的信息和相机内外参，调用sle_augmentation

def sle_augmentation(self):
    H, W = self.data_aug_conf['H'], self.data_aug_conf['W']  # 900, 1600
    fH, fW = self.data_aug_conf['final_dim']  # 128, 352
    if self.is_train:
        resize = np.random.uniform(*self.data_aug_conf['resize_lim']) # (0.193, 0.225)区间范围内均匀采一个值
        resize_dims = (int(W*resize), int(H*resize))  # resize后的尺寸
        newW, newH = resize_dims
        # 计算裁剪的框
        crop_h = int((1 - np.random.uniform(*self.data_aug_conf['bot_pct_lim']))*newH) - fH
        crop_w = int(np.random.uniform(0, max(0, newW - fW)))
        crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
        flip = False
        if self.data_aug_conf['rand_flip'] and np.random.choice([0, 1]):
            flip = True
        rotate = np.random.uniform(*self.data_aug_conf['rot_lim'])
    else:
        resize = max(fH/H, fW/W)
        resize_dims = (int(W*resize), int(H*resize))
        newW, newH = resize_dims
        crop_h = int((1 - np.mean(self.data_aug_conf['bot_pct_lim']))*newH) - fH
        crop_w = int(max(0, newW - fW) / 2)
        crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
        flip = False
        rotate = 0
    return resize, resize_dims, crop, flip, rotate

d.调用img_transform,输入增强参数，增强原理可参考该链接：https://www.cnblogs.com/jimchen1218/p/17940326

def img_transform(img, post_rot, post_tran,
                  resize, resize_dims, crop,
                  flip, rotate):
    # adjust image
    img = img.resize(resize_dims)  # 变形 crop
    img = img.crop(crop)
    if flip:
        img = img.transpose(method=Image.FLIP_LEFT_RIGHT)
    img = img.rotate(rotate)

    # post-homography transformation
    post_rot *= resize
    post_tran -= torch.Tensor(crop[:2])
    if flip:
        A = torch.Tensor([[-1, 0], [0, 1]])
        b = torch.Tensor([crop[2] - crop[0], 0])
        post_rot = A.matmul(post_rot)
        post_tran = A.matmul(post_tran) + b
    A = get_rot(rotate/180*np.pi)
    b = torch.Tensor([crop[2] - crop[0], crop[3] - crop[1]]) / 2
    b = A.matmul(-b) + b
    post_rot = A.matmul(post_rot)
    post_tran = A.matmul(post_tran) + b

    return img, post_rot, post_tran

e.调用get_binimg，输入rec，构造shot任务：分割gt图

1）读取rec对应LiDAR的ego pose，从中读取translation和rotation，令trans为-translation,rot为rotation的逆，逆变换的参数；

2) 构造BEV网格：200x200

3）遍历rec中anns字段，也就是每个标注实例：涉及ego坐标系，sensor坐标系，世界坐标系；inst的坐标是世界坐标系，要转到ego下，使用ego pose的Rt参数进行逆变换；获取box的bottom corners的xy坐标；

通过box的whl参数，计算原点在box中心时的坐标，然后左乘旋转矩阵，平移到center中心，角点顺序如下：

bottom_corners按顺序提取2，3，7，6角点，返回3X4矩阵，因为是BEV，去除Z坐标，保留前两维，得到2x4矩阵，在转置到4x2矩阵，单位为米。

#得到在BEV上的网格坐标
pts = np.round((pts - self.bx[:2] + self.dx[:2]/2.) / self.dx[:2]).astype(np.int32) #bx第一个网格中心坐标
#imgs: [5,3,H,W]
#rots:[5,3,3]
#trans:[5,3]
#intrins:[5,3,3]
#post_rots:[5,3,3]
#post_trans:[5,3]
#binimg:[1,200,200]

lift-attend-splat bevfusion方案attend

shoot

splat

lift

strange lift 1548 hdu

riding 479e lift cf