BEV开山之作LSS(Lift,Splat,Shoot)代码浅析之一:数据加载

发布时间 2024-01-02 15:26:50作者: jimchen1218

 compile_data:

1)初始化Nuscenes API

2)Segmentation Data类,__getitem__得到traindata和valdata,主要调用NuscData的初始化,

   a.get_scenes调用create_split_scenes得到train,val的场景ID,比如scene-xxxx list赋值scenes

   b.调用prepro得到sle,赋值ixes,sle即一段场景视频中每隔0.5s采样的一帧信息

   c.调用gen_dx_bx,dx:[0.5,0.5,20]代表单位长度,bx是[-49.75,49.75,0]代表起始网格点的中心,nx[200,200,1] 代表网格数目

   DataLoader包装,trainloader,valloader

3) 训练数据读取

  a.ixes读取sle,赋值rec

  b.调用choose_cams,从6个cam中抽取5个

  c.调用get_image_data,输入rec,cams,遍历每个相机,读取采集的信息和相机内外参,调用sle_augmentation

def sle_augmentation(self):
    H, W = self.data_aug_conf['H'], self.data_aug_conf['W']  # 900, 1600
    fH, fW = self.data_aug_conf['final_dim']  # 128, 352
    if self.is_train:
        resize = np.random.uniform(*self.data_aug_conf['resize_lim']) # (0.193, 0.225)区间范围内均匀采一个值
        resize_dims = (int(W*resize), int(H*resize))  # resize后的尺寸
        newW, newH = resize_dims
        # 计算裁剪的框
        crop_h = int((1 - np.random.uniform(*self.data_aug_conf['bot_pct_lim']))*newH) - fH
        crop_w = int(np.random.uniform(0, max(0, newW - fW)))
        crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
        flip = False
        if self.data_aug_conf['rand_flip'] and np.random.choice([0, 1]):
            flip = True
        rotate = np.random.uniform(*self.data_aug_conf['rot_lim'])
    else:
        resize = max(fH/H, fW/W)
        resize_dims = (int(W*resize), int(H*resize))
        newW, newH = resize_dims
        crop_h = int((1 - np.mean(self.data_aug_conf['bot_pct_lim']))*newH) - fH
        crop_w = int(max(0, newW - fW) / 2)
        crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
        flip = False
        rotate = 0
    return resize, resize_dims, crop, flip, rotate

  d.调用img_transform,输入增强参数,增强原理可参考该链接:https://www.cnblogs.com/jimchen1218/p/17940326

def img_transform(img, post_rot, post_tran,
                  resize, resize_dims, crop,
                  flip, rotate):
    # adjust image
    img = img.resize(resize_dims)  # 变形 crop
    img = img.crop(crop)
    if flip:
        img = img.transpose(method=Image.FLIP_LEFT_RIGHT)
    img = img.rotate(rotate)

    # post-homography transformation
    post_rot *= resize
    post_tran -= torch.Tensor(crop[:2])
    if flip:
        A = torch.Tensor([[-1, 0], [0, 1]])
        b = torch.Tensor([crop[2] - crop[0], 0])
        post_rot = A.matmul(post_rot)
        post_tran = A.matmul(post_tran) + b
    A = get_rot(rotate/180*np.pi)
    b = torch.Tensor([crop[2] - crop[0], crop[3] - crop[1]]) / 2
    b = A.matmul(-b) + b
    post_rot = A.matmul(post_rot)
    post_tran = A.matmul(post_tran) + b

    return img, post_rot, post_tran

e.调用get_binimg,输入rec,构造shot任务:分割gt图

  1)读取rec对应LiDAR的ego pose,从中读取translation和rotation,令trans为-translation,rot为rotation的逆,逆变换的参数;

  2)  构造BEV网格:200x200

  3) 遍历rec中anns字段,也就是每个标注实例:涉及ego坐标系,sensor坐标系,世界坐标系;inst的坐标是世界坐标系,要转到ego下,使用ego pose的Rt参数进行逆变换; 获取box的bottom corners的xy坐标;

    通过box的whl参数,计算原点在box中心时的坐标,然后左乘旋转矩阵,平移到center中心,角点顺序如下:

  bottom_corners按顺序提取2,3,7,6角点,返回3X4矩阵,因为是BEV,去除Z坐标,保留前两维,得到2x4矩阵,在转置到4x2矩阵,单位为米。

#得到在BEV上的网格坐标
pts = np.round((pts - self.bx[:2] + self.dx[:2]/2.) / self.dx[:2]).astype(np.int32) #bx第一个网格中心坐标
#imgs: [5,3,H,W]
#rots:[5,3,3]
#trans:[5,3]
#intrins:[5,3,3]
#post_rots:[5,3,3]
#post_trans:[5,3]
#binimg:[1,200,200]