综合设计——多源异构数据采集与融合应用综合实践
- 作业的基本信息
- 自己分工
  - 文心一言插件番剧信息获取功能
  - 多模态功能之图像目标识别与视频类别判别

综合设计——多源异构数据采集与融合应用综合实践

作业的基本信息

这个项目属于哪个课程	数据采集与融合技术
组名、项目简介	<组名、项目需求、项目目标、项目开展技术路线>
团队成员学号	<写上团队所有成员学号>
这个项目的目标	<写上具体方面>
其他参考文献	...

自己分工

文心一言插件番剧信息获取功能

通过爬虫爬取相关番剧网站的番剧更新信息，并结合百度千帆的ERNIE-Bot大模型对文本进行处理。进而为用户提供最新的番剧更新信息，使文心一言对数据信息的获取更具有实时性。
其中，爬取番剧的信息采用的是抓包的方式，具体如下：

import requests

def get_today_anime_info():
    url = "https://api.bilibili.com/pgc/web/timeline?types=1&before=6&after=6"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 Edg/117.0.2045.31"
    }
    response = requests.get(url=url, headers=headers)
    data_json = response.json()

    today_anime_info = ""
    # 爬取当天的新番
    for day_dic in data_json["result"]:
        if day_dic["is_today"] == 1:
            for episode in day_dic["episodes"]:
                title = episode["title"]
                pub_index = episode["pub_index"]
                pub_time = episode["pub_time"]
                today_anime_info += f"番剧名：{title}, 更新进度：{pub_index}, 更新时间：{pub_time}\n"
    return today_anime_info

Gitee文件夹链接：
https://gitee.com/zheng-wanling/crawl_project/blob/master/综合设计——多源异构数据采集与融合应用综合实践/shiwen_fzu.zip

多模态功能之图像目标识别与视频类别判别

首先先对相关数据集进行训练，使模型可以识别20类的目标，然后再对模型进行进一步的拓展。
其中，第一次训练的20类模型具体如下：

dict = {1:'aeroplane',    2:'bicycle', 3:'bird',   4:'boat',       5:'bottle',
        6:'bus',          7:'car',     8:'cat',    9:'chair',      10:'cow',
        11:'diningTable', 12:'dog',    13:'horse', 14:'motorbike', 15:'person',
        16:'pottedPlant', 17:'sheep',  18:'sofa',  19:'train',     20:'TV'}

对数据集进行预处理：

# TensorFlow session
gpu_options = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(log_device_placement=False, gpu_options=gpu_options)
isess = tf.InteractiveSession(config=config)

l_VOC_CLASS = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle',
               'bus', 'car', 'cat', 'chair', 'cow',
               'diningTable', 'dog', 'horse', 'motorbike', 'person',
               'pottedPlant', 'sheep', 'sofa', 'train', 'TV']

# 定义数据格式，设置占位符
net_shape = (300, 300)
# 预处理，以Tensorflow backend, 将输入图片大小改成 300x300，作为下一步输入
img_input = tf.placeholder(tf.uint8, shape=(None, None, 3))
# 输入图像的通道排列形式，'NHWC'表示 [batch_size,height,width,channel]
data_format = 'NHWC'

# 数据预处理，将img_input输入的图像resize为300大小，labels_pre,bboxes_pre,bbox_img待解析
image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
    img_input, None, None, net_shape, data_format,
    resize=ssd_vgg_preprocessing.Resize.WARP_RESIZE)
# 拓展为4维变量用于输入
image_4d = tf.expand_dims(image_pre, 0)

接着我们要定义对应的SSD模型，以便进行目标检测：

# 定义SSD模型
# 是否复用，目前我们没有在训练所以为None
reuse = True if 'ssd_net' in locals() else None
# 调出基于VGG神经网络的SSD模型对象，注意这是一个自定义类对象
ssd_net = ssd_vgg_300.SSDNet()
# 得到预测类和预测坐标的Tensor对象，这两个就是神经网络模型的计算流程
with slim.arg_scope(ssd_net.arg_scope(data_format=data_format)):
    predictions, localisations, _, _ = ssd_net.net(image_4d, is_training=False, reuse=reuse)

# 导入官方给出的 SSD 模型参数
ckpt_filename = '../checkpoints/ssd_300_vgg.ckpt'
# ckpt_filename = '../checkpoints/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt'
isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)

然后开始具体的模型训练和测试：

ssd_anchors = ssd_net.anchors(net_shape)


# 加载辅助作图函数
def colors_subselect(colors, num_classes=21):
    dt = len(colors) // num_classes
    sub_colors = []
    for i in range(num_classes):
        color = colors[i * dt]
        if isinstance(color[0], float):
            sub_colors.append([int(c * 255) for c in color])
        else:
            sub_colors.append([c for c in color])
    return sub_colors


def bboxes_draw_on_img(img, classes, scores, bboxes, colors, thickness=2):
    shape = img.shape
    for i in range(bboxes.shape[0]):
        bbox = bboxes[i]
        color = colors[classes[i]]
        # Draw bounding box...
        p1 = (int(bbox[0] * shape[0]), int(bbox[1] * shape[1]))
        p2 = (int(bbox[2] * shape[0]), int(bbox[3] * shape[1]))
        cv2.rectangle(img, p1[::-1], p2[::-1], color, thickness)
        # Draw text...
        s = '%s/%.3f' % (l_VOC_CLASS[int(classes[i]) - 1], scores[i])
        p1 = (p1[0] - 5, p1[1])
        # cv2.putText(img, s, p1[::-1], cv2.FONT_HERSHEY_DUPLEX, 1.5, color, 3)


colors_plasma = colors_subselect(mpcm.plasma.colors, num_classes=21)


# 主流程函数
def process_image(img, case, select_threshold=0.15, nms_threshold=.1, net_shape=(300, 300)):
    # select_threshold：box阈值——每个像素的box分类预测数据的得分会与box阈值比较，高于一个box阈值则认为这个box成功框到了一个对象
    # nms_threshold：重合度阈值——同一对象的两个框的重合度高于该阈值，则运行下面去重函数

    # 执行SSD模型，得到4维输入变量，分类预测，坐标预测，rbbox_img参数为最大检测范围，本文固定为[0,0,1,1]即全图
    rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions,
                                                               localisations, bbox_img], feed_dict={img_input: img})

    # ssd_bboxes_select()函数根据每个特征层的分类预测分数，归一化后的映射坐标，
    # ancohor_box的大小，通过设定一个阈值计算得到每个特征层检测到的对象以及其分类和坐标
    rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(rpredictions, rlocalisations, ssd_anchors,
                                                              select_threshold=select_threshold,
                                                              img_shape=net_shape,
                                                              num_classes=21, decode=True)

    # 检测有没有超出检测边缘
    rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
    rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k=400)
    # 去重，将重复检测到的目标去掉
    rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold=nms_threshold)
    # 将box的坐标重新映射到原图上（上文所有的坐标都进行了归一化，所以要逆操作一次）
    rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)

    if case == 1:
        bboxes_draw_on_img(img, rclasses, rscores, rbboxes, colors_plasma, thickness=8)
        return img
    else:
        return rclasses, rscores, rbboxes

然后对图片进行目标识别，示例如下：

在模型能够初步对目标进行标记后，我们小组进一步对模型进行优化，使其能够检测更多的图像类别，并由对图像的目标检测逐步过渡到对视频的目标检测，通过对视频进行抽帧处理，判断视频的类别（主要分为娱乐搞笑、生活日常、教育知识、健身运动、才艺展示这五大类）
其中，在视频内主要的检测对象有：

names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane',
        5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 
        10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench',
        14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow',
        20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack',
        25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee',
        30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat',
        35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket',
        39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon',
        45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli',
        51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair',
        57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet',
        62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone',
        68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator',
        73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear',
        78: 'hair drier', 79: 'toothbrush'}

Gitee文件夹链接（整个文件夹太大超过100M，故这里只放了主要的训练代码）：
https://gitee.com/zheng-wanling/crawl_project/blob/master/综合设计——多源异构数据采集与融合应用综合实践/demo_test.py