综合设计——多源异构数据采集与融合应用综合实践-526互联

这个项目属于哪个课程	2023数据采集与融合技术
组名、项目简介	组名：Double 20000、项目需求：设计出一个交互友好的多源异构数据的采集与融合的小应用、项目目标：通过在网页中上传文本、图片、视频或音频分析其中的情感、项目开展技术路线：前端3件套、Python、fastapi
团队成员学号	042101414、052101230、102102104、102102105、102102108、102102111、102102157、102102158
这个项目目标	通过在网页中上传文本、图片、视频或音频分析其中的情感
其他参考文献	[1]李慧,庞经纬.基于文图音融合的多模态网民情感识别研究[J/OL].数据分析与知识发现:1-17[2023-12-13].http://kns.cnki.net/kcms/detail/10.1478.g2.20231011.1557.012.html.

项目整体介绍：

项目名称：多模态情感分析系统

项目背景：在当前的数字化时代，情感分析在各种应用中变得越来越重要，如客户服务、市场分析和社交媒体监控。多模态情感分析能够提供比单一模态更丰富、更准确的情感识别和分析。

项目目标：开发一个多模态情感分析系统，能够处理和分析文本、图片、音频和视频数据，从而提供综合的情感分析结果。

技术路线：

前端开发：
- 使用HTML、CSS和JavaScript进行界面设计，实现用户与系统的交互。
- 集成文件上传功能，支持文本、图片、音频和视频文件。
后端开发：
- 使用Python进行后端逻辑的编写。
- 利用FastAPI框架处理前端请求和数据传输。
数据处理与分析：
- 文本分析：最开始自己训练模型但是后来因为文心一言的准确率更高，因此采用文心一言的接口进行文本情感分析。
  
  （音频、视频、图片找不到接口，因此自己训练模型）
- 音频分析
  - 使用RAVDESS数据集进行训练。
  - 对上传的音频文件进行特征提取和情感识别。
- 图片分析：
  - 使用VGG模型进行图像处理。
  - 利用CK+和FER数据集进行情感分类。
- 视频分析（找不到可以训练视频的模型，最后只能通过提取音频进行分析）
  - 提取视频中的音频部分。
  - 对提取的音频进行分析，使用同音频分析的方法。
结果输出与展示：将分析结果通过前端界面展示。

最终效果：

通过在本地上传文件进行分析并且得到结果

自己分工：

进行图片的训练同时将pth文件进行保存，并且利用pth文件进行图像的预测。首先我先查找了CK+和FER数据集（CK+数据集是用于人脸表情识别的公共数据集之一）进行训练，分别通过VGG和resnet这两个模型进行训练，同时选出准确率更高的一个模型，同时保存最好的训练与预测的模型与参数文件，进行预测

模型保存：

if opt.resume:
    # Load checkpoint.
    print('==> Resuming from checkpoint..')
    assert os.path.isdir(path), 'Error: no checkpoint directory found!'
    checkpoint = torch.load(os.path.join(path,'Test_model.pth'))
    
    net.load_state_dict(checkpoint['net'])
    best_Test_acc = checkpoint['best_Test_acc']
    best_Test_acc_epoch = checkpoint['best_Test_acc_epoch']
    start_epoch = best_Test_acc_epoch + 1
    
    
 if Train_acc > best_Train_acc:
        print('Saving best train model..')
        print("best_Train_acc: %0.3f" % Train_acc)
        best_Train_acc = Train_acc
        best_Train_acc_epoch = epoch
        state = {
            'net': net.state_dict() if use_cuda else net,
            'best_Train_acc': best_Train_acc,
            'epoch': epoch,
        }
        model_path = os.path.join(path, 'Best_Train_model.pth')
        if not os.path.isdir(os.path.dirname(model_path)):
            os.makedirs(os.path.dirname(model_path))
        torch.save(state, model_path)

训练结果

Model:VGG19;
Test_acc：94.646%
Model:Resnet18;
Test_acc;94.040%

预测：

import torch
from torchvision import transforms
from PIL import Image
from models import VGG 

model = VGG('VGG19')
checkpoint = torch.load(r'CK+_VGG19\1\Best_Train_model(1).pth')
if isinstance(checkpoint['net'], torch.nn.Module):
    model = checkpoint['net']
else:
    model.load_state_dict(checkpoint['net'])
    
def preprocess_image(image_path):
    transform = transforms.Compose([
        transforms.Resize((48, 48)),
        transforms.Grayscale(num_output_channels=3),  # 将图像转换为3通道RGB
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    image = Image.open(image_path)
    image = transform(image).unsqueeze(0)  # 增加一个维度来表示批大小
    return image

def predict(image_path):
    image = preprocess_image(image_path)
    with torch.no_grad():  # 关闭梯度计算
        outputs = model(image)
        _, predicted = torch.max(outputs.data, 1)
        return predicted.item()

image_path = r'CK+48\surprise\S046_002_00000004.png'
prediction = predict(image_path)
print(f'Predicted class: {prediction}')

但是可能模型太过拟合，因此只对训练集的图像具有较高的准确率，对其他图片准确率不高，但由于时间限制，来不及调整参数部分，所以其实预测准确率不高。

通过前端调用接口，这一部分是两个人合作完成的，通过前端向已经部署在服务器上的代码并且查看接口文档发送请求，实现前后端的连接

//文本分析
document.getElementById('analyze-text').addEventListener('click', function() {
    const text = document.getElementById('text-input').value;
    const url = `http://1.92.69.178:8000/text?text_word=${encodeURIComponent(text)}`;

    fetch(url, {
        method: 'POST',
        headers: {
            'accept': 'application/json'
        }
    })
    .then(response => response.json())
    .then(data => {
        document.getElementById('text-analysis-result').textContent ="预测类别: " + data.response;
    })
    .catch(error => {
        console.error('Error:', error);
    });
});

// 图片分析
document.getElementById('analyze-image').addEventListener('click', function() {
    const fileInput = document.getElementById('image-input');
    const file = fileInput.files[0];
    if (file) {
        const formData = new FormData();
        formData.append('file', file);

        fetch('http://1.92.69.178:8000/predict-emotion', {
            method: 'POST',
            body: formData
        })
        .then(response => response.json())
        .then(data => {
            document.getElementById('image-analysis-result').textContent = "预测类别: " + data["预测类别:"];
        })
        .catch(error => {
            console.error('Error:', error);
        });
    } else {
        console.log("未选择图片");
    }
});

// 音频分析
document.getElementById('analyze-audio').addEventListener('click', function() {
    const fileInput = document.getElementById('audio-input');
    const file = fileInput.files[0];
    if (file) {
        const formData = new FormData();
        formData.append('audio_file', file);

        fetch('http://1.92.69.178:8000/audio', {
            method: 'POST',
            body: formData
        })
        .then(response => response.json())
        .then(data => {
            const translatedEmotion = translateEmotion(data.prediction);
            document.getElementById('audio-analysis-result').textContent = "预测结果: " + translatedEmotion;
        })

        .catch(error => {
            console.error('Error:', error);
        });
    } else {
        console.log("未选择音频文件");
    }
});

// 视频分析
document.getElementById('analyze-video').addEventListener('click', function() {
    const fileInput = document.getElementById('video-input');
    const file = fileInput.files[0];
    if (file) {
        const formData = new FormData();
        formData.append('file', file);

        fetch('http://1.92.69.178:8000/video', {
            method: 'POST',
            body: formData
        })
        .then(response => response.json())
        .then(data => {
            const translatedEmotion = translateEmotion(data.prediction);
            document.getElementById('video-analysis-result').textContent = "预测结果: " + translatedEmotion;
        })
        .catch(error => {
            console.error('Error:', error);
        });
    } else {
        console.log("未选择视频文件");
    }
});

前端部署至服务器，在后端已经部署在服务器的基础上，将前端代码同时部署至服务器上，但最开始因为前端代码有中文路径，部署之后无法识别，因此需要经中文路径改为英文。

gitee文件夹