讲座笔记1：六月 28 日 CCF 广州珠江论坛 29 第四届自然语言处理前沿论坛-526互联

曲维光面向语言实际的语言信息处理

文学院和计电学院的共同工作

CA-CAMR语义表示体系

看他的博士生的论文

周栋面向自然语言文本的鲁棒表示学习及其应用研究

表示学习方法 是一种从原始数据中抽取有用特征

静态食谱检索，跨模态分子检索

基于流形学习的静态词表征优化方法

LLE局部线性嵌入、等距映射、局部正切

斯皮尔曼相关系数用于评估

基于局部切信息的蒸馏词表征优化方法：池化

基于张量分解的跨模态食谱检索方法：异构的模态数据之间存在的语义鸿沟、语义空间转换、

基于对抗网络的跨模态分子检索方法：cp.图卷积网络

权小军预训练语言模型知识蒸馏技术研究

The era of large language models(LLMs):

Challenges: resources, Cost, Latency, Adaption, Privacy

Model Compression: Pruning, Quantization, Knowledge distillation (大模型中提取出小模型）

Why not train a small model directly? —Small models are hard to train.

“Distilling the knowledge in a Neural Network”

Different strategies are: Offline distillation, Online distillation, Self-distillation (第一种用得最多）

What to match between student and teacher?

Output logist
Intermediate weight
Intermediate feature
Gradients
Relational information
Attributes

DistilBERT

Patient-KD: Patient Knowledge Distillation

TinyBERT

AD-KD: Attribution-Driven Knowledge Distillation

ChatGPT:

No internal parameters/features

“Steal”data from LLMs and instruction-finetune(指令微调) a small model: Alpaca, Vicuna, Baize

How to derive a student model from ChatGPT?

Chain of thought(CoT)

指令微调侧重于一般规划能力，CoT主要是特有数据的能力

规划长期（六个月）的项目能避免在发稿之前跟别人重合太多

传统跨语言项目的意义：用ChatGPT来辅助传统项目的研究

对API中数据的一定要验证后才能调教小模型

发自我的iPad

526互联

讲座笔记1：六月 28 日 CCF 广州 珠江论坛 29 第四届自然语言处理前沿论坛

讲座笔记1：六月 28 日 CCF 广州珠江论坛 29 第四届自然语言处理前沿论坛