High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation-526互联

简介
- 创新点
内容
- Entropy Coding Using Multistage Context Model
模型结构

简介

创新点

Integrated Convolution and Self-Attention (ICSA) unit
提出集成卷积和自注意力单元，形成内容自适应变换
Multistage Context Model (MCM)
多阶段的上下文模型，按照预安排的空间通道排序，进行精确的并行概率估计。

得到了比常用方法快60倍的解码速度。

内容

本文关注于图像压缩的变换和熵编码阶段

Entropy Coding Using Multistage Context Model

模型结构

感觉是结合了<不均匀通道上下文模型>和< Swin Transformer>两篇文章

残差邻域注意力块Residual Neighborhood Attention Block RNAB

这里的RNAB设计的和Swin里的设计差不多，残差的 normlization层+注意力层+normalization层+MLP

激活函数高斯误差线性单元激活函数GELU

RNAB中用到了激活函数GELU，谷歌的BERT和OpenAI的GPT-2中都用到了该函数
\(G E L U(x)=x \times P(X<=x)=x \times \phi(x), x \sim N(0,1)\)在代码计算时，用\(G E L U(x)=0.5 x\left(1+\tanh \left(\sqrt{2 / \pi}\left(x+0.044715 x^3\right)\right)\right)\)去近似。