ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITIONEMBEDDING 论文
我们先看hf官网上给的说明:
https://hf-mirror.com/docs/transformers/model_doc/roformer
RoPE comes with valuable properties such as flexibility of being expand to any sequence lengths, decaying inter-token dependency with increasing relative distances, and capability of equipping the linear self-attention with relative position encoding.
特性: 位置编码可以拓展到任意长度, 内在token之间的依赖性跟他们的相对距离长度进行衰减,
下面都是使用文档意思不大. 直接看论文.