Attention-526互联

参考：https://blog.csdn.net/weixin_52668444/article/details/115288690

传统的机器翻译为例子来说明为什么我们需要Attention。

encoder的工作原理和RNN类似，将词向量输入到Encoder中之后，我们将最后一个hidden state的输出结果作为encoder的输出，称之为context。Context可以理解成是encoder对当前输入句子的理解。之后将context输入进decoder中，然后每一个decoder中的hidden state的输出就是decoder 所预测的当前位子的单词。

从encoder到decoder的过程中，encoder中的第一个hidden state 是随机初始化的且在encoder中我们只在乎它的最后一个hidden state的输出，但是在decoder中，它的初始hidden state 是encoder的输出，且我们关心每一个decoder中的hidden state 的输出。

这种需求下，提出Attention技术。

Attention

self-attention

self-attention attention self

attention need all you

attention-based

cross-attention

cnn-lstm-attention

transformer attention need all

convolutional segmentation rethinking attention

attention smoother kernel self