16X16

《AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE》阅读笔记

论文标题 《AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE》 谷歌论文起名越来越写意了,“一幅图像值16X16个单词” 是什么玩意儿。 AT SCALE:说明适合大规模的图片识别,也许小规模的不好使 ......
IMAGE TRANSFORMERS RECOGNITION 笔记 16X16

An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale

模型如下图所示: 将H×W×C的图像reshape成了N×(P2×C),其中(H,W)是图像的原始分辨率,C是通道数,(P,P)是每个图像块的分辨率,N=H×W/P2为图像块的数量,将一个图像块使用可学习的线性层映射到维度为D的隐藏向量,如式(1)所示,线性映射的输出称为patch embeddin ......
Image Transformers Recognition 16x16 Worth
共2篇  :1/1页 首页上一页1下一页尾页