transformers retentive networks视觉
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition * Authors: [[Aravind Srinivas]], [[Tsung-Yi Lin]], [[Niki Parmar]], [[Jonathon Shlens]], [[Pieter Abbee ......
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation * Authors: [[Olaf Ronneberger]], [[Philipp Fischer]], [[Thomas Brox]] Local library 初读 ......
Non-local Neural Networks 第一次将自注意力用于cv
Non-local Neural Networks * Authors: [[Xiaolong Wang]], [[Ross Girshick]], [[Abhinav Gupta]], [[Kaiming He]] Local library 初读印象 comment:: (NonLocal)过去 ......
SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation * Authors: [[Qiang Wan]], [[Zilong Huang]], [[Jiachen Lu]], [[Gang Yu]] ......
RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation
RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation * Authors: [[Guosheng Lin]], [[Anton Milan]], [[Chunhua Shen]], [[ ......
Expectation-Maximization Attention Networks for Semantic Segmentation 使用了EM算法的注意力
Expectation-Maximization Attention Networks for Semantic Segmentation * Authors: [[Xia Li]], [[Zhisheng Zhong]], [[Jianlong Wu]], [[Yibo Yang]], [[Zho ......
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery * Authors: [[Libo Wang]], [[Rui Li]], [[ ......
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network * Authors: [[Wenzhe Shi]], [[Jose Caballer ......
Pyramid Scene Parsing Network
Pyramid Scene Parsing Network * Authors: [[Hengshuang Zhao]], [[Jianping Shi]], [[Xiaojuan Qi]], [[Xiaogang Wang]], [[Jiaya Jia]] DOI: 10.1109/CVPR.20 ......
SegViT: Semantic Segmentation with Plain Vision Transformers
SegViT: Semantic Segmentation with Plain Vision Transformers * Authors: [[Bowen Zhang]], [[Zhi Tian]], [[Quan Tang]], [[Xiangxiang Chu]], [[Xiaolin We ......
Asymmetric Non-Local Neural Networks for Semantic Segmentation 非对称注意力
Asymmetric Non-Local Neural Networks for Semantic Segmentation * Authors: [[Zhen Zhu]], [[Mengdu Xu]], [[Song Bai]], [[Tengteng Huang]], [[Xiang Bai]] ......
PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers
PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers * Authors: [[Jiacong Xu]], [[Zixiang Xiong]], [[Shankar P. Bhattacharyya ......
PSANet: Point-wise Spatial Attention Network for Scene Parsing双向注意力
PSANet: Point-wise Spatial Attention Network for Scene Parsing * Authors: [[Hengshuang Zhao]], [[Yi Zhang]], [[Shu Liu]], [[Jianping Shi]], [[Chen Cha ......
Object Tracking Network Based on Deformable Attention Mechanism
Object Tracking Network Based on Deformable Attention Mechanism Local library 初读印象 comment:: (DeTrack)采用基于可变形注意力机制的编码器模块和基于自注意力机制的编码器模块相结合的方式进行特征交互。基于 ......
BiFormer: Vision Transformer with Bi-Level Routing Attention 使用超标记的轻量ViT
alias: Zhu2023a tags: 超标记 注意力 rating: ⭐ share: false ptype: article BiFormer: Vision Transformer with Bi-Level Routing Attention * Authors: [[Lei Zhu] ......
Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images * Authors: [[Bowei Du]], [[Yecheng ......
A Deformable Attention Network for High-Resolution Remote Sensing Images Semantic Segmentation可变形注意力
A Deformable Attention Network for High-Resolution Remote Sensing Images Semantic Segmentation * Authors: [[Renxiang Zuo]], [[Guangyun Zhang]], [[Rong ......
2021-CVPR-Transformer Tracking
Transformer Tracking 相关性在跟踪领域起着关键作用,特别是在最近流行的暹罗跟踪器中。相关操作是考虑模板与搜索区域之间相似性的一种简单的融合方式。然而,相关操作本身是一个局部线性匹配过程,导致语义信息的丢失并容易陷入局部最优,这可能是设计高精度跟踪算法的瓶颈。还有比相关性更好的特征 ......
Rethinking and Improving Relative Position Encoding for Vision Transformer: ViT中的位置编码
Rethinking and Improving Relative Position Encoding for Vision Transformer * Authors: [[Kan Wu]], [[Houwen Peng]], [[Minghao Chen]], [[Jianlong Fu]], ......
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition:使用大核卷积调制来简化注意力
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition * Authors: [[Qibin Hou]], [[Cheng-Ze Lu]], [[Ming-Ming Cheng]], [[Jiashi Feng]] ......
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows详解
初读印象 comment:: (Swin-transformer)代码:https://github. com/microsoft/Swin-Transformer 动机 将在nlp上主流的Transformer转换到cv上。存在以下困难: nlp中单词标记是一个基本单元,但是视觉元素在尺度上有很大 ......
SiReN Sign-Aware Recommendation Using Graph Neural Networks论文阅读笔记
Abstract 目前使用GNN的推荐系统主要利用高评分的正向用户-物品交互信息。但是如何利用低评分来表示用户的偏好是一个挑战,因为低评分仍然可以提供有用的信息。所以在本文中提出了基于GNN模型的有符号感知推荐系统SiReN,SiReN有三个关键组件 构造一个符号二部图更精确的表示用户的偏好,分为两 ......
Fully Attentional Network for Semantic Segmentation:FLANet
Fully Attentional Network for Semantic Segmentation * Authors: [[Qi Song]], [[Jie Li]], [[Chenghong Li]], [[Hao Guo]], [[Rui Huang]] 初读印象 comment:: (F ......
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation;OCRNet
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation * Authors: [[Yuhui Yuan]], [[Xiaokang Chen]], [[Xilin Chen]], [[ ......
从滑动窗口到YOLO、Transformer:目标检测的技术革新
本文全面回顾了目标检测技术的演进历程,从早期的滑动窗口和特征提取方法到深度学习的兴起,再到YOLO系列和Transformer的创新应用。通过对各阶段技术的深入分析,展现了计算机视觉领域的发展趋势和未来潜力。 关注TechLead,分享AI全维度知识。作者拥有10+年互联网服务架构、AI产品研发经验 ......
Instruction-Following Agents with Multimodal Transformer
概述 提出了InstructRL,包含一个multimodal transformer用来将视觉obs和语言的instruction进行编码,以及一个transformer-based policy,可以基于编码的表示来输出actions。 前者在1M的image-text对和NL的text上进行训 ......
关于UIView transform使用注意点
先上代码 let tView = UIView()override func viewDidLoad() { tView.backgroundColor = .orange view.addSubview(tView)} override func viewWillLayoutSubViews() ......
计算机视觉大作业
要使用Matlab实现这样的语言引导模型,涉及到自然语言处理和图像生成两个主要任务。下面是一个简单的示例,演示如何使用Matlab进行语言引导的图像生成。 首先,需要安装并加载一些必要的工具箱,例如计算机视觉工具箱、深度学习工具箱等。 % 加载计算机视觉工具箱和深度学习工具箱 addpath('路径 ......
将Transformer用于扩散模型,AI 生成视频达到照片级真实感
前言 在视频生成场景中,用 Transformer 做扩散模型的去噪骨干已经被李飞飞等研究者证明行得通。这可算得上是 Transformer 在视频生成领域取得的一项重大成功。 本文转载自机器之心 仅用于学术分享,若侵权请联系删除 欢迎关注公众号CV技术指南,专注于计算机视觉的技术总结、最新技术跟踪 ......
纯卷积BEV模型的巅峰战力 | BEVENet不用Transformer一样成就ADAS的量产未来(转)
近年来,在自动驾驶领域,鸟瞰视角(BEV)空间中的3D目标检测作为一种普遍的方法逐渐脱颖而出。尽管与视角视图方法相比,BEV方法在精度和速度估计方面得到了改进,但将BEV技术部署到实际自动驾驶车辆中仍然具有挑战性。这主要归因于它们依赖于基于视觉 Transformer (ViT)的架构,这使得相对于 ......