CVPR 2024 论文和开源项目合集(Papers with Code)
CVPR 2024 的决定现在可以在 OpenReview 上查看!
注1:欢迎各位大佬提交 issue,分享 CVPR 2024 论文和开源项目!
注2:关于往年 CV 顶会论文以及其他优质 CV 论文和大盘点,详见:https://github.com/amusi/daily-paper-computer-vision
欢迎扫码加入【CVer 学术交流群】,这是最大的计算机视觉 AI 知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AI 绘画、图像处理、深度学习、自动驾驶、医疗影像和 AIGC 等方向的学习资料,学起来!
【CVPR 2024 论文开源目录】
- 3DGS(Gaussian Splatting)
- Avatars
- Backbone
- CLIP
- MAE
- Embodied AI
- GAN
- GNN
- 多模态大语言模型(MLLM)
- 大语言模型(LLM)
- NAS
- OCR
- NeRF
- DETR
- Prompt
- 扩散模型(Diffusion Models)
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 异常检测(Anomaly Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像(Medical Image)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 自动驾驶(Autonomous Driving)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 3D生成(3D Generation)
- 视频理解(Video Understanding)
- 行为检测(Action Detection)
- 文本检测(Text Detection)
- 知识蒸馏(Knowledge Distillation)
- 模型剪枝(Model Pruning)
- 图像压缩(Image Compression)
- 三维重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 轨迹预测(Trajectory Prediction)
- 车道线检测(Lane Detection)
- 图像描述(Image Captioning)
- 视觉问答(Visual Question Answering)
- 手语识别(Sign Language Recognition)
- 视频预测(Video Prediction)
- 新视点合成(Novel View Synthesis)
- Zero-Shot Learning(零样本学习)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 场景图生成(Scene Graph Generation)
- 隐式神经表示(Implicit Neural Representations)
- 图像质量评价(Image Quality Assessment)
- 视频质量评价(Video Quality Assessment)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
3DGS(Gaussian Splatting)
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
- 主页:https://city-super.github.io/scaffold-gs/
- 论文:https://arxiv.org/abs/2312.00109
- 代码:https://github.com/city-super/Scaffold-GS
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
- 主页:https://shunyuanzheng.github.io/GPS-Gaussian
- 论文:https://arxiv.org/abs/2312.02155
- 代码:https://github.com/ShunyuanZheng/GPS-Gaussian
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
- 主页:https://ingra14m.github.io/Deformable-Gaussians/
- 论文:https://arxiv.org/abs/2309.13101
- 代码:https://github.com/ingra14m/Deformable-3D-Gaussians
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
- 主页:https://yihua7.github.io/SC-GS-web/
- 论文:https://arxiv.org/abs/2312.14937
- 代码:https://github.com/yihua7/SC-GS
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
- 主页:https://oppo-us-research.github.io/SpacetimeGaussians-website/
- 论文:https://arxiv.org/abs/2312.16812
- 代码:https://github.com/oppo-us-research/SpacetimeGaussians
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
- 主页:https://fictionarry.github.io/DNGaussian/
- 论文:https://arxiv.org/abs/2403.06912
- 代码:https://github.com/Fictionarry/DNGaussian
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
Avatars
**GaussianAvatar: Towards Realistic Human Avatar Modeling from
OneLLM: One Framework to Align All Modalities with Language
大语言模型(LLM)
VTimeLLM: Empower LLM to Grasp Video Moments
NAS
ReID(重识别)
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
扩散模型(Diffusion Models)
InstanceDiffusion: Instance-level Control for Image Generation
Residual Denoising Diffusion Models
DeepCache: Accelerating Diffusion Models for Free
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
SVGDreamer: Text Guided SVG Generation with Diffusion Model
InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model
MMA-Diffusion: MultiModal Attack on Diffusion Models
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
- 主页: https://video-motion-customization.github.io/
- 论文: https://arxiv.org/abs/2312.00845
- 代码: https://github.com/HyeonHo99/Video-Motion-Customization
Vision Transformer
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
RepViT: Revisiting Mobile CNN From ViT Perspective
A General and Efficient Training for Transformer via Token Expansion
视觉和语言(Vision-Language)
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
FairCLIP: Harnessing Fairness in Vision-Language Learning
目标检测(Object Detection)
DETRs Beat YOLOs on Real-time Object Detection
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
- 论文: https://arxiv.org/abs/2312.01220
- 代码: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation
YOLO-World: Real-Time Open-Vocabulary Object Detection
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
异常检测(Anomaly Detection)
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
目标跟踪(Object Tracking)
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
- 论文: https://arxiv.org/abs/2403.04700
- 代码: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT
语义分割(Semantic Segmentation)
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
医学图像(Medical Image)
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
医学图像分割(Medical Image Segmentation)
自动驾驶(Autonomous Driving)
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Memory-based Adapters for Online 3D Scene Perception
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
A Real-world Large-scale Dataset for Roadside Cooperative Perception
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
- 论文: https://arxiv.org/abs/240
- 首页: https://video-motion-customization.github.io/
- 论文: https://arxiv.org/abs/2312.00845
- 代码: https://github.com/HyeonHo99/Video-Motion-Customization
3D生成
CityDreamer: 无边界3D城市的组成生成模型
- 首页: https://haozhexie.com/project/city-dreamer/
- 论文: https://arxiv.org/abs/2309.00610
- 代码: https://github.com/hzxie/city-dreamer
LucidDreamer: 通过区间得分匹配实现高保真文本到3D生成
视频理解
MVBench: 一个全面的多模态视频理解基准
- 论文: https://arxiv.org/abs/2311.17005
- 代码: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2
知识蒸馏
知识蒸馏中的Logit标准化
通过极小极大扩散实现高效的数据集蒸馏
立体匹配
用于立体匹配的神经马尔可夫随机场
场景图生成
HiKER-SGG: 层次化知识增强的鲁棒场景图生成
- 首页: https://zhangce01.github.io/HiKER-SGG/
- 论文 : https://arxiv.org/abs/2403.12033
- 代码: https://github.com/zhangce01/HiKER-SGG
视频质量评价
KVQ: 用于短视频的万花筒视频质量评价
数据集
用于道路协同感知的真实世界大规模数据集
通过TSP6K数据集进行交通场景解析
其他
作为下一个标记预测的对象识别
ParameterNet: 仅通过参数进行大规模视觉预训练的移动网络
通过混合位置编码实现无缝人体运动合成
LL3DA: 用于全方位3D理解、推理和规划的视觉交互式指令调整
CLOVA: 一个带有工具使用和更新的闭环视觉助手
MoMask: 3D人体动作的生成掩码建模
野外环境中的非完全可见真值与补全
- 首页: https://www.robots.ox.ac.uk/~vgg/research/amodal/
- 论文: https://arxiv.org/abs/2312.17247
- 代码: https://github.com/Championchess/Amodal-Completion-in-the-Wild
通过自洽解释改进视觉定位
ImageNet-D: 在扩散生成对象上的神经网络鲁棒性基准
- 首页: https://chenshuang-zhang.github.io/imagenet_d/
- 论文: https://arxiv.org/abs/2403.18775
- 代码: https://github.com/chenshuang-zhang/imagenet_d
从合成的人类群体活动中学习
- 首页: https://cjerry1243.github.io/M3Act/
- 论文 https://arxiv.org/abs/2306.16772
- 代码: https://github.com/cjerry1243/M3Act
一个跨主体脑解码框架
- 首页: https://littlepure2333.github.io/MindBridge/
- 论文: https://arxiv.org/abs/2404.07850
- 代码: https://github.com/littlepure2333/MindBridge
通过低秩专家混合进行多任务密集预测
用于广义类别发现的对比均值偏移学习