ECCV 2024 论文和开源项目合集(Papers with Code)
ECCV 2024 decisions are now available!
注1:欢迎各位大佬提交issue,分享ECCV 2024论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
想看ECCV 2024和最新最全的顶会工作,欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来!
【ECCV 2024 论文开源目录】
- 3DGS(Gaussian Splatting)
- Mamba / SSM)
- Avatars
- Backbone
- CLIP
- MAE
- Embodied AI
- GAN
- GNN
- 多模态大语言模型(MLLM)
- 大语言模型(LLM)
- NAS
- OCR
- NeRF
- DETR
- Prompt
- 扩散模型(Diffusion Models)
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 异常检测(Anomaly Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像(Medical Image)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 自动驾驶(Autonomous Driving)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 3D生成(3D Generation)
- 视频理解(Video Understanding)
- 行为识别(Action Recognition)
- 行为检测(Action Detection)
- 文本检测(Text Detection)
- 知识蒸馏(Knowledge Distillation)
- 模型剪枝(Model Pruning)
- 图像压缩(Image Compression)
- 三维重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 轨迹预测(Trajectory Prediction)
- 车道线检测(Lane Detection)
- 图像描述(Image Captioning)
- 视觉问答(Visual Question Answering)
- 手语识别(Sign Language Recognition)
- 视频预测(Video Prediction)
- 新视点合成(Novel View Synthesis)
- Zero-Shot Learning(零样本学习)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 场景图生成(Scene Graph Generation)
- 计数(Counting)
- 隐式神经表示(Implicit Neural Representations)
- 图像质量评价(Image Quality Assessment)
- 视频质量评价(Video Quality Assessment)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
3DGS(Gaussian Splatting)
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
- Project: https://donydchen.github.io/mvsplat
- Paper: https://arxiv.org/abs/2403.14627
- Code:https://github.com/donydchen/mvsplat
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
- Project: https://zehaozhu.github.io/FSGS/
- Paper: https://arxiv.org/abs/2312.00451
- Code: https://github.com/VITA-Group/FSGS
Mamba / SSM
VideoMamba: State Space Model for Efficient Video Understanding
ZIGMA: A DiT-style Zigzag Mamba Diffusion Model
- Paper: https://arxiv.org/abs/2403.13802
- Code: https://taohu.me/zigma/
Avatars
Backbone
CLIP
MAE
Embodied AI
GAN
OCR
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
Occupancy
Fully Sparse 3D Occupancy Prediction
NeRF
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
- Project: https://nerf-mae.github.io/
- Paper: https://arxiv.org/pdf/2404.01300
- Code: https://github.com/zubair-irshad/NeRF-MAE
DETR
Prompt
多模态大语言模型(MLLM)
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
ControlCap: Controllable Region-level Captioning
大语言模型(LLM)
NAS
ReID(重识别)
扩散模型(Diffusion Models)
ZIGMA: A DiT-style Zigzag Mamba Diffusion Model
- Paper: https://arxiv.org/abs/2403.13802
- Code: https://taohu.me/zigma/
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
- Project: https://ut-mao.github.io/noise.github.io/
- Paper: https://arxiv.org/abs/2312.08872
- Code: https://github.com/UT-Mao/Initial-Noise-Construction
Vision Transformer
GiT: Towards Generalist Vision Transformer through Universal Language Interface
视觉和语言(Vision-Language)
GalLoP: Learning Global and Local Prompts for Vision-Language Models
目标检测(Object Detection)
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
- Paper: https://arxiv.org/abs/2407.11699v1
- Code: https://github.com/xiuqhou/Relation-DETR
- Dataset: https://huggingface.co/datasets/xiuqhou/SA-Det-100k
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
- Project: http://yuqianfu.com/CDFSOD-benchmark/
- Paper: https://arxiv.org/pdf/2402.03094
- Code: https://github.com/lovelyqian/CDFSOD-benchmark
异常检测(Anomaly Detection)
目标跟踪(Object Tracking)
语义分割(Semantic Segmentation)
Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
医学图像(Medical Image)
Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
- Project: https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k
- Paper : https://arxiv.org/abs/2407.08813
- Dataset: https://drive.google.com/drive/u/1/folders/1huH93JVeXMj9rK6p1OZRub868vv0UK0O
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain
医学图像分割(Medical Image Segmentation)
ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
- Project: https://scribbleprompt.csail.mit.edu/
- Paper: https://arxiv.org/abs/2312.07381
- Code: https://github.com/halleewong/ScribblePrompt
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
- Paper: https://arxiv.org/abs/2407.14754
- Code: https://github.com/cbmi-group/FFM-Multi-Decoder-Network
视频目标分割(Video Object Segmentation)
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
- Project: https://zhang-tao-whu.github.io/projects/DVIS_DAQ/
- Paper: https://arxiv.org/abs/2404.00086
- Code: https://github.com/zhang-tao-whu/DVIS_Plus
自动驾驶(Autonomous Driving)
Fully Sparse 3D Occupancy Prediction
milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
4D对比超流是密集3D表征学习器
3D点云(3D-Point-Cloud)
3D目标检测(3D Object Detection)
具有动态空间剪枝的3D小目标检测
- 项目: https://xuxw98.github.io/DSPDet3D/
- 论文: https://arxiv.org/abs/2305.03716
- 代码: https://github.com/xuxw98/DSPDet3D
光线去噪:基于深度感知的难负样本采样用于多视图3D目标检测
3D语义分割(3D Semantic Segmentation)
图像编辑(Image Editing)
图像补全/图像修复(Image Inpainting)
BrushNet:具有分解双分支扩散的即插即用图像补全模型
- 项目: https://tencentarc.github.io/BrushNet/
- 论文: https://arxiv.org/abs/2403.06976
- 代码: https://github.com/TencentARC/BrushNet
视频编辑(Video Editing)
低级视觉(Low-level Vision)
通过直方图变换在恶劣天气条件下恢复图像
OneRestore:复合退化的通用恢复框架
- 项目: https://gy65896.github.io/projects/ECCV2024_OneRestore
- 论文: https://arxiv.org/abs/2407.04621
- 代码: https://github.com/gy65896/OneRestore
超分辨率(Super-Resolution)
去噪(Denoising)
图像去噪(Image Denoising)
3D人体姿态估计(3D Human Pose Estimation)
图像生成(Image Generation)
在文本到图像扩散模型中基于对象条件的能量式注意力图对齐
每个像素都有其时刻:通过密集归一化实现超高分辨率的无对齐图像到图像翻译
- 项目: https://kaminyou.com/Dense-Normalization/
- 论文: https://arxiv.org/abs/2407.04245
- 代码: https://github.com/Kaminyou/Dense-Normalization
ZIGMA:一种DiT风格的之字形曼巴扩散模型
现象空间中的倾斜阻碍了文本到图像生成的泛化
视频生成(Video Generation)
VideoStudio:生成内容一致和多场景的视频
3D生成
视频理解(Video Understanding)
VideoMamba:高效视频理解的状态空间模型
C2C:用于零样本组合动作识别的组件到组合学习
行为识别(Action Recognition)
SA-DVAE:通过解耦变分自编码器改进零样本基于骨架的动作识别
知识蒸馏(Knowledge Distillation)
图像压缩(Image Compression)
基于空间频率自适应的机器和人类视觉图像压缩
立体匹配(Stereo Matching)
场景图生成(Scene Graph Generation)
计数(Counting)
通过良好示例进行零样本目标计数
视频质量评价(Video Quality Assessment)
数据集(Datasets)
其他(Others)
用于3D视觉定位的多分支协作学习网络
PDiscoFormer:通过视觉变压器放宽部分发现约束
SPVLoc:在未知环境中用于6D相机定位的语义全景视口匹配
- 项目: https://fraunhoferhhi.github.io/spvloc/
- 论文: https://arxiv.org/abs/2404.10527
- 代码: https://github.com/fraunhoferhhi/spvloc
REFRAME:移动设备实时渲染反射表面