ECCV 2024 论文和开源项目合集(Papers with Code)

ECCV 2024 decisions are now available！

注1：欢迎各位大佬提交issue，分享ECCV 2024论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

CVPR 2024

ECCV 2022

ECCV 2020

想看ECCV 2024和最新最全的顶会工作，欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！

【ECCV 2024 论文开源目录】

3DGS(Gaussian Splatting)
Mamba / SSM)
Avatars
Backbone
CLIP
MAE
Embodied AI
GAN
GNN
多模态大语言模型(MLLM)
大语言模型(LLM)
NAS
OCR
NeRF
DETR
Prompt
扩散模型(Diffusion Models)
ReID(重识别)
长尾分布(Long-Tail)
Vision Transformer
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
异常检测(Anomaly Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
全景分割(Panoptic Segmentation)
医学图像(Medical Image)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video Object Segmentation)
视频实例分割(Video Instance Segmentation)
参考图像分割(Referring Image Segmentation)
图像抠图(Image Matting)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
去噪(Denoising)
去模糊(Deblur)
自动驾驶(Autonomous Driving)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D语义场景补全(3D Semantic Scene Completion)
3D配准(3D Registration)
3D人体姿态估计(3D Human Pose Estimation)
3D人体Mesh估计(3D Human Mesh Estimation)
医学图像(Medical Image)
图像生成(Image Generation)
视频生成(Video Generation)
3D生成(3D Generation)
视频理解(Video Understanding)
行为识别(Action Recognition)
行为检测(Action Detection)
文本检测(Text Detection)
知识蒸馏(Knowledge Distillation)
模型剪枝(Model Pruning)
图像压缩(Image Compression)
三维重建(3D Reconstruction)
深度估计(Depth Estimation)
轨迹预测(Trajectory Prediction)
车道线检测(Lane Detection)
图像描述(Image Captioning)
视觉问答(Visual Question Answering)
手语识别(Sign Language Recognition)
视频预测(Video Prediction)
新视点合成(Novel View Synthesis)
Zero-Shot Learning(零样本学习)
立体匹配(Stereo Matching)
特征匹配(Feature Matching)
场景图生成(Scene Graph Generation)
计数(Counting)
隐式神经表示(Implicit Neural Representations)
图像质量评价(Image Quality Assessment)
视频质量评价(Video Quality Assessment)
数据集(Datasets)
新任务(New Tasks)
其他(Others)

3DGS(Gaussian Splatting)

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Project: https://donydchen.github.io/mvsplat
Paper: https://arxiv.org/abs/2403.14627
Code：https://github.com/donydchen/mvsplat

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Paper: https://arxiv.org/abs/2404.01133
Code: https://github.com/DekuLiuTesla/CityGaussian

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Project: https://zehaozhu.github.io/FSGS/
Paper: https://arxiv.org/abs/2312.00451
Code: https://github.com/VITA-Group/FSGS

Mamba / SSM

VideoMamba: State Space Model for Efficient Video Understanding

Paper: https://arxiv.org/abs/2403.06977
Code: https://github.com/OpenGVLab/VideoMamba

ZIGMA: A DiT-style Zigzag Mamba Diffusion Model

Paper: https://arxiv.org/abs/2403.13802
Code: https://taohu.me/zigma/

Avatars

Backbone

CLIP

MAE

Embodied AI

GAN

OCR

Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors

Paper: https://arxiv.org/pdf/2312.05286
Code: https://github.com/SJTU-DeepVisionLab/FreeReal

PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer

Paper: https://arxiv.org/abs/2407.07764
Code: https://github.com/SJTU-DeepVisionLab/PosFormer

Occupancy

Fully Sparse 3D Occupancy Prediction

Paper: https://arxiv.org/abs/2312.17118
Code: https://github.com/MCG-NJU/SparseOcc

NeRF

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Project: https://nerf-mae.github.io/
Paper: https://arxiv.org/pdf/2404.01300
Code: https://github.com/zubair-irshad/NeRF-MAE

DETR

Prompt

多模态大语言模型(MLLM)

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Paper: https://arxiv.org/abs/2403.11299
Code: https://github.com/heliossun/SQ-LLaVA

ControlCap: Controllable Region-level Captioning

Paper: https://arxiv.org/abs/2401.17910
Code: https://github.com/callsys/ControlCap

大语言模型(LLM)

NAS

ReID(重识别)

扩散模型(Diffusion Models)

ZIGMA: A DiT-style Zigzag Mamba Diffusion Model

Paper: https://arxiv.org/abs/2403.13802
Code: https://taohu.me/zigma/

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

Paper: https://arxiv.org/abs/2403.16394
Code: https://github.com/zdxdsw/skewed_relations_T2I

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Project: https://ut-mao.github.io/noise.github.io/
Paper: https://arxiv.org/abs/2312.08872
Code: https://github.com/UT-Mao/Initial-Noise-Construction

Vision Transformer

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper: https://arxiv.org/abs/2403.09394
Code: https://github.com/Haiyang-W/GiT

视觉和语言(Vision-Language)

GalLoP: Learning Global and Local Prompts for Vision-Language Models

Paper：https://arxiv.org/abs/2407.01400

目标检测(Object Detection)

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Paper: https://arxiv.org/abs/2407.11699v1
Code: https://github.com/xiuqhou/Relation-DETR
Dataset: https://huggingface.co/datasets/xiuqhou/SA-Det-100k

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Project: http://yuqianfu.com/CDFSOD-benchmark/
Paper: https://arxiv.org/pdf/2402.03094
Code: https://github.com/lovelyqian/CDFSOD-benchmark

异常检测(Anomaly Detection)

目标跟踪(Object Tracking)

语义分割(Semantic Segmentation)

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Paper: https://arxiv.org/abs/2405.06228
Code: https://github.com/nizhenliang/CGRSeg

医学图像(Medical Image)

Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging

Paper: https://arxiv.org/abs/2311.16914
Code: https://github.com/peirong26/Brain-ID

FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

医学图像分割(Medical Image Segmentation)

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Project: https://scribbleprompt.csail.mit.edu/
Paper: https://arxiv.org/abs/2312.07381
Code: https://github.com/halleewong/ScribblePrompt

AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking

Paper: https://arxiv.org/abs/2407.06468
Code: https://github.com/ricklisz/AnatoMask

Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures

视频目标分割(Video Object Segmentation)

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries

Project: https://zhang-tao-whu.github.io/projects/DVIS_DAQ/
Paper: https://arxiv.org/abs/2404.00086
Code: https://github.com/zhang-tao-whu/DVIS_Plus

自动驾驶(Autonomous Driving)

Fully Sparse 3D Occupancy Prediction

Paper: https://arxiv.org/abs/2312.17118
Code: https://github.com/MCG-NJU/SparseOcc

milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing

论文: https://arxiv.org/abs/2306.17010
代码: https://github.com/Toytiny/milliFlow/

4D对比超流是密集3D表征学习器

论文: https://arxiv.org/abs/2407.06190
代码: https://github.com/Xiangxu-0103/SuperFlow

3D点云(3D-Point-Cloud)

3D目标检测(3D Object Detection)

具有动态空间剪枝的3D小目标检测

光线去噪：基于深度感知的难负样本采样用于多视图3D目标检测

论文: https://arxiv.org/abs/2402.03634
代码: https://github.com/LiewFeng/RayDN

3D语义分割(3D Semantic Segmentation)

图像编辑(Image Editing)

图像补全/图像修复(Image Inpainting)

BrushNet：具有分解双分支扩散的即插即用图像补全模型

项目: https://tencentarc.github.io/BrushNet/
论文: https://arxiv.org/abs/2403.06976
代码: https://github.com/TencentARC/BrushNet

视频编辑(Video Editing)

低级视觉(Low-level Vision)

通过直方图变换在恶劣天气条件下恢复图像

论文: https://arxiv.org/abs/2407.10172
代码: https://github.com/sunshangquan/Histoformer

OneRestore：复合退化的通用恢复框架

项目: https://gy65896.github.io/projects/ECCV2024_OneRestore
论文: https://arxiv.org/abs/2407.04621
代码: https://github.com/gy65896/OneRestore

超分辨率(Super-Resolution)

去噪(Denoising)

图像去噪(Image Denoising)

3D人体姿态估计(3D Human Pose Estimation)

图像生成(Image Generation)

在文本到图像扩散模型中基于对象条件的能量式注意力图对齐

论文: https://arxiv.org/abs/2404.07389
代码: https://github.com/YasminZhang/EBAMA

每个像素都有其时刻：通过密集归一化实现超高分辨率的无对齐图像到图像翻译

项目: https://kaminyou.com/Dense-Normalization/
论文: https://arxiv.org/abs/2407.04245
代码: https://github.com/Kaminyou/Dense-Normalization

ZIGMA：一种DiT风格的之字形曼巴扩散模型

论文: https://arxiv.org/abs/2403.13802
代码: https://taohu.me/zigma/

现象空间中的倾斜阻碍了文本到图像生成的泛化

论文: https://arxiv.org/abs/2403.16394
代码: https://github.com/zdxdsw/skewed_relations_T2I

视频生成(Video Generation)

VideoStudio：生成内容一致和多场景的视频

项目: https://vidstudio.github.io/
代码: https://github.com/FuchenUSTC/VideoStudio

3D生成

视频理解(Video Understanding)

VideoMamba：高效视频理解的状态空间模型

论文: https://arxiv.org/abs/2403.06977
代码: https://github.com/OpenGVLab/VideoMamba

C2C：用于零样本组合动作识别的组件到组合学习

论文: https://arxiv.org/abs/2407.06113
代码: https://github.com/RongchangLi/ZSCAR_C2C

行为识别(Action Recognition)

SA-DVAE：通过解耦变分自编码器改进零样本基于骨架的动作识别

论文: https://arxiv.org/abs/2407.13460
代码: https://github.com/pha123661/SA-DVAE

知识蒸馏(Knowledge Distillation)

图像压缩(Image Compression)

基于空间频率自适应的机器和人类视觉图像压缩

代码: https://github.com/qingshi9974/ECCV2024-AdpatICMH
论文: http://arxiv.org/abs/2407.09853

立体匹配(Stereo Matching)

场景图生成(Scene Graph Generation)

计数(Counting)

通过良好示例进行零样本目标计数

论文: https://arxiv.org/abs/2407.04948
代码: https://github.com/HopooLinZ/VA-Count

视频质量评价(Video Quality Assessment)

数据集(Datasets)

其他(Others)

用于3D视觉定位的多分支协作学习网络

论文: https://arxiv.org/abs/2407.05363v2
代码: https://github.com/qzp2018/MCLN

PDiscoFormer：通过视觉变压器放宽部分发现约束

代码: https://github.com/ananthu-aniraj/pdiscoformer
论文: https://arxiv.org/abs/2407.04538

SPVLoc：在未知环境中用于6D相机定位的语义全景视口匹配

项目: https://fraunhoferhhi.github.io/spvloc/
论文: https://arxiv.org/abs/2404.10527
代码: https://github.com/fraunhoferhhi/spvloc

REFRAME：移动设备实时渲染反射表面

项目: https://xdimlab.github.io/REFRAME/
论文: https://arxiv.org/abs/2403.16481
代码: https://github.com/MARVELOUSJI/REFRAME

ECCV2024-Papers-with-Code