XMem: 基于Atkinson-Shiffrin记忆模型的长期视频对象分割
新的VOS项目:将对象重新放入视频对象分割:https://github.com/hkchengrex/Cutie
新项目:使用XMem进行开放世界视频分割:https://github.com/hkchengrex/Tracking-Anything-with-DEVA
Ho Kei Cheng, Alexander Schwing
伊利诺伊大学厄巴纳-香槟分校
演示
处理长期遮挡问题:
https://user-images.githubusercontent.com/7107196/177921527-7a1bd593-2162-4598-9adf-f2112763fccf.mp4
超长视频;遮罩层插入:
https://user-images.githubusercontent.com/7107196/179089789-3d69adea-0405-4c83-ac28-45f59fe1e1c1.mp4
来源:https://www.youtube.com/watch?v=q5Xr0F4a0iU
领域外案例:
https://user-images.githubusercontent.com/7107196/177920383-161f1da1-33f9-48b3-b8b2-09e450432e2b.mp4
来源:かぐや様は告らせたい ~天才たちの恋愛頭脳戦~ 第三集; A-1 Pictures
[失败案例]
特征
- 使用有限的GPU内存处理超长视频。
- 非常快。即使是长视频也可以期望约20FPS(硬件依赖)。
- 配备一个GUI(从MiVOS修改而来)。
目录
简介
我们首先将视频对象分割(VOS)框架定义为一个记忆问题。 之前的工作大多使用单一类型的特征记忆。这可以是网络权重形式(即在线学习)、最后一帧分割(如MaskTrack)、空间隐藏表示(如基于Conv-RNN的方法)、空间-注意力特征(如STM、STCN、AOT),或某种长期紧凑特征(如AFB-URR)。
具有短期记忆跨度的方法对变化不够稳健,而具有大记忆库的方法则会显著增加计算和GPU内存使用。尝试长期注意力VOS(如AFB-URR)时,特征在生成后立即被压缩,导致特征分辨率损失。
我们的方法受Atkinson-Shiffrin人类记忆模型启发,该模型具有感官记忆、工作记忆和长期记忆。这些记忆存储在时间尺度上各不相同,并在我们的记忆读取机制中相辅相成。它在短期和长期视频数据集上表现良好,轻松处理超过10,000帧的视频。
训练/推理
首先,按照GETTING_STARTED.md安装所需的Python包和数据集。
有关训练,请参见TRAINING.md。
有关推理,请参见INFERENCE.md。
相关项目/扩展:
引用
如果您觉得此仓库有用,请引用我们的论文!
@inproceedings{cheng2022xmem,
title={{XMem}: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model},
author={Cheng, Ho Kei and Alexander G. Schwing},
booktitle={ECCV},
year={2022}
}
本文基于的相关项目:
@inproceedings{cheng2021stcn,
title={Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation},
author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
booktitle={NeurIPS},
year={2021}
}
@inproceedings{cheng2021mivos,
title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
booktitle={CVPR},
year={2021}
}
我们在交互式演示中使用了f-BRS:https://github.com/saic-vul/fbrs_interactive_segmentation
如果您想引用数据集:
bibtex
@inproceedings{shi2015hierarchicalECSSD,
title={Hierarchical image saliency detection on extended CSSD},
author={Shi, Jianping and Yan, Qiong and Xu, Li and Jia, Jiaya},
booktitle={TPAMI},
year={2015},
}
@inproceedings{wang2017DUTS,
title={Learning to Detect Salient Objects with Image-level Supervision},
author={Wang, Lijun and Lu, Huchuan and Wang, Yifan and Feng, Mengyang
and Wang, Dong, and Yin, Baocai and Ruan, Xiang},
booktitle={CVPR},
year={2017}
}
@inproceedings{FSS1000,
title = {FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation},
author = {Li, Xiang and Wei, Tianhan and Chen, Yau Pun and Tai, Yu-Wing and Tang, Chi-Keung},
booktitle={CVPR},
year={2020}
}
@inproceedings{zeng2019towardsHRSOD,
title = {Towards High-Resolution Salient Object Detection},
author = {Zeng, Yi and Zhang, Pingping and Zhang, Jianming and Lin, Zhe and Lu, Huchuan},
booktitle = {ICCV},
year = {2019}
}
@inproceedings{cheng2020cascadepsp,
title={{CascadePSP}: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement},
author={Cheng, Ho Kei and Chung, Jihoon and Tai, Yu-Wing and Tang, Chi-Keung},
booktitle={CVPR},
year={2020}
}
@inproceedings{xu2018youtubeVOS,
title={Youtube-vos: A large-scale video object segmentation benchmark},
author={Xu, Ning and Yang, Linjie and Fan, Yuchen and Yue, Dingcheng and Liang, Yuchen and Yang, Jianchao and Huang, Thomas},
booktitle = {ECCV},
year={2018}
}
@inproceedings{perazzi2016benchmark,
title={A benchmark dataset and evaluation methodology for video object segmentation},
author={Perazzi, Federico and Pont-Tuset, Jordi and McWilliams, Brian and Van Gool, Luc and Gross, Markus and Sorkine-Hornung, Alexander},
booktitle={CVPR},
year={2016}
}
@inproceedings{denninger2019blenderproc,
title={BlenderProc},
author={Denninger, Maximilian and Sundermeyer, Martin and Winkelbauer, Dominik and Zidan, Youssef and Olefir, Dmitry and Elbadrawy, Mohamad and Lodhi, Ahsan and Katam, Harinandan},
booktitle={arXiv:1911.01911},
year={2019}
}
@inproceedings{shapenet2015,
title = {{ShapeNet: An Information-Rich 3D Model Repository}},
author = {Chang, Angel Xuan and Funkhouser, Thomas and Guibas, Leonidas and Hanrahan, Pat and Huang, Qixing and Li, Zimo and Savarese, Silvio and Savva, Manolis and Song, Shuran and Su, Hao and Xiao, Jianxiong and Yi, Li and Yu, Fisher},
booktitle = {arXiv:1512.03012},
year = {2015}
}