sam2-hiera-tiny - 提供图像和视频分割功能的开放源码基础模型

项目介绍：sam2-hiera-tiny

sam2-hiera-tiny是一个用于图像和视频中的分割任务的项目，旨在解决提示式视觉分割问题。这一项目由FAIR（Facebook AI Research）团队开发，并被称为SAM 2。它基于一个基础模型，能够在图像和视频中实现"Segment Anything"（即分割任何物体）的功能。如果您感兴趣，可以参考他们的SAM 2 论文获取更多信息。

使用方法

项目的代码已经在GitHub 仓库中公开发布。以下是如何在图像和视频中进行预测的一些简单示例。

图像预测

要在图像中进行分割预测，首先需要安装必要的库，并利用提供的模型类进行预测。以下是使用示例代码：

import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor

# 加载预训练模型
predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-tiny")

# 在推断模式下运行
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    predictor.set_image(<your_image>)  # 设置要处理的图像
    masks, _, _ = predictor.predict(<input_prompts>)  # 进行预测并获得分割结果

视频预测

在视频中进行分割预测的过程稍微复杂一些，因为涉及到多个帧的处理。以下是如何在视频中应用分割的示例代码：

import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor

# 加载预训练模型
predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-tiny")

# 在推断模式下运行
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state(<your_video>)  # 初始化视频状态

    # 添加新提示并即时获得当前帧的输出
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>)

    # 将提示传播到整个视频中以获得所有帧的结果
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        ...

如果需要进一步的详细信息和示例，可以参考他们的演示笔记。

引用

如果你想引用此论文、模型或软件，请使用下述格式：

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}

这个项目的目标是大幅度简化和提升图像与视频分割任务的效率和准确性，适用于众多实际应用场景。希望未来能够看到更广泛的应用和改进。