sam2-hiera-small - 下一代图像和视频可提示视觉分割技术

sam2-hiera-small项目介绍

项目背景

sam2-hiera-small是一个由FAIR（Facebook AI Research）团队开发的创新项目，项目的核心是一个名为SAM 2的基础模型。这个模型的目的是解决图像和视频中可提示的视觉分割问题，也就是说，它能够在给定某些提示的情况下，自动识别并分割图像或视频中的特定对象或区域。想了解更多详情可以参考SAM 2论文。

代码仓库

sam2-hiera-small项目的官方代码在GitHub上公开发布，用户可以在这个仓库中找到所有相关资源。

项目特点

SAM 2模型具备多种功能，能够结合不同的提示来实现图像和视频的分割。这一模型大大提高了在视觉数据中自动化识别和处理任务的能力，使得各种应用场景，如自动标注、大规模影像数据处理等，变得更加高效。

使用方法

图像预测

用户可以通过以下Python代码实现对图像的预测：

import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-small")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    predictor.set_image(<your_image>)
    masks, _, _ = predictor.predict(<input_prompts>)

该代码利用了PyTorch库，通过预训练的模型对图像进行处理，根据输入的提示生成对应的分割遮罩。

视频预测

类似地，用户也可以对视频进行分割预测，以下是实现代码：

import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor

predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-small")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state(<your_video>)

    # add new prompts and instantly get the output on the same frame
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>):

    # propagate the prompts to get masklets throughout the video
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        ...

用户可以从视频中初始化状态，并通过不同的提示即时获取某一帧的分割结果，同时也可以将提示传播到视频中的其他帧，获取整体的分割效果。

学术引用

如果您需要引用该项目的论文、模型或软件，请使用以下格式：

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and others},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}

结语

sam2-hiera-small项目为用户提供了一种新的自动化方式，可以有效地分割和处理海量的图像和视频数据。这一模型不仅支持研究者在视觉数据上的深入探索，也为工业界提供了一个强大的工具，推动了智能化数据分析的发展。对于代码的更多细节和使用实例，用户可以参考demo notebooks。