ese_vovnet39b.ra_in1k - 高效实时的VoVNet-v2图像分类解决方案

ese_vovnet39b.ra_in1k 项目介绍

ese_vovnet39b.ra_in1k 是一个用于图像分类的模型，这个模型属于第二代 VoVNet 系列。它是通过 Ross Wightman 在 timm 库中使用 RandAugment RA 配方在 ImageNet-1k 数据集上进行预训练的。这个项目与 ResNet 的相关研究有一定的联系。

模型详情

模型类型:
这是一个用于图像分类的模型，同时也可以作为特征骨干网络使用。

模型统计:

参数数量：24.6 百万
计算量：7.1 GMACs (十亿次乘法加法计算)
激活数：6.7 百万
图像尺寸：训练时为 224 x 224，测试时为 288 x 288

相关论文:

数据集:
该模型在 ImageNet-1k 数据集上进行预训练。

原始代码库:
可以在 Huggingface 的 pytorch-image-models 项目中找到完整内容。

模型使用方法

图像分类

我们可以通过一些 Python 代码来使用这个模型进行图像分类。首先，要加载和处理图像，然后使用 timm 中提供的 ese_vovnet39b.ra_in1k 模型来预测。

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('ese_vovnet39b.ra_in1k', pretrained=True)
model = model.eval()

# 获取模型特定的转换步骤，例如归一化和调整图像大小
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # 将单张图像扩展成批量为1的格式

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

特征图提取

模型还能用于特征图的提取，适合于计算特定图像的特征。

model = timm.create_model(
    'ese_vovnet39b.ra_in1k',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# 执行与分类相同的图像转换
output = model(transforms(img).unsqueeze(0))

for o in output:
    print(o.shape)

图像嵌入

进行图像嵌入时，可以通过模型获得图像的特征向量，便于进一步的机器学习任务。

model = timm.create_model(
    'ese_vovnet39b.ra_in1k',
    pretrained=True,
    num_classes=0,  # 移除分类器层
)
model = model.eval()

# 获取与图像分类相同的图像转换
output = model(transforms(img).unsqueeze(0))

# 直接利用模型获取特征向量
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)

论文引用

项目相关的学术引用可以借鉴以下格式：

@inproceedings{lee2019energy,
  title = {An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection},
  author = {Lee, Youngwan and Hwang, Joong-won and Lee, Sangrok and Bae, Yuseok and Park, Jongyoul},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops},
  year = {2019}
}

这些信息便于研究者们在科学论文中引用和交流模型的相关研究成果。