levit_256.fb_dist_in1k - LeViT卷积图像分类模型具备快速推理能力

项目介绍：levit_256.fb_dist_in1k

levit_256.fb_dist_in1k 是一个用于图像分类的模型，利用了卷积模式（使用 nn.Conv2d 和 nn.BatchNorm2d）。这个模型已经在 ImageNet-1k 数据集上经过蒸馏训练，由论文作者预训练完成。

模型详细信息

模型类型: 图像分类 / 特征骨干
模型统计数据:
- 参数量（百万）: 18.9
- GMACs: 1.1
- 激活数（百万）: 4.2
- 图像尺寸: 224 x 224
相关论文:
- LeViT: 像卷积网络一样扮演视觉转换器，提升推理速度: 论文链接
原始作者主页: GitHub链接
数据集: ImageNet-1k

模型用途

图像分类

使用代码示例展示如何利用该模型进行图像分类：

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(
    urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))

model = timm.create_model('levit_256.fb_dist_in1k', pretrained=True)
model = model.eval()

data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

图像嵌入

模型还可以用于提取图像嵌入，具体代码示例如下：

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(
    urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))

model = timm.create_model(
    'levit_256.fb_dist_in1k',
    pretrained=True,
    num_classes=0  # 移除分类器 nn.Linear
)
model = model.eval()

data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)

模型比较

以下是 levit 系列模型的性能对比表：

Model	Top-1 Accuracy	Top-5 Accuracy	Parameter Count (M)	Image Size
levit_384.fb_dist_in1k	82.596	96.012	39.13	224
levit_conv_384.fb_dist_in1k	82.596	96.012	39.13	224
levit_256.fb_dist_in1k	81.512	95.48	18.89	224
levit_conv_256.fb_dist_in1k	81.512	95.48	18.89	224
levit_conv_192.fb_dist_in1k	79.86	94.792	10.95	224
levit_192.fb_dist_in1k	79.858	94.792	10.95	224
levit_128.fb_dist_in1k	78.474	94.014	9.21	224
levit_conv_128.fb_dist_in1k	78.474	94.02	9.21	224
levit_128s.fb_dist_in1k	76.534	92.864	7.78	224
levit_conv_128s.fb_dist_in1k	76.532	92.864	7.78	224

引用

如果要引用此模型的论文或相关工作，请参考以下文献格式：

@InProceedings{Graham_2021_ICCV,
  author    = {Graham, Benjamin and El-Nouby, Alaaeldin and Touvron, Hugo and Stock, Pierre and Joulin, Armand and Jegou, Herve and Douze, Matthijs},
  title     = {LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2021},
  pages     = {12259-12269}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
}