eca_halonext26ts.c1_in1k项目介绍
项目概述
eca_halonext26ts.c1_in1k是一个基于HaloNet架构的图像分类模型。这个模型结合了高效通道注意力机制(Efficient Channel Attention,ECA)和ResNeXt架构的特点,由Ross Wightman在timm库中训练完成。该模型在ImageNet-1k数据集上进行了训练,旨在提供一个高效且性能优秀的图像分类解决方案。
模型特点
这个模型具有以下几个显著特点:
-
灵活的架构:使用了timm库中的BYOBNet(Bring-Your-Own-Blocks Network)框架,允许灵活配置网络结构、注意力机制等组件。
-
高效注意力机制:采用了高效通道注意力机制,提高了模型的性能和效率。
-
优化的训练策略:基于"ResNet Strikes Back"论文中的C类配方,使用了SGD优化器(带Nesterov动量)和自适应梯度裁剪(AGC)技术。
-
先进的学习率调度:采用了带有预热的余弦退火学习率调度策略。
-
多项高级特性:包括随机深度、梯度检查点、分层学习率衰减和每阶段特征提取等timm库的常用功能。
模型详情
eca_halonext26ts.c1_in1k模型具有以下统计数据:
- 参数量:10.8百万
- GMACs:2.4
- 激活量:11.5百万
- 输入图像尺寸:256 x 256
这些数据表明,该模型在保持较小规模的同时,仍能提供出色的性能。
应用场景
这个模型主要应用于以下场景:
-
图像分类:可以直接用于识别和分类各种图像。
-
特征提取:作为backbone网络,可以提取图像的多尺度特征,用于下游任务如目标检测、图像分割等。
-
图像嵌入:可以生成图像的高维向量表示,用于图像检索、相似度计算等任务。
使用方法
模型的使用非常简便,通过timm库可以轻松加载预训练模型并进行推理。用户可以根据需求选择图像分类、特征图提取或图像嵌入等不同的使用方式。
总结
eca_halonext26ts.c1_in1k是一个结合了多项先进技术的图像分类模型,它在保持模型规模较小的同时,通过优化的架构设计和训练策略,实现了高效的性能。无论是直接用于图像分类任务,还是作为特征提取器用于其他计算机视觉任务,这个模型都展现出了很好的应用潜力。
Model card for eca_halonext26ts.c1_in1k
A HaloNet image classification model (with Efficient channel attention, based on ResNeXt architecture). Trained on ImageNet-1k in timm
by Ross Wightman.
NOTE: this model did not adhere to any specific paper configuration, it was tuned for reasonable training times and reduced frequency of self-attention blocks.
Recipe details:
- Based on ResNet Strikes Back
C
recipes - SGD (w/ Nesterov) optimizer and AGC (adaptive gradient clipping).
- Cosine LR schedule with warmup
This model architecture is implemented using timm
's flexible BYOBNet (Bring-Your-Own-Blocks Network).
BYOB (with BYOANet attention specific blocks) allows configuration of:
- block / stage layout
- block-type interleaving
- stem layout
- output stride (dilation)
- activation and norm layers
- channel and spatial / self-attention layers
...and also includes timm
features common to many other architectures, including:
- stochastic depth
- gradient checkpointing
- layer-wise LR decay
- per-stage feature extraction
Model Details
- Model Type: Image classification / feature backbone
- Model Stats:
- Params (M): 10.8
- GMACs: 2.4
- Activations (M): 11.5
- Image size: 256 x 256
- Papers:
- Scaling Local Self-Attention for Parameter Efficient Visual Backbones: https://arxiv.org/abs/2103.12731
- ResNet strikes back: An improved training procedure in timm: https://arxiv.org/abs/2110.00476
- Dataset: ImageNet-1k
Model Usage
Image Classification
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('eca_halonext26ts.c1_in1k', pretrained=True)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
Feature Map Extraction
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'eca_halonext26ts.c1_in1k',
pretrained=True,
features_only=True,
)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
for o in output:
# print shape of each feature map in output
# e.g.:
# torch.Size([1, 64, 128, 128])
# torch.Size([1, 256, 64, 64])
# torch.Size([1, 512, 32, 32])
# torch.Size([1, 1024, 16, 16])
# torch.Size([1, 2048, 8, 8])
print(o.shape)
Image Embeddings
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'eca_halonext26ts.c1_in1k',
pretrained=True,
num_classes=0, # remove classifier nn.Linear
)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
# or equivalently (without needing to set num_classes=0)
output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 2048, 8, 8) shaped tensor
output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor
Model Comparison
Explore the dataset and runtime metrics of this model in timm model results.
Citation
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}
@article{Vaswani2021ScalingLS,
title={Scaling Local Self-Attention for Parameter Efficient Visual Backbones},
author={Ashish Vaswani and Prajit Ramachandran and A. Srinivas and Niki Parmar and Blake A. Hechtman and Jonathon Shlens},
journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2021},
pages={12889-12899}
}
@inproceedings{wightman2021resnet,
title={ResNet strikes back: An improved training procedure in timm},
author={Wightman, Ross and Touvron, Hugo and Jegou, Herve},
booktitle={NeurIPS 2021 Workshop on ImageNet: Past, Present, and Future}
}