Transformers-Tutorials
Hi there!
This repository contains demos I made with the Transformers library by 🤗 HuggingFace. Currently, all of them are implemented in PyTorch.
NOTE: if you are not familiar with HuggingFace and/or Transformers, I highly recommend to check out our free course, which introduces you to several Transformer architectures (such as BERT, GPT-2, T5, BART, etc.), as well as an overview of the HuggingFace libraries, including Transformers, Tokenizers, Datasets, Accelerate and the hub.
For an overview of the ecosystem of HuggingFace for computer vision (June 2022), refer to this notebook with corresponding video.
Currently, it contains the following demos:
- Audio Spectrogram Transformer (paper):
- performing inference with
ASTForAudioClassification
to classify audio.
- BERT (paper):
- fine-tuning
BertForTokenClassification
on a named entity recognition (NER) dataset.
- fine-tuning
BertForSequenceClassification
for multi-label text classification.
- BEiT (paper):
- understanding
BeitForMaskedImageModeling
- CANINE (paper):
- fine-tuning
CanineForSequenceClassification
on IMDb
- CLIPSeg (paper):
- performing zero-shot image segmentation with
CLIPSeg
- Conditional DETR (paper):
- performing inference with
ConditionalDetrForObjectDetection
- fine-tuning
ConditionalDetrForObjectDetection
on a custom dataset (balloon)
- ConvNeXT (paper):
- fine-tuning (and performing inference with)
ConvNextForImageClassification
- DINO (paper):
- visualize self-attention of Vision Transformers trained using the DINO method
- DETR (paper):
- performing inference with
DetrForObjectDetection
- fine-tuning
DetrForObjectDetection
on a custom object detection dataset
- evaluating
DetrForObjectDetection
on the COCO detection 2017 validation set
- performing inference with
DetrForSegmentation
- fine-tuning
DetrForSegmentation
on COCO panoptic 2017
- DPT (paper):
- performing inference with DPT for monocular depth estimation
- performing inference with DPT for semantic segmentation
- Deformable DETR (paper):
- performing inference with
DeformableDetrForObjectDetection
- DiT (paper):
- performing inference with DiT for document image classification
- Donut (paper):
- performing inference with Donut for document image classification
- fine-tuning Donut for document image classification
- performing inference with Donut for document visual question answering (DocVQA)
- performing inference with Donut for document parsing
- fine-tuning Donut for document parsing with PyTorch Lightning
- GIT (paper):
- performing inference with GIT for image/video captioning and image/video question-answering
- fine-tuning GIT on a custom image captioning dataset
- GLPN (paper):
- performing inference with
GLPNForDepthEstimation
to illustrate monocular depth estimation
- GPT-J-6B (repository):
- performing inference with
GPTJForCausalLM
to illustrate few-shot learning and code generation
- GroupViT (repository):
- performing inference with
GroupViTModel
to illustrate zero-shot semantic segmentation
- ImageGPT (blog post):
- (un)conditional image generation with
ImageGPTForCausalLM
- linear probing with ImageGPT
- LUKE (paper):
- fine-tuning
LukeForEntityPairClassification
on a custom relation extraction dataset using PyTorch Lightning
- LayoutLM (paper):
- fine-tuning
LayoutLMForTokenClassification
on the FUNSD dataset
- fine-tuning
LayoutLMForSequenceClassification
on the RVL-CDIP dataset
- adding image embeddings to LayoutLM during fine-tuning on the FUNSD dataset
- LayoutLMv2 (paper):
- fine-tuning
LayoutLMv2ForSequenceClassification
on RVL-CDIP
- fine-tuning
LayoutLMv2ForTokenClassification
on FUNSD
- fine-tuning
LayoutLMv2ForTokenClassification
on FUNSD using the 🤗 Trainer
- performing inference with
LayoutLMv2ForTokenClassification
on FUNSD [![Open In