Transformers for Natural Language Processing and Computer Vision: Take Generative AI and LLMs to the next level with Hugging Face, Google Vertex AI, ChatGPT, GPT-4V, and DALL-E 3 3rd Edition
by Denis Rothman
This repo is continually updated and upgraded.
Last updated: July 22, 2024
Look for 🐬 to explore new bonus notebooks such as Midjourney's API, Google Vertex AI Gemini's API, and boosting the speed of OpenAI GPT models with asynchronous batch API calls!
Look for 🎏 to explore existing notebooks for the latest model or platform releases, such as OpenAI's latest GPT-4o and GPT-4o-mini models.
Look for 🛠 to run existing notebooks with new dependency versions and platform API constraints and tweaks.
🚩If you see anything that doesn't run as expected, raise an issue, and we'll work on it!
Transformers-for-NLP-and-Computer-Vision-3rd-Edition
This is the code repository for Transformers for Natural Language Processing and Computer Vision, published by Packt.
Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3
About the book
Transformers for Natural Language Processing and Computer Vision, Third Edition, explores Large Language Model (LLM) architectures, applications, and various platforms (Hugging Face, OpenAI, and Google Vertex AI) used for Natural Language Processing (NLP) and Computer Vision (CV).
Dive into generative vision transformers and multimodal model architectures and build applications, such as image and video-to-text classifiers. Go further by combining different models and platforms and learning about AI agent replication.
What you will learn
- Learn how to pretrain and fine-tune LLMs
- Learn how to work with multiple platforms, such as Hugging Face, OpenAI, and Google Vertex AI
- Learn about different tokenizers and the best practices for preprocessing language data
- Implement Retrieval Augmented Generation and rules bases to mitigate hallucinations
- Visualize transformer model activity for deeper insights using BertViz, LIME, and SHAP
- Create and implement cross-platform chained models, such as HuggingGPT
- Go in-depth into vision transformers with CLIP, DALL-E 2, DALL-E 3, and GPT-4V
Table of Contents
Chapters
- What Are Transformers?
- Getting Started with the Architecture of the Transformer Model
- Emergent vs Downstream Tasks: The Unseen Depths of Transformers
- Advancements in Translations with Google Trax, Google Translate, and Gemini
- Diving into Fine-Tuning through BERT
- Pretraining a Transformer from Scratch through RoBERTa
- The Generative AI Revolution with ChatGPT
- Fine-Tuning OpenAI GPT Models
- Shattering the Black Box with Interpretable Tools
- Investigating the Role of Tokenizers in Shaping Transformer Models
- Leveraging LLM Embeddings as an Alternative to Fine-Tuning
- Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4
- Summarization with T5 and ChatGPT
- Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2
- Guarding the Giants: Mitigating Risks in Large Language Models
- Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
- Transcending the Image-Text Boundary with Stable Diffusion
- Hugging Face AutoTrain: Training Vision Models without Coding
- On the Road to Functional AGI with HuggingGPT and its Peers
- Beyond Human-Designed Prompts with Generative Ideation
Appendix
Appendix: Answers to the Questions
Platforms
You can run the notebooks directly from the table below: