- Providing valuable insights into the latest models, including number of parameters, fine-tuning datasets and techniques, and hardware specifications.
- Practical guides for LLM alignment post-training, include dataset, benchmark datasets, efficient training libraries and techniques; also involves short insight of pre-trained LLMs.
- Explore from pre-training models to post-training models, interesting things you will get.
Catalog
- Catalog
- Pre-trained Base Models
- Licences
- Open Source Aligned LLMs
- Instruction and Conversational Datasets
- Pre-training Datasets
- Efficient Training
- Evaluation Benchmark
- Multi-Modal LLMs
- Tool Learning
- Star History
Pre-trained Base Models
Simple Version
- OpenAI: GPT-1, GPT-2, GPT-3, InstructGPT, Code-davinci-002, GPT-3.5, GPT-4(-8k/32k)
- Anthropic: Claude-v1, Claude Instant
- Meta: OPT, Galactica, LLaMA
- huggingface BigScience: BLOOM (176B), BLOOMZ, mT0
- EleutherAI: GPT-Neo, GPT-J (6B), GPT-NeoX (20B), Pythia
- TogetherCompute: GPT-JT, RedPajama-7B, RedPajama-INCITE
- Berkeley: OpenLLaMA
- MosaicML: MPT-7B, MPT-7B-Instruct/Chat
- TII: Falcon-7/40B-(instruct)
- BlinkDL: RWKV-4-Pile, RWKV-4-PilePlus
- Tsinghua THUDM: GLM-130B, ChatGLM-6B
- Cerebras: Cerebras-GPT
- Google: T5, mT5, LaMDA, Pathways, PaLM, UL2, Flan-T5, Flan-UL2, Bard, PaLM-E, PaLM 2, MoE, Switch Transformer, GLaM, ST-MoE, MoE Routing
- DeepMind: Gopher, Chinchilla, Sparrow
- Nvidia: Megatron-Turing NLG (530B)
- AI21 Studio: Jurassic-1, Jurassic-2
- OpenAI
- 2018/06, GPT-1 (117m)
- 2019/02, GPT-2 (1.5B)
- 2020/06, GPT-3 (175B): ada(350M), babbage(1.3B), curie(6.7B), davinci(175B), detail here
- 2022/01, InstructGPT-3: text-ada(350M), text-babbage(1.3B), text-curie(6.7B), text-davinci-001(175B)
- 2022/02, Code-davinci-002
- GPT-3.5 (175B): text-davinci-002 (2022/03), text-davinci-003 (2022/11), ChatGPT (2022/11), gpt-3.5-turbo (2023/03)
- 2023/03, GPT-4(-8k/32k)
- Anthropic
- Claude-v1: 2023/03, state-of-the-art high-performance model, context window 9k/100k tokens
- Claude Instant: 2023/03, lighter, less expensive, and much faster option, context window 9k/100k tokens
- Meta
- OPT (125M/350M/1.3B/2.7B/6.7B/13B/30B/66B/175B): 2022/03, pre-trained on (datasets used in RoBERTa, the Pile,
PushShift.io
Reddit) using metaseq, 1/7th the carbon footprint if GPT-3, combining Meta’s open source Fully Sharded Data Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM, contain predominantly English text and a small amount of non-English data via CommonCrawl, released under a noncommercial license. - OPT-IML (30B/175B): 2022/12, create OPT-IML Bench, a large benchmark for Instruction MetaLearning (IML) of 2000 NLP tasks; train OPT-IML which are instruction-tuned versions of OPT
- Galactica (125M/1.3B/6.7B/30B/120B): 2022/11, facebook/galactica models are designed to perform scientific tasks, include prompts in pre-training alongside the general corpora, under a non-commercial CC BY-NC 4.0 license
- LLaMA (7B/13B/33B/65B): 2023/02, trained LLaMA 65B/33B on 1.4 trillion tokens, LLaMA 7B on one trillion tokens, chose text from the 20 languages with the most speakers, leaked, under a non-commercial GPL-3.0 license.
- OPT (125M/350M/1.3B/2.7B/6.7B/13B/30B/66B/175B): 2022/03, pre-trained on (datasets used in RoBERTa, the Pile,
- huggingface BigScience
- BLOOM (176B): 2022/07/11, a multilingual LLM trained on ROOTS corpus (a composite collection of 498 Hugging Face datasets), using 250k vocabulary sizes, seq-len 2048, smaller size model search here, release under commercial friendly BigScience Responsible AI License.
- BLOOMZ & mT0: 2022/11, finetune BLOOM & mT5 on our crosslingual task instruction following mixture (xP3), released under commercial friendly bigscience-bloom-rail-1.0 License.
- EleutherAI
- The Pile: 2020/12/31, a 300B (deduplicated 207B) token open source English-only language modelling dataset, download here.
- GPT-Neo (125M/1.3B/2.7B)(Deprecated): 2021/03/21, A set of decoder-only LLMs trained on the Pile, MIT license.
- GPT-J (6B): 2021/06/04, EleutherAI/gpt-j-6b, English language model trained on the Pile using mesh-transformer-jax library, seq-len 2048, Apache-2.0 license.
- GPT-NeoX (20B): 2022/02/10, EleutherAI/gpt-neox-20b, English language model trained on the Pile using GPT-NeoX library, seq-len 2048, Apache-2.0 license.
- Pythia (70M/160M/410M/1B/1.4B/2.8B/6.9B/12B): 2023/02/13, a suite of 8 model sizes on 2 different datasets: the Pile, the Pile deduplication, using gpt-neox library, seq-len 2048, Apache-2.0 license.
- TogetherCompute
- GPT-JT (6B): 2022/11/29, A fork of GPT-J-6B, fine-tuned on 3.53 billion tokens with open-source dataset and techniques, outperforms most 100B+ parameter models at classification.
- RedPajama-Pythia-7B: 2023/04/17, release RedPajama-Data-1T for reproducing "LLaMA" foundation models in a fully open-source way; 40% RedPajama-Data-1T trained RedPajama-Pythia-7B beat Pythia-7B trained on the Pile and StableLM-7B with higher HELM score, still weaker than LLaMA-7B for now; detail see blog1, blog2 and Card.
- OpenChatKit: 2023/03/10, fine-tuned for chat from EleutherAI’s GPT-NeoX-20B with over OIG-43M instructions dataset; contributing to a growing corpus of open instruction following dataset.
- RedPajama-INCITE (3B/7B): 2023/05/05, open-source 3B model (base/chat/instruct) trained on 800B tokens and finetuned, the strongest model in it’s class and brings LLM to a wide variety of hardware; 80% (800B) trained 7B model beat same class GPT-J/Pythia/LLaMA on HELM and lm-evaluation-harness; releasing RedPajama v2 with 2T Tokens (mix the Pile dataset into RedPajama, more code like the Stack); Apache 2.0 license.
- Berkeley/OpenLLaMA: open source reproduction of Meta AI’s LLaMA 7B/3B trained on the RedPajama dataset, provide PyTorch and JAX weights, Apache-2.0 license.
- MosaicML
- MPT (MosaicML Pretrained Transformer, 7B(6.7B)): 2023/05/05, a GPT-style decoder-only transformers trained from scratch on 1T tokens of text and code (RedPajama, mC4, C4, the Stack Dedup, Semantic Scholar ORC) in 9.5 days at a cost of ~$200k, ALiBi (handle 65k long input) and other optimized techniques, matches the quality of LLaMA-7B; open source for commercial use, Apache-2.0 License.
- MPT-7B-Instruct/Chat: finetuning MPT-7B on instruction following dataset and dialogue generation dataset; release mosaicml/dolly_hhrlhf dataset derived from Databricks Dolly-15k and Anthropic’s Helpful and Harmless datasets; CC-By-SA-3.0 (commercially-usable) / CC-By-NC-SA-4.0 (non-commercial use only).
- TII (Technology Innovation Institute)
- Falcon-7/40B-(instruct): 2023/05/26, pretrained on 1500/1000B tokens of RefinedWeb (apache-2.0) enhanced with curated corpora, finetuned on a mixture of chat/instruct datasets like Baize, No. 1 at huggingface Open LLM Leaderboard at the end of May 2023; change license to Apache 2.0 on June 01.
- BlinkDL
- RWKV-4-Pile (169M/430M/1.5B/3B/7B/14B): 2023/04, RWKV: Reinventing RNNs for the Transformer Era, leverages RNN with a linear attention mechanism, trained on the Pile, infinite seq-len, Weights.
- RWKV-4-PilePlus (7B/14B): 2023/04, finetuning on [RedPajama + some of Pile v2 = 1.7T tokens].
- Tsinghua THUDM
- GLM-130B: 2022/10, An Open Bilingual Pre-Trained Model, support english and chinese, trained on 400B text tokens using GLM library, Apache-2.0 license.
- ChatGLM-6B: 2023/03, trained with 1T chinese and english tokens, finetuned with instruction following QA and dialogue dataset in chinese language, released under Apache-2.0 license, authorization needed.
- Cerebras
- Cerebras-GPT: 2023/03, a family of seven GPT models ranging from 111M to 13B, trained Eleuther Pile dataset using the Chinchilla formula, release under the Apache 2.0 license
- Google
- 2019/10/23, T5, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- 2020/01/28, LaMDA: blog-2021/03/18, blog-2020/01/28, blog-2022/01/21
- 2021/10/28, Introducing Pathways: A next-generation AI architecture
- 2022/04/04, Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
- 2022/05/11, Language Models Perform Reasoning via Chain of Thought
- 2022/10/14, UL2 20B: An Open Source Unified Language Learner
- 2023/02/01,