📱🦙 MobiLlama: 致力于准确轻量的全透明GPT

Oryx MobiLLama

Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Michael Felsberg, Timothy Baldwin, Eric Xing 和 Fahad Khan

穆罕默德·本·扎耶德人工智能大学 (MBZUAI), 阿联酋和林雪平大学, 瑞典

🤗

📢 最新更新

2024年2月26日 - Arxiv预印本发布!
2024年2月25日 - 代码(训练和评估脚本)发布!
2024年2月25日 - 最终预训练模型(包括中间检查点)和聊天版本以及在线演示链接发布!

概述

"越大越好"一直是近期大型语言模型(LLMs)开发的主导趋势。然而,LLMs并不适合需要设备端处理、能源效率、低内存占用和快速响应的场景。这些要求对于隐私、安全和可持续部署至关重要。本文探讨了"少即是多"的范式,通过解决为资源受限设备设计准确yet高效的小型语言模型(SLMs)的挑战。我们的主要贡献是引入了一个准确且完全透明的开源0.5亿(0.5B)参数SLM,名为MobiLlama,专门满足资源受限计算的特定需求,强调在减少资源需求的同时提高性能。 MobiLlama是一种SLM设计,从较大的模型开始,应用精心设计的参数共享方案来减少预训练和部署成本。

⚡ 模型下载

模型名称	下载链接
MobiLlama-05B	HuggingFace
MobiLlama-08B	HuggingFace
MobiLlama-1B	HuggingFace
MobiLlama-05B-Chat	HuggingFace
MobiLlama-1B-Chat	HuggingFace

使用MobiLlama生成

模型描述

模型类型: 使用LLaMA-7B架构设计的语言模型
语言(NLP): 英语
许可: Apache 2.0
更多信息资源:

加载MobiLlama

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)

model.to('cuda')
text = "I was walking towards the river when "
input_ids = tokenizer(text, return_tensors="pt").to('cuda').input_ids
outputs = model.generate(input_ids, max_length=1000, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())

加载中间检查点

model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-05B", revision="ckpt_352", trust_remote_code=True)

所有中间检查点从ckpt_100到ckpt_358都可用。

数据集

从huggingface下载预处理的Amber数据。整个训练数据有360个块,总大小约为8 TB。Amber数据集包含总计1.2万亿个标记,这些标记来自下面显示的不同数据源。

子集	标记数(十亿)
Arxiv	30.00
Book	28.86
C4	197.67
Refined-Web	665.01
StarCoder	291.92
StackExchange	21.75
Wikipedia	23.90
总计	1259.13

安装

首先根据您的操作系统的具体说明安装PyTorch。

要从源代码安装(推荐用于训练/微调),请运行:

conda create -n mobillama python=3.10
conda activate mibillama
git clone https://github.com/mbzuai-oryx/MobiLlama.git
cd MobiLlama
pip install -r  requirements.txt

预训练

对于MobiLlama(使用20个节点的A100 80GB GPU)

sbatch pretrain.sh

对于large-base,在pretrain.sh的第11行使用main_largebase.py

🔎 评估

我们使用Analysis-360在不同的llm基准上评估我们的模型。

📊 结果

模型名称	参数量	HellaSwag	Truthfulqa	MMLU	Arc_C	CrowsPairs	piqa	race	siqa	winogrande	平均分
gpt-neo-125m	0.15B	30.26	45.58	25.97	22.95	61.55	62.46	27.56	40.33	51.78	40.93
tiny-starcoder	0.17B	28.17	47.68	26.79	20.99	49.68	52.55	25.45	38.28	51.22	37.86
cerebras-gpt-256m	0.26B	28.99	45.98	26.83	22.01	60.52	61.42	27.46	40.53	52.49	40.69
opt-350m	0.35B	36.73	40.83	26.02	23.55	64.12	64.74	29.85	41.55	52.64	42.22
megatron-gpt2-345m	0.38B	39.18	41.51	24.32	24.23	64.82	66.87	31.19	40.28	52.96	42.81
LiteLlama	0.46B	38.47	41.59	26.17	24.91	62.90	67.73	28.42	40.27	49.88	42.26
gpt-sw3-356m	0.47B	37.05	42.55	25.93	23.63	61.59	64.85	32.15	41.56	53.04	42.48
pythia-410m	0.51B	40.85	41.22	27.25	26.19	64.20	67.19	30.71	41.40	53.12	43.57
xglm-564m	0.56B	34.64	40.43	25.18	24.57	62.25	64.85	29.28	42.68	53.03	41.87
Lamini-GPT-LM	0.59B	31.55	40.72	25.53	24.23	63.09	63.87	29.95	40.78	47.75	40.83
MobiLlama (我们的)	0.5B	52.52	38.05	26.45	29.52	64.03	72.03	33.68	40.22	57.53	46.00
Lamini-GPT-LM	0.77B	43.83	40.25	26.24	27.55	66.12	69.31	37.12	42.47	56.59	45.49
MobiLlama (我们的)	0.8B	54.09	38.48	26.92	30.20	64.82	73.17	33.37	41.60	57.45	46.67

该表提供了各种模型（包括我们的MobiLlama）在几个LLM基准测试中的比较分析。它突出显示了MobiLlama的卓越性能，特别是在其0.5B和0.8B配置中，展示了其在处理复杂语言任务时的效率和有效性。这种比较强调了MobiLlama在实现更高准确性方面的进步，并展示了其作为LLM领域领先解决方案的潜力。

模型	参数量	HellaSwag	Truthfulqa	MMLU	Arc_C	CrowsPairs	piqa	race	siqa	winogrande	平均分
Boomer	1B	31.62	39.42	25.42	22.26	61.26	57.99	28.99	40.32	50.98	39.80
Pythia-Dedup	1B	49.63	38.92	24.29	29.09	67.11	70.23	32.44	42.63	53.98	45.36
Falcon-RW	1B	63.12	35.96	25.36	35.06	69.04	74.10	36.07	40.23	61.88	48.98
TinyLlama	1.1B	60.22	37.59	26.11	33.61	70.60	73.28	36.45	41.65	59.18	48.74
OLMo	1.2B	62.50	32.94	25.86	34.45	69.59	73.70	36.74	41.14	58.90	48.42
Cerebras-GPT	1.3B	38.51	42.70	26.66	26.10	63.67	66.75	30.33	42.42	53.59	43.41
Lamini	1.3B	38.05	36.43	28.47	26.62	64.62	67.89	33.39	43.19	50.59	43.25
OPT	1.3B	54.50	38.67	24.63	29.60	70.70	72.47	34.16	42.47	59.74	47.43
GPT-NEO	1.3B	48.49	39.61	24.82	31.31	65.67	71.05	34.06	41.81	57.06	45.98
Pythia-Deduped	1.4B	55.00	38.63	25.45	32.59	67.33	72.68	34.64	42.68	56.90	47.32
large-base	1.2B	62.99	35.90	24.79	34.55	68.49	75.57	35.31	41.96	62.03	49.06

与现有的<2B参数完全开源LLM模型在9个基准测试上进行全面比较。我们在1.2T tokens上预训练的1.2B "large-base"模型达到了优于最近的OLMo 1.17B模型和TinyLlama 1.1B模型的性能，而后两者是在大幅更大的3T tokens数据上预训练的。

📱 Android上的MobiLlama

要在Android应用上运行我们的模型，请从这里下载并安装APK。

🙏 致谢

我们感谢LLM-360提供完全透明和开源的语言模型实现。MobiLlama存储库是使用LLM-360构建的。

📜 引用

@misc{thawakar2024mobillama,
      title={MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT}, 
      author={Omkar Thawakar and Ashmal Vayani and Salman Khan and Hisham Cholakkal and Rao Muhammad Anwer and Michael Felsberg and Timothy Baldwin and Eric P. Xing and Fahad Shahbaz Khan},
      year={2024},
      eprint={2402.16840},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}