MiniCPM: 揭示端侧大语言模型的无限潜力

中文 | English

MiniCPM 技术博客 | MiniCPM 论文 | MiniCPM-V 仓库 | 加入我们的 discord 和微信群

MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的系列端侧大模型，主体语言模型 MiniCPM-2B 仅有 24亿（2.4B）的非词嵌入参数量, 总计2.7B参数量。

经过 SFT 后，MiniCPM-2B 在公开综合性评测集上与 Mistral-7B 表现相近（中文、数学、代码能力更优），整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。
经过 DPO 后，MiniCPM-2B 在当前最接近用户体感的评测集 MTBench 上也超越了 Llama2-70B-Chat、Vicuna-33B、Mistral-7B-Instruct-v0.1、Zephyr-7B-alpha 等众多代表性开源大模型。
以 MiniCPM-2B 为基础构建端侧多模态大模型 MiniCPM-V 2.0，在多个测试基准中实现了 7B 以下模型的最佳性能，在 OpenCompass 榜单上超过了 Qwen-VL-Chat 9.6B、CogVLM-Chat 17.4B 和 Yi-VL 34B 等更大参数规模的模型。MiniCPM-V 2.0 还展现出领先的 OCR 能力，在场景文字识别能力上接近 Gemini Pro。
经过 Int4 量化后，MiniCPM 可在手机上进行部署推理，流式输出速度略高于人类说话速度。MiniCPM-V 也直接跑通了多模态大模型在手机上的部署。
一张1080/2080可高效参数微调，一张3090/4090可全参数微调，一台机器可持续训练 MiniCPM，二次开发成本较低。

我们完全开源MiniCPM系列的模型参数供学术研究和有限商用。具体而言，我们目前已公开以下模型，地址详见模型下载部分

基于MiniCPM-2B的指令微调与人类偏好对齐版本MiniCPM-2B-SFT/DPO。
基于MiniCPM-2B的多模态模型MiniCPM-V 2.0。
MiniCPM-2B-SFT/DPO的Int4量化版MiniCPM-2B-SFT/DPO-Int4。
MiniCPM-2B的128k长文本版本MiniCPM-2B-128k。
MiniCPM-2B的MoE版本MiniCPM-MoE-8x2B。
更轻量级的MiniCPM-1B指令微调版本MiniCPM-1B-SFT。
基于MLC-LLM、LLMFarm开发的MiniCPM手机端程序，文本及多模态模型均可在手机端进行推理。
MiniCPM-2B训练过程中的30个Checkpoints供模型机理研究。

局限性：

受限于模型规模，模型可能出现幻觉性问题。其中由于DPO模型生成的回复内容更长，更容易出现幻觉。我们也将持续进行MiniCPM模型的迭代改进。
为了保证在学术研究用途上模型的通用性，我们未对模型进行任何身份认同训练。同时由于我们用ShareGPT开源语料作为部分训练数据，模型可能会输出类似GPT系列模型的身份认同信息。
受限于模型规模，模型的输出受到提示词（prompt）的影响较大，可能多次尝试产生不一致的结果。
受限于模型容量，模型的知识记忆较不准确，后续我们将结合RAG方法来增强模型的知识记忆能力。

常用模块导航

以下表格可以让你快速访问常用的工程模块，如果你需要广泛而详细的教程请点击教程

推理	微调	手机部署	量化
Transformers	Transformers	MLC部署	GPTQ
vLLM	mlx_finetune	llama.cpp	AWQ
llama.cpp	LLaMA-Factory		bnb
ollama			量化测试
fastllm
mlx_lm
powerinfer

更新日志

2024/04/11 开源MiniCPM-V-2.0、MiniCPM-2B-128k、MiniCPM-MoE-8x2B和MiniCPM-1B！点击这里查看技术博客。
2024/03/16 MiniCPM-2B 的30余个中间检查点开放了！HuggingFace链接
2024/02/13 支持了llama.cpp
2024/02/09 我们在README里加入了一个开源社区章节，用来收集开源社区对MiniCPM的支持案例。
2024/02/08 我们更新了llama-format的模型权重，方便大家更加快捷地使用我们的模型。
2024/02/01 初始发布。

模型下载

语言模型

HuggingFace	ModelScope	WiseModel
MiniCPM-2B-sft-bf16	MiniCPM-2B-sft-bf16	MiniCPM-2B-sft-bf16
MiniCPM-2B-dpo-bf16	MiniCPM-2B-dpo-bf16	MiniCPM-2B-dpo-bf16
MiniCPM-2B-128k	MiniCPM-2B-128k
MiniCPM-MoE-8x2B	MiniCPM-MoE-8x2B
MiniCPM-1B-sft-bf16	MiniCPM-1B-sft-bf16

注: 更多模型版本见这里。

多模态模型

HuggingFace ModelScope WiseModel
MiniCPM-V 2.0 MiniCPM-V 2.0
MiniCPM-V MiniCPM-V MiniCPM-V
OmniLMM-12B OmniLMM-12B OmniLMM-12B

HuggingFace	ModelScope	WiseModel
MiniCPM-V 2.0	MiniCPM-V 2.0
MiniCPM-V	MiniCPM-V	MiniCPM-V
OmniLMM-12B	OmniLMM-12B	OmniLMM-12B

快速上手

在线体验

Colab

Huggingface 模型

MiniCPM-2B

安装transformers>=4.36.0以及accelerate后，运行以下代码

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM-2B-dpo-bf16'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？", temperature=0.5, top_p=0.8, repetition_penalty=1.02)
print(responds)

期望输出

山东省最高的山是泰山，海拔1545米。

相对于黄山（海拔1864米），泰山海拔较低，相差约319米。

MiniCPM-2B （Llama Format）

我们将MiniCPM的模型权重转化成了Llama代码可以直接调用的格式，以便大家尝试:

import torch
from transformers import LlamaTokenizerFast, LlamaForCausalLM
model_path = "openbmb/MiniCPM-2B-dpo-bf16-llama-format"
tokenizer = LlamaTokenizerFast.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

prompt="Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: `ls -l`"
input_ids = tokenizer.encode("<用户>{}<AI>".format(prompt), return_tensors='pt', add_special_tokens=True).cuda()
responds = model.generate(input_ids, temperature=0.3, top_p=0.8, repetition_penalty=1.02, max_length=1024)
responds = tokenizer.decode(responds[0], skip_special_tokens=True)
print(responds)

MiniCPM-V

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V', trust_remote_code=True)
model.eval().cuda()

image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': question}]

res, context, _ = model.chat(
    image=image,
    msgs=msgs,
    context=None,
    tokenizer=tokenizer,
    sampling=True,
    temperature=0.7
)
print(res)

vLLM 推理

安装vLLM

pip install "vllm>=0.4.1"

测试样例

python inference/inference_vllm.py --model_path <hf_repo_path> --prompt_path prompts/prompt_demo.txt

期望输出

<用户>: Which city is the capital of China?
<AI>:
 The capital city of China is Beijing. Beijing is a major political, cultural, and economic center in China, and it is known for its rich history, beautiful architecture, and vibrant nightlife. It is also home to many of China's most important cultural and historical sites, including the Forbidden City, the Great Wall of China, and the Temple of Heaven. Beijing is a popular destination for tourists from around the world, and it is an important hub for international business and trade.