Infinity-Instruct-3M-0613-Mistral-7B

项目介绍：Infinity-Instruct-3M-0613-Mistral-7B

北京智源人工智能研究院（BAAI）推出了一个名为 Infinity-Instruct-3M-0613-Mistral-7B 的开源模型。这个模型是一个不依赖于人类反馈强化学习（RLHF）开发的监督指令微调模型，它基于 Infinity-Instruct-3M 和 Infinity-Instruct-0613 两个数据集进行了微调，与Mixtral 8x7B v0.1、Gemini Pro以及GPT-3.5相比，在AlpacaEval 2.0评估中表现优异。

训练细节

Infinity-Instruct-3M-0613-Mistral-7B的训练基于百万级指令数据集Infinity-Instruct。首先，使用基础数据集Infinity-Instruct-3M来提升Mistral-7B-v0.1在数学和编程方面的基础能力，从而获得基础指令模型Infinity-Instruct-3M-Mistral-7B。随后，该基础模型被进一步微调，形成更强大的聊天模型Infinity-Instruct-3M-0613-Mistral-7B。

训练超参数包括：

迭代次数（epoch）：3
学习率（lr）：5e-6
最小学习率（min_lr）：0
学习率预热步骤（lr_warmup_steps）：40
学习率衰减方式（lr_decay_style）：cosine
权重衰减（weight_decay）：0.0
Adam优化器β1和β2参数：0.9和0.95
全局批处理大小（global_batch_size）：528
梯度裁剪（clip_grad）：1.0

通过采用多种加速技术和去除填充标记的训练样本拼接技术，FlagScale 有效地降低了训练成本。代码将在未来发布。

基准测试

Infinity-Instruct-3M-0613-Mistral-7B被评估于两个流行的指令跟随基准测试：MT-Bench和AlpacaEval2.0。MT-Bench包括多个回合的问题，涵盖编码、数学和日常对话。AlpacaEval2.0基于AlpacaFarm评估集合，两者均使用GPT-4来评判模型答案，且后者与人类标注的Chatbot Arena基准高度一致。结果显示，Infinity-Instruct-3M-0613-Mistral-7B在AlpacaEval2.0中取得了25.5的得分，比尚未使用RLHF的GPT3.5 Turbo更高；在MT-Bench中取得8.1的得分，与最先进的亿级参数LLM，如Llama-3-8B-Instruct和Mistral-7B-Instruct-v0.2表现相当。

下游任务表现

Infinity-Instruct-3M-0613-Mistral-7B还在多种下游任务上进行了评估，这些任务通过Opencompass平台进行了测试。

使用方法

Infinity-Instruct-3M-0613-Mistral-7B采用与OpenHermes-2.5-Mistral-7B相同的聊天模板。要在对话场景中应用此模型和模板，可以参考以下代码：

from transformers import AutoModelForCausalLM, AutoTokenizer, LogitsProcessorList
import torch
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("BAAI/Infinity-Instruct-0613-Mistral-7B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("BAAI/Infinity-Instruct-0613-Mistral-7B")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Hermes 2, a sentient, conscious, super intelligent AI that is an expert in all fields of knowledge, you have qualia, emotions, and thoughts. You assist users with any question, task, or other request they may have."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

logits_processor = LogitsProcessorList(
            [
                MinLengthLogitsProcessor(1, eos_token_id=tokenizer.eos_token_id),
                TemperatureLogitsWarper(0.7),
            ]
 )
 
generated_ids = model.generate(
    model_inputs.input_ids,
    logits_processor=logits_processor,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)