Infinity-Instruct-7M-Gen-Llama3_1-8B - 开源指令调优模型，支持大规模无监督学习

Infinity-Instruct-7M-Gen-Llama3_1-8B项目介绍

北京智源人工智能研究院（BAAI）推出了Infinity-Instruct-7M-Gen-Llama3_1-8B项目，这是一个开源的监督指令优化模型，不依赖于人类反馈的强化学习（RLHF）。该模型基于Infinity-Instruct-7M和Infinity-Instruct-Gen数据集进行微调，与GPT4相比，在AlpacaEval 2.0中展现了优异的表现。

新闻动态

2024年8月2日，发布了InfInstruct-Llama3.1-70B Gen、InfInstruct-Llama3.1-8B Gen、InfInstruct-Mistral-7B Gen模型权重。
2024年8月2日，发布了7M基础数据集Infinity-Instruct-7M。
2024年7月9日，发布了InfInstruct-Mistral-7B 0625等多种模型权重，以及升级版聊天数据集Infinity-Instruct-0625。
2024年6月28日，发布了InfInstruct-Llama3-8B 0613模型权重。
2024年6月21日，发布了InfInstruct-Mistral-7B 0613模型权重。
2024年6月13日，分享了数据构建过程中的中间结果。

训练细节

Infinity-Instruct-7M-Gen-Llama3.1-8B基于百万级别指令数据集Infinity-Instruct进行优化。首先使用Infinity-Instruct-7M提升Llama3-8B的基础能力（数学和代码），形成基础指令模型Infinity-Instruct-7M-Llama3.1-8B。之后，再进一步微调得到更强的聊天模型Infinity-Instruct-7M-Gen-Llama3.1-8B。训练过程中采用了FlagScale工具进行多样化加速技术，减少了训练成本。

训练参数如下：

epoch: 3
lr: 5e-6
min_lr: 0
lr_warmup_steps: 40
lr_decay_style: cosine
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.95
global_batch_size: 528
clip_grad: 1.0

基准测试

模型	MT-Bench	AlpacaEval2.0	Arena-hard
GPT-4-0314	9.0	35.3	50.0
GPT-4-0613	9.2	30.2	37.9
GPT-4-1106	9.3	30.2	--
Llama-3-8B-Instruct	9.0	34.4	46.6
Llama-3.1-8B-Instruct	--	20.9	20.6
InfInstruct-7M-Llama-3.1-8B	8.2	33.9	30.4

Infinity-Instruct-7M-Llama-3.1-8B模型在不使用RLHF的情况下表现突出。

使用方法

Infinity-Instruct-7M-Gen-Llama3.1-8B采用与Llama3-8B-instruct相同的聊天模板。

以下是如何在对话场景中应用该模型和模板的代码示例：

from transformers import AutoModelForCausalLM, AutoTokenizer, LogitsProcessorList
import torch
device = "cuda"

model = AutoModelForCausalLM.from_pretrained("BAAI/Infinity-Instruct-7M-Gen-Llama3_1-8B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("BAAI/Infinity-Instruct-7M-Gen-Llama3_1-8B")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

logits_processor = LogitsProcessorList(
            [
                MinLengthLogitsProcessor(1, eos_token_id=tokenizer.eos_token_id),
                TemperatureLogitsWarper(0.7),
            ]
 )
 
generated_ids = model.generate(
    model_inputs.input_ids,
    logits_processor=logits_processor,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

免责声明

本项目的代码、数据和模型权重等资源仅限于学术研究用途，禁止用于商业用途。由Infinity Instruct生成的内容因受随机性等不可控因素影响，其准确性无法得到保证。本项目不对模型输出内容，以及使用相关资源及其产生结果所造成的任何损失承担法律责任。

引用

本项目详细介绍Infinity Instruct数据集及其优化模型的论文将在arXiv上发布，敬请期待！

@article{InfinityInstruct2024,
  title={Infinity Instruct},
  author={Beijing Academy of Artificial Intelligence (BAAI)},
  journal={arXiv preprint arXiv:2406.XXXX},
  year={2024}
}