Qwen1.5-110B-Chat - 多语言模型与人类偏好优化的显著提升

Qwen1.5-110B-Chat项目介绍

项目背景

Qwen1.5-110B-Chat是一个基于Transformer架构的语言模型项目，涵盖了多种模型尺寸。相较于之前的Qwen版本，此次更新带来了显著的改进，包括：

提供了9种模型尺寸：从0.5B到110B不等的密集模型，以及一个14B、其中2.7B激活的MoE模型。
聊天模型在人类偏好方面性能大幅提升。
基础模型和聊天模型均支持多语言。
稳定支持32K的上下文长度，适用于所有模型尺寸。
不再需要通过trust_remote_code进行信任代码。

有关更多详细信息，可以访问我们的博客文章和GitHub仓库。

模型详情

Qwen1.5系列是一个由不同尺寸的解码器语言模型组成的语言模型集。对于每一种尺寸，我们都发布了基础语言模型和对话聊天模型。Qwen1.5基于Transformer架构，采用了SwiGLU激活、注意力QKV偏置、组查询注意力、滑动窗口注意力与完全注意力的混合等技术。此外，改进的分词器可适应多种自然语言和代码。在此测试版本中，暂时未包括针对32B和110B之外的模型的GQA和SWA与完全注意力混合功能。

训练详情

Qwen1.5模型通过大量数据进行预训练，并同时通过有监督微调和直接偏好优化进行后期训练。

使用要求

Qwen1.5的代码已集成在最新版本的Hugging Face transformers中。建议安装transformers >= 4.37.0，否则可能会出现如下错误：

KeyError: 'qwen2'

快速入门

下面是一个使用apply_chat_template的方法代码片段，展示了如何加载分词器和模型并生成内容。

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"  # 将模型加载到的设备

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-110B-Chat",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-110B-Chat")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

使用建议

如果遇到代码切换或其他不理想情况，建议使用我们提供的generation_config.json中的超参数进行调整。

引用信息

如果您觉得我们的工作对您有帮助，欢迎进行引用。

@article{qwen,
  title={Qwen Technical Report},
  author={众多作者列表},
  journal={arXiv preprint arXiv:2309.16609},
  year={2023}
}