Qwen1.5-32B-Chat - 支持多语言的人类交互优化模型

项目介绍

Qwen1.5-32B-Chat是一个基于变压器的解码式语言模型，专为处理和生成文本而设计。它是Qwen2的测试版本，并在先前发布的Qwen模型基础上进行了多方面的改进。

模型改进

Qwen1.5在多个方面进行了显著提升：

多样模型尺寸：提供了8种模型尺寸，包括0.5B、1.8B、4B、7B、14B、32B和72B的密集模型，以及一个14B且激活2.7B的MoE模型。
性能提升：在人类偏好测试中，聊天模型的表现取得显著提升。
多语言支持：基础模型和聊天模型均支持多种语言。
长文本支持：所有尺寸的模型都稳定支持32K的上下文长度。
无需特殊代码信任：不需要信任远程代码即可使用。

模型详情

Qwen1.5定义了一系列解码语言模型，涵盖不同的模型尺寸。每种尺寸都有基础语言模型和对齐的聊天模型。模型基于变压器架构，采用SwiGLU激活、注意力QKV偏置、组查询注意力、滑动窗注意力和全局注意力混合等技术。此外，改进了适用于多种自然语言和代码的分词器。

训练详情

模型通过大量数据进行预训练，并通过监督微调和直接偏好优化进行后续训练。

使用要求

Qwen1.5的代码已集成在最新的Hugging Face transformers库中，建议安装transformers>=4.37.0，否则可能会遇到KeyError: 'qwen2'的错误。

快速入门

以下是一个使用apply_chat_template的代码示例，展示如何加载分词器和模型，并生成内容。

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # 设备

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-32B-Chat",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-32B-Chat")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

小贴士

如果遇到代码切换或其他不良情况，建议使用我们在generation_config.json中提供的超参数。

引用

如果您觉得我们的工作有帮助，欢迎引用：

@article{qwen,
  title={Qwen Technical Report},
  author={多位作者名字},
  journal={arXiv preprint arXiv:2309.16609},
  year={2023}
}