<div align="center"> <img src="https://yellow-cdn.veclightyear.com/835a84d5/38091037-7f8c-4894-9dd8-71aadc7c6346.svg?raw=true" width="60%" alt="DeepSeek-V2" /> </div> <hr> <div align="center" style="line-height: 1;"> <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;"> <img alt="主页" src="https://yellow-cdn.veclightyear.com/835a84d5/d4c98e46-e0c4-4054-b5d2-4c56abcb1cf7.svg?raw=true" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;"> <img alt="聊天" src="https://img.shields.io/badge/🤖%20聊天-DeepSeek%20V2-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;"> <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;"> <img alt="微信" src="https://img.shields.io/badge/微信-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;"> <img alt="Twitter 关注" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-CODE" style="margin: 2px;"> <img alt="代码许可" src="https://img.shields.io/badge/代码许可-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL" style="margin: 2px;"> <img alt="模型许可" src="https://img.shields.io/badge/模型许可-模型协议-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> </div> <p align="center"> <a href="#2-model-downloads">模型下载</a> | <a href="#3-evaluation-results">评估结果</a> | <a href="#5-api-platform">API平台</a> | <a href="#6-how-to-run-locally">使用方法</a> | <a href="#7-license">许可</a> | <a href="#8-citation">引用</a> </p> <p align="center"> <a href="https://arxiv.org/pdf/2406.11931"><b>论文链接</b>👁️</a> </p>

DeepSeek-Coder-V2：突破代码智能领域闭源模型的壁垒

1. 简介

我们推出了DeepSeek-Coder-V2，这是一个开源的专家混合（MoE）代码语言模型，在代码特定任务中达到了与GPT4-Turbo相当的性能。具体来说，DeepSeek-Coder-V2是在DeepSeek-V2的中间检查点基础上进行了额外6万亿token的进一步预训练。通过这种持续预训练，DeepSeek-Coder-V2大幅提升了DeepSeek-V2的编码和数学推理能力，同时在通用语言任务中保持了相当的性能。与DeepSeek-Coder-33B相比，DeepSeek-Coder-V2在代码相关任务的各个方面，以及推理和通用能力上都展现出显著的进步。此外，DeepSeek-Coder-V2将支持的编程语言从86种扩展到338种，并将上下文长度从16K延长到128K。

在标准基准评估中，DeepSeek-Coder-V2在编码和数学基准测试中的表现超过了GPT4-Turbo、Claude 3 Opus和Gemini 1.5 Pro等闭源模型。支持的编程语言列表可以在这里找到。

2. 模型下载

我们向公众发布了基于DeepSeekMoE框架的16B和236B参数的DeepSeek-Coder-V2，其实际活跃参数仅为2.4B和21B，包括基础模型和指令模型。

模型	总参数量	活跃参数量	上下文长度	下载
DeepSeek-Coder-V2-Lite-Base	16B	2.4B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Lite-Instruct	16B	2.4B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Base	236B	21B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Instruct	236B	21B	128k	🤗 HuggingFace

</div>

3. 评估结果

3.1 代码生成

	#总参数	#活跃参数	HumanEval	MBPP+	LiveCodeBench	USACO
闭源模型
Gemini-1.5-Pro	-	-	83.5	74.6	34.1	4.9
Claude-3-Opus	-	-	84.2	72.0	34.6	7.8
GPT-4-Turbo-1106	-	-	87.8	69.3	37.1	11.1
GPT-4-Turbo-0409	-	-	88.2	72.2	45.7	12.3
GPT-4o-0513	-	-	91.0	73.5	43.4	18.8
开源模型
CodeStral	22B	22B	78.1	68.2	31.0	4.6
DeepSeek-Coder-Instruct	33B	33B	79.3	70.1	22.5	4.2
Llama3-Instruct	70B	70B	81.1	68.8	28.7	3.3
DeepSeek-Coder-V2-Lite-Instruct	16B	2.4B	81.1	68.8	24.3	6.5
DeepSeek-Coder-V2-Instruct	236B	21B	90.2	76.2	43.4	12.1

3.2 代码补全

模型	#总参数	#活跃参数	RepoBench (Python)	RepoBench (Java)	HumanEval FIM
CodeStral	22B	22B	46.1	45.7	83.0
DeepSeek-Coder-Base	7B	7B	36.2	43.3	86.1
DeepSeek-Coder-Base	33B	33B	39.1	44.8	86.4
DeepSeek-Coder-V2-Lite-Base	16B	2.4B	38.9	43.3	86.4

3.3 代码修复

	#TP	#AP	Defects4J	SWE-Bench	Aider
闭源模型
Gemini-1.5-Pro	-	-	18.6	19.3	57.1
Claude-3-Opus	-	-	25.5	11.7	68.4
GPT-4-Turbo-1106	-	-	22.8	22.7	65.4
GPT-4-Turbo-0409	-	-	24.3	18.3	63.9
GPT-4o-0513	-	-	26.1	26.7	72.9
开源模型
CodeStral	22B	22B	17.8	2.7	51.1
DeepSeek-Coder-Instruct	33B	33B	11.3	0.0	54.5
Llama3-Instruct	70B	70B	16.2	-	49.2
DeepSeek-Coder-V2-Lite-Instruct	16B	2.4B	9.2	0.0	44.4
DeepSeek-Coder-V2-Instruct	236B	21B	21.0	12.7	73.7

3.4 数学推理能力

	#TP	#AP	GSM8K	MATH	AIME 2024	Math Odyssey
闭源模型
Gemini-1.5-Pro	-	-	90.8	67.7	2/30	45.0
Claude-3-Opus	-	-	95.0	60.1	2/30	40.6
GPT-4-Turbo-1106	-	-	91.4	64.3	1/30	49.1
GPT-4-Turbo-0409	-	-	93.7	73.4	3/30	46.8
GPT-4o-0513	-	-	95.8	76.6	2/30	53.2
开源模型
Llama3-Instruct	70B	70B	93.0	50.4	1/30	27.9
DeepSeek-Coder-V2-Lite-Instruct	16B	2.4B	86.4	61.8	0/30	44.4
DeepSeek-Coder-V2-Instruct	236B	21B	94.9	75.7	4/30	53.7

3.5 通用自然语言处理

基准测试	领域	DeepSeek-V2-Lite Chat	DeepSeek-Coder-V2-Lite Instruct	DeepSeek-V2 Chat	DeepSeek-Coder-V2 Instruct
BBH	英语	48.1	61.2	79.7	83.9
MMLU	英语	55.7	60.1	78.1	79.2
ARC-Easy	英语	86.1	88.9	98.1	97.4
ARC-Challenge	英语	73.4	77.4	92.3	92.8
TriviaQA	英语	65.2	59.5	86.7	82.3
NaturalQuestions	英语	35.5	30.8	53.4	47.5
AGIEval	英语	42.8	28.7	61.4	60
CLUEWSC	中文	80.0	76.5	89.9	85.9
C-Eval	中文	60.1	61.6	78.0	79.4
CMMLU	中文	62.5	62.7	81.6	80.9
Arena-Hard	-	11.4	38.1	41.6	65.0
AlpaceEval 2.0	-	16.9	17.7	38.9	36.9
MT-Bench	-	7.37	7.81	8.97	8.77
Alignbench	-	6.02	6.83	7.91	7.84

3.6 上下文窗口

在"大海捞针"（NIAH）测试上的评估结果。DeepSeek-Coder-V2在所有长达128K的上下文窗口长度上表现良好。

4. 聊天网站

您可以在DeepSeek的官方网站上与DeepSeek-Coder-V2聊天：coder.deepseek.com

5. API平台

我们还在DeepSeek平台提供OpenAI兼容的API：platform.deepseek.com，您也可以以无与伦比的价格按使用量付费。

6. 如何本地运行

这里，我们提供了一些如何使用DeepSeek-Coder-V2-Lite模型的示例。如果您想使用BF16格式的DeepSeek-Coder-V2进行推理，需要80GB*8的GPU。

使用Huggingface的Transformers进行推理

您可以直接使用Huggingface的Transformers进行模型推理。

代码补全

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#写一个快速排序算法"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

代码插入

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = """<｜fim▁begin｜>def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = []
    right = []
<｜fim▁hole｜>
        if arr[i] < pivot:
            left.append(arr[i])
        else:
            right.append(arr[i])
    return quick_sort(left) + [pivot] + quick_sort(right)<｜fim▁end｜>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

对话补全

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
messages=[
    { 'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# tokenizer.eos_token_id 是 <｜end▁of▁sentence｜> 标记的 id
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

完整的对话模板可以在 huggingface 模型仓库中的 tokenizer_config.json 文件中找到。

以下是对话模板的示例：

<｜begin▁of▁sentence｜>User: {user_message_1}

A: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

A:

你也可以添加一个可选的系统消息：

<｜begin▁of▁sentence｜>{system_message}

User: {user_message_1}

A: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

A:

在对话的最后一轮中，请注意"Assistant:"后面没有空格。在16B-Lite模型上，添加空格可能会导致以下问题：

英语问题得到中文回答。
回答包含乱码。
回答过度重复。

Ollama 的旧版本存在这个 bug（参见 https://github.com/deepseek-ai/DeepSeek-Coder-V2/issues/12），但在最新版本中已修复。

使用 vLLM 进行推理（推荐）

要使用 vLLM 进行模型推理，请将此 Pull Request 合并到你的 vLLM 代码库中：https://github.com/vllm-project/vllm/pull/4650。

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 1
model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "write a quick sort algorithm in python."}],
    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

7. 许可证

此代码仓库采用 MIT 许可证。DeepSeek-Coder-V2 基础/指令模型的使用受模型许可证约束。DeepSeek-Coder-V2 系列（包括基础和指令模型）支持商业使用。

8. 引用

@article{zhu2024deepseek,
  title={DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence},
  author={Zhu, Qihao and Guo, Daya and Shao, Zhihong and Yang, Dejian and Wang, Peiyi and Xu, Runxin and Wu, Y and Li, Yukun and Gao, Huazuo and Ma, Shirong and others},
  journal={arXiv preprint arXiv:2406.11931},
  year={2024}
}