专家专门化微调

这是论文《让专家专注于自己的领域:稀疏架构大语言模型的专家专门化微调》的官方代码库,该论文由Zihan Wang、Deli Chen、Damai Dai、Runxin Xu、Zhuoshu Li和Y. Wu撰写。

ESFT旨在通过仅调整与任务相关的部分,高效地定制具有专家混合(MoE)架构的大语言模型,从而在使用更少资源和存储的同时提高效率和性能。

📰 新闻

📅 2024年8月11日: 我们现在发布了ESFT训练代码! ✨ 您现在可以用自己的模型和数据集进行尝试!

🚀 快速开始

安装和设置

git clone https://github.com/deepseek-ai/ESFT.git
cd esft

安装所需依赖

pip install transformers torch safetensors accelerate

下载必要的适配器

bash scripts/download_adapters.sh

🔧关键脚本

eval_multigpu.py 此脚本评估模型在各种数据集上的性能。详细配置和说明请参见scripts/eval.sh。

用法:

python eval_multigpu.py \
    --eval_dataset=translation \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --adapter_dir=all_models/adapters/token/translation \
    --output_path=results/completions/token/translation.jsonl \
    --openai_api_key=YOUR_OPENAI_API_KEY

get_expert_scores.py 此脚本根据评估数据集计算每个专家的分数。 用法:

python scripts/expert/get_expert_scores.py \
    --eval_dataset=translation \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --output_dir=results/expert_scores/translation \
    --n_sample_tokens=131072 \
    --world_size=4 \
    --gpus_per_rank=2

generate_expert_config.py 此脚本生成配置,用于转换只训练了与任务相关的任务的MoE模型,基于评估分数。 用法:

python scripts/expert/generate_expert_config.py \
    --eval_datasets=intent,summary,law,translation \
    --expert_scores_dir=results/expert_scores \
    --output_dir=results/expert_configs \
    --score_function=token \
    --top_p=0.2 # 评分函数和top_p是超参数

train.py 和 train_ep.py 此脚本使用由前一个脚本生成的专家配置来训练模型。train_ep.py文件使用专家并行,并已针对多GPU训练进行了优化。 用法:

python train.py \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --expert_config=results/expert_configs/intent.json \
    --train_dataset=intent \
    --train_config=configs/base.yaml \
    --output_dir=results/checkpoints/intent
    
torchrun --nproc-per-node=8 train_ep.py \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --expert_config=results/expert_configs/translation.json \
    --train_dataset=translation \
    --train_config=configs/base.yaml \
    --output_dir=results/checkpoints/translation

联系和支持

对于错误报告、功能请求和一般查询,请在我们的GitHub问题页面上开一个issue。请确保包含尽可能多的细节,以帮助我们快速解决您的问题。

🌟待办事项

☑️ 📝 更新模型、评估脚本和专家选择脚本
☑️ 🔧 更新训练脚本
🔲 🚀 更多...

📚引用

如果您发现我们的代码或论文有用,请引用:

@article{wang2024letexpertsticklast,
      title={Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models}, 
      author={Zihan Wang and Deli Chen and Damai Dai and Runxin Xu and Zhuoshu Li and Y. Wu},
      year={2024},
      eprint={2407.01906},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.01906}, 
}