WeNet

路线图 | 文档 | 论文 | 运行时 | 预训练模型 | HuggingFace

我们一起分享网络。

亮点

以生产为先，为生产而准备：作为核心设计原则，WeNet为语音识别提供全栈生产解决方案。
准确：WeNet在许多公开语音数据集上达到了最先进的结果。
轻量级：WeNet易于安装、易于使用、设计良好且文档完善。

安装

安装Python包

pip install git+https://github.com/wenet-e2e/wenet.git

命令行用法（使用-h查看参数）：

wenet --language chinese audio.wav

Python编程用法：

import wenet

model = wenet.load_model('chinese')
result = model.transcribe('audio.wav')
print(result['text'])

更多命令行和Python编程用法请参考Python使用说明。

安装用于训练和部署

克隆仓库

git clone https://github.com/wenet-e2e/wenet.git

安装Conda：请参见 https://docs.conda.io/en/latest/miniconda.html
创建Conda环境：

conda create -n wenet python=3.10
conda activate wenet
conda install conda-forge::sox

安装CUDA：请按照此链接进行操作，建议安装CUDA 12.1
安装torch和torchaudio，推荐使用2.2.2+cu121版本：

pip install torch==2.2.2+cu121 torchaudio==2.2.2+cu121 -f https://download.pytorch.org/whl/torch_stable.html

对于Ascend NPU用户：

安装CANN：请按照此链接安装CANN工具包和内核。
安装带有torch-npu依赖的WeNet：

pip install -e .[torch-npu]

相关版本控制表：

要求	最低版本	推荐版本
CANN	8.0.RC2.alpha003	最新版
torch	2.1.0	2.2.0
torch-npu	2.1.0	2.2.0
torchaudio	2.1.0	2.2.0
deepspeed	0.13.2	最新版

安装其他Python包

pip install -r requirements.txt
pre-commit install  # 用于保持代码整洁

常见问题解答（FAQ）

# 如果遇到sox兼容性问题
RuntimeError: set_buffer_size requires sox extension which is not available.
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel
# conda环境
conda install conda-forge::sox

部署构建

可选地，如果你想使用x86运行时或语言模型（LM），你必须按以下步骤构建运行时。否则，你可以忽略此步骤。

# 运行时构建需要cmake 3.14或更高版本
cd runtime/libtorch
mkdir build && cd build && cmake -DGRAPH_TOOLS=ON .. && cmake --build .

请查看文档了解在更多平台和操作系统上构建运行时的方法。

讨论与交流

你可以直接在Github Issues上进行讨论。

对于中国用户，你也可以扫描左侧的二维码关注我们的WeNet官方公众号。我们创建了一个微信群，以便更好地讨论和更快地回应。请扫描右侧的个人二维码，该人员负责邀请你加入聊天群。

致谢

我们从ESPnet借鉴了大量代码用于基于Transformer的建模。
我们从Kaldi借鉴了大量代码用于基于WFST的解码以实现LM集成。
我们参考了EESEN来构建基于TLG的图以实现LM集成。
我们参考了OpenTransformer来实现端到端模型的Python批处理推理。

引用

@inproceedings{yao2021wenet,
title={WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit},
author={Yao, Zhuoyuan and Wu, Di and Wang, Xiong and Zhang, Binbin and Yu, Fan and Yang, Chao and Peng, Zhendong and Chen, Xiaoyu and Xie, Lei and Lei, Xin},
  booktitle={Proc. Interspeech},
  year={2021},
  address={Brno, Czech Republic },
  organization={IEEE}
}

@article{zhang2022wenet,
  title={WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit},
  author={Zhang, Binbin and Wu, Di and Peng, Zhendong and Song, Xingchen and Yao, Zhuoyuan and Lv, Hang and Xie, Lei and Yang, Chao and Pan, Fuping and Niu, Jianwei},
  journal={arXiv preprint arXiv:2203.15455},
  year={2022}
}

wenet

WeNet

亮点

安装

安装Python包

安装用于训练和部署

讨论与交流

致谢

引用

相关项目

最新项目