Wordcab 转录

💬 语音识别现已成为一种商品

基于FastAPI的API，使用faster-whisper转录音频文件，并使用自动调谐谱聚类进行说话人分离（基于此GitHub实现）。

[!重要]
如果您想了解Wordcab-Transcribe与市面上所有可用ASR工具相比的出色性能，请查看我们的基准测试项目：Rate that ASR。

主要特点

⚡ 快速：faster-whisper库和CTranslate2使音频处理比其他实现快得多。
🐳 易于部署：您可以使用Docker在工作站或云端部署项目。
🔥 批量请求：API中实现了批量请求，您可以一次转录多个音频文件。
💸 成本效益：作为开源解决方案，您无需支付昂贵的ASR平台费用。
🫶 易用的API：只需几行代码，您就可以使用API转录音频文件甚至YouTube视频。
🤗 MIT许可：您可以无限制地将项目用于商业目的。

要求

本地开发

Linux（在Ubuntu Server 20.04/22.04上测试过）
Python >=3.8, <3.12
Hatch
FFmpeg

在本地运行API 🚀

hatch run runtime:launch

部署

Docker（可选，用于部署）
NVIDIA GPU + NVIDIA Container Toolkit（可选，用于部署）

使用Docker运行API

构建镜像。

docker build -t wordcab-transcribe:latest .

运行容器。

docker run -d --name wordcab-transcribe \
--gpus all \
--shm-size 1g \
--restart unless-stopped \
-p 5001:5001 \
-v ~/.cache:/root/.cache \
wordcab-transcribe:latest

您可以将卷挂载到容器中以加载本地whisper模型。如果挂载卷，需要更新.env文件中的WHISPER_MODEL环境变量。

docker run -d --name wordcab-transcribe \
--gpus all \
--shm-size 1g \
--restart unless-stopped \
-p 5001:5001 \
-v ~/.cache:/root/.cache \
-v /path/to/whisper/models:/app/whisper/models \
wordcab-transcribe:latest

您可以使用以下命令简单地进入容器：

docker exec -it wordcab-transcribe /bin/bash

这对检查一切是否按预期工作很有用。

在反向代理后运行API

您可以在Nginx等反向代理后运行API。我们已包含nginx.conf文件以帮助您开始。

# 创建docker网络并将api容器连接到该网络
docker network create transcribe
docker network connect transcribe wordcab-transcribe

# 将/absolute/path/to/nginx.conf替换为您机器上nginx.conf文件的绝对路径
# （例如/home/user/wordcab-transcribe/nginx.conf）。
docker run -d \
--name nginx \
--network transcribe \
-p 80:80 \
-v /absolute/path/to/nginx.conf:/etc/nginx/nginx.conf:ro \
nginx

# 检查一切是否按预期工作
docker logs nginx

⏱️ 分析API性能

您可以使用py-spy作为分析器来分析进程执行情况。

# 使用cap-add=SYS_PTRACE选项启动容器
docker run -d --name wordcab-transcribe \
--gpus all \
--shm-size 1g \
--restart unless-stopped \
--cap-add=SYS_PTRACE \
-p 5001:5001 \
-v ~/.cache:/root/.cache \
wordcab-transcribe:latest

# 进入容器
docker exec -it wordcab-transcribe /bin/bash

# 安装py-spy
pip install py-spy

# 找到要分析的进程PID
top  # 例如28

# 运行分析器
py-spy record --pid 28 --format speedscope -o profile.speedscope.json

# 在API上执行任何任务以生成一些分析数据

# 退出容器并将生成的文件复制到本地机器
exit
docker cp wordcab-transcribe:/app/profile.speedscope.json profile.speedscope.json

# 访问https://www.speedscope.app/并上传文件以可视化分析结果

测试API

容器运行后，您可以测试API。 API文档可在http://localhost:5001/docs访问。

音频文件：

import json
import requests

filepath = "/path/to/audio/file.wav"  # 或任何其他可由ffmpeg转换的格式
data = {
    "num_speakers": -1,  # 保持为-1以猜测说话人数量
    "diarization": True,  # 处理时间更长但有说话人段落归属
    "multi_channel": False,  # 仅用于每个通道一个说话人的立体声音频文件
    "source_lang": "en",  # 可选，默认为"en"
    "timestamps": "s",  # 可选，默认为"s"。可以是"s"、"ms"或"hms"。
    "word_timestamps": False,  # 可选，默认为False
}

with open(filepath, "rb") as f:
    files = {"file": f}
    response = requests.post(
        "http://localhost:5001/api/v1/audio",
        files=files,
        data=data,
    )

r_json = response.json()

filename = filepath.split(".")[0]
with open(f"{filename}.json", "w", encoding="utf-8") as f:
    json.dump(r_json, f, indent=4, ensure_ascii=False)

YouTube视频：

import json
import requests

headers = {"accept": "application/json", "Content-Type": "application/json"}
params = {"url": "https://youtu.be/JZ696sbfPHs"}
data = {
    "diarization": True,  # 处理时间更长但有说话人段落归属
    "source_lang": "en",  # 可选，默认为"en"
    "timestamps": "s",  # 可选，默认为"s"。可以是"s"、"ms"或"hms"。
    "word_timestamps": False,  # 可选，默认为False
}

response = requests.post(
    "http://localhost:5001/api/v1/youtube",
    headers=headers,
    params=params,
    data=json.dumps(data),
)

r_json = response.json()

with open("youtube_video_output.json", "w", encoding="utf-8") as f:
    json.dump(r_json, f, indent=4, ensure_ascii=False)

运行本地模型

您可以链接本地文件夹路径以使用自定义模型。如果这样做，您应该在docker run命令中将文件夹挂载为卷，或将模型目录包含在Dockerfile中以将其烘焙到镜像中。

注意，对于默认的tensorrt-llm whisper引擎，获取转换后模型的最简单方法是使用hatch在本地启动服务器一次。在.env中指定WHISPER_MODEL和ALIGN_MODEL，然后在终端中运行hatch run runtime:launch。这将下载并转换这些模型。

然后，您会在cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models中找到转换后的模型。然后在Dockerfile中，将转换后的模型复制到/app/src/wordcab_transcribe/whisper_models目录。

WHISPER_MODEL的Dockerfile示例行：COPY cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models/large-v3 /app/src/wordcab_transcribe/whisper_models/large-v3 ALIGN_MODEL的Dockerfile示例行：COPY cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models/tiny /app/src/wordcab_transcribe/whisper_models/tiny

🚀 贡献

入门

确保您已安装Hatch（例如使用pipx）：

hatch

克隆仓库

git clone
cd wordcab-transcribe

安装依赖并开始编码

hatch env create

运行测试

# 不修改代码的质量检查
hatch run quality:check

# 质量检查和自动格式化
hatch run quality:format

# 运行带覆盖率的测试
hatch run tests:run

工作流程

为您要处理的功能或错误创建一个问题。
使用GitHub左侧面板创建分支。
git fetch和git checkout该分支。
进行更改并提交。
将分支推送到GitHub。
创建拉取请求并请求审核。
当获得批准且CI通过时合并拉取请求。
删除分支。
使用git fetch和git pull更新本地仓库。