英雄图

vocode

在几分钟内构建基于语音的LLM应用

Vocode是一个开源库，可以轻松构建基于语音的LLM应用。使用Vocode，您可以构建与LLM的实时流式对话，并将其部署到电话通话、Zoom会议等场景中。您还可以构建个人助手或语音国际象棋等应用。Vocode提供简单的抽象和集成，使您所需的一切都在一个单一的库中。

我们正在积极寻找社区维护者，如果您感兴趣，请联系我们！

⭐️ 特性

🗣 使用系统音频开始对话
➡️ 📞 设置一个由基于LLM的代理响应的电话号码
📞 ➡️ 从您的电话号码发出由基于LLM的代理管理的电话
🧑‍💻 拨入Zoom通话
🤖 在Langchain代理中使用对真实电话号码的外呼
开箱即用的集成，包括：
- 转录服务，包括：
  - AssemblyAI
  - Deepgram
  - Gladia
  - Google Cloud
  - Microsoft Azure
  - RevAI
  - Whisper
  - Whisper.cpp
- LLM，包括：
  - OpenAI
  - Anthropic
- 合成服务，包括：
  - Rime.ai
  - Microsoft Azure
  - Google Cloud
  - Play.ht
  - Eleven Labs
  - Cartesia
  - Coqui (OSS)
  - gTTS
  - StreamElements
  - Bark
  - AWS Polly

查看我们的React SDK 在这里！

🫂 贡献和路线图

我们是一个开源项目，非常欢迎贡献者添加新功能、集成和文档！请不要犹豫，联系我们并开始与我们一起构建。

有关贡献的更多信息，请参阅我们的贡献指南。

查看我们的路线图。

我们很乐意在Discord上与您讨论新想法和贡献！

🚀 快速开始

pip install vocode

import asyncio
import signal

from pydantic_settings import BaseSettings, SettingsConfigDict

from vocode.helpers import create_streaming_microphone_input_and_speaker_output
from vocode.logging import configure_pretty_logging
from vocode.streaming.agent.chat_gpt_agent import ChatGPTAgent
from vocode.streaming.models.agent import ChatGPTAgentConfig
from vocode.streaming.models.message import BaseMessage
from vocode.streaming.models.synthesizer import AzureSynthesizerConfig
from vocode.streaming.models.transcriber import (
    DeepgramTranscriberConfig,
    PunctuationEndpointingConfig,
)
from vocode.streaming.streaming_conversation import StreamingConversation
from vocode.streaming.synthesizer.azure_synthesizer import AzureSynthesizer
from vocode.streaming.transcriber.deepgram_transcriber import DeepgramTranscriber

configure_pretty_logging()


class Settings(BaseSettings):
    """
    流式对话快速入门的设置。
    这些参数可以通过环境变量进行配置。
    """

    openai_api_key: str = "在此输入您的OPENAI_API_KEY"
    azure_speech_key: str = "在此输入您的AZURE_KEY"
    deepgram_api_key: str = "在此输入您的DEEPGRAM_API_KEY"

    azure_speech_region: str = "eastus"

    # 这意味着可以使用.env文件来覆盖这些设置
    # 例如："OPENAI_API_KEY=my_key"将覆盖上面的默认openai_api_key
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )


settings = Settings()


async def main():
    (
        microphone_input,
        speaker_output,
    ) = create_streaming_microphone_input_and_speaker_output(
        use_default_devices=False,
    )

    conversation = StreamingConversation(
        output_device=speaker_output,
        transcriber=DeepgramTranscriber(
            DeepgramTranscriberConfig.from_input_device(
                microphone_input,
                endpointing_config=PunctuationEndpointingConfig(),
                api_key=settings.deepgram_api_key,
            ),
        ),
        agent=ChatGPTAgent(
            ChatGPTAgentConfig(
                openai_api_key=settings.openai_api_key,
                initial_message=BaseMessage(text="你好"),
                prompt_preamble="""AI正在进行一场愉快的关于生活的对话""",
            )
        ),
        synthesizer=AzureSynthesizer(
            AzureSynthesizerConfig.from_output_device(speaker_output),
            azure_speech_key=settings.azure_speech_key,
            azure_speech_region=settings.azure_speech_region,
        ),
    )
    await conversation.start()
    print("对话已开始，按Ctrl+C结束")
    signal.signal(signal.SIGINT, lambda _0, _1: asyncio.create_task(conversation.terminate()))
    while conversation.is_active():
        chunk = await microphone_input.get_audio()
        conversation.receive_audio(chunk)


if __name__ == "__main__":
    asyncio.run(main())