SoundStorm

SoundStorm：高效并行音频生成（开发中）

Google Research推出的<a href="https://google-research.github.io/seanet/soundstorm/examples/">SoundStorm</a>并行音频生成的非官方Pytorch实现。

目前，我们首先提供自己的第一版代码。我们直接使用基于掩码的离散扩散来实现这一点，其过程与Google的论文相同。有关模型详细信息，请参阅我们的论文InsturctTTS：https://arxiv.org/pdf/2301.13662.pdf

我们将很快更新基于MASKGIT的第二版，与SoundStorm保持一致。

概述

按照论文，我们使用HuBERT提取语义标记，然后使用语义标记作为条件并行预测所有声学标记。与SoundStorm使用求和操作来组合多个码本不同，我们使用浅层u-net来组合不同的码本。对于AudioCodec，我们使用开源的AcademiCodec https://github.com/yangdongchao/AcademiCodec

准备数据集

请参考data_sample文件夹了解如何准备数据集。

训练

首先，准备好你的数据
bash start/start.sh

推理

首先，根据你的模型修改evaluation/generate_samples_batch.py
python generate_samples_batch.py

参考文献

@article{yang2023instructtts,
  title={InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt},
  author={Yang, Dongchao and Liu, Songxiang and Huang, Rongjie and Lei, Guangzhi and Weng, Chao and Meng, Helen and Yu, Dong},
  journal={arXiv preprint arXiv:2301.13662},
  year={2023}
}

@article{google_soundstorm,
  title={SoundStorm: Efficient Parallel Audio Generation},
  author={Zal´an Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi},
  journal={arXiv preprint arXiv:2305},
  year={2023}
}

@article{yang2023hifi,
  title={HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec},
  author={Yang, Dongchao and Liu, Songxiang and Huang, Rongjie and Tian, Jinchuan and Weng, Chao and Zou, Yuexian},
  journal={arXiv preprint arXiv:2305.02765},
  year={2023}
}