awesome-speech-recognition-speech-synthesis-papers
Paper List
- Text-to-Audio
- Automatic Speech Recognition(ASR)
- Speaker Verification
- Voice Conversion(VC)
- Speech Synthesis(TTS)
- Language Modelling
- Confidence Estimates
- Music Modelling
- Interesting papers
Text to Audio
-
AudioLM: a Language Modeling Approach to Audio Generation(2022), Zalán Borsos et al. [pdf]
-
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models(2023), Haohe Liu et al. [pdf]
-
MusicLM: Generating Music From Text(2023), Andrea Agostinelli et al. [pdf]
-
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion(2023), Flavio Schneider et al. [pdf]
-
Noise2Music: Text-conditioned Music Generation with Diffusion Models(2023), Qingqing Huang et al. [pdf]
Automatic Speech Recognition
-
An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition(1982), S. E. LEVINSON et al. [pdf]
-
A Maximum Likelihood Approach to Continuous Speech Recognition(1983), LALIT R. BAHL et al. [pdf]
-
Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition(1986), Andrew K. Halberstadt. [pdf]
-
Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition(1986), Lalit R. Bahi et al. [pdf]
-
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition(1989), Lawrence R Rabiner. [pdf]
-
Phoneme recognition using time-delay neural networks(1989), Alexander H. Waibel et al. [pdf]
-
Speaker-independent phone recognition using hidden Markov models(1989), Kai-Fu Lee et al. [pdf]
-
Hidden Markov Models for Speech Recognition(1991), B. H. Juang et al. [pdf]
-
Review of Tdnn (time Delay Neural Network) Architectures for Speech Recognition(2014), Masahide Sugiyamat et al. [pdf]
-
Connectionist Speech Recognition: A Hybrid Approach(1994), Herve Bourlard et al. [pdf]
-
A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)(1997), J.G. Fiscus. [pdf]
-
Speech recognition with weighted finite-state transducers(2001), M Mohri et al. [pdf]
-
Framewise phoneme classification with bidirectional LSTM and other neural network architectures(2005), Alex Graves et al. [pdf]
-
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al. [pdf]
-
The kaldi speech recognition toolkit(2011), Daniel Povey et al. [pdf]
-
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al. [pdf]
-
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al. [pdf]
-
Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al. [pdf]
-
Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al. [pdf]
-
Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]
-
Improving deep neural networks for LVCSR using rectified linear units and dropout(2013), George E. Dahl et al. [pdf]
-
Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training(2013), Yajie Miao et al. [pdf]
-
Improvements to deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]
-
Machine Learning Paradigms for Speech Recognition: An Overview(2013), Li Deng et al. [pdf]
-
Recent advances in deep learning for speech research at Microsoft(2013), Li Deng et al. [pdf]
-
Speech recognition with deep recurrent neural networks(2013), Alex Graves et al. [pdf]
-
Convolutional deep maxout networks for phone recognition(2014), László Tóth et al. [pdf]
-
Convolutional Neural Networks for Speech Recognition(2014), Ossama Abdel-Hamid et al. [pdf]
-
Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition(2014), László Tóth. [pdf]
-
Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al. [pdf]
-
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al. [pdf]
-
First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al. [pdf]
-
Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al. [pdf]
-
Robust CNN-based speech recognition with Gabor filter kernels(2014), Shuo-Yiin Chang et al. [pdf]
-
Stochastic pooling maxout networks for low-resource speech recognition(2014), Meng Cai et al. [pdf]
-
Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al. [pdf]
-
A neural transducer(2015), N Jaitly et al. [pdf]
-
Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al. [pdf]
-
Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al. [pdf]
-
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks(2015), Tara N. Sainath et al. [pdf]
-
Deep convolutional neural networks for acoustic modeling in low resource languages(2015), William Chan et al. [pdf]
-
Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition(2015), Chao Weng et al. [pdf]
-
EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding(2015), Y Miao et al. [pdf]
-
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al. [pdf]
-
Lexicon-Free Conversational Speech Recognition with Neural Networks(2015), Andrew L. Maas et al. [pdf]
-
Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al. [pdf]
-
Advances in All-Neural Speech Recognition(2016), Geoffrey Zweig et al. [pdf]
-
Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al. [pdf]
-
End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al. [pdf]
-
Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al. [pdf]
-
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al. [pdf]
-
End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian. [pdf]
-
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al. [pdf]
-
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al. [pdf]
-
Latent Sequence Decompositions(2016), William Chan et al. [pdf]
-
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks(2016), Tara N. Sainath et al. [pdf]
-
Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition(2016), Suyoun Kim et al. [pdf]
-
Segmental Recurrent Neural Networks for End-to-End Speech Recognition(2016), Liang Lu et al.