Awesome Simultaneous Translation
This repository collects the tookits, common datasets and paper list related to the research on Simultaneous Translation. This repository is continuously updating...
It is a great honor if this repository brings some help or reference to your research:blush: If you have any suggestions, feel free to contact me with: Shaolei Zhang zhangshaolei20z@ict.ac.cn
.
Tookits
- Fairseq: a sequence modeling toolkit, covering the machine translation, speech translation and simultaneous translation (both text-to-text and speech-to-text).
- SimulEval: a general evaluation framework for simultaneous translation on text and speech.
Datasets
- Conventional text-to-text translation datasets:
- Conventional speech-to-text translation datasets:
- MuST-C: multilingual speech-to-text translation corpus with 8 language pairs. [Link]
- Conventional speech-to-Speech translation datasets:
- CVSS: massively multilingual-to-English speech-to-speech translation corpus. [Link]
- Simultaneous interpretation datasets:
Tutorials & Talks
PACLIC 2016: The Challenge of Simultaneous Speech Translation. Anoop Sarkar. [Link]
EMNLP 2020: Simultaneous Translation. Liang Huang, Colin Cherry, Mingbo Ma, Naveen Arivazhagan, and Zhongjun He. [Link]
AMTA 2020: Simultaneous Speech Translation in Google Translate. Jeff Pitman. [Link]
Paper List
This is a paper list of Simultaneous Translation, organized by publication year.
We also collect a paper list organized by different categories. Refer to Here.
2002 | 2006 | 2007 | 2009 | 2010 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024
2002
- Translation Unit Concerning Timing of Simultaneous Translation. LREC 2002. [PDF]
2006
- Simultaneous English-Japanese Spoken Language Translation Based on Incremental Dependency Parsing and Transfer. ACL 2006. [PDF]
2007
- Simultaneous translation of lectures and speeches. Mach Translat 2007. [PDF]
2009
- End-to-End Evaluation in Simultaneous Translation. EACL 2009. [PDF]
2010
-
Stream-based Translation Models for Statistical Machine Translation. NAACL 2010. [PDF]
-
Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation. LREC 2010. [PDF]
2012
- Real-time Incremental Speech-to-Speech Translation of Dialogs. NAACL 2012. [PDF]
2013
- Incremental Segmentation and Decoding Strategies for Simultaneous Translation. IJCNLP 2013. [PDF]
2014
-
Optimizing Segmentation Strategies for Simultaneous Speech Translation. ACL 2014. [PDF]
-
Collection of a Simultaneous Translation Corpus for Comparative Analysis. IREC 2014. [PDF]
-
Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. EMNLP 2014. [PDF]
-
Towards Simultaneous Interpreting: the Timing of Incremental Machine Translation and Speech Synthesis. IWSLT 2014. [PDF]
-
Segmentation Strategies for Streaming Speech Translation. NAACL 2014. [PDF]
2015
-
Automated Simultaneous Interpretation: Hints of a Cognitive Framework for Machine Translation. HyTra 2015. [PDF]
-
Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents. ACL 2015. [PDF]
-
Syntax-based Rewriting for Simultaneous Machine Translation. EMNLP 2015. [PDF]
2016
-
An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation. WAT 2016. [PDF]
-
Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation. NAACL 2016. [PDF] [Code]
-
Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons. LREC 2016. [PDF]
-
A Prototype Automatic Simultaneous Interpretation System. COLING 2016. [PDF]
-
Simultaneous Machine Translation using Deep Reinforcement Learning. ICML 2016 [PDF]
-
Can neural Machine Translation do Simultaneous Translation? Arxiv 2016. [PDF]
2017
-
Online and Linear-Time Attention by Enforcing Monotonic Alignments. ICML 2017. [PDF] [Code]
-
Learning to Translate in Real-time with Neural Machine Translation. EACL 2017. [PDF] [Code]
2018
-
Simultaneous Translation using Optimized Segmentation. AMTA 2018. [PDF]
-
Automatic Estimation of Simultaneous Interpreter Performance. ACL 2018. [PDF] [Code]
-
Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation. NAACL 2018. [PDF] [Code]
-
Statistical Analysis of Missing Translation in Simultaneous Interpretation Using A Large-scale Bilingual Speech Corpus. LREC 2018. [PDF]
-
Prediction Improves Simultaneous Neural Machine Translation. EMNLP 2018. [PDF] [Code]
-
KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning. COLING 2018. [PDF]
2019
-
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation. ACL 2019. [PDF]
-
STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework. ACL 2019. [PDF]
-
Simultaneous Translation with Flexible Policy via Restricted Imitation Learning. ACL 2019. [PDF]
-
Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation. NAACL 2019. [PDF] [Code]
-
Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation. EMNLP 2019. [PDF]
-
Speculative Beam Search for Simultaneous Translation. EMNLP 2019. [PDF]
-
Thinking Slow about Latency Evaluation for Simultaneous Machine Translation. Arxiv 2019. [PDF]
-
DuTongChuan: Context-aware Translation Model for Simultaneous Interpreting. Arxiv 2019. [PDF]
-
Simultaneous Neural Machine Translation using Connectionist Temporal Classification. Arxiv 2019. [PDF]
2020
-
Towards Multimodal Simultaneous Neural Machine Translation. WMT 2020. [PDF] [Code]
-
Opportunistic Decoding with Timely Correction for Simultaneous Translation. ACL 2020. [PDF]
-
Simultaneous Translation Policies: From Fixed to Adaptive. ACL 2020. [PDF]
-
SimulSpeech: End-to-End Simultaneous Speech to Text Translation. ACL 2020. [PDF]
-
Learning Adaptive Segmentation Policy for Simultaneous Translation. EMNLP 2020. [PDF]
-
Simultaneous Machine Translation with Visual Context. EMNLP 2020. [PDF] [Code]
-
Direct Segmentation Models for Streaming Speech Translation. EMNLP 2020. [PDF] [Code]
-
SIMULEVAL: An Evaluation Toolkit for Simultaneous Translation. EMNLP 2020. [PDF] [Code]
-
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework. EMNLP 2020 findings. [PDF]
-
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training. EMNLP 2020 findings. [PDF]
-
A General Framework for Adaptation of Neural Machine Translation to Simultaneous Translation. AACL 2020. [PDF]
-
SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation. AACL 2020. [PDF] [Code]
-
Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation. ICASSP 2020 [PDF]
-
Efficient Wait-k Models for Simultaneous Machine Translation. InterSpeech 2020. [PDF] [Code]
-
Presenting Simultaneous Translation in Limited Space. Arxiv 2020. [PDF]
-
Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS. Arxiv 2020. [PDF]
-
Low Latency ASR for Simultaneous Speech Translation. Arxiv 2020. [PDF]
2021
-
Monotonic Simultaneous Translation with Chunk-wise Reordering and Refinement. WMT2021. [PDF]
-
Simultaneous Neural Machine Translation with Constituent Label Prediction. WMT 2021. [PDF]
-
Future-Guided Incremental Transformer for Simultaneous Translation. AAAI 2021. [PDF]
-
Studying The Impact Of Document-level Context On Simultaneous Neural Machine Translation. Machine Translation 2021. [PDF]
-
Beyond Sentence-Level End-to-End Speech Translation: Context Helps. ACL 2021. [PDF] [Code]
-
RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer. ACL 2021 findings. [PDF]
-
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR. ACL 2021 findings.