MINTZAISistemas de Aprendizaje Profundo E2E para Traducción Automática del Habla
- Thierry Etchegoyhen
- Haritz Arzelus
- Harritxu Gete
- Aitor Alvarez
- Inma Hernaez
- Eva Navas
- Ander González-Docasal
- Jaime Osácar
- Edson Benites
- Igor Ellakuria
- Eusebi Calonge
- Maite Martin
ISSN: 1135-5948
Argitalpen urtea: 2020
Zenbakia: 65
Orrialdeak: 97-100
Mota: Artikulua
Beste argitalpen batzuk: Procesamiento del lenguaje natural
Laburpena
Speech Translation consists in translating speech in one language into text or speech in a different language. These systems have numerous applications, particularly in multilingual communities such as the European Union. The standard approach in the field involves the chaining of separate components for speech recognition, machine translation and speech synthesis. With the advances made possible by artificial neural networks and Deep Learning, training end-to-end speech translation systems has given rise to intense research and development activities in recent times. In this paper, we review the state of the art and describe project mintzai, which is being carried out in this field.
Erreferentzia bibliografikoak
- Bahdanau, D., K. Cho, y Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. En Proc. of ICLR.
- Bérard, A., O. Pietquin, C. Servan, y L. Besacier. 2016. Listen and translate: A proof of concept for end-to-end speech-to-text translation. En Proc. of NIPS.
- Casacuberta, F., H. Ney, F. J. Och, E. Vidal, J. M. Vilar, S. Barrachina, I. GarcıaVarea, D. Llorens, C. Martınez, S. Molau, F. Nevado, M. Pastor, D. Pic´o, A. Sanchis, y C. Tillmann. 2004. Some approaches to statistical and finite-state speechto-speech translation. Comput. Speech Lang., 18(1):25–47.
- Duong, L., A. Anastasopoulos, D. Chiang, S. Bird, y T. Cohn. 2016. An attentional model for speech translation without transcription. En Proc. of NAACL, páginas 949–959.
- Graves, A., A.-r. Mohamed, y G. Hinton. 2013. Speech recognition with deep recurrent neural networks. En Proc. of ICASSP, páginas 6645–6649.
- Jia, Y., R. J. Weiss, F. Biadsy, W. Macherey, M. Johnson, Z. Chen, y Y. Wu. 2019. Direct speech-to-speech translation with a sequence-to-sequence model. arXiv:1904.06037.
- Kumar, G., G. Blackwood, J. Trmal, D. Povey, y S. Khudanpur. 2015. A coarsegrained model for optimal coupling of ASR and SMT systems for speech translation. En Proc. of EMNLP, páginas 1902–1907.
- Matusov, E., S. Kanthak, y H. Ney. 2005. On the integration of speech recognition and statistical machine translation. En Proc. of Eurospeech 2005, páginas 467–474.
- Ney, H. 1999. Speech translation: Coupling of recognition and translation. En Proc. of ICASSP 1999, páginas 517–520.
- Niehues, J., R. Cattoni, S. Stuker, M. Negri, M. Turchi, E. Salesky, R. Sanabria, L. Barrault, L. Specia, y M. Federico. 2019. The IWSLT 2019 Evaluation Campaign. En Proc. of IWSLT.
- Vidal, E. 1997. Finite-state speech-to-speech translation. En Proc. of ICASSP, páginas 111–114.
- Wang, Y., R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, y R. A. Saurous. 2017. Tacotron: Towards end-to-end speech synthesis. arXiv:1703.10135.
- Weiss, R. J., J. Chorowski, N. Jaitly, Y. Wu, y Z. Chen. 2017. Sequence-to-sequence models can directly translate foreign speech. arXiv:1703.08581.