Sistema de Conversión Texto a Voz de Código Abierto Para Lenguas Ibéricas

  1. Alonso Burguera, Agustín
  2. Sainz Moncalvillo, Iñaki
  3. Erro Eslava, Daniel
  4. Navas Cordón, Eva
  5. Hernáez Rioja, Inmaculada
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Any de publicació: 2013

Número: 51

Pàgines: 169-175

Tipus: Article

Altres publicacions en: Procesamiento del lenguaje natural

Resum

This paper presents a text-to-speech system based on statistical synthesis which, for the first time, allows generating speech in any of the four official languages of Spain as well as English. Using the AhoTTS system already developed for Spanish and Basque as a starting point, we have added support for Catalan, Galician and English using the code of available open-source modules. The resulting system, named multilingual AhoTTS, has also been released as open-source and it is already being used in real applications.

Referències bibliogràfiques

  • Bonafonte, A., L. Aguilar, I. Esquerra, S. Oller, A. Moreno, 2009 "Recent Work on the FESTCAT Database for Speech Synthesis", Proc. SLTECH pp. 131-132.
  • Erro, D., I. Sainz, I. Luengo, I. Odriozola, J. Sánchez, I. Saratxaga, E. Navas, I. Hernáez, 2010, "HMM-based Speech Synthesis in Basque Language using HTS", Proc. FALA 2010 (VI Jornadas en Tecnología del Habla & II Iberian SLTech), pp. 67-70, (Vigo).
  • Erro, D., I. Sainz, E. Navas, I. Hernaez, 2011, "Improved HNM-based Vocoder for Statistical Synthesizers", Proc. Interspeech, pp. 1809-1812, (Florence).
  • Erro, D., T.C. Zorila, Y. Stylianou, E. Navas, I. Hernáez, 2013 "Statistical Synthesizer with Embedded Prosodic and Spectral Modifications to Generate Highly Intelligible Speech in Noise", Proc. Interspeech, (Lyon).
  • Hernaez, I. 1995 “Conversión de texto a voz para el euskera basada en un sintetizador de formantes”, Tesis doctoral, UPV/EHU.
  • Hunt, A., A. Black, 1996 "Unit selection in a concatenative speech synthesis system using a large speech database", Proc. ICASSP, vol. 1, pp. 373–376.
  • Kominek, J., A Black, 2004 ”The CMU Arctic speech databases”, Proc. 5th ISCA Speech Synthesis Workshop, pp 223-224, Pittsburgh, PA.
  • Ling, Z.H., L. Qin, H. Lu, Y. Gao, L.R. Dai, R.H. Wang, Y. Jiang, Z.W. Zhao, J.H. Yang, Y.J. Chen, G.P. Hu, 2007 "The USTC and iFlytek speech synthesis systems for Blizzard Challenge 2007", Proc. Blizzard Challenge Workshop, Aug .
  • Navas, E., I. Hernáez, J. Sánchez, 2002 "Basque Intonation Modelling For Text To Speech Conversion", Proc. 7th International Conference on Spoken Language Processing (ICSLP), pp. 2409-2412, Denver.
  • Navas, E., I. Hernáez, J. Sánchez, 2002 "Modelo de duración para conversión de texto a voz en euskera", Procesamiento del Lenguaje Natural, vol. 29, pp. 147-152.
  • Navas, E , 2003 “Modelado prosódico del euskera batua para conversión de texto a habla”, Tesis doctoral, UPV/EHU.
  • Pérez, J., A. Bonaforte, H.U. Hain, E. Keller, S. Breueur, J. Tian, 2006 “ECESS Inter-Module Interface Specification for Speech Synthesis”, Proceedings of LREC Conference.
  • Pitz, M., H. Ney, 2005 “Vocal tract normalization equals linear transformation in cepstral space”, IEEE Trans. Speech and Audio Process., vol. 13(5), pp. 930-944.
  • Rodríguez, E., C. García, F. Méndez, M. Gozález, C. Magariños, 2012 “Cotovía: an Open Source Text-to-Speech System for Galician and Spanish”, Proc. Iberspeech 2012 (VII Jornadas en Tecnología del Habla & III Iberian SLTech), pp. 308-315, (Madrid).
  • Rodríguez, M.A., J.G. Escalada, D. Torre, 1998 ”Conversor multilingüe para castellano, catalán, gallego y euskera”, Procesamiento del lenguaje natural, Revista nº 23 pp19-23.
  • Sainz, I., D. Erro, E. Navas, J. Adell, A. Bonafonte, 2011 "BUCEADOR Hybrid TTS for Blizzard Challenge 2011", Proc. Blizzard Challenge Workshop, (Torino).
  • Sainz, I., D. Erro, E. Navas, I. Hernáez, J. Sánchez, I. Saratxaga, I. Odriozola, I. Luengo, 2010 "Aholab Speech Synthesizers for Albayzin2010", Proc. FALA 2010 (VI Jornadas en Tecnología del Habla & II Iberian SLTech), pp. 343-347, (Vigo).
  • Sainz, I., D. Erro, E. Navas, I. Hernáez, 2011 "A Hybrid TTS Approach for Prosody and Acoustic Modules", Proc. Interspeech, pp. 333-336.
  • Sainz, I., D. Erro, E. Navas, I. Hernáez, J. Sánchez, I. Saratxaga, , 2012a "Aholab Speech Synthesizer for Albayzin 2012 Speech Synthesis Evaluation", Proc. Iberspeech 2012 (VII Jornadas en Tecnología del Habla & III Iberian SLTech), pp. 645-652, (Madrid).
  • Sainz, I., D. Erro, E. Navas, I. Hernáez, J. Sánchez, I. Saratxaga and I. Odriozola, 2012b “Versatile Speech Databases for High Quality Synthesis for Basque”, Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC'12), pp. 3308-3312.
  • Taylor, P., Black, A. and Caley, R, 1998 “The architecture of the Festival Speech Synthesis System”, Proc. 3rd ESCA Workshop on Speech Synthesis, pp. 147-151, Jenolan (Caves, Australia).
  • Zen, H., T Nose, J Yamagishi, S Sako, T Masuko, AW Black, K Tokuda, 2007 “The HMM-based speech synthesis system (HTS) version 2.0”, Proc. ISCA Workshop on Speech Synthesis (SSW6), pp. 294-299.
  • Zen, H., K. Tokuda, A. W. Black, 2009 “Statistical parametric speech synthesis”, Speech Communication, Volume 51, Issue 11, pp. 1039-1064.