Language Recognition on Albayzin 2010 LRE using PLLR features

Díez Sánchez, Mireia; Varona Fernández, Amparo; Peñagaricano Badiola, Mikel; Rodríguez Fuentes, Luis Javier; Bordel García, Germán

Language Recognition on Albayzin 2010 LRE using PLLR features

Díez Sánchez, Mireia
Varona Fernández, Amparo
Peñagaricano Badiola, Mikel
Rodríguez Fuentes, Luis Javier
Bordel García, Germán

Revista:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2013

Número: 51

Páginas: 153-160

Tipo: Artículo

DIALNET GOOGLE SCHOLAR RUA editor

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Los as´ý denominados Phone Log-Likelihood Ratios (PLLR), han sido introducidos como caracter´ýsticas alternativas a los MFCC-SDC para sistemas de Reconocimiento de la Lengua (RL) mediante iVectors. En este art´ýculo, tras una breve descripci´on de estas caracter´ýsticas, se proporcionan nuevas evidencias de su utilidad para tareas de RL, con un nuevo conjunto de experimentos sobre la base de datos Albayzin 2010 LRE, que contiene habla multi-locutor de banda ancha en seis lenguas diferentes: euskera, catal´an, gallego, espa�nol, portugu´es e ingl´es. Los sistemas de iVectors entrenados con PLLRs obtienen mejoras relativas significativas respecto a los sistemas fonot´acticos y sistemas de iVectors entrenados con caracter´ýsticas MFCC-SDC, tanto en condiciones de habla limpia como con habla ruidosa. Las fusiones de los sistemas PLLR con los sistemas fonot´acticos y/o sistemas basados en MFCC-SDC proporcionan mejoras adicionales en el rendimiento, lo que revela que las caracter´ýsticas PLLR aportan informaci´on complementaria en ambos casos

Referencias bibliográficas

BenZeghiba, M. F., J. L. Gauvain, and L. Lamel. September 2009. Language Score Calibration using Adapted Gaussian Back-end. In Proceedings of Interspeech 2009, pages 2191–2194, Brighton, UK.
Biadsy, Fadi, Julia Hirschberg, and Daniel P. W. Ellis. 2011. Dialect and accent recognition using phonetic-segmentation supervectors. In Interspeech, pages 745–748.
Brümmer, N. and J. du Preez. 2006. Application-Independent Evaluation of Speaker Detection. Computer, Speech and Language, 20(2-3):230–275.
Brümmer, N. and D.A. van Leeuwen. 2006. On calibration of language recognition scores. In Proceedings of Odyssey - The Speaker and Language Recognition Workshop, pages 1–8.
Brümmer, Niko and Edward de Villiers. 2011. The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF. In Proceedings of the NIST 2011 Speaker Recognition Workshop, Atlanta (GA), USA, December.
Campbell, W. M., F. Richardson, and D. A. Reynolds. 2007. Language Recognition with Word Lattices and Support Vector Machines. In Proc. IEEE ICASSP, pages 15–20.
Dehak, N., P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet. 2011a. Front-end factor analysis for speaker verification. IEEE Transactions on ASLP, 19(4):788-798, May.
Dehak, N., P. A. Torres-Carrasquillo, D. A. Reynolds, and R. Dehak. 2011b. Language Recognition via i-vectors and Dimensionality Reduction. In Interspeech, pages 857–860.
DHaro, L.F., O. Glembek, O. Plocht, P. Matejka, M. Soufifar, R. Cordoba, and J. Cernocky. 2012. Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts. In Proceedings of the Interspeech 2012, Portland, USA.
Diez, M., A. Varona, M. Penagarikano, L.J. Rodríguez Fuentes, and G.Bordel. 2012. On the Use of Phone Log-Likelihood Ratios as Features in Spoken Language Recognition. In Proc. IEEE Workshop on SLT, Miami, Florida, USA.
Fan, R.E., K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. J. Machine Learning Research, 9:1871–1874.
Martínez, D., L. Burget, L. Ferrer, and N.S. Scheffer. 2012. iVector-based Prosodic System for Language Identification. In Proceedings of ICASSP, pages 4861–4864, Japan.
Martínez, D., O. Plchot, L. Burget, O. Glembek, and P. Matejka. 2011. Language Recognition in iVectors Space. In Proceedings of Interspeech, pages 861–864, Firenze, Italy.
Penagarikano, M., A. Varona, M. Diez, L.J. Rodriguez Fuentes, and G. Bordel. 2012. Study of Different Backends in a State-Ofthe-Art Language Recognition System. In Interspeech 2012, Portland, Oregon, USA, 9-13 September.
Penagarikano, M., A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2011. Dimensionality Reduction for Using High-Order n-grams in SVM-Based Phonotactic Language Recognition. In Interspeech, pages 853–856.
Plchot, O., M. Karafiát, N. Brümmer, O. Glembek, P. Matejka, and E. de Villiers J. Cernocký. 2012. Speaker vectors from Subspace Gaussian Mixture Model as complementary features for Language Identification. In Odyssey: The Speaker and Language Recognition Workshop, pages 330–333.
Rodriguez-Fuentes, L. J., M. Penagarikano, G. Bordel, and A. Varona. 2010. The Albayzin 2008 Language Recognition Evaluation. In Proceedings of Odyssey 2010: The Speaker and Language Recognition Workshop, pages 172–179, Brno, Czech Republic.
Rodriguez-Fuentes, L. J., M. Penagarikano, A. Varona, M. Diez, and G. Bordel. 2011. The Albayzin 2010 Language Recognition Evaluation. In Proceedings of Interspeech, pages 1529–1532, Firenze, Italia.
Rodriguez-Fuentes, L. J., M. Penagarikano, A. Varona, M. Diez, and G. Bordel. 2012. KALAKA-2: a TV broadcast speech database for the recognition of Iberian languages in clean and noisy environments. In Proceedings of the LREC, Istanbul, Turkey. RTTH, 2006. Spanish Network on Speech Technology. Web (in Spanish): http://lorien.die.upm.es/~lapiz/rtth/.
Schwarz, P. 2008. Phoneme recognition based on long temporal context. Ph.D. thesis, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.
Singer, E., P. A. Torres-Carrasquillo, T. P. Gleason, W. M. Campbell, and D. A. Reynolds. 2003. Acoustic, Phonetic and Discriminative Approaches to Automatic Language Identification. In Proceedings of Eurospeech (Interspeech), pages 1345–1348, Geneva, Switzerland.
Soufifar, M., S. Cumani, L. Burget, and J. Cernocky. 2012. Discriminative Classifiers for Phonotactic Language Recognition with iVectors. In Proc. IEEE ICASSP, pages 4853–4856.
Stolcke, A. 2002. SRILM - An extensible language modeling toolkit. In Interspeech, pages 257–286.
Varona, Amparo, Mikel Penagarikano, Luis Javier Rodriguez Fuentes, Mireia Diez, and Germán Bordel. 2010. Verification of the four spanish official languages on tv show recordings. In XXV Congreso de la Sociedad Espa¨ı¿12 ola para el Procesamiento de Lenguaje Natural (SEPLN), Valencia, Spain, 8-10 September.
Young, S., G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Lui, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. 2006. The HTK Book (for HTK Version 3.4). Entropic, Ltd., Cambridge, UK.

Fuente de los datos: Dialnet