Language Recognition on Albayzin 2010 LRE using PLLR features

  1. Díez Sánchez, Mireia
  2. Varona Fernández, Amparo
  3. Peñagaricano Badiola, Mikel
  4. Rodríguez Fuentes, Luis Javier
  5. Bordel García, Germán
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2013

Issue: 51

Pages: 153-160

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

Phone Log-Likelihood Ratios (PLLR) have been recently proposed as alternative features to MFCC-SDC for iVector Spoken Language Recognition (SLR). In this paper, PLLR features are first described, and then further evidence of their usefulness for SLR tasks is provided, with a new set of experiments on the Albayzin 2010 LRE dataset, which features wide-band multi speaker TV broadcast speech on six languages: Basque, Catalan, Galician, Spanish, Portuguese and English. iVector systems built using PLLR features, computed by means of three open-source phone decoders, achieved significant relative improvements with regard to the phonotactic and MFCC-SDC iVector systems in both clean and noisy speech conditions. Fusions of PLLR systems with the phonotactic and/or the MFCC-SDC iVector systems led to improved performance, revealing that PLLR features provide complementary information in both cases

Bibliographic References

  • BenZeghiba, M. F., J. L. Gauvain, and L. Lamel. September 2009. Language Score Calibration using Adapted Gaussian Back-end. In Proceedings of Interspeech 2009, pages 2191–2194, Brighton, UK.
  • Biadsy, Fadi, Julia Hirschberg, and Daniel P. W. Ellis. 2011. Dialect and accent recognition using phonetic-segmentation supervectors. In Interspeech, pages 745–748.
  • Brümmer, N. and J. du Preez. 2006. Application-Independent Evaluation of Speaker Detection. Computer, Speech and Language, 20(2-3):230–275.
  • Brümmer, N. and D.A. van Leeuwen. 2006. On calibration of language recognition scores. In Proceedings of Odyssey - The Speaker and Language Recognition Workshop, pages 1–8.
  • Brümmer, Niko and Edward de Villiers. 2011. The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF. In Proceedings of the NIST 2011 Speaker Recognition Workshop, Atlanta (GA), USA, December.
  • Campbell, W. M., F. Richardson, and D. A. Reynolds. 2007. Language Recognition with Word Lattices and Support Vector Machines. In Proc. IEEE ICASSP, pages 15–20.
  • Dehak, N., P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet. 2011a. Front-end factor analysis for speaker verification. IEEE Transactions on ASLP, 19(4):788-798, May.
  • Dehak, N., P. A. Torres-Carrasquillo, D. A. Reynolds, and R. Dehak. 2011b. Language Recognition via i-vectors and Dimensionality Reduction. In Interspeech, pages 857–860.
  • DHaro, L.F., O. Glembek, O. Plocht, P. Matejka, M. Soufifar, R. Cordoba, and J. Cernocky. 2012. Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts. In Proceedings of the Interspeech 2012, Portland, USA.
  • Diez, M., A. Varona, M. Penagarikano, L.J. Rodríguez Fuentes, and G.Bordel. 2012. On the Use of Phone Log-Likelihood Ratios as Features in Spoken Language Recognition. In Proc. IEEE Workshop on SLT, Miami, Florida, USA.
  • Fan, R.E., K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. J. Machine Learning Research, 9:1871–1874.
  • Martínez, D., L. Burget, L. Ferrer, and N.S. Scheffer. 2012. iVector-based Prosodic System for Language Identification. In Proceedings of ICASSP, pages 4861–4864, Japan.
  • Martínez, D., O. Plchot, L. Burget, O. Glembek, and P. Matejka. 2011. Language Recognition in iVectors Space. In Proceedings of Interspeech, pages 861–864, Firenze, Italy.
  • Penagarikano, M., A. Varona, M. Diez, L.J. Rodriguez Fuentes, and G. Bordel. 2012. Study of Different Backends in a State-Ofthe-Art Language Recognition System. In Interspeech 2012, Portland, Oregon, USA, 9-13 September.
  • Penagarikano, M., A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2011. Dimensionality Reduction for Using High-Order n-grams in SVM-Based Phonotactic Language Recognition. In Interspeech, pages 853–856.
  • Plchot, O., M. Karafiát, N. Brümmer, O. Glembek, P. Matejka, and E. de Villiers J. Cernocký. 2012. Speaker vectors from Subspace Gaussian Mixture Model as complementary features for Language Identification. In Odyssey: The Speaker and Language Recognition Workshop, pages 330–333.
  • Rodriguez-Fuentes, L. J., M. Penagarikano, G. Bordel, and A. Varona. 2010. The Albayzin 2008 Language Recognition Evaluation. In Proceedings of Odyssey 2010: The Speaker and Language Recognition Workshop, pages 172–179, Brno, Czech Republic.
  • Rodriguez-Fuentes, L. J., M. Penagarikano, A. Varona, M. Diez, and G. Bordel. 2011. The Albayzin 2010 Language Recognition Evaluation. In Proceedings of Interspeech, pages 1529–1532, Firenze, Italia.
  • Rodriguez-Fuentes, L. J., M. Penagarikano, A. Varona, M. Diez, and G. Bordel. 2012. KALAKA-2: a TV broadcast speech database for the recognition of Iberian languages in clean and noisy environments. In Proceedings of the LREC, Istanbul, Turkey. RTTH, 2006. Spanish Network on Speech Technology. Web (in Spanish): http://lorien.die.upm.es/~lapiz/rtth/.
  • Schwarz, P. 2008. Phoneme recognition based on long temporal context. Ph.D. thesis, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.
  • Singer, E., P. A. Torres-Carrasquillo, T. P. Gleason, W. M. Campbell, and D. A. Reynolds. 2003. Acoustic, Phonetic and Discriminative Approaches to Automatic Language Identification. In Proceedings of Eurospeech (Interspeech), pages 1345–1348, Geneva, Switzerland.
  • Soufifar, M., S. Cumani, L. Burget, and J. Cernocky. 2012. Discriminative Classifiers for Phonotactic Language Recognition with iVectors. In Proc. IEEE ICASSP, pages 4853–4856.
  • Stolcke, A. 2002. SRILM - An extensible language modeling toolkit. In Interspeech, pages 257–286.
  • Varona, Amparo, Mikel Penagarikano, Luis Javier Rodriguez Fuentes, Mireia Diez, and Germán Bordel. 2010. Verification of the four spanish official languages on tv show recordings. In XXV Congreso de la Sociedad Espa¨ı¿12 ola para el Procesamiento de Lenguaje Natural (SEPLN), Valencia, Spain, 8-10 September.
  • Young, S., G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Lui, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. 2006. The HTK Book (for HTK Version 3.4). Entropic, Ltd., Cambridge, UK.