A Spoken Document Retrieval System for TV Broadcast News in Spanish and Basque

  1. Varona Fernández, Amparo
  2. Nieto, Silvia
  3. Rodríguez Fuentes, Luis Javier
  4. Peñagaricano Badiola, Mikel
  5. Bordel García, Germán
  6. Díez Sánchez, Mireia
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2011

Issue: 47

Pages: 75-83

Type: Article

More publications in: Procesamiento del lenguaje natural


This paper presents a spoken document retrieval system (Hearch) looking like a conventional search tool, which retrieves audio/video segments based on the automatic transcription of speech contents. The system consists of a back-end that captures, processes and indexes audio/video resources, and a front-end that allows to search contents, configure various modules and display performance statistics through a web interface. An early version of this tool is available (http://gtts.ehu.es/Hearch/), which searches and retrieves segments on TV broadcast news repositories in Spanish and Basque. To evaluate the performance of the system, six manually transcribed TV broadcast news in Spanish and seven in Basque have been used. An approach based on extending the query with the so called friendly terms has been proposed and evaluated, attempting to minimize the effect of errors introduced by the Automatic Speech Recognition module. This approach led to slight performance improvements.

Bibliographic References

  • Aduriz, I., E. Agirre, I. Aldezabal, I. Alegria, O. Ansa, X. Arregi, J.M. Arriola, X. Artola, A. Diaz de Ilarraza, N. Ezeiza, K. Gojenola, A. Maritxalar, M. Maritxalar, M. Oronoz, K. Sarasola, A. Soroa, R. Urizar, and M. Urkia. 1998. A Framework for the Automatic Processing of Basque. In Proceedings of LREC, Granada, Spain.
  • Alberti, C., M. Bacchiani, A. Bezman, C. Chelba, A. Drofa, H. Liao, P. Moreno, T. Power, A. Sahuguet, M. Shugrina, and O. Siohan. 2009. An audio indexing system for election video material. In Proc. of ICASSP, pages 4873–4876.
  • Atserias, Jordi, Bernardino Casas, Elisabet Comelles, Meritxell González, Lluís Padró, and Muntsa Padró. 2006. Free- Ling 1.3: Syntactic and semantic services in an open-source NLP library. In Proceedings of the LREC. Basque-Government. 2005. ADITU Program: Voice Resources in Basque. http://www.euskara.euskadi.net/r59-4572/es/contenidos/informacion/aurkezpena/es8550/presentacion.html.
  • Bordel, G., A. Casillas, M. Penagarikano, L.J. Rodriguez-Fuentes, and A. Varona. 2009. An XML Resource Definition for Spoken Document Retrieval. In Proceedings of the Iberian SLTech 2009.
  • Clements, M. and M. Gavalda. 2007. Voice/audio information retrieval: minimizing the need for human ears. In Proc. of IEEE ASRU Workshop, pages 613 –623.
  • Diez, M., M. Penagarikano, A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2011. On the use of dot scoring for speaker diarization. In Iberian Conference on Pattern Recognition and Image Analysis.
  • Frakes, W.B. and R. Baeza-Yates. 1992. Information Retrieval. Prentice Hall.
  • Glass, James R., Timothy J. Hazen, D. Scott Cyphers, Igor Malioutov, David Huynh, and Regina Barzilay. 2007. Recent progress in the MIT spoken lecture processing project. In Proc. of Interspeech, pages 2553–2556.
  • Hansen, J. H. L. et al. 2005. SpeechFind: Advances in Spoken Document Retrieval for a National Gallery of the Spoken Word. IEEE Transactions on Speech and Audio Processing, 13(5):712–730.
  • Hatcher, Erik, Otis Gospodnetic, and Mc-Candless M. 2010. Lucene in Action. Manning Publications Co. 2nd edition.
  • Jelinek, Frederick. 1999. Statistical Methods for Speech Recognition (Second Edition). Language, Speech and Communication Series. The MIT Press, Cambridge.
  • Kiranyaz, S., Ahmad Farooq Qureshi, and M. Gabbouj. 2006. A generic audio classification and segmentation approach for multimedia indexing and retrieval. IEEE Transactions on Audio, Speech, and Language Processing, 14(3):1062–1081.
  • Lee, Donghyeon and Gary Geunbae Lee. 2008. A Korean Spoken Language Document Retrieval System for Lecture Search. In SCSS.
  • Makhoul, J., F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, and A. Srivastava. 2000. Speech and Language Technologies for Audio Indexing and Retrieval. Proceedings of the IEEE, 88(8):1338–1353.
  • Mamou, Jonathan and Bhuvana Ramabhadran. 2008. Phonetic query expansion for spoken document retrieval. In Proc. Interspeech.
  • Mills, Timothy J., David Pye, Nicholas J. Hollinghurst, and Kenneth R. Wood. 2000. AT&TV: Broadcast Television and Radio Retrieval. In Proceedings of RIAO’2000, pages 1135–1144, Paris, France.
  • Moreno, Asuncion, Dolors Poch, Antonio Bonafonte, Eduardo Lleida, Joaquim Llisterri, Jose B. Marino, and Climent Nadeu. 1993. Albayzin speech database: design of the phonetic corpus. In Proc. Interspeech.
  • Ohtsuki, Katsutoshi, Katsuji Bessho, Yoshihiro Matsuo, Shoichi Matsunaga, and Yoshihiko Hayashi. 2006. Automatic Multimedia Indexing. IEEE Signal Processing Magazine, 23(2):69–78.
  • Penagarikano, M. and G. Bordel. 2005. Sautrela: A Highly Modular Open Source Speech Recognition Framework. In Proceedings of the IEEE ASRU workshop.
  • Rodriguez-Fuentes, L.J., M. Penagarikano, A. Varona, M. Diez, and G. Bordel. 2010. GTTS Systems for the Albayzin 2010 Audio Segmentation Evaluation. In VI Jornadas en Tecnologías del Habla and II Iberian SLTech Workshop, pages 419–420.
  • Siemund, R., H. Höge, S. Kunzmann, and K. Marasek. 2000. SPEECON - speech data for consumer devices. In Proc. LREC, pages 883–886.
  • Stolcke, Andreas. 2002. SRILM - an extensible language modeling toolkit. In Proceedings of ICSLP, pages 257–286.
  • Thong, J.M. Van, P.J. Moreno, B. Logan, B. Fidler, K. Maffey, and M. Moores. 2002. SpeechBot: An Experimental Speech-Based Search Engine for Multimedia Content in the Web. IEEE Transactions on Multimedia, 4(1):88–96.
  • Varona, A., Penagarikano M., Rodriguez-Fuentes L.J., M. Diez, and G. Bordel. 2010. Verification of the four Spanish official languages on TV show recordings. In XXV Congreso de la Sociedad Española para el Procesamiento de Lenguaje Natural (SEPLN), Valencia, Spain.
  • Ye, Ruizhi, Yingchun Yang, Zhenyu Shan, Yiyan Liu, and Sen Zhou. 2006. ASEKS: A P2P Audio Search Engine Based on Keyword Spotting. In Proceedings of the Eighth IEEE International Symposium on Multimedia, pages 615–620.
  • Young, S. et al. 2006. The HTK Book (Version 3.4). Cambridge, UK.