Tectogrammar-based machine translation for English-Spanish and English-Basque

  1. Nora Aranberri
  2. Gorka Labaka
  3. Oneka Jauregi
  4. Arantza Díaz de Ilarraza
  5. Iñaki Alegría
  6. Eneko Agirre
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2016

Número: 56

Páginas: 73-80

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

We present the first attempt to build machine translation systems for the English-Spanish and English-Basque language pairs following the tectogrammar approach. Based on the English-Czech system, we describe the language-specific tools added in the analysis and synthesis steps, and the resources for bilingual transfer. Evaluation shows the potential of these systems for new languages and domains.

Referencias bibliográficas

  • Agerri, R., J. Bermudez, and G. Rigau. 2014. IXA pipeline: Efficient and ready to use multilingual NLP tools. In Conference on Language Resources and Evaluation, Reykjavik.
  • Aranberri, N., G. Labaka, A. Dı́az de Ilarraza, and K. Sarasola. 2015. Exploiting portability to build an RBMT prototype for a new source language. In Proceedings of EAMT 2015, Antalya.
  • Berger, A., V. Della Pietra, and S. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational linguistics, 22(1):39–71.
  • Brandt, M., H. Loftsson, H. Sigurthórsson, and F. Tyers. 2011. Apertium-icenlp: A rule-based Icelandic to English machine translation system. In Proceedings of EAMT 2011, Leuven, Belgium.
  • Crouse, M., R. Nowak, and R. Baraniuk. 1998. Wavelet-based statistical signal processing using hidden markov models. Signal Processing, IEEE Transactions, 46(4):886–902.
  • Duˇsek, O. and F. Jurˇc´ıˇcek. 2013. Robust multilingual statistical morphological generation models. ACL 2013, page 158.
  • Duˇsek, O., Z. Zabokrtsk´y, M. Popel, ˇ M. Majliˇs, M. Nov´ak, and D. Mareˇcek. 2012. Formemes in English-Czech deep syntactic MT. In Proceedings of WMT7, pages 267–274
  • Hajiˇc, J., J. Panevov´a, E. Hajiˇcov´a, P. Sgall, P. Pajas, J. Step´anek, J. Havelka, ˇ M. Mikulov´a, Z. Zabokrtsk´y, and ˇ M. Sevcıkov´a Razımov´a. 2006. Prague ˇ dependency treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia, 98.
  • Hajiˇcov´a, E. 2000. Dependency-based underlying-structure tagging of a very large Czech corpus. TAL. Traitement automatique des langues, 41(1):57–78.
  • Mareˇcek, D., M. Popel, and Z. Zabokrtsk´y. ˇ 2010. Maximum entropy translation model in dependency-based MT framework. In Proceedings of WMT5 and MetricsMATR, pages 201–206. ACL.
  • Mayor, A., I. Alegria, A. Dı́az de Ilarraza, G. Labaka, M. Lersundi, and K. Sarasola. 2011. Matxin, an open-source rule-based machine translation system for Basque. Machine translation, 25(1):53–82.
  • Popel, M. 2009. Ways to improve the quality of English-Czech machine translation. Master’s thesis, Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic.
  • Popel, M. and Z. Žabokrtský. 2010. TectoMT: modular NLP framework. In Advances in natural language processing. Springer, pages 293–304.
  • Sgall, P. 1967. Functional sentence perspective in a generative description. Prague studies in mathematical linguistics, 2(203-225).
  • Zabokrtsk´y, Z. 2010. From treebanking ˇ to machine translation. Habilitation thesis, Charles University, Prague, Czech Republic.
  • Zeman, D. 2008. Reusable tagset conversion using tagset drivers. In Proceedings of LREC, pages 213–218.
  • Zeman, D., O. Dušek, D. Mareček, M. Popel, L. Ramasamy, J. Štěpánek, Z. Žabokrtský, and J. Hajič. 2014. HamleDT: Harmonized multi-language dependency treebank. Language Resources and Evaluation, 48(4):601–637.