Tectogrammar-based machine translation for English-Spanish and English-Basque
ISSN: 1135-5948
Año de publicación: 2016
Número: 56
Páginas: 73-80
Tipo: Artículo
Otras publicaciones en: Procesamiento del lenguaje natural
Resumen
We present the first attempt to build machine translation systems for the English-Spanish and English-Basque language pairs following the tectogrammar approach. Based on the English-Czech system, we describe the language-specific tools added in the analysis and synthesis steps, and the resources for bilingual transfer. Evaluation shows the potential of these systems for new languages and domains.
Referencias bibliográficas
- Agerri, R., J. Bermudez, and G. Rigau. 2014. IXA pipeline: Efficient and ready to use multilingual NLP tools. In Conference on Language Resources and Evaluation, Reykjavik.
- Aranberri, N., G. Labaka, A. DıÌaz de Ilarraza, and K. Sarasola. 2015. Exploiting portability to build an RBMT prototype for a new source language. In Proceedings of EAMT 2015, Antalya.
- Berger, A., V. Della Pietra, and S. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational linguistics, 22(1):39–71.
- Brandt, M., H. Loftsson, H. SigurthoÌrsson, and F. Tyers. 2011. Apertium-icenlp: A rule-based Icelandic to English machine translation system. In Proceedings of EAMT 2011, Leuven, Belgium.
- Crouse, M., R. Nowak, and R. Baraniuk. 1998. Wavelet-based statistical signal processing using hidden markov models. Signal Processing, IEEE Transactions, 46(4):886–902.
- Duˇsek, O. and F. Jurˇc´ıˇcek. 2013. Robust multilingual statistical morphological generation models. ACL 2013, page 158.
- Duˇsek, O., Z. Zabokrtsk´y, M. Popel, ˇ M. Majliˇs, M. Nov´ak, and D. Mareˇcek. 2012. Formemes in English-Czech deep syntactic MT. In Proceedings of WMT7, pages 267–274
- Hajiˇc, J., J. Panevov´a, E. Hajiˇcov´a, P. Sgall, P. Pajas, J. Step´anek, J. Havelka, ˇ M. Mikulov´a, Z. Zabokrtsk´y, and ˇ M. Sevcıkov´a Razımov´a. 2006. Prague ˇ dependency treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia, 98.
- Hajiˇcov´a, E. 2000. Dependency-based underlying-structure tagging of a very large Czech corpus. TAL. Traitement automatique des langues, 41(1):57–78.
- Mareˇcek, D., M. Popel, and Z. Zabokrtsk´y. ˇ 2010. Maximum entropy translation model in dependency-based MT framework. In Proceedings of WMT5 and MetricsMATR, pages 201–206. ACL.
- Mayor, A., I. Alegria, A. DıÌaz de Ilarraza, G. Labaka, M. Lersundi, and K. Sarasola. 2011. Matxin, an open-source rule-based machine translation system for Basque. Machine translation, 25(1):53–82.
- Popel, M. 2009. Ways to improve the quality of English-Czech machine translation. Master’s thesis, Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic.
- Popel, M. and Z. ZÌabokrtskyÌ. 2010. TectoMT: modular NLP framework. In Advances in natural language processing. Springer, pages 293–304.
- Sgall, P. 1967. Functional sentence perspective in a generative description. Prague studies in mathematical linguistics, 2(203-225).
- Zabokrtsk´y, Z. 2010. From treebanking ˇ to machine translation. Habilitation thesis, Charles University, Prague, Czech Republic.
- Zeman, D. 2008. Reusable tagset conversion using tagset drivers. In Proceedings of LREC, pages 213–218.
- Zeman, D., O. DusÌek, D. MarecÌek, M. Popel, L. Ramasamy, J. SÌteÌpaÌnek, Z. ZÌabokrtskyÌ, and J. HajicÌ. 2014. HamleDT: Harmonized multi-language dependency treebank. Language Resources and Evaluation, 48(4):601–637.