Adaptando NMT a la traducción de pies de imagen en Wikimedia Commons para idiomas con pocos recursos

  1. Labaka Intxauspe, Gorka
  2. Alegría Loinaz, Iñaki
  3. Poncelas, Alberto
  4. Sarasola Gabiola, Kepa
  5. Dowling, Meghan
  6. Way, Andy
Procesamiento del lenguaje natural

ISSN: 1135-5948

Ano de publicación: 2019

Número: 63

Páxinas: 33-40

Tipo: Artigo

Outras publicacións en: Procesamiento del lenguaje natural


Este artículo presenta una adaptación a dominio exitosa de un sistema de Traducción automática neuronal (NMT) utilizando un corpus bilingüe creado con los pies de imagen utilizados en Wikimedia Commons para los pares de idiomas español-euskera e inglés-irlandés.

Referencias bibliográficas

  • Alegria, I., U. Cabezon, U. F. de Betono, G. Labaka, A. Mayor, K. Sarasola, and A. Zubiaga. 2013. Reciprocal enrichment between basque wikipedia and machine translation. In The People’s Web Meets NLP. Springer, pages 101–118.
  • Bahdanau, D., K. Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
  • Bojar, O., R. Chatterjee, C. Federmann, Y. Graham, B. Haddow, S. Huang, M. Huck, P. Koehn, Q. Liu, V. Logacheva, et al. 2017. Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Second Conference on Machine Translation, pages 169–214, Copenhagen, Denmark.
  • Crego, J., J. Kim, G. Klein, A. Rebollo, K. Yang, J. Senellart, E. Akhanov, P. Brunelle, A. Coquard, Y. Deng, et al. 2016. Systran’s pure neural machine translation systems. arXiv preprint arXiv:1610.05540.
  • Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the second international conference on Human Language Technology Research, pages 138– 145, San Diego, CA.
  • Dowling, M., L. Cassidy, E. Maguire, T. Lynn, A. Srivastava, and J. Judge. 2015. Tapadóir: Developing a statistical machine translation engine and associated resources for Irish. In Proceedings of the The Fourth LRL Workshop: Language Technologies in support of Less-Resourced Languages, pages 314–318, Poznan, Poland.
  • Dowling, M., T. Lynn, A. Poncelas, and A. Way. 2018. Smt versus NMT: Preliminary comparisons for Irish. In Technologies for MT of Low Resource Languages (LoResMT 2018), pages 12–20, Boston, USA.
  • Etchegoyhen, T., E. M. Garcia, A. Azpeitia, G. Labaka, I. Alegria, I. C. Etxabe, A. J. Carrera, I. E. Santos, and M. M. eta Eusebi Calonge. 2018. Neural machine translation of Basque. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation (EAMT), pages 139– 148, Alicante, Spain.
  • Hochreiter, S. and J. Schmidhuber. 1997. Long short-term memory. Neural computation, 9:1735–1780.
  • Klein, G., Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. 2017. Opennmt: Open-source toolkit for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational LinguisticsSystem Demonstrations, pages 67–72, Vancouver, Canada.
  • Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 388–395, Barcelona, Spain.
  • Labaka, G., I. Alegria, and K. Sarasola. 2016. Domain adaptation in MT using titles in wikipedia as a parallel corpus: Resources and evaluation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 2209– 2213, Portorož, Slovenia.
  • Labaka, G., C. España-Bonet, L. Màrquez, and K. Sarasola. 2014. A hybrid machine translation architecture guided by syntax. Machine translation, 28(2):91–125.
  • Luong, M.-T. and C. D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the International Workshop on Spoken Language Translation, pages 76–79, Da Nang, Vietnam.
  • Nothman, J., N. Ringland, W. Radford, T. Murphy, and J. R. Curran. 2013. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence, 194:151– 175.
  • Otero, P. G. and I. G. López. 2010. Wikipedia as multilingual source of comparable corpora. In Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, LREC, pages 21–25, Valletta, Malta.
  • Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA.
  • Poncelas, A., G. M. de Buy Wenniger, and A. Way. 2018a. Data selection with feature decay algorithms using an approximated target side. In 15th International Workshop on Spoken Language Translation, pages 173– 180, Bruges, Belgium.
  • Poncelas, A., G. M. de Buy Wenniger, and A. Way. 2018b. Feature decay algorithms for neural machine translation. In 21st Annual Conference of the European Association for Machine Translation, pages 239–248, Alacant, Spain.
  • Poncelas, A., A. Way, and K. Sarasola. 2018. The adapt system description for the iwslt 2018 basque to english translation task. In Proceedings of the 15th International Workshop on Spoken Language Translation, pages 76–82, Bruges, Belgium.
  • Sennrich, R., B. Haddow, and A. Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1715–1725, Berlin, Germany,.
  • Snover, M., B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pages 223–231, Cambridge, Massachusetts, USA.
  • Wu, Y., M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.