Coreference Resolution for Morphologically Rich Languages. Adaptation of the Stanford System to Basque.

  1. Ander Soraluze
  2. Olatz Arregi
  3. Xabier Arregi
  4. Arantza Díaz de Ilarraza
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2015

Número: 55

Páginas: 23-30

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Este artículo presenta el proceso de adaptación del sistema de resolución de coreferencia de Stanford para el euskera, un idioma aglutinante, de núcleo final y pro-drop. Este sistema ha sido integrado en una cadena de análisis lingüística de manera que recibe como entrada textos procesados y analizados para el euskera. Hemos demostrado que haciendo uso de las características lingüísticas del lenguaje se puede mejorar la resolución de la coreferencia. En el caso de los lenguajes aglutinantes el uso de características morfosintácticas mejora claramente el rendimiento del sistema obteniéndose un incremento en CoNLL F1 de 5 puntos para el caso de menciones automáticas y de 7,87 puntos con menciones gold.

Referencias bibliográficas

  • Aduriz, I., M. Aranzabe, J. M. Arriola, M. Atutxa, A. Dı́az de Ilarraza, N. Ezeiza, K. Gojenola, M. Oronoz, A. Soroa, and R. Urizar. 2006. Methodology and Steps towards the Construction of EPEC, a Cor- pus of Written Basque Tagged at Morphological and Syntactic Levels for the Automatic Processing. Rodopi. Book series: Language and Computers., pages 1–15.
  • Alegria, I., O. Ansa, X. Artola, N. Ezeiza, K. Gojenola, and R. Urizar. 2004. Representation and Treatment of Multiword Expressions in Basque. In ACL workshop on Multiword Expressions, pages 48–55.
  • Alegria, I., M. Aranzabe, N. Ezeiza, A. Ezeiza, and R. Urizar. 2002. Using Finite State Technology in Natural Language Processing of Basque. In Implementation and Application of Automata, volume 2494 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pages 1–12.
  • Alegria, I., X. Artola, K. Sarasola, and M. Urkia. 1996. Automatic Morphological Analysis of Basque. Literary & Linguistic Computing, 11(4):193–203.
  • Alegria, I., N. Ezeiza, I. Fernandez, and R. Urizar. 2003. Named Entity Recognition and Classification for texts in Basque. In II Jornadas de Tratamiento y Recuperación de Información, (JOTRI 2003), pages 198–203, Madrid, Spain.
  • Bagga, A. and B. Baldwin. 1998. Algorithms for Scoring Coreference Chains. In In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563–566.
  • Bengoetxea, K. and K. Gojenola. 2010. Application of Different Techniques to Dependency Parsing of Basque. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, pages 31–39, Stroudsburg, PA, USA.
  • Björkelund, A. and R. Farkas. 2012. Data- driven Multilingual Coreference Resolution Using Resolver Stacking. In Joint Conference on EMNLP and CoNLL Shared Task, pages 49–55, Jeju Island, Korea, July. Association for Computational Linguistics.
  • Broscheit, S., M. Poesio, S. P. Ponzetto, K. J. Rodriguez, L. Romano, O. Uryupina, Y. Versley, and R. Zanoli. 2010. BART: A multilingual Anaphora Resolution System. In Proceedings of the 5th International Workshop on Semantic Evaluation, (SemEval 2010), pages 104–107, Stroudsburg, PA, USA.
  • Chen, C. and V. Ng. 2012. Combining the Best of Two Worlds: A Hybrid Approach to Multilingual Coreference Resolution. In Joint Conference on EMNLP and CoNLL: Proceedings of the Shared Task, pages 56–63.
  • Chomsky, N. 1981. Lectures on Government and Binding. Studies in generative grammar. Foris publications, Dordrecht, Cinnaminson (R.I.).
  • Fernandes, E. R., C. N. dos Santos, and R. L. Milidiú. 2012. Latent Structure Perceptron with Feature Induction for Unrestricted Coreference Resolution. In Joint Conference on EMNLP and CoNLL Shared Task, CoNLL ’12, pages 41–48, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Goenaga, I., O. Arregi, K. Ceberio, A. Dı́az de Ilarraza, and A. Jimeno. 2012. Automatic Coreference Annotation in Basque. In 11th International Workshop on Treebanks and Linguistic Theories, Lisbon, Portugal.
  • Kobdani, H. and H. Schütze. 2010. SUCRE: A Modular System for Coreference Resolution. In Proceedings of the 5th International Workshop on Semantic Evaluation, (SemEval 2010), pages 92–95, Stroudsburg, PA, USA.
  • Kopeć, M. and M. Ogrodniczuk. 2012. Cre- ating a Coreference Resolution System for Polish. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Language Resources Association (ELRA).
  • Laka, I. 1996. A Brief Grammar of Euskara, the Basque Language. http://www.ehu.es/grammar. University of the Basque Country.
  • Lee, H., A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky. 2013. Deterministic Coreference Resolu- tion Based on Entity-centric, Precisionranked Rules. Compututational Linguistics, 39(4):885–916.
  • Luo, X. 2005. On Coreference Resolu- tion Performance Metrics. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25– 32, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Müller, C. and M. Strube. 2006. Multilevel Annotation of Linguistic Data with MMAX2. In Sabine Braun, Kurt Kohn, and Joybrato Mukherjee, editors, Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods. Peter Lang, Frankfurt a.M., Germany, pages 197–214.
  • Nivre, J., J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov, and E. Marsi. 2007. MaltParser: A language- independent System for Data-driven Dependency Parsing. Natural Language Engineering, 13(2):95–135.
  • Pradhan, S., E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, and R. Weischedel. 2007. OntoNotes: A Unified Relational Semantic Representation. In Proceedings of the International Conference on Semantic Computing, (ICSC 2007), pages 517–526, Washington, DC, USA. IEEE Computer Society.
  • Pradhan, S., X. Luo, M. Recasens, E. Hovy, V. Ng, and M. Strube. 2014. Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 30–35. Association for Computational Linguistics.
  • Pradhan, S., A. Moschitti, N. Xue, O. Uryupina, and Y. Zhang. 2012. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes. In Proceedings of the Sixteenth Conference on Computational Natural Language Learning (CoNLL 2012), Jeju, Korea.
  • Pradhan, S., L. Ramshaw, M. Marcus, M. Palmer, R. Weischedel, and N. Xue. 2011. CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, (CoNLL 2011), pages 1–27, Stroudsburg, PA, USA.
  • Recasens, M. and E. Hovy. 2011. BLANC: Implementing the Rand index for coreference evaluation. Natural Language Engineering, 17(4):485–510.
  • Recasens, M., L. Màrquez, E. Sapena, M. A. Mart́ı, M. Taulé, V. Hoste, M. Poesio, and Y. Versley. 2010. SemEval-2010 task 1: Coreference Resolution in Multiple Languages. In Proceedings of the 5th International Workshop on Semantic Evaluation, (SemEval 2010), pages 1–8, Stroudsburg, PA, USA.
  • Soraluze, A., I. Alegria, O. Ansa, O. Arregi, and X. Arregi. 2011. Recognition and Classification of Numerical Entities in Basque. In RANLP, pages 764–769, Hissar, Bulgaria.
  • Soraluze, A., O. Arregi, X. Arregi, K. Ceberio, and A. Dı́az de Ilarraza. 2012. Mention Detection: First Steps in the Development of a Basque Correference Resolution System. In KONVENS 2012, The 11th Conference on Natural Language Processing, Vienna, Austria.
  • Uryupina, O. 2008. Error Analysis for Learning-based Coreference Resolution. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, May. European Language Resources Association (ELRA).
  • Uryupina, O. 2010. Corry: A System for Coreference Resolution. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 100–103, Upp- sala, Sweden, July. Association for Computational Linguistics.
  • Vilain, M., J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman. 1995. A Modeltheoretic Coreference Scoring Scheme. In Proceedings of the 6th Conference on Message Understanding, MUC6 ’95, pages 45– 52, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Zhekova, D. and S. Kübler. 2010. UBIU: A Language-independent System for Coreference Resolution. In Proceedings of the 5th International Workshop on Semantic Evaluation, (SemEval 2010), pages 96–99, Stroudsburg, PA, USA.