Kausazko koherentzia-erlazioen azterketa automatikoa euskarazko laburpen zientifikoetan

  1. Mikel Iruskieta 1
  2. Maria Jesus Aranzabe 1
  3. Arantza Diaz de Ilarraza 1
  4. Mikel Lersundi 1
  1. 1 Universidad del País Vasco/Euskal Herriko Unibertsitatea
    info

    Universidad del País Vasco/Euskal Herriko Unibertsitatea

    Lejona, España

    ROR https://ror.org/000xsnr85

Aldizkaria:
Gogoa: Euskal Herriko Unibersitateko hizkuntza, ezagutza, komunikazio eta ekintzari buruzko aldizkaria

ISSN: 1577-9424

Argitalpen urtea: 2016

Zenbakien izenburua: Xabier Arrazola Gogoan (1962-2015)

Zenbakia: 14

Orrialdeak: 45-77

Mota: Artikulua

DOI: 10.1387/GOGOA.15628 DIALNET GOOGLE SCHOLAR lock_openSarbide irekia editor

Beste argitalpen batzuk: Gogoa: Euskal Herriko Unibersitateko hizkuntza, ezagutza, komunikazio eta ekintzari buruzko aldizkaria

Garapen Iraunkorreko Helburuak

Laburpena

Detecting automatically the cause relations of a text may be useful in question answering tasks and event information extraction. The aim of this paper is to study how to detect coherence relations of the cause subgroup (CAUSE, RESULT and PURPOSE). TO achieve this aim we have used the Rhetorical Structure Theory (RST) and some automatic linguistic information from different tools developed by IXA Group. We have used a corpus of 60 scientific abstracts, the Basque RST Treebank (Iruskieta et al., 2013), of different domains: science, medicine and terminology. A linguist has annotated all the signals of that corpus and described the most important problems in such task. To report the reliability of this annotator, two linguists have annotated the signals of the cause subgroup and all the annotations were compared and evaluated. After that, a superannotator has harmonized all the signals of those cause relations. Finally, we show the most important signals for such relations.

Erreferentzia bibliografikoak

  • ALDABE, I. (2011). Automatic exercise generation based on corpora and natural language processing techniques. Unpublished doctoral dissertation, UPV/EHU, Donostia, Basque Country.
  • ALDABE, I., GONZALEZ-DIOS, I., LOPEZ-GAZPIO, I., MADRAZO, I. and MARITXALAR, M. (2013). Two approaches to generate questions in basque. Procesamiento del lenguaje natural, 51:101-108.
  • ALDEZABAL, I., ANSA, O. and ARRIETA, B., ARTOLA, X., EZEIZA, A., HERNÁNDEZ, G., LERSUNDI, M. (2001). EDBL: a general lexical basis for the automatic processing of Basque. In IRCS Workshop on linguistic databases. Philadelphia (USA).
  • ARTOLA, X., DÍAZ DE ILARRAZA, A., EZEIZA, N., GOJENOLA, K., LABAKA, G., SOLOGAISTOA, A. and SOROA, A. (2005). A framework for representing and managing linguistic annotations based on typed feature structures. In RANLP 2005.
  • ASHER, N. and LASCARIDES, A. (2003). Logics of conversation. Cambridge Univ Pr, Cambridge.
  • BENAMARA, F. and TABOADA, M. (2015). Mapping different rhetorical relation annotations: A proposal. In Proceedings of the 4th Joint Conference on Lexical and Computational Semantics (*SEM). Denver (USA).
  • BENGOETXEA, K. and GOJENOLA, K. (2007). Desarrollo de un analizador sintáctico estadístico basado en dependencias para el euskera. Procesamiento del lenguaje natural, 39:5-12.
  • DA CUNHA, I. (2013). A symbolic corpus-based approach to detect and solve the ambiguity of discourse markers. In 14th International Conference on Intelligent Text Processing and Computational Linguistics, Samos, Greece.
  • DA CUNHA, I., SANJUAN, E., TORRES-MORENO, J.-M., CABRÉ, M. and SIERRA, G. (2012). A symbolic approach for automatic detection of nuclearity and rhetorical relations among intra-sentence discourse segments in Spanish. Computational Linguistics and Intelligent Text Processing, 7181:462-474.
  • DAS, D., TABOADA, M. and MCFETRIDGE, P. (2015). RST Signalling Corpus. LDC2015T10. Distributed through the Linguistic Data Consortium.
  • DUQUE, E. (2014). Signaling causal coherence relations. Discourse Studies, 16(l):25-46.
  • EUSKALTZAINDIA (1990). Euskal gramatika. Lehen urratsak III (Lokailuak). Euskaltzaindia, Bilbo.
  • EUSKALTZAINDIA (1999). Euskal gramatika. Lehen urratsak- V (mendeko perpausak-I). Euskaltzaindia, Bilbo.
  • EUSKARA INSTITUTUA, E. (2015). Helburu perpausen bereizgarriak. Sareko Euskal Gramatika (SEG), www. ehu.eus/seg. UPV/EHU, ISBN-978-84-693-9891-3.
  • GEORG, G., HERNAULT, H., CAVAZZA, M., PRENDINGER, H. and ISHIZUKA, M. (2009). From rhetorical structures to document structure: shallow pragmatic analysis for document engineering. In 9th ACM symposium on Document engineering, pages 185-192, Munich, Germany. ACM.
  • GIRJU, R. (2003). Automatic detection of causal relations for question answering. In Proceedings of the. ACL 2003 workshop on Multilingual summarization and question answering-Volume 12, pages 76-83. Association for Computational Linguistics.
  • GÓMEZ, I. (1996). Euskararen zatiketa informazionalaren eredu baterantz. Anuario del Seminario de Filología Vasca «Julio de Urquijo», 30(1):195-218.
  • IRUSKIETA, M. (2014). Pragmatikako erlaziozko diskurtso-egitura: deskribapena eta bere ebaluazioa hizkuntzalaritza konputazionalean (a description of pragmatics rhetorical structure and its evaluation in computational linguistic). Phd-thesis, Euskal Herriko Unibertsitatea, Donostia. http: //ixa2.si.ehu.es/~jibquirm/tesia/tesi_t xostena.pdf.
  • IRUSKIETA, M., ARANZABE, M.J., DIAZ DE ILARRAZA, A., GONZALEZ, I., LERSUNDI, M. and LOPEZ DE LA CALLE, O. (2013). The RST Basque Treebank: an online search interface to check rhetorical relations. In 4th Workshop «RST and Discourse. Studies», Brasil.
  • IRUSKIETA, M., DIAZ DE ILARRAZA, A., LABAKA, G. and LERSUNDI, M. (2015a). The detection of central units in Basque scientific abstracts. In 5th Workshop «RST and Discourse Studies». Alicante, Spain.
  • IRUSKIETA, M., DIAZ DE ILARRAZA, A. and LERSUNDI, M. (2015b). Koherentziazko erlazioak: marko teorikoa eta corpusaren deskribapena, pages 345-361. Ibon Sarasola, Gorazarre. Homenatge, Homenaje. UPV/EHUren Argitalpen Zerbitzua. Bilbo.
  • IRUSKIETA, M., DIAZ DE ILARRAZA, A. and LERSUNDI, M. (2009). Correlaciones en euskera entre las relaciones retóricas y los marcadores del discurso. In Proceedings of 27th AESLA International Conference, pages 963-971, Ciudad Real, Spain.
  • IRUSKIETA, M. and ZAPIRAIN, B. (2015). EusEduSeg: a dependency-based EDU segmentation for Basque. Procesamiento del Lenguaje Natural. 55: 41-48.
  • KHOO, C.S., KORNFILT, J., MYAENG, S.H. and ODDY, R. (1998). Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary and Linguistic Computing, 13(4):177.
  • KNOTT, A. and DALE, R. (1994). Using linguistic phenomena to motivate a set of coherence relations. Discourse Processes, 18:35-35.
  • KNOTT, A. and SANDERS, T.J. (1998). The classification of coherence relations and their linguistic markers: An exploration of two languages. Journal of Pragmatics, 30(2):135-175.
  • LONGACRE, R.E. (1985). Sentences as combinations of clauses. Language typology and syntactic description, 2:235-286.
  • LOPEZ-GAZPIO, I. and MARICHALAR ANGLADA, M. (2013). Web application for reading practice. In IADAT-e2013: Proceedings of the 6th IADAT International Conference on Education, pages pp-74. IADAT-e2013. ISBN: 978-84-935915-3-3.
  • MANN, W.C. and THOMPSON, S.A. (1987). Rhetorical Structure Theory: A Theory of Text Organization. Text, 8(3):243-281.
  • MANN, W.C. and THOMPSON, S.A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse, 8(3):243-281.
  • MARCU, D. (2000). The theory and practice of discourse parsing and summarization. The MIT press, Cambridge.
  • NEDJALKOV, V. and SILNICKIJ, G. (1973). The topology of causative constructions. Folia Lingüistica, (6):273-290.
  • OATES, S.L. (1999). State of the art report on discourse markers and relations. Technical report, University of Brighton, Information Technology Research Institute.
  • ORMAZABAL, J. (2007). Kausatibo aldizkatzeak euskaraz eta inguruko hizkuntzetan, pages 643-654. Gramatika Jaietan. Patxi Goenagaren omenez. ASJUren gehigarriak. UPV/EHU. Bilbo.
  • OSINAGA, M.S. (2001). Azalpenezko testu entziklopedikoaren azterketa eta didaktika. Erein.
  • OYHARÇABAL, B. (2002). Kausazio aldizkatzea euskal aditzetan. Lapurdum, 7(VII):271-294.
  • PARDO, T.A.S. (2005). Métodos para análise discursiva automática. Master’s thesis.
  • PARDO, T.A.S. and NUNES, M.G.V. (2004). Relações Retóricas e seus Marcadores Superficiais: Análise de um Corpus de Textos Científicos em Português do Brasil. Technical Report NILC-TR-04-03.
  • PARDO, T.A.S. and SENO, E.R.M. (2005). Rhetalho: um corpus de referência anotado retoricamente. In Anais do V Encontro de Corpora, São Carlos, Brazil.
  • RISSELADA, R. and SPOOREN, W. (1998). Introduction: Discourse markers and coherence relations. Journal of Pragmatics, 30:131-134.
  • TABOADA, M. (2006). Discourse markers as signals (or not) of rhetorical relations. Journal of Pragmatics, 38(4):567-592.
  • TABOADA, M. and MANN, W.C. (2006). Rhetorical Structure Theory: looking back and moving ahead. Discourse Studies, 8(3):423-459.
  • VAN DIJK, T.A. (1998). Texto y contexto: semántica y pragmática del discurso. Cátedra.
  • WEBBER, B.L., STONE, M., JOSHI, A. and KNOTT, A. (2003). Anaphora and discourse structure. Compu tational Linguistics, 29(4): 545-587.
  • ZABALA, I. (2000). Hitz-hurrenkera euskara tekniko-zientifikoan. Ekaia: Euskal Herriko Unibertsitateko zientzi eta teknologi aldizkaria, (12):143-166.