Transforming Complex Sentences using Dependency Trees for Automatic Text Simplification in Basque

  1. Aranzabe Urruzola, María Jesús
  2. Díaz de Ilarraza Sánchez, Arantza
  3. González Dios, Itziar
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2013

Issue: 50

Pages: 61-68

Type: Article

More publications in: Procesamiento del lenguaje natural


In this paper we present a module of the Text Simplification architecture that we are implementing. Exactly, we describe the module that carries out the task of splitting sentences into clauses. This module is based on general-coverage tools. We have adapted the clause identifier in this module and we have added a algorithm based on dependency-trees to split the sentences. This way, we get simple sentences.

Bibliographic References

  • Aduriz, Itziar, Izaskun Aldezabal, Iñaki Alegria, Jose Mari Arriola, Arantza Díaz de Ilarraza, Nerea Ezeiza, and Koldo Gojenola. 2003. Finite State Applications for Basque. In EACL'2003 Workshop on Finite-State Methods in Natural Language Processing., pages 3-11.
  • Aduriz, Itziar, María Jesús Aranzabe, Jose Mari Arriola, Aitziber Atutxa, Arantza Díaz de Ilarraza, Nerea Ezeiza, Koldo Gojenola, Maite Oronoz, Aitor Soroa, and Ruben Urizar. 2006a. Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for automatic processing. Language and Computers, 56(1):1-15.
  • Aduriz, Itziar, Bertol Arrieta, Jose Mari Arriola, Arantza Díaz de Ilarraza, Elixabete Izagirre, and Ainara Ondarra. 2006b. Muga Gramatikaren Optimizazioa. Technical report, UPV/EHU/LSI/TR 9-2006.
  • Agirre, Eneko, Izaskun Aldezabal, Jone Etxeberria, Mikel Iruskieta, Elixabete Izagirre, Karmele Mendizabal, and Eli Pociello. 2006. A methodology for the joint development of the Basque WordNet and Semcor. In Proceedings of the 5th International Conference on Language Resources and Evaluations (LREC).
  • Agirre, Eneko, Iñaki Alegria, Xabier Arregi, Xabier Artola, Arantza Díaz de Ilarraza, Montse Maritxalar, Kepa Sarasola, and Miriam Urkia. 1992. Xuxen: A Spelling Checker/Corrector for Basque based in Two-Level Morphology. In Proceedings of NAACL-ANLP'92, pages 119-125.
  • Al-Subaihin, Afnan A. and Hend S. Al-Khalifa. 2011. Al-Baseet: A proposed simplification authoring tool for the Arabic language. In International Conference on Communications and Information Technology (ICCIT), pages 121-125.
  • Aldezabal, Izaskun, María Jesús Aranzabe, Arantza Díaz de Ilarraza, Ainara Estarrona, Kike Fernandez, and Larraitz Uria. 2010. EPEC-RS: EPEC (Euskararen Prozesamendurako Erreferentzia Corpusa) rol semantikoekin etiketatzeko eskuliburua. Technical report, UPV/EHU/LSI/TR 02-2010.
  • Alegria, Iñaki, María Jesús Aranzabe, Aitzol Ezeiza, Nerea Ezeiza, and Ruben Urizar. 2002. Robustness and customisation in an analyser/lemmatiser for Basque. In LREC-2002 Customizing knowledge in NLP applications workshop, pages 1-6.
  • Alegria, Iñaki, Nerea Ezeiza, Izaskun Fernandez, and Ruben Urizar. 2003. Named Entity Recognition and Classification for texts in Basque. In II Jornadas de Tratamiento y Recuperación de Información, JOTRI.
  • Aranzabe, María Jesús. 2008. Dependentziaereduan oinarritutako baliabide sintaktikoak: zuhaitz-bankua eta gramatika konputazionala. Ph.D. thesis, Euskal Filologia Saila (UPV/EHU).
  • Arrieta, Bertol. 2010. Azaleko sintaxiaren tratamendua ikasketa automatikoko tekniken bidez: euskarako kateen eta perpausen identifikazioa eta bere erabilera koma-zuzentzaile batean. Ph.D. thesis, Informatika Fakultatea (UPV-EHU).
  • Bernhard, Delphine, Louis De Viron, Véronique Moriceau, and Xavier Tannier. 2012. Question Generation for French: Collating Parsers and Paraphrasing Questions. Dialogue and Discourse, 3(2):43-74.
  • Burstein, Jill. 2009. Opportunities for Natural Language Processing Research in Education. In Computational Linguistics and Intelligent Text Processing. Springer Berlin / Heidelberg.
  • Candido, Jr., Arnaldo, Erick Maziero, Caroline Gasperin, Thiago A. S. Pardo, Lucia Specia, and Sandra M. Aluisio. 2009. Supporting the adaptation of texts for poor literacy readers: a text simplification editor for Brazilian Portuguese. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications, EdAppsNLP '09, pages 34-42. ACL.
  • Carroll, John, Guido Minnen, Darren Pearce, Yvonne Canning, Siobhan Devlin, and John Tait. 1999. Simplifying Text for Language-Impaired Readers. In 9th Conference of the European Chapter of the Association for Computational Linguistics.
  • Castro-Castro, Daniel, Rocío Lannes-Losada, Montse Maritxalar, Ianire Niebla, Celia Pérez-Marqués, Nancy C. Alamo-Suarez, and Aurora Pons-Porrata. 2008. A Multilingual Application for Automated Essay Scoring. In Lecture Notes in Advances in Artificial Intelligence - LNAI 5290 - IBERAMIA, pages 243-251. Springer New York.
  • Ezeiza, Nerea. 2002. Corpusak ustiatzeko tresna linguistikoak. Euskararen etiketatzaile morfosintaktiko sendo eta malgua. Ph.D. thesis, Informatika Fakultatea, UPV-EHU.
  • Gonzalez-Dios, Itziar. 2011. Euskarazko egitura sintaktikoen azterketa testuen sinplifikazio automatikorako: Aposizioak, erlatibozko perpausak eta denborazko perpausak. Master's thesis, University of the Basque Country.
  • Hulden, Mans. 2009. Foma: a Finite-State Compiler and Library. In EACL (Demos)'09, pages 29-32.
  • Inui, Kentaro, Atsushi Fujita, Tetsuro Takahashi, Ryu Iida, and Tomoya Iwakura. 2003. Text simplification for reading assistance: a project note. In Proceedings of the second international workshop on Paraphrasing-Volume 16, pages 9-16. ACL.
  • Iruskieta, Mikel, Arantza Díaz de Ilarraza, and Mikel Lersundi. 2011. Unidad discursiva y relaciones retóricas: un estudio acerca de las unidades de discurso en el etiquetado de un corpus en euskera. Procesamiento del Lenguaje Natural, (47).
  • Jonnalagadda, Siddhartha and Graciela Gonzalez. 2010. Sentence simplification aids protein-protein interaction extraction. Arxiv preprint arXiv:1001.4273.
  • Karlsson, Fred, Atro Voutilainen, Juha Heikkila, and Atro Anttila. 1995. Constraint Grammar, A Languageindependent System for Parsing Unrestricted Text. Mouton de Gruyter.
  • Ondarra, Ainara. 2003. Murriztapen Gramatikaren sintaxia. EUSMG optimizatzen. Esaldi-mugak. Master's thesis, Euskal Herriko Unibertsitatea.
  • Petersen, Sarah E. and Mari Ostendorf. 2007. Text Simplification for Language Learners: A Corpus Analysis. Electrical Engineering, (SLaTE):69-72.
  • Poornima, C., V. Dhanalakshmi, K.M. Anand, and KP Soman. 2011. Rule based Sentence Simplification for English to Tamil Machine Translation System. International Journal of Computer Applications, 25(8):38-42.
  • Rybing, Jonas, Christian Smith, and Annika Silvervarg. 2010. Towards a Rule Based System for Automatic Simplification of texts. In The Third Swedish Language Technology Conference (SLTC 2010).
  • Saggion, Horacio, Elena Gómez-Martínez, Esteban Etayo, Alberto Anula, and Lorena Bourg. 2011. Text Simplification in Simplext: Making Text More Accessible. Revista de la Sociedad Española para el Procesamiento del Lenguaje Natural.
  • Seretan, Violeta. 2012. Acquisition of syntactic simplification rules for french. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Association (ELRA).
  • Siddharthan, Advaith. 2006. Syntactic simplification and text cohesion. Research on Language & Computation, 4(1):77-109.
  • Siddharthan, Advaith. 2011. Text Simplification using Typed Dependencies: A Comparison of the Robustness of Different Generation Strategies. In Proceedings of the 13th European Workshop on Natural Language Generation, pages 2-11. ACL.
  • Soraluze, Ander, Olatz Arregi, Xabier Arregi, Klara Ceberio, and Arantza Díaz de Ilarraza. 2012. Mention Detection: First Steps in the Development of a Basque Coreference Resolution System. In Proceedings of KONVENS 2012 (Main track: oral presentations), pages 128-163.
  • Urizar, Ruben. 2012. Euskal lokuzioen tratamendu konputazionala. Ph.D. thesis, UPV-EHU.
  • Zhu, Zhemin, Delphine Bernhard, and Iryna Gurevych. 2010. A monolingual treebased translation model for sentence simplification. In Proceedings of The 23rd International Conference on Computational Linguistics, pages 1353-1361.