Reutilización del Treebank de Dependencias del Euskera para la Construcción del Gold Standard de la Sintaxis Superficial

  1. Arriola Egurrola, José María
  2. Aranzabe Urruzola, María Jesús
  3. Goenaga Azcarate, Iakes
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2013

Issue: 51

Pages: 83-90

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

The aim of the work is to profit the existing dependency Treebank EPEC-DEP (BDT) in order to build the gold standard for the surface syntax of Basque. As basic step, we make a comparative study of both formalisms, the Constraint Grammar formalism (CG) and the Dependency Grammar (DP) that have been applied on the corpus. As a result, we establish some criteria that will serve us to derive automatically the CG style syntactic function tags. Those criteria were implemented and evaluated; as a result, in the 75 % of the cases we are able to derive the CG style syntactic function tags for building the gold standard.

Bibliographic References

  • Aduriz I., Arriola J. M., Artola X., Díaz de Ilarraza A., Gojenola K., y Maritxalar M. 1997. Morphosyntactic disambiguation for Basque based on the Constraint Grammar Formalism. Proceedings of Recent Advances in NLP (RANLP97), páginas 282-288. Tzigov Chark, Bulgary.
  • Aduriz I. y Díaz de Ilarraza A. Morphosyntactic disambiguation and shallow parsing in Computational Processing of Basque. 2003. Inquiries into the lexicon-syntax relations in Basque. Bernarrd Oyharçabal (Ed.). University of the Basque Country.
  • Aduriz I., Arriola J. M., Artola X., Díaz de Ilarraza A., Gojenola K., Maritxalar M. y Urkia M. 2000. Euskararako murriztapen-gramatika: mapaketak, erregela morfosintaktikoak eta sintaktikoak. UPV/EHU/LSI/TR12-2000
  • Aduriz I., Aranzabe M. J., Arriola J. M., Atutxa A., Díaz de Ilarraza A., Ezeiza N., Gojenola K., Oronoz M., Soroa A. y Urizar R. 2006. Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. Andrew Wilson, Paul Rayson, and Dawn Archer. Corpus Linguistics Around the World. Book series: Language and Computers. Vol 56 (pag 1- 15). Rodopi Netherlands.
  • Aldezabal I., Aranzabe M.J., Díaz de Ilarraza A. y Fernández K. 2008. From Dependencies to Constituents in the Reference Corpus for the Processing of Basque. Procesamiento del Lenguaje Natural, nº 41 (2008), pp.147-154.
  • Aldezabal I., Aranzabe M. J., Arriola J. M. y Díaz de Ilarraza A. 2009. Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues Corpus Linguistics and Linguistic Theory 5-2, 241-269. Mouton de Gruyter. Berlin-New York.
  • Aranzabe, M. J. 2008. Dependentzia-ereduan oinarritutako baliabide sintaktikoak: zuhaitzbankua eta gramatika konputazionala. [Recursos sintácticos basados en la Gramática de Dependencias: Treebank y Gramática Computacionsl]. PhD Thesis, Euskal Filologia Saila (UPV/EHU).
  • Carroll J., Briscoe T. y Sanfilippo A. 1998. Parser evaluation: A survey and a new proposal. International Conference on Language Resources and Evaluations, University of Granada (Spain).
  • Gelbukh A., Torres S. y Calvo H. 2005. Transforming a Constituency Treebank into a Dependency Treebank. Procesamiento del Lenguaje Natural, (35), 145-152.
  • Karlsson F., Voutilainen A., Heikkilä J. y Anttila A. 1995. Constraint grammar: A language-independent system for parsing unrestricted text. Berlin & NewYork: Mouton de Gruyter.
  • Mille, S., Burga, A.,Vidal,V. y Wanner, L. 2009. Towards a Rich Dependency Annotation of Spanish Corpora. In Proceedings of SEPLN, SanSebastian.
  • Nilsson, J. y Hall J. 2005. Reconstruction of the Swedish Treebank Talbanken. MSI report 05067, Växjö University: School of Mathematics and Systems Engineering.
  • Uria L., Estarrona A., Aldezabal I., Aranzabe M. J., Díaz de ILarraza A. y Iruskieta M. 2009. Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque Lecture Notes in Computer Science (LNCS) nº 5449, Alexander Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing. pp 72-85. Mexico City, Mexico.
  • Voutilainen A., Purtonen T. K. y Muhonen K. 2012. Outsourcing Parsebanking: The FinnTreeBank Project. Diana Sousa, Krister Lindén, Wanjiku Nganga (Ed.), Shall we Play the Festschrift Game? : Essays on the Occasion of Lauri Carlson's 60th Birthday. pp 117-131. Springer Verlag.