Compilación del corpus académico de noveles en euskera HARTAeus y su explotación para el estudio de la fraseología académica

  1. Aranzabe Urruzola, María Jesús
  2. Gurrutxaga, Antton
  3. Zabala, Igone
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2022

Issue: 69

Pages: 95-103

Type: Article

More publications in: Procesamiento del lenguaje natural


An academic corpus of novices was compiled for Basque, comparable to the corpus HARTA-noveles for Spanish. A list of academic Basque vocabulary, collocations and formulas were extracted from the corpus, and then they were assigned discursive functions. The ultimate objective of the HARTAes-vas project, in which this work is framed, is to design a tool to help academic writing for Basque and Spanish focused on academic lexical combinations, integrating lexicographic information and corpora.

Bibliographic References

  • Alegria, I., M.J. Aranzabe, A. Ezeiza A., N. Ezeiza, y R. Urizar R. 2002. Robustness and customisation in an analyser/lemmatiser for Basque. En Third International Conference on Language Resources and Evaluation (LREC): Customizing Knowledge in NLP Applications-Strategies, Issues and Evaluation Workshop, páginas 1-6, Las Palmas de Gran Canaria (Spain).
  • Alegría, I, A. Gurrutxaga, P. Lizaso, X. Saralegi, S. Ugartetxea, y R. Urizar. 2004. An Xml-Based Term Extraction Tool for Basque. En Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), páginas 1733-1736, Lisboa (Portugal).
  • Alonso-Ramos, M., M. García-Salido, y M. Garcia. 2017. Exploiting a Corpus to Compile a Lexical Resource for Academic Writing: Spanish Lexical Combinations. En Kosem, I., J. Kallas, C. Tiberius, S. Krek, M. Jakubíček, V. Baisa (Eds). Electronic lexicography in the 21st century. Proceedings of eLex 2017 conference, páginas 571-586, Leiden (the Netherlands).
  • Alonso-Ramos, M. y I. Zabala. 2022. HARTAes-vas: Combinaciones léxicas para una Herramienta de ayuda a la redacción de textos académicos en español y en vasco. En Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations, SEPLN, September, A Coruña (Spain).
  • Biber, D. 2006. University Language. A corpusbased study of spoken and written registers. John Benjamins, Amsterdam. Biber, D., E. Finegan, S. Johanson, S. Conrad, y G. Leech. 1999. Longman Grammar of Spoken and Written English. Longman, London.
  • Biber, D., S. Conrad, y C. Viviana. 2004. If you look at…: lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3):371-405.
  • García-Salido, M., M. García, M., Villayandre, y M. Alonso-Ramos. 2018. A Lexical Tool for Academic Writing in Spanish based on Expert and Novice Corpora. En Calzolari N. et al. (Eds). Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), páginas 260-265, Miyazaki (Japan).
  • García-Salido, M., M. Garcia, y M. Alonso- Ramos. 2019. Identifying lexical bundles for an academic writing assistant in Spanish. En Corpas Pastor, G. y R. Mitkov (Eds). Computational and Corpus-Based Phraseology. Volume 11755 of Lecture Notes in Artificial Intelligence, páginas 144- 158, Springer, Berlin.
  • García-Salido, M. 2021. Compiling an Academic Vocabulary List of Spanish. DOI: 10.13140/RG.2.2.27681.33123
  • Görlach, M. 2002. Still More Englishes. John Benjamins, Amsterdam.
  • Gotti, M. 2012. Variation in Academic Texts. En Gotti, M. (Ed). Academic Identity Traits. A Corpus Based Investigation, páginas 21- 42, Peter Lang (Switzerland).
  • Gurrutxaga, A. e I. Alegria. 2011. Automatic extraction of NV expressions in Basque: basic issues on cooccurrence techniques. En Proceedings of the Workshop on Multiword Expressions: from parsing and generation to the real world, páginas 2–7, Portland, Oregon (USA).
  • Gurrutxaga, A., I. Alegria, y X. Artola. 2018. Caracterización computacional de la idiomaticidad: aplicación a la combinación nombre+verbo en euskera. En Ruiz Miyares, L. (Ed). Estudios de Lexicología y Lexicografía. Homenaje a Eloína Miyares Bermúdez. Santiago de Cuba (Cuba).
  • Hyland, K. 2008. As can be seen: lexical bundles and disciplinary variation. English for Specific Porpuses, 27(1): 4-21.
  • Johansson Kokkinakis, S., E. Sköldberg, B. Henriksen, K. Kinn, y J. Bondi Johannessen. 2012. Developing Academic Word Lists for Swedish, Norwegian and Danish a Joint Research Project. En Fjeld, R.V. y J.M. Torjusen (Eds). Proceedings of the 15th EURALEX International Congress, páginas 563–569, University of Oslo (Norway).
  • Laurén, Ch., J. Myking, y H. Picht. 2002. Language and domains: a proposal for a domain dynamics taxonomy. LSP and Professional Communication, 2(2):23-30.
  • Paquot, M. 2018. Phraseological Competence: A Missing Component in University Entrance Language Tests? Insights from A Study of EFL Learnes’s Use of Statistical Collocations. Language Assessment Quarterly, 15(1):29-43.
  • Swales, J. 2000. Language for Specific Purposes. Annual review of Applied Linguistics, 20:59-76. Villayandre, M. 2018. “HARTA” de noveles: un corpus de español académico. CHIMERA: Revista De Corpus De Lenguas Romances Y Estudios Lingüísticos, 5(1): 131–140.
  • Zabala, I., I. San Martin, M. Lersundi, y A. Elordui. 2011. Graduate teaching of specialized registers in a language in the normalization process: Towards a comprehensive and interdisciplinary treatment of academic Basque. En Maruenda-Bataller, S. y B. Clavel-Arroita (Eds). Multiple voices in academic and professional discourse, páginas 208˗218, Cambridge Scholars (Newcastle upon Tyne, UK).
  • Zabala, I., M. Lersundi, I. Leturia, I. Manterola, y G. Santander. 2013. GARATERM: euskararen erregistro akademikoen garapenaren ikerketarako laningurunea. En Alberdi, X. y P. Salaburu (Eds). Ugarteburu terminologia jardunaldiak (V). Terminologia naturala eta terminologia planifikatua euskararen normalizazioari begira, páginas 98-114, Servicio Editorial de la UPV/EHU (Bilbao).
  • Zabala, I. 2019. The elaboration of Basque in Academic and Professional Domains. En Grenoble, L., P. Lane, y U. Røyneland (Editor-in-Chief), Igartua, I. y L. Oñederra (Basque Eds). Linguistic Minorities in Europe Online, De Gruyter Mouton.
  • Zabala, I., M.J. Aranzabe, y I. Aldezabal. 2021. Retos actuales del desarrollo y aprendizaje de los registros académicos orales y escritos del euskera. Círculo de Lingüística Aplicada a la Comunicación, 88:31-50.