Weba euskarazko corpus gisa

  1. Igor Leturia
  2. Xabier Arregi
  3. Kepa Sarasola
Ekaia: Euskal Herriko Unibertsitateko zientzi eta teknologi aldizkaria

ISSN: 0214-9001

Datum der Publikation: 2014

Nummer: 27

Seiten: 281-296

Art: Artikel

Andere Publikationen in: Ekaia: Euskal Herriko Unibertsitateko zientzi eta teknologi aldizkaria


The Basque language. just as any other, needs text corpora to survive in the modern world and to be used normally. But Basque corpora are few and small compared to those in other major languages. This is so because other languages have made use of the "Web-as-Corpus" approach , which consists of using the web as a corpus or as a source of texts for corpora. ln this paper, we describe the research carried out in his PhD thesis by the first author, under the supervision of the other two authors, to use the web and automatic methods for Basque corpus building, and also the tools developed and the results obtained. Out of them we can conclude that the "Web-as-Corpus" approach is val id to improve the state of Basque corpora , since with the developed tools we have collected quality corpora of different types (very large general corpora, specialized corpora, comparable corpora ... ) and built a service to query the web as a Basque corpus. Many of these tools and services ha ve already been placed online for their public use.