Weba euskarazko corpus gisa
- Igor Leturia
- Xabier Arregi
- Kepa Sarasola
ISSN: 0214-9001
Año de publicación: 2014
Número: 27
Páginas: 281-296
Tipo: Artículo
Otras publicaciones en: Ekaia: Euskal Herriko Unibertsitateko zientzi eta teknologi aldizkaria
Resumen
The Basque language. just as any other, needs text corpora to survive in the modern world and to be used normally. But Basque corpora are few and small compared to those in other major languages. This is so because other languages have made use of the "Web-as-Corpus" approach , which consists of using the web as a corpus or as a source of texts for corpora. ln this paper, we describe the research carried out in his PhD thesis by the first author, under the supervision of the other two authors, to use the web and automatic methods for Basque corpus building, and also the tools developed and the results obtained. Out of them we can conclude that the "Web-as-Corpus" approach is val id to improve the state of Basque corpora , since with the developed tools we have collected quality corpora of different types (very large general corpora, specialized corpora, comparable corpora ... ) and built a service to query the web as a Basque corpus. Many of these tools and services ha ve already been placed online for their public use.