Errores ortográficos y de competencia en textos de la web en euskera

  1. Alegría Loinaz, Iñaki
  2. Etxeberria Uztarroz, Izaskun
  3. Leturia Azkarate, Igor
Procesamiento del lenguaje natural

ISSN: 1135-5948

Ano de publicación: 2010

Número: 45

Páxinas: 137-144

Tipo: Artigo

Outras publicacións en: Procesamiento del lenguaje natural


The objective of the work presented in this paper is to estimate the quality of corpora retrieved from the Basque Web. The methodology followed is similar to that used for English and Germany by Ringlstetter et al. (2006). The main difference lies in the fact that we reuse spelling checkers for detecting errors. We think that by this way we obtain a higher error coverage and that the method can be applied to other languages with practically no manual work provided such tools are available for them. The results obtained can be useful for improving the quality of corpora obtained from the web, eliminating documents containing errors over a given threshold.

