Lessons from the development of a named entity recognizer for Basque

  1. Alegría Loinaz, Iñaki
  2. Arregi Uriarte, Olatz
  3. Ezeiza Ramos, Nerea
  4. Fernández, Ignacio
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Ano de publicación: 2006

Número: 36

Páxinas: 25-38

Tipo: Artigo

Outras publicacións en: Procesamiento del lenguaje natural

Resumo

This paper presents the conclusions reached from the development of a system for Named Entity recognition in written Basque. In order to obtain this recognizer we have worked with different types of classifiers, one of them based on linguistic information and others constructed using machine learning methods. Taking these classifiers as starting point, and once we explain the different attempts done with each simple method using different information sources, we present the experiments we did combining those single methods in order to improve the performance and obtain a more robust system. Finally, we explain some conclusions and lessons we have learned from all these experiments, especially useful when dealing with named entity recognition in languages others than English.