Selección de unidades léxicas para reconocimiento antomático del habla continua en euskera
- Lopez de Ipiña Peña, Miren Karmele
- Ezeiza Ramos, Aitzol
- Graña, Fernando Manuel
- Zulueta Guerrero, Ekaitz
ISSN: 1135-5948
Year of publication: 2003
Issue: 31
Pages: 115-122
Type: Article
More publications in: Procesamiento del lenguaje natural
Abstract
Basque is an agglutinative language, which implies that corpus vocabulary can not be defined with words, because they grow combinationally making medium and large vocabulary tasks intractable. Pseudo-morphemes, generated with an automatic segmentation tool, could be an alternative choice for building the lexicon and the language model, for they notably reduce the vocabulary size. In Basque, there are many short and acoustically very similar morphemes. This phenomenon has to be taken into account, because the acoustic-phonetic decodification process can influence the CSR task, increasing the possibility of confusion and insertion of certain lexical units (very short units with high rates of acoustic confusion). A feasible way to deal with this problem is to avoid the segmentation of those units. The next step to improve the CSR system in Basque is the use of a language model in order to guide the recognition process.