Machine learning techniques for word sense disambiguation
- Escudero Bakx, Gerard
- Lluís Márquez Villodre Director/a
- Germán Rigau Claramunt Director
Universidad de defensa: Universitat Politècnica de Catalunya (UPC)
Fecha de defensa: 13 de julio de 2006
- Horacio Rodríguez Hontoria Presidente/a
- Lluís Padró Cirera Secretario/a
- Mark W. Stevenson Vocal
- Eneko Agirre Bengoa Vocal
- Walter Dealemans Vocal
Tipo: Tesis
Resumen
In the Natural Language Processing (NLP) community, Word Sense Disambiguation (WSD) has been described as the task which selects the appropriate meaning (sense) to a given word in a text or discourse where this meaning is distinguishable from other senses potentially attributable to that word, These senses could be seen as the target labels of a classification problem. That is, Machine Learning (ML) seems to be a posible way to tackle this problem. This work studies the possible application of the algorithms and techniques of the Machine Learning field in order to handle the WSD task. The first issue treated has been the adaptation of alternative ML algorithms to deal with word senses as classes. Then, a comparison of these methods is performed under the same conditions. The evaluation measures applied to compare the performances of these methods are the typical precision and recall, but also agreement rates and kappa statistics. The second topic explored is the cross-corpora application of supervised Machine Learning systems for WSD to test the generalisation ability across corpora and domains. The results obtained are very disappointing, seriously questioning the possibility of constructing a general enough training corpus (labelled or unlabelled), and the way its examples should be used to develop a general purpose Word Sense Tagger. The use of unlabelled data to train classifiers for Word Sense Disambiguation is a very challenging line of research in order to develop a really robust, complete and accurate Word Sense Tagger. Due to this fact, the next topic treated in this work is the application of two bootstrapping approaches on WSD: the Transductive Support Vector Machines and the Greedy Agreement bootstrapping algorithm by Steven Abney. During the development of this research we have been interested in the construction and evaluation of several WSD systems. We have participated in the last two editions of the English Lexical Sample task of Sen