Automated machine learning to support model selection in supervised traffic forecasting
- Angarita Zapata, Juan S.
- Antonio David Masegosa Arredondo Director
- Isaac Triguero Velázquez Director/a
Universidad de defensa: Universidad de Deusto
Fecha de defensa: 25 de noviembre de 2020
- David Alejandro Pelta Mochcovsky Presidente/a
- Enrique Onieva Caracuel Secretario
- Alberto Cano Vocal
Tipo: Tesis
Resumen
Intelligent Transportation Systems announce the production of tons of hardly manageable traffic data that motivate the use of data-driven approaches, with a particular interest in Machine Learning (ML), to analyzing this data. ITS data can be used by different applications such as Traffic Forecasting (TF) schemes. Recently, TF is gaining relevance due to its ability to deal with traffic congestion through forecasting future states of different traffic measures (e.g. travel time). TF poses two main challenges to the ML paradigm. First, traffic data can be collected in multiple formats (e.g. traffic-counting measures, GPS tracks) and under different transportation circumstances (e.g. urban, freeway). These characteristics influence the performance of ML methods, and choosing the most competitive method from a set of candidates brings human effort and time costs. Second, raw traffic data usually needs to be preprocessed before being analyzed. Hence, deciding the most suitable combination of data preprocessing techniques and ML method is a time-consuming task that demands specialized ML knowledge to approach it. Automated Machine Learning (AutoML) arises as a promising approach that addresses the issues mentioned above in problem domains wherein expert ML knowledge is not always an available or affordable asset such as TF. AutoML methods have been broadly used in other areas; however, it has been underexplored in TF. The latter raises the question if general-purpose AutoML guarantees competitive results while reducing the human-time costs of ML in TF. However, current AutoML approaches suffer from issues that can also affect its performance in TF as well as in other ML problems. The optimization process to find competitive pipelines is complicated and computational costly because of the diversity of the search space and the high evaluation cost of the objective function. Alternative learning approaches (e.g. meta-learning) have been designed to try to overcome these issues, but they could not properly work on diverse datasets such as TF. Therefore, this thesis focuses on the development of new AutoML approaches more suited to specific problem domains that can also offer competitive results in TF. We present a new AutoML method for supervised problems, such as TF, with a search strategy based on the construction of ensembles from a portfolio of multiple classifiers. This AutoML mechanism can better adapt to specific problem domains using data preprocessing techniques, ML methods and raw data. The proposed method can lead to better or competitive results in the general-purpose field and TF with respect to the state-of-the-art. This is accomplished by taking advantage of the automated generation of ensembles from a predefined set of ML pipelines. The use of these multiple classifier systems significantly speed up the AutoML process, and it also opens the path towards AutoML frameworks based on ensemble strategies.