Treffer: Feature selection using simple and efficient machine learning models. Case studies and software tools.

Title:
Feature selection using simple and efficient machine learning models. Case studies and software tools.
Authors:
Amato, Federico1 (AUTHOR) federico.amato@unil.ch, Guignard, Fabian1 (AUTHOR), Kanevski, Mikhail1 (AUTHOR)
Source:
Geophysical Research Abstracts. 2019, Vol. 21, p1-1. 1p.
Geographic Terms:
Database:
Academic Search Index

Weitere Informationen

Feature selection (FS) of relevant variables from the original input space is a crucial issue inMachine Learning research, especially in environmental data mining. Besides reducing thedimensionality, removing irrelevant information, increasing learning accuracy andimproving the interpretability of the results, feature selection is also often used tooptimize data collection, as it identifies which kind of data are more important togather. There are three fundamental classes of FS – filters, wrappers and embedding[1]. FS can be considered either as a pre-processing step or as a dynamic processintegrated into the modelling procedure, which helps to reduce the prediction error anduncertainty.The present research deals with an experimental study of FS using both simulated dataand monthly wind speed data in Switzerland for the year 2008 collected by theMeteoSwiss meteorological network (118 stations). The raw data were embedded into athirteen-dimensional input feature space generated from the Digital Elevation Model [2]. Toidentify the relevant features to be used in the prediction of wind speed, two efficientand fast machine learning models, namely Extreme Learning Machine (ELM) [3]and General Regression Neural Network (GRNN) [4], have been implemented.An exhaustive search over the thirteen-dimensional space, giving rise to the 8191possible models, has been performed with the two algorithms for the twelve monthlydatasets. Best models were selected according to the smallest root mean squared error.Subsequently, the obtained results were independently tested by applying a Random Forestmodel and an Anisotropic General Regression Neural Network (AGRNN). Theresults obtained on the wind speed dataset confirm the idea that the best subset offeatures is changing according to the studied month/season, which agrees with aphysical understanding of the phenomenon. The future research will consider anextension of the approach to higher dimensional space and forward and backward FStechniques.The models were implemented in Python. The newly developed AGRNN code is compatiblewith the most widespread Python libraries, such as Pandas, Numpy and Scipy, and hascomplete integration with Scikit-learn, the most used Python library in machinelearning.References[1] I. Guyon and A. Elisseeff. An Introduction to Variable and Feature Selection. Journal ofMachine Learning Research 3 (2003) 1157-1182, 2003[2] S. Robert, L. Foresti, M. Kanevski. Spatial prediction of monthly wind speeds in complexterrain with adaptive general regression neural networks. International Journal ofClimatology 33 (7), 1793-1804, 2013[3] M. Leuenberger, M. Kanevski. Extreme Learning Machines for spatial environmentaldata. Computers & Geosciences 85, 64-73, 2015[4] M. Kanevski, A. Pozdnoukhov, V. Timonin. Machine Learning for Spatial EnvironmentalData: theory, applications and software. EPFL Press, 2009 [ABSTRACT FROM AUTHOR]