gStab: An R Package for Measuring Stability of Feature Selection Methods
Abdul Wahid, Department of Mathematics and Statistics, Institute of Southern Punjab Multan, Pakistan.
Dost Muhammad Khan, Department of Statistics, Abdul Wali Khan University Mardan, Pakistan.
Corresponding Author:
Abdul Wahid (ab.wahid1996@gmail.com)
Abstract:
In this article, a new R package gStab is developed for the evaluation of stability in feature selection learning’s. The main characteristic of this approach is measuring stability within subset and among selected subsets by feature selection methods in different subsampling experiments. Hence, gStab package is applicable in more general scenarios when the feature selection methods return a constant number of features or the number of features chosen is not pre-determined by the user. Firstly, the Absolute Shrinkage and Selector Operator (LASSO) is applied for the purpose of feature selection using real-world and simulated datasets. Secondly, an average stability is computed of the feature selection of LASSO by using gStab package. We can optimize the value of hyper parameter of LASSO that results higher stability. An important conclusion is that optimizing stability can be potentially achieved without significant loss of accuracy, and can help recognizing the true underlying set of features using R package gStab.
Keywords:
Stability; Feature Selection; High-dimensional Data; Subsampling; gStab