Novel and Efficient Randomized Algorithms for Feature Selection

Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embed...

Full description

Saved in:
Bibliographic Details
Main Authors: Zigeng Wang, Xia Xiao, Sanguthevar Rajasekaran
Format: Article
Language:English
Published: Tsinghua University Press 2020-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2020.9020005
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832573628288335872
author Zigeng Wang
Xia Xiao
Sanguthevar Rajasekaran
author_facet Zigeng Wang
Xia Xiao
Sanguthevar Rajasekaran
author_sort Zigeng Wang
collection DOAJ
description Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embedded approaches can only be applied to a small subset of machine learning models. Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost. To enhance their efficiency, many randomized algorithms have been designed. In this paper, we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection. We conduct theoretical computational complexity analysis and further explain our algorithms’ generic parallelizability. We conduct experiments on both synthetic and real datasets with different machine learning base models. Results show that, compared with existing approaches, our proposed techniques can locate a more meaningful set of features with a high efficiency.
format Article
id doaj-art-84ef381e566e41bebf74107fc29eb08a
institution Kabale University
issn 2096-0654
language English
publishDate 2020-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-84ef381e566e41bebf74107fc29eb08a2025-02-02T03:45:08ZengTsinghua University PressBig Data Mining and Analytics2096-06542020-09-013320822410.26599/BDMA.2020.9020005Novel and Efficient Randomized Algorithms for Feature SelectionZigeng Wang0Xia Xiao1Sanguthevar Rajasekaran2<institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>University of Connecticut</institution>, <city>Storrs</city>, <state>CT</state> <postal-code>06269</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>University of Connecticut</institution>, <city>Storrs</city>, <state>CT</state> <postal-code>06269</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>University of Connecticut</institution>, <city>Storrs</city>, <state>CT</state> <postal-code>06269</postal-code>, <country>USA</country>.Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embedded approaches can only be applied to a small subset of machine learning models. Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost. To enhance their efficiency, many randomized algorithms have been designed. In this paper, we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection. We conduct theoretical computational complexity analysis and further explain our algorithms’ generic parallelizability. We conduct experiments on both synthetic and real datasets with different machine learning base models. Results show that, compared with existing approaches, our proposed techniques can locate a more meaningful set of features with a high efficiency.https://www.sciopen.com/article/10.26599/BDMA.2020.9020005feature selectionrandomized algorithmsefficient selection
spellingShingle Zigeng Wang
Xia Xiao
Sanguthevar Rajasekaran
Novel and Efficient Randomized Algorithms for Feature Selection
Big Data Mining and Analytics
feature selection
randomized algorithms
efficient selection
title Novel and Efficient Randomized Algorithms for Feature Selection
title_full Novel and Efficient Randomized Algorithms for Feature Selection
title_fullStr Novel and Efficient Randomized Algorithms for Feature Selection
title_full_unstemmed Novel and Efficient Randomized Algorithms for Feature Selection
title_short Novel and Efficient Randomized Algorithms for Feature Selection
title_sort novel and efficient randomized algorithms for feature selection
topic feature selection
randomized algorithms
efficient selection
url https://www.sciopen.com/article/10.26599/BDMA.2020.9020005
work_keys_str_mv AT zigengwang novelandefficientrandomizedalgorithmsforfeatureselection
AT xiaxiao novelandefficientrandomizedalgorithmsforfeatureselection
AT sanguthevarrajasekaran novelandefficientrandomizedalgorithmsforfeatureselection