Novel and Efficient Randomized Algorithms for Feature Selection
Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embed...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2020-09-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2020.9020005 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832573628288335872 |
---|---|
author | Zigeng Wang Xia Xiao Sanguthevar Rajasekaran |
author_facet | Zigeng Wang Xia Xiao Sanguthevar Rajasekaran |
author_sort | Zigeng Wang |
collection | DOAJ |
description | Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embedded approaches can only be applied to a small subset of machine learning models. Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost. To enhance their efficiency, many randomized algorithms have been designed. In this paper, we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection. We conduct theoretical computational complexity analysis and further explain our algorithms’ generic parallelizability. We conduct experiments on both synthetic and real datasets with different machine learning base models. Results show that, compared with existing approaches, our proposed techniques can locate a more meaningful set of features with a high efficiency. |
format | Article |
id | doaj-art-84ef381e566e41bebf74107fc29eb08a |
institution | Kabale University |
issn | 2096-0654 |
language | English |
publishDate | 2020-09-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj-art-84ef381e566e41bebf74107fc29eb08a2025-02-02T03:45:08ZengTsinghua University PressBig Data Mining and Analytics2096-06542020-09-013320822410.26599/BDMA.2020.9020005Novel and Efficient Randomized Algorithms for Feature SelectionZigeng Wang0Xia Xiao1Sanguthevar Rajasekaran2<institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>University of Connecticut</institution>, <city>Storrs</city>, <state>CT</state> <postal-code>06269</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>University of Connecticut</institution>, <city>Storrs</city>, <state>CT</state> <postal-code>06269</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>University of Connecticut</institution>, <city>Storrs</city>, <state>CT</state> <postal-code>06269</postal-code>, <country>USA</country>.Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embedded approaches can only be applied to a small subset of machine learning models. Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost. To enhance their efficiency, many randomized algorithms have been designed. In this paper, we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection. We conduct theoretical computational complexity analysis and further explain our algorithms’ generic parallelizability. We conduct experiments on both synthetic and real datasets with different machine learning base models. Results show that, compared with existing approaches, our proposed techniques can locate a more meaningful set of features with a high efficiency.https://www.sciopen.com/article/10.26599/BDMA.2020.9020005feature selectionrandomized algorithmsefficient selection |
spellingShingle | Zigeng Wang Xia Xiao Sanguthevar Rajasekaran Novel and Efficient Randomized Algorithms for Feature Selection Big Data Mining and Analytics feature selection randomized algorithms efficient selection |
title | Novel and Efficient Randomized Algorithms for Feature Selection |
title_full | Novel and Efficient Randomized Algorithms for Feature Selection |
title_fullStr | Novel and Efficient Randomized Algorithms for Feature Selection |
title_full_unstemmed | Novel and Efficient Randomized Algorithms for Feature Selection |
title_short | Novel and Efficient Randomized Algorithms for Feature Selection |
title_sort | novel and efficient randomized algorithms for feature selection |
topic | feature selection randomized algorithms efficient selection |
url | https://www.sciopen.com/article/10.26599/BDMA.2020.9020005 |
work_keys_str_mv | AT zigengwang novelandefficientrandomizedalgorithmsforfeatureselection AT xiaxiao novelandefficientrandomizedalgorithmsforfeatureselection AT sanguthevarrajasekaran novelandefficientrandomizedalgorithmsforfeatureselection |