Feature Selection with Graph Mining Technology

Many real world applications have problems with high dimensionality, which existing algorithms cannot overcome. A critical data preprocessing problem is feature selection, whereby its non-scalability negatively influences both the efficiency and performance of big data applications. In this research...

Full description

Saved in:
Bibliographic Details
Main Authors: Thosini Bamunu Mudiyanselage, Yanqing Zhang
Format: Article
Language:English
Published: Tsinghua University Press 2019-06-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2018.9020032
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572804679073792
author Thosini Bamunu Mudiyanselage
Yanqing Zhang
author_facet Thosini Bamunu Mudiyanselage
Yanqing Zhang
author_sort Thosini Bamunu Mudiyanselage
collection DOAJ
description Many real world applications have problems with high dimensionality, which existing algorithms cannot overcome. A critical data preprocessing problem is feature selection, whereby its non-scalability negatively influences both the efficiency and performance of big data applications. In this research, we developed a new algorithm to reduce the dimensionality of a problem using graph-based analysis, which retains the physical meaning of the original high-dimensional feature space. Most existing feature-selection methods are based on a strong assumption that features are independent of each other. However, if the feature-selection algorithm does not take into consideration the interdependencies of the feature space, the selected data fail to correctly represent the original data. We developed a new feature-selection method to address this challenge. Our aim in this research was to examine the dependencies between features and select the optimal feature set with respect to the original data structure. Another important factor in our proposed method is that it can perform even in the absence of class labels. This is a more difficult problem that many feature-selection algorithms fail to address. In this case, they only use wrapper techniques that require a learning algorithm to select features. It is important to note that our experimental results indicates, this proposed simple ranking method performs better than other methods, independent of any particular learning algorithm used.
format Article
id doaj-art-51216d57c7ae4f39aa729063d30bc32f
institution Kabale University
issn 2096-0654
language English
publishDate 2019-06-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-51216d57c7ae4f39aa729063d30bc32f2025-02-02T06:50:33ZengTsinghua University PressBig Data Mining and Analytics2096-06542019-06-0122738210.26599/BDMA.2018.9020032Feature Selection with Graph Mining TechnologyThosini Bamunu Mudiyanselage0Yanqing Zhang1<institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>.Many real world applications have problems with high dimensionality, which existing algorithms cannot overcome. A critical data preprocessing problem is feature selection, whereby its non-scalability negatively influences both the efficiency and performance of big data applications. In this research, we developed a new algorithm to reduce the dimensionality of a problem using graph-based analysis, which retains the physical meaning of the original high-dimensional feature space. Most existing feature-selection methods are based on a strong assumption that features are independent of each other. However, if the feature-selection algorithm does not take into consideration the interdependencies of the feature space, the selected data fail to correctly represent the original data. We developed a new feature-selection method to address this challenge. Our aim in this research was to examine the dependencies between features and select the optimal feature set with respect to the original data structure. Another important factor in our proposed method is that it can perform even in the absence of class labels. This is a more difficult problem that many feature-selection algorithms fail to address. In this case, they only use wrapper techniques that require a learning algorithm to select features. It is important to note that our experimental results indicates, this proposed simple ranking method performs better than other methods, independent of any particular learning algorithm used.https://www.sciopen.com/article/10.26599/BDMA.2018.9020032graph miningnetwork embeddingbig data analysisfeature selectionhigh-dimensional data
spellingShingle Thosini Bamunu Mudiyanselage
Yanqing Zhang
Feature Selection with Graph Mining Technology
Big Data Mining and Analytics
graph mining
network embedding
big data analysis
feature selection
high-dimensional data
title Feature Selection with Graph Mining Technology
title_full Feature Selection with Graph Mining Technology
title_fullStr Feature Selection with Graph Mining Technology
title_full_unstemmed Feature Selection with Graph Mining Technology
title_short Feature Selection with Graph Mining Technology
title_sort feature selection with graph mining technology
topic graph mining
network embedding
big data analysis
feature selection
high-dimensional data
url https://www.sciopen.com/article/10.26599/BDMA.2018.9020032
work_keys_str_mv AT thosinibamunumudiyanselage featureselectionwithgraphminingtechnology
AT yanqingzhang featureselectionwithgraphminingtechnology