Random k conditional nearest neighbor for high-dimensional data
The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informat...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2025-01-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-2497.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585227443109888 |
---|---|
author | Jiaxuan Lu Hyukjun Gweon |
author_facet | Jiaxuan Lu Hyukjun Gweon |
author_sort | Jiaxuan Lu |
collection | DOAJ |
description | The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance. |
format | Article |
id | doaj-art-f9bf41fdef6e435a825d66a560d146a6 |
institution | Kabale University |
issn | 2376-5992 |
language | English |
publishDate | 2025-01-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ Computer Science |
spelling | doaj-art-f9bf41fdef6e435a825d66a560d146a62025-01-26T15:05:15ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e249710.7717/peerj-cs.2497Random k conditional nearest neighbor for high-dimensional dataJiaxuan Lu0Hyukjun Gweon1University of Western Ontario, London, ON, CanadaUniversity of Western Ontario, London, ON, CanadaThe k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.https://peerj.com/articles/cs-2497.pdfK nearest neighborHigh-dimensional dataNonparametric classification |
spellingShingle | Jiaxuan Lu Hyukjun Gweon Random k conditional nearest neighbor for high-dimensional data PeerJ Computer Science K nearest neighbor High-dimensional data Nonparametric classification |
title | Random k conditional nearest neighbor for high-dimensional data |
title_full | Random k conditional nearest neighbor for high-dimensional data |
title_fullStr | Random k conditional nearest neighbor for high-dimensional data |
title_full_unstemmed | Random k conditional nearest neighbor for high-dimensional data |
title_short | Random k conditional nearest neighbor for high-dimensional data |
title_sort | random k conditional nearest neighbor for high dimensional data |
topic | K nearest neighbor High-dimensional data Nonparametric classification |
url | https://peerj.com/articles/cs-2497.pdf |
work_keys_str_mv | AT jiaxuanlu randomkconditionalnearestneighborforhighdimensionaldata AT hyukjungweon randomkconditionalnearestneighborforhighdimensionaldata |