Random k conditional nearest neighbor for high-dimensional data

The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informat...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiaxuan Lu, Hyukjun Gweon
Format: Article
Language:English
Published: PeerJ Inc. 2025-01-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2497.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585227443109888
author Jiaxuan Lu
Hyukjun Gweon
author_facet Jiaxuan Lu
Hyukjun Gweon
author_sort Jiaxuan Lu
collection DOAJ
description The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.
format Article
id doaj-art-f9bf41fdef6e435a825d66a560d146a6
institution Kabale University
issn 2376-5992
language English
publishDate 2025-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-f9bf41fdef6e435a825d66a560d146a62025-01-26T15:05:15ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e249710.7717/peerj-cs.2497Random k conditional nearest neighbor for high-dimensional dataJiaxuan Lu0Hyukjun Gweon1University of Western Ontario, London, ON, CanadaUniversity of Western Ontario, London, ON, CanadaThe k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.https://peerj.com/articles/cs-2497.pdfK nearest neighborHigh-dimensional dataNonparametric classification
spellingShingle Jiaxuan Lu
Hyukjun Gweon
Random k conditional nearest neighbor for high-dimensional data
PeerJ Computer Science
K nearest neighbor
High-dimensional data
Nonparametric classification
title Random k conditional nearest neighbor for high-dimensional data
title_full Random k conditional nearest neighbor for high-dimensional data
title_fullStr Random k conditional nearest neighbor for high-dimensional data
title_full_unstemmed Random k conditional nearest neighbor for high-dimensional data
title_short Random k conditional nearest neighbor for high-dimensional data
title_sort random k conditional nearest neighbor for high dimensional data
topic K nearest neighbor
High-dimensional data
Nonparametric classification
url https://peerj.com/articles/cs-2497.pdf
work_keys_str_mv AT jiaxuanlu randomkconditionalnearestneighborforhighdimensionaldata
AT hyukjungweon randomkconditionalnearestneighborforhighdimensionaldata