Random k conditional nearest neighbor for high-dimensional data

The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jiaxuan Lu, Hyukjun Gweon
Format:	Article
Language:	English
Published:	PeerJ Inc. 2025-01-01
Series:	PeerJ Computer Science
Subjects:	K nearest neighbor High-dimensional data Nonparametric classification
Online Access:	https://peerj.com/articles/cs-2497.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832585227443109888
author	Jiaxuan Lu Hyukjun Gweon
author_facet	Jiaxuan Lu Hyukjun Gweon
author_sort	Jiaxuan Lu
collection	DOAJ
description	The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.
format	Article
id	doaj-art-f9bf41fdef6e435a825d66a560d146a6
institution	Kabale University
issn	2376-5992
language	English
publishDate	2025-01-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj-art-f9bf41fdef6e435a825d66a560d146a62025-01-26T15:05:15ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e249710.7717/peerj-cs.2497Random k conditional nearest neighbor for high-dimensional dataJiaxuan Lu0Hyukjun Gweon1University of Western Ontario, London, ON, CanadaUniversity of Western Ontario, London, ON, CanadaThe k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.https://peerj.com/articles/cs-2497.pdfK nearest neighborHigh-dimensional dataNonparametric classification
spellingShingle	Jiaxuan Lu Hyukjun Gweon Random k conditional nearest neighbor for high-dimensional data PeerJ Computer Science K nearest neighbor High-dimensional data Nonparametric classification
title	Random k conditional nearest neighbor for high-dimensional data
title_full	Random k conditional nearest neighbor for high-dimensional data
title_fullStr	Random k conditional nearest neighbor for high-dimensional data
title_full_unstemmed	Random k conditional nearest neighbor for high-dimensional data
title_short	Random k conditional nearest neighbor for high-dimensional data
title_sort	random k conditional nearest neighbor for high dimensional data
topic	K nearest neighbor High-dimensional data Nonparametric classification
url	https://peerj.com/articles/cs-2497.pdf
work_keys_str_mv	AT jiaxuanlu randomkconditionalnearestneighborforhighdimensionaldata AT hyukjungweon randomkconditionalnearestneighborforhighdimensionaldata

Random k conditional nearest neighbor for high-dimensional data

Similar Items