Linear Dimensionality Reduction: What Is Better?

This research paper focuses on dimensionality reduction, which is a major subproblem in any data processing operation. Dimensionality reduction based on principal components is the most used methodology. Our paper examines three heuristics, namely Kaiser’s rule, the broken stick, and the conditional...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohit Baliyan, Evgeny M. Mirkes
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/10/5/70
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This research paper focuses on dimensionality reduction, which is a major subproblem in any data processing operation. Dimensionality reduction based on principal components is the most used methodology. Our paper examines three heuristics, namely Kaiser’s rule, the broken stick, and the conditional number rule, for selecting informative principal components when using principal component analysis to reduce high-dimensional data to lower dimensions. This study uses 22 classification datasets and three classifiers, namely Fisher’s discriminant classifier, logistic regression, and K nearest neighbors, to test the effectiveness of the three heuristics. The results show that there is no universal answer to the best intrinsic dimension, but the conditional number heuristic performs better, on average. This means that the conditional number heuristic is the best candidate for automatic data pre-processing.
ISSN:2306-5729