Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework

Abstract Financial distress prediction (FDP) is a critical area of study for researchers, industry stakeholders, and regulatory authorities. However, FDP tasks present several challenges, including high-dimensional datasets, class imbalances, and the complexity of parameter optimization. These issue...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ruize Gao, Shaoze Cui, Yu Wang, Wei Xu
Format:	Article
Language:	English
Published:	SpringerOpen 2025-01-01
Series:	Financial Innovation
Subjects:	Financial distress prediction Feature selection Imbalanced data Ensemble learning Particle swarm optimization
Online Access:	https://doi.org/10.1186/s40854-024-00745-w
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832594456740626432
author	Ruize Gao Shaoze Cui Yu Wang Wei Xu
author_facet	Ruize Gao Shaoze Cui Yu Wang Wei Xu
author_sort	Ruize Gao
collection	DOAJ
description	Abstract Financial distress prediction (FDP) is a critical area of study for researchers, industry stakeholders, and regulatory authorities. However, FDP tasks present several challenges, including high-dimensional datasets, class imbalances, and the complexity of parameter optimization. These issues often hinder the predictive model’s ability to accurately identify companies at high risk of financial distress. To mitigate these challenges, we introduce FinMHSPE—a novel multi-heterogeneous self-paced ensemble (MHSPE) FDP learning framework. The proposed model uses pairwise comparisons of data from multiple time frames combined with the maximum relevance and minimum redundancy method to select an optimal subset of features, effectively resolving the high dimensionality issue. Furthermore, the proposed framework incorporates the MHSPE model to iteratively identify the most informative majority class data samples, effectively addressing the class imbalance issue. To optimize the model’s parameters, we leverage the particle swarm optimization algorithm. The robustness of our proposed model is validated through extensive experiments performed on a financial dataset of Chinese listed companies. The empirical results demonstrate that the proposed model outperforms existing competing models in the field of FDP. Specifically, our FinMHSPE framework achieves the highest performance, achieving an area under the curve (AUC) value of 0.9574, considerably surpassing all existing methods. A comparative analysis of AUC values further reveals that FinMHSPE outperforms state-of-the-art approaches that rely on financial features as inputs. Furthermore, our investigation identifies several valuable features for enhancing FDP model performance, notably those associated with a company’s information and growth potential.
format	Article
id	doaj-art-0bfa2dc32ff94535ab5a4167fdf4f7ee
institution	Kabale University
issn	2199-4730
language	English
publishDate	2025-01-01
publisher	SpringerOpen
record_format	Article
series	Financial Innovation
spelling	doaj-art-0bfa2dc32ff94535ab5a4167fdf4f7ee2025-01-19T12:36:08ZengSpringerOpenFinancial Innovation2199-47302025-01-0111113410.1186/s40854-024-00745-wPredicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning frameworkRuize Gao0Shaoze Cui1Yu Wang2Wei Xu3Tsinghua UniversityBeijing Institute of TechnologyChongqing UniversityJiangnan UniversityAbstract Financial distress prediction (FDP) is a critical area of study for researchers, industry stakeholders, and regulatory authorities. However, FDP tasks present several challenges, including high-dimensional datasets, class imbalances, and the complexity of parameter optimization. These issues often hinder the predictive model’s ability to accurately identify companies at high risk of financial distress. To mitigate these challenges, we introduce FinMHSPE—a novel multi-heterogeneous self-paced ensemble (MHSPE) FDP learning framework. The proposed model uses pairwise comparisons of data from multiple time frames combined with the maximum relevance and minimum redundancy method to select an optimal subset of features, effectively resolving the high dimensionality issue. Furthermore, the proposed framework incorporates the MHSPE model to iteratively identify the most informative majority class data samples, effectively addressing the class imbalance issue. To optimize the model’s parameters, we leverage the particle swarm optimization algorithm. The robustness of our proposed model is validated through extensive experiments performed on a financial dataset of Chinese listed companies. The empirical results demonstrate that the proposed model outperforms existing competing models in the field of FDP. Specifically, our FinMHSPE framework achieves the highest performance, achieving an area under the curve (AUC) value of 0.9574, considerably surpassing all existing methods. A comparative analysis of AUC values further reveals that FinMHSPE outperforms state-of-the-art approaches that rely on financial features as inputs. Furthermore, our investigation identifies several valuable features for enhancing FDP model performance, notably those associated with a company’s information and growth potential.https://doi.org/10.1186/s40854-024-00745-wFinancial distress predictionFeature selectionImbalanced dataEnsemble learningParticle swarm optimization
spellingShingle	Ruize Gao Shaoze Cui Yu Wang Wei Xu Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework Financial Innovation Financial distress prediction Feature selection Imbalanced data Ensemble learning Particle swarm optimization
title	Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
title_full	Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
title_fullStr	Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
title_full_unstemmed	Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
title_short	Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
title_sort	predicting financial distress in high dimensional imbalanced datasets a multi heterogeneous self paced ensemble learning framework
topic	Financial distress prediction Feature selection Imbalanced data Ensemble learning Particle swarm optimization
url	https://doi.org/10.1186/s40854-024-00745-w
work_keys_str_mv	AT ruizegao predictingfinancialdistressinhighdimensionalimbalanceddatasetsamultiheterogeneousselfpacedensemblelearningframework AT shaozecui predictingfinancialdistressinhighdimensionalimbalanceddatasetsamultiheterogeneousselfpacedensemblelearningframework AT yuwang predictingfinancialdistressinhighdimensionalimbalanceddatasetsamultiheterogeneousselfpacedensemblelearningframework AT weixu predictingfinancialdistressinhighdimensionalimbalanceddatasetsamultiheterogeneousselfpacedensemblelearningframework

Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework

Similar Items