Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
Abstract Financial distress prediction (FDP) is a critical area of study for researchers, industry stakeholders, and regulatory authorities. However, FDP tasks present several challenges, including high-dimensional datasets, class imbalances, and the complexity of parameter optimization. These issue...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2025-01-01
|
Series: | Financial Innovation |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40854-024-00745-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832594456740626432 |
---|---|
author | Ruize Gao Shaoze Cui Yu Wang Wei Xu |
author_facet | Ruize Gao Shaoze Cui Yu Wang Wei Xu |
author_sort | Ruize Gao |
collection | DOAJ |
description | Abstract Financial distress prediction (FDP) is a critical area of study for researchers, industry stakeholders, and regulatory authorities. However, FDP tasks present several challenges, including high-dimensional datasets, class imbalances, and the complexity of parameter optimization. These issues often hinder the predictive model’s ability to accurately identify companies at high risk of financial distress. To mitigate these challenges, we introduce FinMHSPE—a novel multi-heterogeneous self-paced ensemble (MHSPE) FDP learning framework. The proposed model uses pairwise comparisons of data from multiple time frames combined with the maximum relevance and minimum redundancy method to select an optimal subset of features, effectively resolving the high dimensionality issue. Furthermore, the proposed framework incorporates the MHSPE model to iteratively identify the most informative majority class data samples, effectively addressing the class imbalance issue. To optimize the model’s parameters, we leverage the particle swarm optimization algorithm. The robustness of our proposed model is validated through extensive experiments performed on a financial dataset of Chinese listed companies. The empirical results demonstrate that the proposed model outperforms existing competing models in the field of FDP. Specifically, our FinMHSPE framework achieves the highest performance, achieving an area under the curve (AUC) value of 0.9574, considerably surpassing all existing methods. A comparative analysis of AUC values further reveals that FinMHSPE outperforms state-of-the-art approaches that rely on financial features as inputs. Furthermore, our investigation identifies several valuable features for enhancing FDP model performance, notably those associated with a company’s information and growth potential. |
format | Article |
id | doaj-art-0bfa2dc32ff94535ab5a4167fdf4f7ee |
institution | Kabale University |
issn | 2199-4730 |
language | English |
publishDate | 2025-01-01 |
publisher | SpringerOpen |
record_format | Article |
series | Financial Innovation |
spelling | doaj-art-0bfa2dc32ff94535ab5a4167fdf4f7ee2025-01-19T12:36:08ZengSpringerOpenFinancial Innovation2199-47302025-01-0111113410.1186/s40854-024-00745-wPredicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning frameworkRuize Gao0Shaoze Cui1Yu Wang2Wei Xu3Tsinghua UniversityBeijing Institute of TechnologyChongqing UniversityJiangnan UniversityAbstract Financial distress prediction (FDP) is a critical area of study for researchers, industry stakeholders, and regulatory authorities. However, FDP tasks present several challenges, including high-dimensional datasets, class imbalances, and the complexity of parameter optimization. These issues often hinder the predictive model’s ability to accurately identify companies at high risk of financial distress. To mitigate these challenges, we introduce FinMHSPE—a novel multi-heterogeneous self-paced ensemble (MHSPE) FDP learning framework. The proposed model uses pairwise comparisons of data from multiple time frames combined with the maximum relevance and minimum redundancy method to select an optimal subset of features, effectively resolving the high dimensionality issue. Furthermore, the proposed framework incorporates the MHSPE model to iteratively identify the most informative majority class data samples, effectively addressing the class imbalance issue. To optimize the model’s parameters, we leverage the particle swarm optimization algorithm. The robustness of our proposed model is validated through extensive experiments performed on a financial dataset of Chinese listed companies. The empirical results demonstrate that the proposed model outperforms existing competing models in the field of FDP. Specifically, our FinMHSPE framework achieves the highest performance, achieving an area under the curve (AUC) value of 0.9574, considerably surpassing all existing methods. A comparative analysis of AUC values further reveals that FinMHSPE outperforms state-of-the-art approaches that rely on financial features as inputs. Furthermore, our investigation identifies several valuable features for enhancing FDP model performance, notably those associated with a company’s information and growth potential.https://doi.org/10.1186/s40854-024-00745-wFinancial distress predictionFeature selectionImbalanced dataEnsemble learningParticle swarm optimization |
spellingShingle | Ruize Gao Shaoze Cui Yu Wang Wei Xu Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework Financial Innovation Financial distress prediction Feature selection Imbalanced data Ensemble learning Particle swarm optimization |
title | Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework |
title_full | Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework |
title_fullStr | Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework |
title_full_unstemmed | Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework |
title_short | Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework |
title_sort | predicting financial distress in high dimensional imbalanced datasets a multi heterogeneous self paced ensemble learning framework |
topic | Financial distress prediction Feature selection Imbalanced data Ensemble learning Particle swarm optimization |
url | https://doi.org/10.1186/s40854-024-00745-w |
work_keys_str_mv | AT ruizegao predictingfinancialdistressinhighdimensionalimbalanceddatasetsamultiheterogeneousselfpacedensemblelearningframework AT shaozecui predictingfinancialdistressinhighdimensionalimbalanceddatasetsamultiheterogeneousselfpacedensemblelearningframework AT yuwang predictingfinancialdistressinhighdimensionalimbalanceddatasetsamultiheterogeneousselfpacedensemblelearningframework AT weixu predictingfinancialdistressinhighdimensionalimbalanceddatasetsamultiheterogeneousselfpacedensemblelearningframework |