Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction

Credit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Emmanuel de-Graft Johnson Owusu-Ansah, Richard Doamekpor, Richard Kodzo Avuglah, Yaa Kyere Adwubi
Format:	Article
Language:	English
Published:	Accademia Piceno Aprutina dei Velati 2024-12-01
Series:	Ratio Mathematica
Subjects:	credit scoring, smote, oversampling, undersampling, class imbalance, machine learning algorithms
Online Access:	http://eiris.it/ojs/index.php/ratiomathematica/article/view/1601
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Credit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in Machine Learning is an unbalanced distribution of classes within a dataset. This problem often arises in classification jobs if the distribution of classes or labels in a dataset is not uniform. To overcome this issue, just resample by adding or removing entries from the minority or majority classes. The present study looks on the efficacy of classification algorithms employing various data balancing approaches. The dataset was collected from a well-known commercial bank in Ghana. To resolve the imbalance, three data balancing approaches were used: under-sampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Findings, with the exception of the SMOTE dataset, XGBoost consistently beat the other classifiers across the other datasets in terms of AUC. Random forest, decision tree, and logistic regression all performed well and might be utilized as alternatives to XGBoost classifiers for developing credit scoring models. The findings demonstrate that classifiers trained on balanced datasets have higher sensitivity scores than those trained on the original skewed dataset, while maintaining their capacity to differentiate between defaulters and non-defaulters. This demonstrates the value of data balancing strategies in increasing models' ability to anticipate minority class occurrences, Hence, the major discovery is that oversampling outperforms under-sampling across classifiers and evaluation measures is affirmed.
ISSN:	1592-7415 2282-8214

Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction

Similar Items