Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction

Credit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Emmanuel de-Graft Johnson Owusu-Ansah, Richard Doamekpor, Richard Kodzo Avuglah, Yaa Kyere Adwubi
Format:	Article
Language:	English
Published:	Accademia Piceno Aprutina dei Velati 2024-12-01
Series:	Ratio Mathematica
Subjects:	credit scoring, smote, oversampling, undersampling, class imbalance, machine learning algorithms
Online Access:	http://eiris.it/ojs/index.php/ratiomathematica/article/view/1601
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832575483357691904
author	Emmanuel de-Graft Johnson Owusu-Ansah Richard Doamekpor Richard Kodzo Avuglah Yaa Kyere Adwubi
author_facet	Emmanuel de-Graft Johnson Owusu-Ansah Richard Doamekpor Richard Kodzo Avuglah Yaa Kyere Adwubi
author_sort	Emmanuel de-Graft Johnson Owusu-Ansah
collection	DOAJ
description	Credit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in Machine Learning is an unbalanced distribution of classes within a dataset. This problem often arises in classification jobs if the distribution of classes or labels in a dataset is not uniform. To overcome this issue, just resample by adding or removing entries from the minority or majority classes. The present study looks on the efficacy of classification algorithms employing various data balancing approaches. The dataset was collected from a well-known commercial bank in Ghana. To resolve the imbalance, three data balancing approaches were used: under-sampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Findings, with the exception of the SMOTE dataset, XGBoost consistently beat the other classifiers across the other datasets in terms of AUC. Random forest, decision tree, and logistic regression all performed well and might be utilized as alternatives to XGBoost classifiers for developing credit scoring models. The findings demonstrate that classifiers trained on balanced datasets have higher sensitivity scores than those trained on the original skewed dataset, while maintaining their capacity to differentiate between defaulters and non-defaulters. This demonstrates the value of data balancing strategies in increasing models' ability to anticipate minority class occurrences, Hence, the major discovery is that oversampling outperforms under-sampling across classifiers and evaluation measures is affirmed.
format	Article
id	doaj-art-d6235b7b2ad543cb9c9ddf2b882cb5db
institution	Kabale University
issn	1592-7415 2282-8214
language	English
publishDate	2024-12-01
publisher	Accademia Piceno Aprutina dei Velati
record_format	Article
series	Ratio Mathematica
spelling	doaj-art-d6235b7b2ad543cb9c9ddf2b882cb5db2025-02-01T06:51:01ZengAccademia Piceno Aprutina dei VelatiRatio Mathematica1592-74152282-82142024-12-0153010.23755/rm.v53i0.1601949Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default PredictionEmmanuel de-Graft Johnson Owusu-Ansah0Richard Doamekpor1Richard Kodzo Avuglah2Yaa Kyere Adwubi3Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology, Kumasi .Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology, KumasiDepartment of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology, KumasiDepartment of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology, KumasiCredit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in Machine Learning is an unbalanced distribution of classes within a dataset. This problem often arises in classification jobs if the distribution of classes or labels in a dataset is not uniform. To overcome this issue, just resample by adding or removing entries from the minority or majority classes. The present study looks on the efficacy of classification algorithms employing various data balancing approaches. The dataset was collected from a well-known commercial bank in Ghana. To resolve the imbalance, three data balancing approaches were used: under-sampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Findings, with the exception of the SMOTE dataset, XGBoost consistently beat the other classifiers across the other datasets in terms of AUC. Random forest, decision tree, and logistic regression all performed well and might be utilized as alternatives to XGBoost classifiers for developing credit scoring models. The findings demonstrate that classifiers trained on balanced datasets have higher sensitivity scores than those trained on the original skewed dataset, while maintaining their capacity to differentiate between defaulters and non-defaulters. This demonstrates the value of data balancing strategies in increasing models' ability to anticipate minority class occurrences, Hence, the major discovery is that oversampling outperforms under-sampling across classifiers and evaluation measures is affirmed.http://eiris.it/ojs/index.php/ratiomathematica/article/view/1601credit scoring, smote, oversampling, undersampling, class imbalance, machine learning algorithms
spellingShingle	Emmanuel de-Graft Johnson Owusu-Ansah Richard Doamekpor Richard Kodzo Avuglah Yaa Kyere Adwubi Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction Ratio Mathematica credit scoring, smote, oversampling, undersampling, class imbalance, machine learning algorithms
title	Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_full	Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_fullStr	Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_full_unstemmed	Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_short	Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_sort	machine learning algorithms analysis of synthetic minority oversampling technique smote application to credit default prediction
topic	credit scoring, smote, oversampling, undersampling, class imbalance, machine learning algorithms
url	http://eiris.it/ojs/index.php/ratiomathematica/article/view/1601
work_keys_str_mv	AT emmanueldegraftjohnsonowusuansah machinelearningalgorithmsanalysisofsyntheticminorityoversamplingtechniquesmoteapplicationtocreditdefaultprediction AT richarddoamekpor machinelearningalgorithmsanalysisofsyntheticminorityoversamplingtechniquesmoteapplicationtocreditdefaultprediction AT richardkodzoavuglah machinelearningalgorithmsanalysisofsyntheticminorityoversamplingtechniquesmoteapplicationtocreditdefaultprediction AT yaakyereadwubi machinelearningalgorithmsanalysisofsyntheticminorityoversamplingtechniquesmoteapplicationtocreditdefaultprediction

Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction

Similar Items