Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction

Credit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in...

Full description

Saved in:
Bibliographic Details
Main Authors: Emmanuel de-Graft Johnson Owusu-Ansah, Richard Doamekpor, Richard Kodzo Avuglah, Yaa Kyere Adwubi
Format: Article
Language:English
Published: Accademia Piceno Aprutina dei Velati 2024-12-01
Series:Ratio Mathematica
Subjects:
Online Access:http://eiris.it/ojs/index.php/ratiomathematica/article/view/1601
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575483357691904
author Emmanuel de-Graft Johnson Owusu-Ansah
Richard Doamekpor
Richard Kodzo Avuglah
Yaa Kyere Adwubi
author_facet Emmanuel de-Graft Johnson Owusu-Ansah
Richard Doamekpor
Richard Kodzo Avuglah
Yaa Kyere Adwubi
author_sort Emmanuel de-Graft Johnson Owusu-Ansah
collection DOAJ
description Credit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in Machine Learning is an unbalanced distribution of classes within a dataset. This problem often arises in classification jobs if the distribution of classes or labels in a dataset is not uniform. To overcome this issue, just resample by adding or removing entries from the minority or majority classes. The present study looks on the efficacy of classification algorithms employing various data balancing approaches. The dataset was collected from a well-known commercial bank in Ghana. To resolve the imbalance, three data balancing approaches were used: under-sampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Findings, with the exception of the SMOTE dataset, XGBoost consistently beat the other classifiers across the other datasets in terms of AUC. Random forest, decision tree, and logistic regression all performed well and might be utilized as alternatives to XGBoost classifiers for developing credit scoring models. The findings demonstrate that classifiers trained on balanced datasets have higher sensitivity scores than those trained on the original skewed dataset, while maintaining their capacity to differentiate between defaulters and non-defaulters. This demonstrates the value of data balancing strategies in increasing models' ability to anticipate minority class occurrences, Hence, the major discovery is that oversampling outperforms under-sampling across classifiers and evaluation measures is affirmed.
format Article
id doaj-art-d6235b7b2ad543cb9c9ddf2b882cb5db
institution Kabale University
issn 1592-7415
2282-8214
language English
publishDate 2024-12-01
publisher Accademia Piceno Aprutina dei Velati
record_format Article
series Ratio Mathematica
spelling doaj-art-d6235b7b2ad543cb9c9ddf2b882cb5db2025-02-01T06:51:01ZengAccademia Piceno Aprutina dei VelatiRatio Mathematica1592-74152282-82142024-12-0153010.23755/rm.v53i0.1601949Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default PredictionEmmanuel de-Graft Johnson Owusu-Ansah0Richard Doamekpor1Richard Kodzo Avuglah2Yaa Kyere Adwubi3Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology, Kumasi .Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology, KumasiDepartment of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology, KumasiDepartment of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology, KumasiCredit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in Machine Learning is an unbalanced distribution of classes within a dataset. This problem often arises in classification jobs if the distribution of classes or labels in a dataset is not uniform. To overcome this issue, just resample by adding or removing entries from the minority or majority classes. The present study looks on the efficacy of classification algorithms employing various data balancing approaches. The dataset was collected from a well-known commercial bank in Ghana. To resolve the imbalance, three data balancing approaches were used: under-sampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Findings, with the exception of the SMOTE dataset, XGBoost consistently beat the other classifiers across the other datasets in terms of AUC. Random forest, decision tree, and logistic regression all performed well and might be utilized as alternatives to XGBoost classifiers for developing credit scoring models. The findings demonstrate that classifiers trained on balanced datasets have higher sensitivity scores than those trained on the original skewed dataset, while maintaining their capacity to differentiate between defaulters and non-defaulters. This demonstrates the value of data balancing strategies in increasing models' ability to anticipate minority class occurrences, Hence, the major discovery is that oversampling outperforms under-sampling across classifiers and evaluation measures is affirmed.http://eiris.it/ojs/index.php/ratiomathematica/article/view/1601credit scoring, smote, oversampling, undersampling, class imbalance, machine learning algorithms
spellingShingle Emmanuel de-Graft Johnson Owusu-Ansah
Richard Doamekpor
Richard Kodzo Avuglah
Yaa Kyere Adwubi
Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
Ratio Mathematica
credit scoring, smote, oversampling, undersampling, class imbalance, machine learning algorithms
title Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_full Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_fullStr Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_full_unstemmed Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_short Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
title_sort machine learning algorithms analysis of synthetic minority oversampling technique smote application to credit default prediction
topic credit scoring, smote, oversampling, undersampling, class imbalance, machine learning algorithms
url http://eiris.it/ojs/index.php/ratiomathematica/article/view/1601
work_keys_str_mv AT emmanueldegraftjohnsonowusuansah machinelearningalgorithmsanalysisofsyntheticminorityoversamplingtechniquesmoteapplicationtocreditdefaultprediction
AT richarddoamekpor machinelearningalgorithmsanalysisofsyntheticminorityoversamplingtechniquesmoteapplicationtocreditdefaultprediction
AT richardkodzoavuglah machinelearningalgorithmsanalysisofsyntheticminorityoversamplingtechniquesmoteapplicationtocreditdefaultprediction
AT yaakyereadwubi machinelearningalgorithmsanalysisofsyntheticminorityoversamplingtechniquesmoteapplicationtocreditdefaultprediction