Machine Learning Algorithms Analysis of Synthetic Minority Oversampling Technique (SMOTE): Application to Credit Default Prediction
Credit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Accademia Piceno Aprutina dei Velati
2024-12-01
|
Series: | Ratio Mathematica |
Subjects: | |
Online Access: | http://eiris.it/ojs/index.php/ratiomathematica/article/view/1601 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Credit default prediction is an important problem in financial risk management. It aims to determine the possibility of borrowers failing on their loan commitments. However, dataset to guide Machine Learning modeling procedure for data driven support suffers from class imbalance. Class imbalance in Machine Learning is an unbalanced distribution of classes within a dataset. This problem often arises in classification jobs if the distribution of classes or labels in a dataset is not uniform. To overcome this issue, just resample by adding or removing entries from the minority or majority classes.
The present study looks on the efficacy of classification algorithms employing various data balancing approaches. The dataset was collected from a well-known commercial bank in Ghana. To resolve the imbalance, three data balancing approaches were used: under-sampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Findings, with the exception of the SMOTE dataset, XGBoost consistently beat the other classifiers across the other datasets in terms of AUC. Random forest, decision tree, and logistic regression all performed well and might be utilized as alternatives to XGBoost classifiers for developing credit scoring models. The findings demonstrate that classifiers trained on balanced datasets have higher sensitivity scores than those trained on the original skewed dataset, while maintaining their capacity to differentiate between defaulters and non-defaulters. This demonstrates the value of data balancing strategies in increasing models' ability to anticipate minority class occurrences, Hence, the major discovery is that oversampling outperforms under-sampling across classifiers and evaluation measures is affirmed. |
---|---|
ISSN: | 1592-7415 2282-8214 |