Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
Code smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a signif...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10844273/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586896796024832 |
---|---|
author | Esraa Hamouda Abeer El-Korany Soha Makady |
author_facet | Esraa Hamouda Abeer El-Korany Soha Makady |
author_sort | Esraa Hamouda |
collection | DOAJ |
description | Code smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a significant impact on source code quality. Recent surveys have highlighted a group of code smells that has rarely been studied by researchers. Furthermore, some machine learning classification models were evaluated on a subset of the source code features while ignoring significant features during classification. This paper proposes a novel approach, called Smell-ML, for detecting five rarely studied code smells: Middle Man (MM), Class Data Should Be Private (CDSBP), Inappropriate Intimacy (II), Refused Bequest (RB), and Speculative Generality (SG). The novelty of this approach stems from the improvement in both the data preparation and classification phases. During data preparation, Smell-ML relies on data balancing and an extended source code feature list to improve accuracy. In the classification phase, different classifiers were assessed, including traditional, ensemble, and multi-level classifiers. We evaluated Smell-ML on a dataset composed of 13 open source Java projects with 125 versions per project. The results show that Smell-ML’s detection F1-score values surpass those of previous studies with significant improvements across various code smells. The F1-score measure of the 11 machine learning classifiers improved after using the extended feature list. Data balancing and multi-level classification notably boosted accuracy. |
format | Article |
id | doaj-art-e74ef17004cd4af98cda57b4103b9f19 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-e74ef17004cd4af98cda57b4103b9f192025-01-25T00:01:46ZengIEEEIEEE Access2169-35362025-01-0113129661298010.1109/ACCESS.2025.353092710844273Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code SmellsEsraa Hamouda0https://orcid.org/0009-0001-3921-1206Abeer El-Korany1https://orcid.org/0000-0003-3626-7850Soha Makady2https://orcid.org/0000-0002-3330-6204Faculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptFaculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptFaculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptCode smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a significant impact on source code quality. Recent surveys have highlighted a group of code smells that has rarely been studied by researchers. Furthermore, some machine learning classification models were evaluated on a subset of the source code features while ignoring significant features during classification. This paper proposes a novel approach, called Smell-ML, for detecting five rarely studied code smells: Middle Man (MM), Class Data Should Be Private (CDSBP), Inappropriate Intimacy (II), Refused Bequest (RB), and Speculative Generality (SG). The novelty of this approach stems from the improvement in both the data preparation and classification phases. During data preparation, Smell-ML relies on data balancing and an extended source code feature list to improve accuracy. In the classification phase, different classifiers were assessed, including traditional, ensemble, and multi-level classifiers. We evaluated Smell-ML on a dataset composed of 13 open source Java projects with 125 versions per project. The results show that Smell-ML’s detection F1-score values surpass those of previous studies with significant improvements across various code smells. The F1-score measure of the 11 machine learning classifiers improved after using the extended feature list. Data balancing and multi-level classification notably boosted accuracy.https://ieeexplore.ieee.org/document/10844273/Code smell detectionmachine learningdata balancingensemble learningmulti-level classification |
spellingShingle | Esraa Hamouda Abeer El-Korany Soha Makady Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells IEEE Access Code smell detection machine learning data balancing ensemble learning multi-level classification |
title | Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells |
title_full | Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells |
title_fullStr | Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells |
title_full_unstemmed | Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells |
title_short | Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells |
title_sort | smell ml a machine learning framework for detecting rarely studied code smells |
topic | Code smell detection machine learning data balancing ensemble learning multi-level classification |
url | https://ieeexplore.ieee.org/document/10844273/ |
work_keys_str_mv | AT esraahamouda smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells AT abeerelkorany smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells AT sohamakady smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells |