Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells

Code smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a signif...

Full description

Saved in:
Bibliographic Details
Main Authors: Esraa Hamouda, Abeer El-Korany, Soha Makady
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10844273/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586896796024832
author Esraa Hamouda
Abeer El-Korany
Soha Makady
author_facet Esraa Hamouda
Abeer El-Korany
Soha Makady
author_sort Esraa Hamouda
collection DOAJ
description Code smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a significant impact on source code quality. Recent surveys have highlighted a group of code smells that has rarely been studied by researchers. Furthermore, some machine learning classification models were evaluated on a subset of the source code features while ignoring significant features during classification. This paper proposes a novel approach, called Smell-ML, for detecting five rarely studied code smells: Middle Man (MM), Class Data Should Be Private (CDSBP), Inappropriate Intimacy (II), Refused Bequest (RB), and Speculative Generality (SG). The novelty of this approach stems from the improvement in both the data preparation and classification phases. During data preparation, Smell-ML relies on data balancing and an extended source code feature list to improve accuracy. In the classification phase, different classifiers were assessed, including traditional, ensemble, and multi-level classifiers. We evaluated Smell-ML on a dataset composed of 13 open source Java projects with 125 versions per project. The results show that Smell-ML’s detection F1-score values surpass those of previous studies with significant improvements across various code smells. The F1-score measure of the 11 machine learning classifiers improved after using the extended feature list. Data balancing and multi-level classification notably boosted accuracy.
format Article
id doaj-art-e74ef17004cd4af98cda57b4103b9f19
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e74ef17004cd4af98cda57b4103b9f192025-01-25T00:01:46ZengIEEEIEEE Access2169-35362025-01-0113129661298010.1109/ACCESS.2025.353092710844273Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code SmellsEsraa Hamouda0https://orcid.org/0009-0001-3921-1206Abeer El-Korany1https://orcid.org/0000-0003-3626-7850Soha Makady2https://orcid.org/0000-0002-3330-6204Faculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptFaculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptFaculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptCode smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a significant impact on source code quality. Recent surveys have highlighted a group of code smells that has rarely been studied by researchers. Furthermore, some machine learning classification models were evaluated on a subset of the source code features while ignoring significant features during classification. This paper proposes a novel approach, called Smell-ML, for detecting five rarely studied code smells: Middle Man (MM), Class Data Should Be Private (CDSBP), Inappropriate Intimacy (II), Refused Bequest (RB), and Speculative Generality (SG). The novelty of this approach stems from the improvement in both the data preparation and classification phases. During data preparation, Smell-ML relies on data balancing and an extended source code feature list to improve accuracy. In the classification phase, different classifiers were assessed, including traditional, ensemble, and multi-level classifiers. We evaluated Smell-ML on a dataset composed of 13 open source Java projects with 125 versions per project. The results show that Smell-ML’s detection F1-score values surpass those of previous studies with significant improvements across various code smells. The F1-score measure of the 11 machine learning classifiers improved after using the extended feature list. Data balancing and multi-level classification notably boosted accuracy.https://ieeexplore.ieee.org/document/10844273/Code smell detectionmachine learningdata balancingensemble learningmulti-level classification
spellingShingle Esraa Hamouda
Abeer El-Korany
Soha Makady
Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
IEEE Access
Code smell detection
machine learning
data balancing
ensemble learning
multi-level classification
title Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_full Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_fullStr Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_full_unstemmed Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_short Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_sort smell ml a machine learning framework for detecting rarely studied code smells
topic Code smell detection
machine learning
data balancing
ensemble learning
multi-level classification
url https://ieeexplore.ieee.org/document/10844273/
work_keys_str_mv AT esraahamouda smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells
AT abeerelkorany smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells
AT sohamakady smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells