Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells

Code smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a signif...

Full description

Saved in:

Bibliographic Details
Main Authors:	Esraa Hamouda, Abeer El-Korany, Soha Makady
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Code smell detection machine learning data balancing ensemble learning multi-level classification
Online Access:	https://ieeexplore.ieee.org/document/10844273/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832586896796024832
author	Esraa Hamouda Abeer El-Korany Soha Makady
author_facet	Esraa Hamouda Abeer El-Korany Soha Makady
author_sort	Esraa Hamouda
collection	DOAJ
description	Code smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a significant impact on source code quality. Recent surveys have highlighted a group of code smells that has rarely been studied by researchers. Furthermore, some machine learning classification models were evaluated on a subset of the source code features while ignoring significant features during classification. This paper proposes a novel approach, called Smell-ML, for detecting five rarely studied code smells: Middle Man (MM), Class Data Should Be Private (CDSBP), Inappropriate Intimacy (II), Refused Bequest (RB), and Speculative Generality (SG). The novelty of this approach stems from the improvement in both the data preparation and classification phases. During data preparation, Smell-ML relies on data balancing and an extended source code feature list to improve accuracy. In the classification phase, different classifiers were assessed, including traditional, ensemble, and multi-level classifiers. We evaluated Smell-ML on a dataset composed of 13 open source Java projects with 125 versions per project. The results show that Smell-ML’s detection F1-score values surpass those of previous studies with significant improvements across various code smells. The F1-score measure of the 11 machine learning classifiers improved after using the extended feature list. Data balancing and multi-level classification notably boosted accuracy.
format	Article
id	doaj-art-e74ef17004cd4af98cda57b4103b9f19
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-e74ef17004cd4af98cda57b4103b9f192025-01-25T00:01:46ZengIEEEIEEE Access2169-35362025-01-0113129661298010.1109/ACCESS.2025.353092710844273Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code SmellsEsraa Hamouda0https://orcid.org/0009-0001-3921-1206Abeer El-Korany1https://orcid.org/0000-0003-3626-7850Soha Makady2https://orcid.org/0000-0002-3330-6204Faculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptFaculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptFaculty of Computers and Artificial Intelligence, Cairo University, Giza, Orman, EgyptCode smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a significant impact on source code quality. Recent surveys have highlighted a group of code smells that has rarely been studied by researchers. Furthermore, some machine learning classification models were evaluated on a subset of the source code features while ignoring significant features during classification. This paper proposes a novel approach, called Smell-ML, for detecting five rarely studied code smells: Middle Man (MM), Class Data Should Be Private (CDSBP), Inappropriate Intimacy (II), Refused Bequest (RB), and Speculative Generality (SG). The novelty of this approach stems from the improvement in both the data preparation and classification phases. During data preparation, Smell-ML relies on data balancing and an extended source code feature list to improve accuracy. In the classification phase, different classifiers were assessed, including traditional, ensemble, and multi-level classifiers. We evaluated Smell-ML on a dataset composed of 13 open source Java projects with 125 versions per project. The results show that Smell-ML’s detection F1-score values surpass those of previous studies with significant improvements across various code smells. The F1-score measure of the 11 machine learning classifiers improved after using the extended feature list. Data balancing and multi-level classification notably boosted accuracy.https://ieeexplore.ieee.org/document/10844273/Code smell detectionmachine learningdata balancingensemble learningmulti-level classification
spellingShingle	Esraa Hamouda Abeer El-Korany Soha Makady Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells IEEE Access Code smell detection machine learning data balancing ensemble learning multi-level classification
title	Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_full	Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_fullStr	Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_full_unstemmed	Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_short	Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
title_sort	smell ml a machine learning framework for detecting rarely studied code smells
topic	Code smell detection machine learning data balancing ensemble learning multi-level classification
url	https://ieeexplore.ieee.org/document/10844273/
work_keys_str_mv	AT esraahamouda smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells AT abeerelkorany smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells AT sohamakady smellmlamachinelearningframeworkfordetectingrarelystudiedcodesmells

Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells

Similar Items