Smell-ML: A Machine Learning Framework for Detecting Rarely Studied Code Smells
Code smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a signif...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10844273/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Code smells are design flaws that reduce the software quality and maintainability. Machine learning classification models have been used to detect different code smells. However, such studies targeted code smells in depth, while leaving other under-explored smells; even so, such smells have a significant impact on source code quality. Recent surveys have highlighted a group of code smells that has rarely been studied by researchers. Furthermore, some machine learning classification models were evaluated on a subset of the source code features while ignoring significant features during classification. This paper proposes a novel approach, called Smell-ML, for detecting five rarely studied code smells: Middle Man (MM), Class Data Should Be Private (CDSBP), Inappropriate Intimacy (II), Refused Bequest (RB), and Speculative Generality (SG). The novelty of this approach stems from the improvement in both the data preparation and classification phases. During data preparation, Smell-ML relies on data balancing and an extended source code feature list to improve accuracy. In the classification phase, different classifiers were assessed, including traditional, ensemble, and multi-level classifiers. We evaluated Smell-ML on a dataset composed of 13 open source Java projects with 125 versions per project. The results show that Smell-ML’s detection F1-score values surpass those of previous studies with significant improvements across various code smells. The F1-score measure of the 11 machine learning classifiers improved after using the extended feature list. Data balancing and multi-level classification notably boosted accuracy. |
---|---|
ISSN: | 2169-3536 |