IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction
Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project de...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2024-01-01
|
Series: | IET Software |
Online Access: | http://dx.doi.org/10.1049/2024/8027037 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832568851627245568 |
---|---|
author | Xuanye Wang Lu Lu Qingyan Tian Haishan Lin |
author_facet | Xuanye Wang Lu Lu Qingyan Tian Haishan Lin |
author_sort | Xuanye Wang |
collection | DOAJ |
description | Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge. |
format | Article |
id | doaj-art-7f6140e09f5b4919aab667d60254d1fb |
institution | Kabale University |
issn | 1751-8814 |
language | English |
publishDate | 2024-01-01 |
publisher | Wiley |
record_format | Article |
series | IET Software |
spelling | doaj-art-7f6140e09f5b4919aab667d60254d1fb2025-02-03T00:10:09ZengWileyIET Software1751-88142024-01-01202410.1049/2024/8027037IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect PredictionXuanye Wang0Lu Lu1Qingyan Tian2Haishan Lin3School of Computer Science and EngineeringSchool of Computer Science and EngineeringGuangdong Provincial Key Laboratory of Tunnel Safety and Emergency Support Technology and EquipmentGuangdong Provincial Key Laboratory of Tunnel Safety and Emergency Support Technology and EquipmentSoftware defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge.http://dx.doi.org/10.1049/2024/8027037 |
spellingShingle | Xuanye Wang Lu Lu Qingyan Tian Haishan Lin IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction IET Software |
title | IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction |
title_full | IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction |
title_fullStr | IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction |
title_full_unstemmed | IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction |
title_short | IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction |
title_sort | ic graf an improved clustering with graph embedding based features for software defect prediction |
url | http://dx.doi.org/10.1049/2024/8027037 |
work_keys_str_mv | AT xuanyewang icgrafanimprovedclusteringwithgraphembeddingbasedfeaturesforsoftwaredefectprediction AT lulu icgrafanimprovedclusteringwithgraphembeddingbasedfeaturesforsoftwaredefectprediction AT qingyantian icgrafanimprovedclusteringwithgraphembeddingbasedfeaturesforsoftwaredefectprediction AT haishanlin icgrafanimprovedclusteringwithgraphembeddingbasedfeaturesforsoftwaredefectprediction |