Cross-Community Question Relevance Prediction for Stack Overflow and GitHub

As the open-source community has evolved, Stack Overflow (SO) has gained extensive usage. The question-and-answer community’s mechanism for recommending related questions helps users discover more content relevant to their current problems, expediting issue resolution. However, the rec...

Full description

Saved in:
Bibliographic Details
Main Authors: Song Yu, Bugao Jiang, Danni Zhang, Zhifang Liao
Format: Article
Language:English
Published: Graz University of Technology 2025-01-01
Series:Journal of Universal Computer Science
Subjects:
Online Access:https://lib.jucs.org/article/119772/download/pdf/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832581886501715968
author Song Yu
Bugao Jiang
Danni Zhang
Zhifang Liao
author_facet Song Yu
Bugao Jiang
Danni Zhang
Zhifang Liao
author_sort Song Yu
collection DOAJ
description As the open-source community has evolved, Stack Overflow (SO) has gained extensive usage. The question-and-answer community’s mechanism for recommending related questions helps users discover more content relevant to their current problems, expediting issue resolution. However, the recommendation of relevant questions in a single community context limits the amount of available content and the diversity of content, and the recommendation results rely heavily on the existing knowledge of the community. Stack Overflow still harbors a substantial number of unresolved questions. To address this situation, this paper proposes a cross-community question relevance prediction model, CCQRP, to predict the relevance of Stack Overflow ques-tions and GitHub(GH) issues, and recommend relevant GitHub issues. CCQRP aims to assist developers in effectively resolving problems and enhancing development efficiency. We design an embedding layer incorporating BERTOverflow and Bi-LSTM and devise a weighted attention matrix based on named entity types of tokens. This matrix assigns different weights to tokens of varying named entity types during the prediction process, capturing critical information to predict the relevance of SO questions and GH issues. Due to the lack of existing datasets, we construct a dataset named Question-Issue dataset (QI), consisting of Stack Overflow questions, GitHub issues, and the corresponding question-issue relevance, containing 240,000 related SO question-GH issue pairs and 470,000 unrelated pairs. We evaluate the effectiveness of CCQRP on QI. Compared to the latest models (MQDD, CodeBERT, ASIM), CCQRP demonstrates an improvement in F1-score ranging from 0.60% to 10.86% and exhibits robust generalization capabilities.
format Article
id doaj-art-34cbf7d1500f4782ba34ccf8ed0f962d
institution Kabale University
issn 0948-6968
language English
publishDate 2025-01-01
publisher Graz University of Technology
record_format Article
series Journal of Universal Computer Science
spelling doaj-art-34cbf7d1500f4782ba34ccf8ed0f962d2025-01-30T08:31:23ZengGraz University of TechnologyJournal of Universal Computer Science0948-69682025-01-01311527110.3897/jucs.119772119772Cross-Community Question Relevance Prediction for Stack Overflow and GitHubSong Yu0Bugao Jiang1Danni Zhang2Zhifang Liao3Central South UniversityCentral South UniversityCentral South UniversityCentral South UniversityAs the open-source community has evolved, Stack Overflow (SO) has gained extensive usage. The question-and-answer community’s mechanism for recommending related questions helps users discover more content relevant to their current problems, expediting issue resolution. However, the recommendation of relevant questions in a single community context limits the amount of available content and the diversity of content, and the recommendation results rely heavily on the existing knowledge of the community. Stack Overflow still harbors a substantial number of unresolved questions. To address this situation, this paper proposes a cross-community question relevance prediction model, CCQRP, to predict the relevance of Stack Overflow ques-tions and GitHub(GH) issues, and recommend relevant GitHub issues. CCQRP aims to assist developers in effectively resolving problems and enhancing development efficiency. We design an embedding layer incorporating BERTOverflow and Bi-LSTM and devise a weighted attention matrix based on named entity types of tokens. This matrix assigns different weights to tokens of varying named entity types during the prediction process, capturing critical information to predict the relevance of SO questions and GH issues. Due to the lack of existing datasets, we construct a dataset named Question-Issue dataset (QI), consisting of Stack Overflow questions, GitHub issues, and the corresponding question-issue relevance, containing 240,000 related SO question-GH issue pairs and 470,000 unrelated pairs. We evaluate the effectiveness of CCQRP on QI. Compared to the latest models (MQDD, CodeBERT, ASIM), CCQRP demonstrates an improvement in F1-score ranging from 0.60% to 10.86% and exhibits robust generalization capabilities.https://lib.jucs.org/article/119772/download/pdf/Relevant QuestionRelevance PredictionStack Ove
spellingShingle Song Yu
Bugao Jiang
Danni Zhang
Zhifang Liao
Cross-Community Question Relevance Prediction for Stack Overflow and GitHub
Journal of Universal Computer Science
Relevant Question
Relevance Prediction
Stack Ove
title Cross-Community Question Relevance Prediction for Stack Overflow and GitHub
title_full Cross-Community Question Relevance Prediction for Stack Overflow and GitHub
title_fullStr Cross-Community Question Relevance Prediction for Stack Overflow and GitHub
title_full_unstemmed Cross-Community Question Relevance Prediction for Stack Overflow and GitHub
title_short Cross-Community Question Relevance Prediction for Stack Overflow and GitHub
title_sort cross community question relevance prediction for stack overflow and github
topic Relevant Question
Relevance Prediction
Stack Ove
url https://lib.jucs.org/article/119772/download/pdf/
work_keys_str_mv AT songyu crosscommunityquestionrelevancepredictionforstackoverflowandgithub
AT bugaojiang crosscommunityquestionrelevancepredictionforstackoverflowandgithub
AT dannizhang crosscommunityquestionrelevancepredictionforstackoverflowandgithub
AT zhifangliao crosscommunityquestionrelevancepredictionforstackoverflowandgithub