News Topic Detection Based on Capsule Semantic Graph

Most news topic detection methods use word-based methods, which easily ignore the relationship among words and have semantic sparsity, resulting in low topic detection accuracy. In addition, the current mainstream probability methods and graph analysis methods for topic detection have high time comp...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuang Yang, Yan Tang
Format: Article
Language:English
Published: Tsinghua University Press 2022-06-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2021.9020023
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572923555086336
author Shuang Yang
Yan Tang
author_facet Shuang Yang
Yan Tang
author_sort Shuang Yang
collection DOAJ
description Most news topic detection methods use word-based methods, which easily ignore the relationship among words and have semantic sparsity, resulting in low topic detection accuracy. In addition, the current mainstream probability methods and graph analysis methods for topic detection have high time complexity. For these reasons, we present a news topic detection model on the basis of capsule semantic graph (CSG). The keywords that appear in each text at the same time are modeled as a keyword graph, which is divided into multiple subgraphs through community detection. Each subgraph contains a group of closely related keywords. The graph is used as the vertex of CSG. The semantic relationship among the vertices is obtained by calculating the similarity of the average word vector of each vertex. At the same time, the news text is clustered using the incremental clustering method, where each text uses CSG; that is, the similarity among texts is calculated by the graph kernel. The relationship between vertices and edges is also considered when calculating the similarity. Experimental results on three standard datasets show that CSG can obtain higher precision, recall, and F1 values than several latest methods. Experimental results on large-scale news datasets reveal that the time complexity of CSG is lower than that of probabilistic methods and other graph analysis methods.
format Article
id doaj-art-0f29e6bae1bb40b2bd24506ec9964adf
institution Kabale University
issn 2096-0654
language English
publishDate 2022-06-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-0f29e6bae1bb40b2bd24506ec9964adf2025-02-02T06:14:03ZengTsinghua University PressBig Data Mining and Analytics2096-06542022-06-01529810910.26599/BDMA.2021.9020023News Topic Detection Based on Capsule Semantic GraphShuang Yang0Yan Tang1College of Computer and Information Science, Southwest University, Chongqing 400000, ChinaCollege of Computer and Information Science, Southwest University, Chongqing 400000, ChinaMost news topic detection methods use word-based methods, which easily ignore the relationship among words and have semantic sparsity, resulting in low topic detection accuracy. In addition, the current mainstream probability methods and graph analysis methods for topic detection have high time complexity. For these reasons, we present a news topic detection model on the basis of capsule semantic graph (CSG). The keywords that appear in each text at the same time are modeled as a keyword graph, which is divided into multiple subgraphs through community detection. Each subgraph contains a group of closely related keywords. The graph is used as the vertex of CSG. The semantic relationship among the vertices is obtained by calculating the similarity of the average word vector of each vertex. At the same time, the news text is clustered using the incremental clustering method, where each text uses CSG; that is, the similarity among texts is calculated by the graph kernel. The relationship between vertices and edges is also considered when calculating the similarity. Experimental results on three standard datasets show that CSG can obtain higher precision, recall, and F1 values than several latest methods. Experimental results on large-scale news datasets reveal that the time complexity of CSG is lower than that of probabilistic methods and other graph analysis methods.https://www.sciopen.com/article/10.26599/BDMA.2021.9020023news topic detectioncapsule semantic graphgraph kernel
spellingShingle Shuang Yang
Yan Tang
News Topic Detection Based on Capsule Semantic Graph
Big Data Mining and Analytics
news topic detection
capsule semantic graph
graph kernel
title News Topic Detection Based on Capsule Semantic Graph
title_full News Topic Detection Based on Capsule Semantic Graph
title_fullStr News Topic Detection Based on Capsule Semantic Graph
title_full_unstemmed News Topic Detection Based on Capsule Semantic Graph
title_short News Topic Detection Based on Capsule Semantic Graph
title_sort news topic detection based on capsule semantic graph
topic news topic detection
capsule semantic graph
graph kernel
url https://www.sciopen.com/article/10.26599/BDMA.2021.9020023
work_keys_str_mv AT shuangyang newstopicdetectionbasedoncapsulesemanticgraph
AT yantang newstopicdetectionbasedoncapsulesemanticgraph