An Adaptive Methodology for Constructing Domain-Specific Sentiment Lexicons Based on Chinese Social Media Data

Currently, many methods for automatically constructing domain-specific sentiment lexicons rely on knowledge bases and domain-specific corpora. However, these methods often face accuracy challenges due to data sparsity, and inferring the polarity of new domain-specific sentiment words from a limited...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xue Xu, Haidong Liu, Lei Liu
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Domain-specific sentiment lexicon graph convolutional network (GCN) BERT Chinese social media public opinion
Online Access:	https://ieeexplore.ieee.org/document/11008636/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Currently, many methods for automatically constructing domain-specific sentiment lexicons rely on knowledge bases and domain-specific corpora. However, these methods often face accuracy challenges due to data sparsity, and inferring the polarity of new domain-specific sentiment words from a limited set of labeled seed words lacks precision. Chinese social media texts typically exhibit a high degree of randomness, noise, and informal sentiment words, which further increases the difficulty of constructing domain-specific sentiment lexicons. To address these challenges, we propose an adaptive framework for constructing domain-specific sentiment lexicons using Chinese social media data and apply it to develop a sentiment lexicon for public opinion during public health emergencies (PHEPO-SentiLex). We first fine-tune Bidirectional Encoder Representations from Transformers (BERT) via a multi-task framework on domain-specific corpus and a small number of Weibo-annotated sentiment datasets, enabling the model to encode both domain semantics and sentiment-related contextual patterns into word embeddings through gradient sharing. The embeddings are subsequently used to calculate the Sentiment Attraction Degree (SAD) during seed word filtering, cosine similarity during domain-specific sentiment word selection, and for constructing the domain-specific corpus-sentiment word graph (SentiGraph). Next, we propose SentiGraph-GCN, a method for sentiment word polarity determination that integrates semantic, sentiment, co-occurrence frequency, and global structural information embedded in the corpus. Experimental results demonstrate that SentiGraph-GCN significantly outperforms existing methods in determining sentiment word polarity. Furthermore, PHEPO-SentiLex exhibits superior accuracy and stability in relevant scenarios compared to general-purpose sentiment lexicons.
ISSN:	2169-3536

An Adaptive Methodology for Constructing Domain-Specific Sentiment Lexicons Based on Chinese Social Media Data

Similar Items