Exploring Topic Coherence With PCC-LDA and BERT for Contextual Word Generation

In the field of natural language processing (NLP), topic modeling and word generation are crucial for comprehending and producing texts that resemble human languages. Extracting key phrases is an essential task that aids document summarization, information retrieval, and topic classification. Topic...

Full description

Saved in:
Bibliographic Details
Main Authors: Sandeep Kumar Rachamadugu, T. P. Pushphavathi, Surbhi Bhatia Khan, Mohammad Alojail
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10713309/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586885312020480
author Sandeep Kumar Rachamadugu
T. P. Pushphavathi
Surbhi Bhatia Khan
Mohammad Alojail
author_facet Sandeep Kumar Rachamadugu
T. P. Pushphavathi
Surbhi Bhatia Khan
Mohammad Alojail
author_sort Sandeep Kumar Rachamadugu
collection DOAJ
description In the field of natural language processing (NLP), topic modeling and word generation are crucial for comprehending and producing texts that resemble human languages. Extracting key phrases is an essential task that aids document summarization, information retrieval, and topic classification. Topic modeling significantly enhances our understanding of the latent structure of textual data. Latent Dirichlet Allocation (LDA) is a popular algorithm for topic modeling, which assumes that every document is a mix of several topics, and each topic will have multiple words. A new model similar to LDA, but a better version called Probabilistic Correlated Clustering Latent Dirichlet Allocation (PCC-LDA) was recently introduced. On the other hand, BERT is an advanced bidirectional pre-trained language model that understands words in a sentence based on the full context to generate more precise and contextually correct words. Topic modeling is a useful way to discover hidden themes or topics within a range of documents aiming to tune better topics from the corpus and enhance topic modeling implementation. The experiments indicated a significant improvement in performance when using this combination approach. Coherence criteria of are utilized to judge whether the words in each topic accord with prior knowledge, which could ensure that topics are interpretable and meaningful. The above results of the topic-level analysis indicate that PCC-LDA consistency topics perform better than LDA and NMF(non-negative matrix factorization Technique) by at least 15.4%,12.9%(<inline-formula> <tex-math notation="LaTeX">$k = 5$ </tex-math></inline-formula>) and up to nearly 12.5% and 11.8% (<inline-formula> <tex-math notation="LaTeX">$k = 10$ </tex-math></inline-formula>) respectively, where k represents the number of topics.
format Article
id doaj-art-bed42e2dfaed49968b31040cc7658126
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-bed42e2dfaed49968b31040cc76581262025-01-25T00:00:28ZengIEEEIEEE Access2169-35362024-01-011217525217526710.1109/ACCESS.2024.347799210713309Exploring Topic Coherence With PCC-LDA and BERT for Contextual Word GenerationSandeep Kumar Rachamadugu0https://orcid.org/0000-0003-3116-5417T. P. Pushphavathi1Surbhi Bhatia Khan2https://orcid.org/0000-0003-3097-6568Mohammad Alojail3Department of Computer Science and Engineering, M. S. Ramaiah University of Applied Sciences, Bengaluru, Karnataka, IndiaDepartment of Computer Science and Engineering, M. S. Ramaiah University of Applied Sciences, Bengaluru, Karnataka, IndiaSchool of Science, Engineering and Environment, University of Salford, Manchester, U.K.Management Information System Department, College of Business Administration, King Saud University, Riyadh, Saudi ArabiaIn the field of natural language processing (NLP), topic modeling and word generation are crucial for comprehending and producing texts that resemble human languages. Extracting key phrases is an essential task that aids document summarization, information retrieval, and topic classification. Topic modeling significantly enhances our understanding of the latent structure of textual data. Latent Dirichlet Allocation (LDA) is a popular algorithm for topic modeling, which assumes that every document is a mix of several topics, and each topic will have multiple words. A new model similar to LDA, but a better version called Probabilistic Correlated Clustering Latent Dirichlet Allocation (PCC-LDA) was recently introduced. On the other hand, BERT is an advanced bidirectional pre-trained language model that understands words in a sentence based on the full context to generate more precise and contextually correct words. Topic modeling is a useful way to discover hidden themes or topics within a range of documents aiming to tune better topics from the corpus and enhance topic modeling implementation. The experiments indicated a significant improvement in performance when using this combination approach. Coherence criteria of are utilized to judge whether the words in each topic accord with prior knowledge, which could ensure that topics are interpretable and meaningful. The above results of the topic-level analysis indicate that PCC-LDA consistency topics perform better than LDA and NMF(non-negative matrix factorization Technique) by at least 15.4%,12.9%(<inline-formula> <tex-math notation="LaTeX">$k = 5$ </tex-math></inline-formula>) and up to nearly 12.5% and 11.8% (<inline-formula> <tex-math notation="LaTeX">$k = 10$ </tex-math></inline-formula>) respectively, where k represents the number of topics.https://ieeexplore.ieee.org/document/10713309/BERTkey phrasesLDAtopic coherencetopic modeling
spellingShingle Sandeep Kumar Rachamadugu
T. P. Pushphavathi
Surbhi Bhatia Khan
Mohammad Alojail
Exploring Topic Coherence With PCC-LDA and BERT for Contextual Word Generation
IEEE Access
BERT
key phrases
LDA
topic coherence
topic modeling
title Exploring Topic Coherence With PCC-LDA and BERT for Contextual Word Generation
title_full Exploring Topic Coherence With PCC-LDA and BERT for Contextual Word Generation
title_fullStr Exploring Topic Coherence With PCC-LDA and BERT for Contextual Word Generation
title_full_unstemmed Exploring Topic Coherence With PCC-LDA and BERT for Contextual Word Generation
title_short Exploring Topic Coherence With PCC-LDA and BERT for Contextual Word Generation
title_sort exploring topic coherence with pcc lda and bert for contextual word generation
topic BERT
key phrases
LDA
topic coherence
topic modeling
url https://ieeexplore.ieee.org/document/10713309/
work_keys_str_mv AT sandeepkumarrachamadugu exploringtopiccoherencewithpccldaandbertforcontextualwordgeneration
AT tppushphavathi exploringtopiccoherencewithpccldaandbertforcontextualwordgeneration
AT surbhibhatiakhan exploringtopiccoherencewithpccldaandbertforcontextualwordgeneration
AT mohammadalojail exploringtopiccoherencewithpccldaandbertforcontextualwordgeneration