Topic based document modeling for information filtering

Information Filtering (IF), which has been popularly studied in recent years, is one of the areas that applies document retrieval techniques for dealing with the huge amount of information. In IF systems, modelling user’s interest and filtering relevant documents are major parts of the systems. Var...

Full description

Saved in:
Bibliographic Details
Main Author: Tran Diem Hanh Nguyen
Format: Article
Language:English
Published: Can Tho University Publisher 2023-10-01
Series:CTU Journal of Innovation and Sustainable Development
Subjects:
Online Access:http://web2010.thanhtoan/index.php/ctujs/article/view/715
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850235688078802944
author Tran Diem Hanh Nguyen
author_facet Tran Diem Hanh Nguyen
author_sort Tran Diem Hanh Nguyen
collection DOAJ
description Information Filtering (IF), which has been popularly studied in recent years, is one of the areas that applies document retrieval techniques for dealing with the huge amount of information. In IF systems, modelling user’s interest and filtering relevant documents are major parts of the systems. Various approaches have been proposed for modelling the first component. In this study, we utilized a topic-modelling technique, Latent Dirichlet Topic Modelling, to model user’s interest for IFs. In particular, an extended model of it to represent user’s interest named Latent Dirichlet Topic Modelling with high Frequency Occurrences, shorted as LDA_HF, was proposed with the intention to enhance retrieving performance of IFs. The new model was then compared to the existing methods in modelling user’s interest such as BM25, pLSA, and LDA_IF over the big benchmark datasets, RCV1 and R8. The results of extensive experiments showed that the new proposed model outperformed all the state-of-the-art baseline models in user modelling such as BM25, pLSA and LDA_IF according to 4 major measurement metrics including Top20, B/P, MAP, and F1. Hence, the model LDA_HF promises one of the reliable methods of enhancing performance of IFs.
format Article
id doaj-art-a40f7f4df40f4e28a99253d25e874bb5
institution OA Journals
issn 2588-1418
2815-6412
language English
publishDate 2023-10-01
publisher Can Tho University Publisher
record_format Article
series CTU Journal of Innovation and Sustainable Development
spelling doaj-art-a40f7f4df40f4e28a99253d25e874bb52025-08-20T02:02:10ZengCan Tho University PublisherCTU Journal of Innovation and Sustainable Development2588-14182815-64122023-10-0115Special issue: ISDSTopic based document modeling for information filteringTran Diem Hanh Nguyen0a:1:{s:5:"en_US";s:19:"Tra Vinh University";} Information Filtering (IF), which has been popularly studied in recent years, is one of the areas that applies document retrieval techniques for dealing with the huge amount of information. In IF systems, modelling user’s interest and filtering relevant documents are major parts of the systems. Various approaches have been proposed for modelling the first component. In this study, we utilized a topic-modelling technique, Latent Dirichlet Topic Modelling, to model user’s interest for IFs. In particular, an extended model of it to represent user’s interest named Latent Dirichlet Topic Modelling with high Frequency Occurrences, shorted as LDA_HF, was proposed with the intention to enhance retrieving performance of IFs. The new model was then compared to the existing methods in modelling user’s interest such as BM25, pLSA, and LDA_IF over the big benchmark datasets, RCV1 and R8. The results of extensive experiments showed that the new proposed model outperformed all the state-of-the-art baseline models in user modelling such as BM25, pLSA and LDA_IF according to 4 major measurement metrics including Top20, B/P, MAP, and F1. Hence, the model LDA_HF promises one of the reliable methods of enhancing performance of IFs. http://web2010.thanhtoan/index.php/ctujs/article/view/715Information filtering, information retrieval, topic models, topic modelling
spellingShingle Tran Diem Hanh Nguyen
Topic based document modeling for information filtering
CTU Journal of Innovation and Sustainable Development
Information filtering, information retrieval, topic models, topic modelling
title Topic based document modeling for information filtering
title_full Topic based document modeling for information filtering
title_fullStr Topic based document modeling for information filtering
title_full_unstemmed Topic based document modeling for information filtering
title_short Topic based document modeling for information filtering
title_sort topic based document modeling for information filtering
topic Information filtering, information retrieval, topic models, topic modelling
url http://web2010.thanhtoan/index.php/ctujs/article/view/715
work_keys_str_mv AT trandiemhanhnguyen topicbaseddocumentmodelingforinformationfiltering