Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word Representation

The citation intent extraction and classification has long been studied as it is a good measure of relevancy. Different approaches have classified the citations into different classes; including weak and strong, positive and negative, important and unimportant. Others have gone further from binary c...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Roman, Abdul Shahid, Shafiullah Khan, Lisu Yu, Muhammad Asif, Yazeed Yasin Ghadi
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9737031/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832582369255620608
author Muhammad Roman
Abdul Shahid
Shafiullah Khan
Lisu Yu
Muhammad Asif
Yazeed Yasin Ghadi
author_facet Muhammad Roman
Abdul Shahid
Shafiullah Khan
Lisu Yu
Muhammad Asif
Yazeed Yasin Ghadi
author_sort Muhammad Roman
collection DOAJ
description The citation intent extraction and classification has long been studied as it is a good measure of relevancy. Different approaches have classified the citations into different classes; including weak and strong, positive and negative, important and unimportant. Others have gone further from binary classification to multi-classes, including extension, use, background, or comparison. Researchers have utilized various elements of the information, including both meta and contents of the paper. The actual context of any referred article lies within the citation context where a paper is referred. Various attempts have been made to study the citation context to capture the citation intent, but very few have encoded the words to their contextual representations. For automated classification, we need to train deep learning models, which take the citation context as input and provides the reason for citing a paper. Deep neural models work on numeric data, and therefore, we must convert the text information to its numeric representation. Natural languages are much complex than computer languages. Computer languages have a pre-defined fixed syntax where each word has a unique meaning. In contrast, every word in natural language may have a different meaning and may well be understood by understanding the position, previous discussion, and neighboring words. The extra information provides the context of a word within a sentence. We have, therefore, used contextual word representation, which is trained through deep neural networks. Deep models require massive data for generalizing the model, however, the existing state-of-the-art datasets don’t provide much information for the training models to get generalized. Therefore, we have developed our own scholarly dataset, Citation Context Dataset with Intent (C2D-I), an extension of the C2D dataset. We used a transformers based model for capturing the contextual representation of words. Our proposed method outperformed the existing benchmark methods with F1 score of 89%.
format Article
id doaj-art-43c12b2eb677453a9a07744ea07b0dd9
institution Kabale University
issn 2169-3536
language English
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-43c12b2eb677453a9a07744ea07b0dd92025-01-30T00:00:21ZengIEEEIEEE Access2169-35362022-01-0110313973141910.1109/ACCESS.2022.31599809737031Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word RepresentationMuhammad Roman0https://orcid.org/0000-0002-9035-2426Abdul Shahid1https://orcid.org/0000-0002-6291-2641Shafiullah Khan2https://orcid.org/0000-0001-8363-2051Lisu Yu3https://orcid.org/0000-0001-8637-852XMuhammad Asif4https://orcid.org/0000-0003-1839-2527Yazeed Yasin Ghadi5https://orcid.org/0000-0002-7121-495XInstitute of Computing, Kohat University of Science and Technology, Kohat, PakistanInstitute of Computing, Kohat University of Science and Technology, Kohat, PakistanInstitute of Computing, Kohat University of Science and Technology, Kohat, PakistanSchool of Information Engineering, Nanchang University, Nanchang, ChinaDepartment of Computer Science, National Textile University, Faisalabad, PakistanDepartment of Software Engineering and Computer Science, Al Ain University, Abu Dhabi, United Arab EmiratesThe citation intent extraction and classification has long been studied as it is a good measure of relevancy. Different approaches have classified the citations into different classes; including weak and strong, positive and negative, important and unimportant. Others have gone further from binary classification to multi-classes, including extension, use, background, or comparison. Researchers have utilized various elements of the information, including both meta and contents of the paper. The actual context of any referred article lies within the citation context where a paper is referred. Various attempts have been made to study the citation context to capture the citation intent, but very few have encoded the words to their contextual representations. For automated classification, we need to train deep learning models, which take the citation context as input and provides the reason for citing a paper. Deep neural models work on numeric data, and therefore, we must convert the text information to its numeric representation. Natural languages are much complex than computer languages. Computer languages have a pre-defined fixed syntax where each word has a unique meaning. In contrast, every word in natural language may have a different meaning and may well be understood by understanding the position, previous discussion, and neighboring words. The extra information provides the context of a word within a sentence. We have, therefore, used contextual word representation, which is trained through deep neural networks. Deep models require massive data for generalizing the model, however, the existing state-of-the-art datasets don’t provide much information for the training models to get generalized. Therefore, we have developed our own scholarly dataset, Citation Context Dataset with Intent (C2D-I), an extension of the C2D dataset. We used a transformers based model for capturing the contextual representation of words. Our proposed method outperformed the existing benchmark methods with F1 score of 89%.https://ieeexplore.ieee.org/document/9737031/Citation intent classificationcitation reasoncitation contexttransformers modelresearch paper similarity
spellingShingle Muhammad Roman
Abdul Shahid
Shafiullah Khan
Lisu Yu
Muhammad Asif
Yazeed Yasin Ghadi
Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word Representation
IEEE Access
Citation intent classification
citation reason
citation context
transformers model
research paper similarity
title Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word Representation
title_full Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word Representation
title_fullStr Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word Representation
title_full_unstemmed Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word Representation
title_short Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word Representation
title_sort investigating maps of science using contextual proximity of citations based on deep contextualized word representation
topic Citation intent classification
citation reason
citation context
transformers model
research paper similarity
url https://ieeexplore.ieee.org/document/9737031/
work_keys_str_mv AT muhammadroman investigatingmapsofscienceusingcontextualproximityofcitationsbasedondeepcontextualizedwordrepresentation
AT abdulshahid investigatingmapsofscienceusingcontextualproximityofcitationsbasedondeepcontextualizedwordrepresentation
AT shafiullahkhan investigatingmapsofscienceusingcontextualproximityofcitationsbasedondeepcontextualizedwordrepresentation
AT lisuyu investigatingmapsofscienceusingcontextualproximityofcitationsbasedondeepcontextualizedwordrepresentation
AT muhammadasif investigatingmapsofscienceusingcontextualproximityofcitationsbasedondeepcontextualizedwordrepresentation
AT yazeedyasinghadi investigatingmapsofscienceusingcontextualproximityofcitationsbasedondeepcontextualizedwordrepresentation