Comprehensive Analysis of Masking Techniques in Molecular Graph Representation Learning

Molecule representation learning is a primary area of focus in drug discovery and molecular property prediction. In previous studies, molecules have been modeled as graphs, enabling graph neural networks (GNNs) to capture essential structural information. Recent approaches have enhanced molecular re...

Full description

Saved in:
Bibliographic Details
Main Authors: Bonyou Koo, Sunyoung Kwon
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10844080/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586867765149696
author Bonyou Koo
Sunyoung Kwon
author_facet Bonyou Koo
Sunyoung Kwon
author_sort Bonyou Koo
collection DOAJ
description Molecule representation learning is a primary area of focus in drug discovery and molecular property prediction. In previous studies, molecules have been modeled as graphs, enabling graph neural networks (GNNs) to capture essential structural information. Recent approaches have enhanced molecular representations by introducing advanced masking strategies, such as extending granularity from nodes to subgraphs, shifting masking locations, and applying masking during downstream tasks. However, comprehensive analyses of these strategies remain limited. In this study, we systematically evaluate masking techniques across various phases, granularities, locations, feature types, and ratios. Our findings reveal that node feature masking during pre-training achieves high performance, while rich features may reduce gains, and the commonly used 25% masking ratio is not universally optimal, with alternative ratios performing better depending on the dataset. Our study provides deeper insights into the benefits of masking techniques in molecular graphs and highlights their potential to improve semantic understanding and predictive accuracy in graph-based learning.
format Article
id doaj-art-7a1e57a47f044483a745e51bbf7e879c
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-7a1e57a47f044483a745e51bbf7e879c2025-01-25T00:01:35ZengIEEEIEEE Access2169-35362025-01-0113142901430310.1109/ACCESS.2025.353130210844080Comprehensive Analysis of Masking Techniques in Molecular Graph Representation LearningBonyou Koo0https://orcid.org/0009-0007-9008-9772Sunyoung Kwon1https://orcid.org/0000-0003-3433-1409Department of Information Convergence Engineering, Pusan National University, Yangsan-si, South KoreaDepartment of Information Convergence Engineering, Pusan National University, Yangsan-si, South KoreaMolecule representation learning is a primary area of focus in drug discovery and molecular property prediction. In previous studies, molecules have been modeled as graphs, enabling graph neural networks (GNNs) to capture essential structural information. Recent approaches have enhanced molecular representations by introducing advanced masking strategies, such as extending granularity from nodes to subgraphs, shifting masking locations, and applying masking during downstream tasks. However, comprehensive analyses of these strategies remain limited. In this study, we systematically evaluate masking techniques across various phases, granularities, locations, feature types, and ratios. Our findings reveal that node feature masking during pre-training achieves high performance, while rich features may reduce gains, and the commonly used 25% masking ratio is not universally optimal, with alternative ratios performing better depending on the dataset. Our study provides deeper insights into the benefits of masking techniques in molecular graphs and highlights their potential to improve semantic understanding and predictive accuracy in graph-based learning.https://ieeexplore.ieee.org/document/10844080/Graph neural networkmaskingmolecular graphrepresentation learningmachine learning
spellingShingle Bonyou Koo
Sunyoung Kwon
Comprehensive Analysis of Masking Techniques in Molecular Graph Representation Learning
IEEE Access
Graph neural network
masking
molecular graph
representation learning
machine learning
title Comprehensive Analysis of Masking Techniques in Molecular Graph Representation Learning
title_full Comprehensive Analysis of Masking Techniques in Molecular Graph Representation Learning
title_fullStr Comprehensive Analysis of Masking Techniques in Molecular Graph Representation Learning
title_full_unstemmed Comprehensive Analysis of Masking Techniques in Molecular Graph Representation Learning
title_short Comprehensive Analysis of Masking Techniques in Molecular Graph Representation Learning
title_sort comprehensive analysis of masking techniques in molecular graph representation learning
topic Graph neural network
masking
molecular graph
representation learning
machine learning
url https://ieeexplore.ieee.org/document/10844080/
work_keys_str_mv AT bonyoukoo comprehensiveanalysisofmaskingtechniquesinmoleculargraphrepresentationlearning
AT sunyoungkwon comprehensiveanalysisofmaskingtechniquesinmoleculargraphrepresentationlearning