A graph neural network approach for hierarchical mapping of breast cancer protein communities
Abstract Background Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consid...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-024-06015-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585338104578048 |
---|---|
author | Xiao Zhang Qian Liu |
author_facet | Xiao Zhang Qian Liu |
author_sort | Xiao Zhang |
collection | DOAJ |
description | Abstract Background Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein–protein interactions for hierarchical clustering. Results Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine. Conclusion The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer. |
format | Article |
id | doaj-art-1d449ef4420e482c8af24ddceabfcee9 |
institution | Kabale University |
issn | 1471-2105 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj-art-1d449ef4420e482c8af24ddceabfcee92025-01-26T12:54:54ZengBMCBMC Bioinformatics1471-21052025-01-0126111810.1186/s12859-024-06015-xA graph neural network approach for hierarchical mapping of breast cancer protein communitiesXiao Zhang0Qian Liu1Department of Applied Computer Science, University of WinnipegDepartment of Applied Computer Science, University of WinnipegAbstract Background Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein–protein interactions for hierarchical clustering. Results Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine. Conclusion The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.https://doi.org/10.1186/s12859-024-06015-xProtein communitiesHierarchical clusteringGraph neural networkGroup LASSOBreast cancerBiomarker |
spellingShingle | Xiao Zhang Qian Liu A graph neural network approach for hierarchical mapping of breast cancer protein communities BMC Bioinformatics Protein communities Hierarchical clustering Graph neural network Group LASSO Breast cancer Biomarker |
title | A graph neural network approach for hierarchical mapping of breast cancer protein communities |
title_full | A graph neural network approach for hierarchical mapping of breast cancer protein communities |
title_fullStr | A graph neural network approach for hierarchical mapping of breast cancer protein communities |
title_full_unstemmed | A graph neural network approach for hierarchical mapping of breast cancer protein communities |
title_short | A graph neural network approach for hierarchical mapping of breast cancer protein communities |
title_sort | graph neural network approach for hierarchical mapping of breast cancer protein communities |
topic | Protein communities Hierarchical clustering Graph neural network Group LASSO Breast cancer Biomarker |
url | https://doi.org/10.1186/s12859-024-06015-x |
work_keys_str_mv | AT xiaozhang agraphneuralnetworkapproachforhierarchicalmappingofbreastcancerproteincommunities AT qianliu agraphneuralnetworkapproachforhierarchicalmappingofbreastcancerproteincommunities AT xiaozhang graphneuralnetworkapproachforhierarchicalmappingofbreastcancerproteincommunities AT qianliu graphneuralnetworkapproachforhierarchicalmappingofbreastcancerproteincommunities |