MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model

Abstract DNA methylation (DNAm) is a key epigenetic mark that shows profound alterations in cancer. Read-level methylomes enable more in-depth analyses, due to their broad genomic coverage and preservation of rare cell-type signals, compared to summarized data such as 450K/EPIC microarrays. Here, we...

Full description

Saved in:
Bibliographic Details
Main Authors: Yunhee Jeong, Clarissa Gerhäuser, Guido Sauter, Thorsten Schlomm, Karl Rohr, Pavlo Lutsik
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-55920-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594652825387008
author Yunhee Jeong
Clarissa Gerhäuser
Guido Sauter
Thorsten Schlomm
Karl Rohr
Pavlo Lutsik
author_facet Yunhee Jeong
Clarissa Gerhäuser
Guido Sauter
Thorsten Schlomm
Karl Rohr
Pavlo Lutsik
author_sort Yunhee Jeong
collection DOAJ
description Abstract DNA methylation (DNAm) is a key epigenetic mark that shows profound alterations in cancer. Read-level methylomes enable more in-depth analyses, due to their broad genomic coverage and preservation of rare cell-type signals, compared to summarized data such as 450K/EPIC microarrays. Here, we propose MethylBERT, a Transformer-based model for read-level methylation pattern classification. MethylBERT identifies tumour-derived sequence reads based on their methylation patterns and local genomic sequence, and estimates tumour cell fractions within bulk samples. In our evaluation, MethylBERT outperforms existing deconvolution methods and demonstrates high accuracy regardless of methylation pattern complexity, read length and read coverage. Moreover, we show its applicability to cell-type deconvolution as well as non-invasive early cancer diagnostics using liquid biopsy samples. MethylBERT represents a significant advancement in read-level methylome analysis and enables accurate tumour purity estimation. The broad applicability of MethylBERT will enhance studies on both tumour and non-cancerous bulk methylomes.
format Article
id doaj-art-cfa8ddeed7fa44d38f1b8984d80e3675
institution Kabale University
issn 2041-1723
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-cfa8ddeed7fa44d38f1b8984d80e36752025-01-19T12:29:54ZengNature PortfolioNature Communications2041-17232025-01-0116111410.1038/s41467-025-55920-zMethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based modelYunhee Jeong0Clarissa Gerhäuser1Guido Sauter2Thorsten Schlomm3Karl Rohr4Pavlo Lutsik5Division of Cancer Epigenomics, German Cancer Research Center (DKFZ)Division of Cancer Epigenomics, German Cancer Research Center (DKFZ)Institute for Pathology, University Medical Center Hamburg-EppendorfDepartment of Urology, Charité – Universitätsmedizin BerlinBiomedical Computer Vision Group, BioQuant, IPMB, Heidelberg UniversityDivision of Cancer Epigenomics, German Cancer Research Center (DKFZ)Abstract DNA methylation (DNAm) is a key epigenetic mark that shows profound alterations in cancer. Read-level methylomes enable more in-depth analyses, due to their broad genomic coverage and preservation of rare cell-type signals, compared to summarized data such as 450K/EPIC microarrays. Here, we propose MethylBERT, a Transformer-based model for read-level methylation pattern classification. MethylBERT identifies tumour-derived sequence reads based on their methylation patterns and local genomic sequence, and estimates tumour cell fractions within bulk samples. In our evaluation, MethylBERT outperforms existing deconvolution methods and demonstrates high accuracy regardless of methylation pattern complexity, read length and read coverage. Moreover, we show its applicability to cell-type deconvolution as well as non-invasive early cancer diagnostics using liquid biopsy samples. MethylBERT represents a significant advancement in read-level methylome analysis and enables accurate tumour purity estimation. The broad applicability of MethylBERT will enhance studies on both tumour and non-cancerous bulk methylomes.https://doi.org/10.1038/s41467-025-55920-z
spellingShingle Yunhee Jeong
Clarissa Gerhäuser
Guido Sauter
Thorsten Schlomm
Karl Rohr
Pavlo Lutsik
MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model
Nature Communications
title MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model
title_full MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model
title_fullStr MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model
title_full_unstemmed MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model
title_short MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model
title_sort methylbert enables read level dna methylation pattern identification and tumour deconvolution using a transformer based model
url https://doi.org/10.1038/s41467-025-55920-z
work_keys_str_mv AT yunheejeong methylbertenablesreadleveldnamethylationpatternidentificationandtumourdeconvolutionusingatransformerbasedmodel
AT clarissagerhauser methylbertenablesreadleveldnamethylationpatternidentificationandtumourdeconvolutionusingatransformerbasedmodel
AT guidosauter methylbertenablesreadleveldnamethylationpatternidentificationandtumourdeconvolutionusingatransformerbasedmodel
AT thorstenschlomm methylbertenablesreadleveldnamethylationpatternidentificationandtumourdeconvolutionusingatransformerbasedmodel
AT karlrohr methylbertenablesreadleveldnamethylationpatternidentificationandtumourdeconvolutionusingatransformerbasedmodel
AT pavlolutsik methylbertenablesreadleveldnamethylationpatternidentificationandtumourdeconvolutionusingatransformerbasedmodel