The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.

Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmed Daoud, Asa Ben-Hur
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1012755
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832540356364730368
author Ahmed Daoud
Asa Ben-Hur
author_facet Ahmed Daoud
Asa Ben-Hur
author_sort Ahmed Daoud
collection DOAJ
description Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources. We argue that these models are the equivalent of foundation models in natural language processing in their utility, as they encode within them chromatin state in its different aspects, providing useful representations that allow quick deployment of accurate models of gene regulation. We demonstrate this premise by leveraging the recently created Sei model to develop simple, interpretable models of intron retention, and demonstrate their advantage over models based on the DNA language model DNABERT-2. Our work also demonstrates the impact of chromatin state on the regulation of intron retention. Using representations learned by Sei, our model is able to discover the involvement of transcription factors and chromatin marks in regulating intron retention, providing better accuracy than a recently published custom model developed for this purpose.
format Article
id doaj-art-dff7467de0a846f4ab459de973091ba8
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-dff7467de0a846f4ab459de973091ba82025-02-05T05:30:38ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-01-01211e101275510.1371/journal.pcbi.1012755The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.Ahmed DaoudAsa Ben-HurComplex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources. We argue that these models are the equivalent of foundation models in natural language processing in their utility, as they encode within them chromatin state in its different aspects, providing useful representations that allow quick deployment of accurate models of gene regulation. We demonstrate this premise by leveraging the recently created Sei model to develop simple, interpretable models of intron retention, and demonstrate their advantage over models based on the DNA language model DNABERT-2. Our work also demonstrates the impact of chromatin state on the regulation of intron retention. Using representations learned by Sei, our model is able to discover the involvement of transcription factors and chromatin marks in regulating intron retention, providing better accuracy than a recently published custom model developed for this purpose.https://doi.org/10.1371/journal.pcbi.1012755
spellingShingle Ahmed Daoud
Asa Ben-Hur
The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
PLoS Computational Biology
title The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_full The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_fullStr The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_full_unstemmed The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_short The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_sort role of chromatin state in intron retention a case study in leveraging large scale deep learning models
url https://doi.org/10.1371/journal.pcbi.1012755
work_keys_str_mv AT ahmeddaoud theroleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels
AT asabenhur theroleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels
AT ahmeddaoud roleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels
AT asabenhur roleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels