The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.

Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ahmed Daoud, Asa Ben-Hur
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS Computational Biology
Online Access:	https://doi.org/10.1371/journal.pcbi.1012755
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832540356364730368
author	Ahmed Daoud Asa Ben-Hur
author_facet	Ahmed Daoud Asa Ben-Hur
author_sort	Ahmed Daoud
collection	DOAJ
description	Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources. We argue that these models are the equivalent of foundation models in natural language processing in their utility, as they encode within them chromatin state in its different aspects, providing useful representations that allow quick deployment of accurate models of gene regulation. We demonstrate this premise by leveraging the recently created Sei model to develop simple, interpretable models of intron retention, and demonstrate their advantage over models based on the DNA language model DNABERT-2. Our work also demonstrates the impact of chromatin state on the regulation of intron retention. Using representations learned by Sei, our model is able to discover the involvement of transcription factors and chromatin marks in regulating intron retention, providing better accuracy than a recently published custom model developed for this purpose.
format	Article
id	doaj-art-dff7467de0a846f4ab459de973091ba8
institution	Kabale University
issn	1553-734X 1553-7358
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Computational Biology
spelling	doaj-art-dff7467de0a846f4ab459de973091ba82025-02-05T05:30:38ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-01-01211e101275510.1371/journal.pcbi.1012755The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.Ahmed DaoudAsa Ben-HurComplex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources. We argue that these models are the equivalent of foundation models in natural language processing in their utility, as they encode within them chromatin state in its different aspects, providing useful representations that allow quick deployment of accurate models of gene regulation. We demonstrate this premise by leveraging the recently created Sei model to develop simple, interpretable models of intron retention, and demonstrate their advantage over models based on the DNA language model DNABERT-2. Our work also demonstrates the impact of chromatin state on the regulation of intron retention. Using representations learned by Sei, our model is able to discover the involvement of transcription factors and chromatin marks in regulating intron retention, providing better accuracy than a recently published custom model developed for this purpose.https://doi.org/10.1371/journal.pcbi.1012755
spellingShingle	Ahmed Daoud Asa Ben-Hur The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models. PLoS Computational Biology
title	The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_full	The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_fullStr	The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_full_unstemmed	The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_short	The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
title_sort	role of chromatin state in intron retention a case study in leveraging large scale deep learning models
url	https://doi.org/10.1371/journal.pcbi.1012755
work_keys_str_mv	AT ahmeddaoud theroleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels AT asabenhur theroleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels AT ahmeddaoud roleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels AT asabenhur roleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels

The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.

Similar Items