The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.
Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2025-01-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1012755 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832540356364730368 |
---|---|
author | Ahmed Daoud Asa Ben-Hur |
author_facet | Ahmed Daoud Asa Ben-Hur |
author_sort | Ahmed Daoud |
collection | DOAJ |
description | Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources. We argue that these models are the equivalent of foundation models in natural language processing in their utility, as they encode within them chromatin state in its different aspects, providing useful representations that allow quick deployment of accurate models of gene regulation. We demonstrate this premise by leveraging the recently created Sei model to develop simple, interpretable models of intron retention, and demonstrate their advantage over models based on the DNA language model DNABERT-2. Our work also demonstrates the impact of chromatin state on the regulation of intron retention. Using representations learned by Sei, our model is able to discover the involvement of transcription factors and chromatin marks in regulating intron retention, providing better accuracy than a recently published custom model developed for this purpose. |
format | Article |
id | doaj-art-dff7467de0a846f4ab459de973091ba8 |
institution | Kabale University |
issn | 1553-734X 1553-7358 |
language | English |
publishDate | 2025-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Computational Biology |
spelling | doaj-art-dff7467de0a846f4ab459de973091ba82025-02-05T05:30:38ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-01-01211e101275510.1371/journal.pcbi.1012755The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.Ahmed DaoudAsa Ben-HurComplex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources. We argue that these models are the equivalent of foundation models in natural language processing in their utility, as they encode within them chromatin state in its different aspects, providing useful representations that allow quick deployment of accurate models of gene regulation. We demonstrate this premise by leveraging the recently created Sei model to develop simple, interpretable models of intron retention, and demonstrate their advantage over models based on the DNA language model DNABERT-2. Our work also demonstrates the impact of chromatin state on the regulation of intron retention. Using representations learned by Sei, our model is able to discover the involvement of transcription factors and chromatin marks in regulating intron retention, providing better accuracy than a recently published custom model developed for this purpose.https://doi.org/10.1371/journal.pcbi.1012755 |
spellingShingle | Ahmed Daoud Asa Ben-Hur The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models. PLoS Computational Biology |
title | The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models. |
title_full | The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models. |
title_fullStr | The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models. |
title_full_unstemmed | The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models. |
title_short | The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models. |
title_sort | role of chromatin state in intron retention a case study in leveraging large scale deep learning models |
url | https://doi.org/10.1371/journal.pcbi.1012755 |
work_keys_str_mv | AT ahmeddaoud theroleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels AT asabenhur theroleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels AT ahmeddaoud roleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels AT asabenhur roleofchromatinstateinintronretentionacasestudyinleveraginglargescaledeeplearningmodels |