Application of alternative <i>de novo</i> motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites

The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models,...

Full description

Saved in:
Bibliographic Details
Main Authors: A. V. Tsukanov, V. G. Levitsky, T. I. Merkulova
Format: Article
Language:English
Published: Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders 2021-03-01
Series:Вавиловский журнал генетики и селекции
Subjects:
Online Access:https://vavilov.elpub.ru/jour/article/view/2911
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575110911885312
author A. V. Tsukanov
V. G. Levitsky
T. I. Merkulova
author_facet A. V. Tsukanov
V. G. Levitsky
T. I. Merkulova
author_sort A. V. Tsukanov
collection DOAJ
description The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classif ication based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a signif icant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.
format Article
id doaj-art-aa4bdd50d2e24290a92e4795c73400f0
institution Kabale University
issn 2500-3259
language English
publishDate 2021-03-01
publisher Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders
record_format Article
series Вавиловский журнал генетики и селекции
spelling doaj-art-aa4bdd50d2e24290a92e4795c73400f02025-02-01T09:58:09ZengSiberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and BreedersВавиловский журнал генетики и селекции2500-32592021-03-0125171710.18699/VJ21.0021127Application of alternative <i>de novo</i> motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sitesA. V. Tsukanov0V. G. Levitsky1T. I. Merkulova2Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of SciencesInstitute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences; Novosibirsk State UniversityInstitute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences; Novosibirsk State UniversityThe most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classif ication based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a signif icant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.https://vavilov.elpub.ru/jour/article/view/2911transcription factor binding sites (tfbs)tfbs <i>de novo</i> searchingchip-seqheterogeneity of tfbs
spellingShingle A. V. Tsukanov
V. G. Levitsky
T. I. Merkulova
Application of alternative <i>de novo</i> motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
Вавиловский журнал генетики и селекции
transcription factor binding sites (tfbs)
tfbs <i>de novo</i> searching
chip-seq
heterogeneity of tfbs
title Application of alternative <i>de novo</i> motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_full Application of alternative <i>de novo</i> motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_fullStr Application of alternative <i>de novo</i> motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_full_unstemmed Application of alternative <i>de novo</i> motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_short Application of alternative <i>de novo</i> motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_sort application of alternative i de novo i motif recognition models for analysis of structural heterogeneity of transcription factor binding sites a case study of foxa2 binding sites
topic transcription factor binding sites (tfbs)
tfbs <i>de novo</i> searching
chip-seq
heterogeneity of tfbs
url https://vavilov.elpub.ru/jour/article/view/2911
work_keys_str_mv AT avtsukanov applicationofalternativeidenovoimotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites
AT vglevitsky applicationofalternativeidenovoimotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites
AT timerkulova applicationofalternativeidenovoimotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites