STICI: Split-Transformer with integrated convolutions for genotype imputation

Abstract Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various method...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Erfan Mowlaei, Chong Li, Oveis Jamialahmadi, Raquel Dias, Junjie Chen, Benyamin Jamialahmadi, Timothy Richard Rebbeck, Vincenzo Carnevale, Sudhir Kumar, Xinghua Shi
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-56273-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571549386801152
author Mohammad Erfan Mowlaei
Chong Li
Oveis Jamialahmadi
Raquel Dias
Junjie Chen
Benyamin Jamialahmadi
Timothy Richard Rebbeck
Vincenzo Carnevale
Sudhir Kumar
Xinghua Shi
author_facet Mohammad Erfan Mowlaei
Chong Li
Oveis Jamialahmadi
Raquel Dias
Junjie Chen
Benyamin Jamialahmadi
Timothy Richard Rebbeck
Vincenzo Carnevale
Sudhir Kumar
Xinghua Shi
author_sort Mohammad Erfan Mowlaei
collection DOAJ
description Abstract Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.
format Article
id doaj-art-dcd7ac33d3ef481a9763a66e811c2b55
institution Kabale University
issn 2041-1723
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-dcd7ac33d3ef481a9763a66e811c2b552025-02-02T12:33:29ZengNature PortfolioNature Communications2041-17232025-01-0116111410.1038/s41467-025-56273-3STICI: Split-Transformer with integrated convolutions for genotype imputationMohammad Erfan Mowlaei0Chong Li1Oveis Jamialahmadi2Raquel Dias3Junjie Chen4Benyamin Jamialahmadi5Timothy Richard Rebbeck6Vincenzo Carnevale7Sudhir Kumar8Xinghua Shi9Computer & Information Sciences, College of Science and Technology, Temple UniversityComputer & Information Sciences, College of Science and Technology, Temple UniversityDepartment of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, Wallenberg Laboratory, University of GothenburgDepartment of Microbiology and Cell Science, University of FloridaSchool of Computer Science and Technology, Harbin Institute of TechnologyDavid R. Cheriton School of Computer Science, University of WaterlooDivision of Population Sciences, Dana-Farber Cancer InstituteInstitute for Genomics and Evolutionary Medicine, Temple UniversityComputer & Information Sciences, College of Science and Technology, Temple UniversityComputer & Information Sciences, College of Science and Technology, Temple UniversityAbstract Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.https://doi.org/10.1038/s41467-025-56273-3
spellingShingle Mohammad Erfan Mowlaei
Chong Li
Oveis Jamialahmadi
Raquel Dias
Junjie Chen
Benyamin Jamialahmadi
Timothy Richard Rebbeck
Vincenzo Carnevale
Sudhir Kumar
Xinghua Shi
STICI: Split-Transformer with integrated convolutions for genotype imputation
Nature Communications
title STICI: Split-Transformer with integrated convolutions for genotype imputation
title_full STICI: Split-Transformer with integrated convolutions for genotype imputation
title_fullStr STICI: Split-Transformer with integrated convolutions for genotype imputation
title_full_unstemmed STICI: Split-Transformer with integrated convolutions for genotype imputation
title_short STICI: Split-Transformer with integrated convolutions for genotype imputation
title_sort stici split transformer with integrated convolutions for genotype imputation
url https://doi.org/10.1038/s41467-025-56273-3
work_keys_str_mv AT mohammaderfanmowlaei sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT chongli sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT oveisjamialahmadi sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT raqueldias sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT junjiechen sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT benyaminjamialahmadi sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT timothyrichardrebbeck sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT vincenzocarnevale sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT sudhirkumar sticisplittransformerwithintegratedconvolutionsforgenotypeimputation
AT xinghuashi sticisplittransformerwithintegratedconvolutionsforgenotypeimputation