X-Mapper: fast and accurate sequence alignment via gapped x-mers

Abstract Sequence alignment is foundational to many bioinformatic analyses. Many aligners start by splitting sequences into contiguous, fixed-length seeds, called k-mers. Alignment is faster with longer, unique seeds, but more accurate with shorter seeds avoiding mutations. Here, we introduce X-Mapp...

Full description

Saved in:
Bibliographic Details
Main Authors: Jeffry M. Gaston, Eric J. Alm, An-Ni Zhang
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-024-03473-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571634454626304
author Jeffry M. Gaston
Eric J. Alm
An-Ni Zhang
author_facet Jeffry M. Gaston
Eric J. Alm
An-Ni Zhang
author_sort Jeffry M. Gaston
collection DOAJ
description Abstract Sequence alignment is foundational to many bioinformatic analyses. Many aligners start by splitting sequences into contiguous, fixed-length seeds, called k-mers. Alignment is faster with longer, unique seeds, but more accurate with shorter seeds avoiding mutations. Here, we introduce X-Mapper, aiming to offer high speed and accuracy via dynamic-length seeds containing gaps, called gapped x-mers. We observe 11–24-fold fewer suboptimal alignments analyzing a human reference and 3–579-fold lower inconsistency across bacterial references than other aligners, improving on 53% and 30% of reads aligned to non-target strains and species, respectively. Other seed-based analysis algorithms might benefit from gapped x-mers too.
format Article
id doaj-art-1800fe219b634d5c96529b77404f556b
institution Kabale University
issn 1474-760X
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj-art-1800fe219b634d5c96529b77404f556b2025-02-02T12:27:06ZengBMCGenome Biology1474-760X2025-01-0126112710.1186/s13059-024-03473-7X-Mapper: fast and accurate sequence alignment via gapped x-mersJeffry M. Gaston0Eric J. Alm1An-Ni Zhang2GoogleDepartment of Biological Engineering, Massachusetts Institute of TechnologyDepartment of Biological Engineering, Massachusetts Institute of TechnologyAbstract Sequence alignment is foundational to many bioinformatic analyses. Many aligners start by splitting sequences into contiguous, fixed-length seeds, called k-mers. Alignment is faster with longer, unique seeds, but more accurate with shorter seeds avoiding mutations. Here, we introduce X-Mapper, aiming to offer high speed and accuracy via dynamic-length seeds containing gaps, called gapped x-mers. We observe 11–24-fold fewer suboptimal alignments analyzing a human reference and 3–579-fold lower inconsistency across bacterial references than other aligners, improving on 53% and 30% of reads aligned to non-target strains and species, respectively. Other seed-based analysis algorithms might benefit from gapped x-mers too.https://doi.org/10.1186/s13059-024-03473-7BioinformaticsSequence alignment algorithmsK-merMicrobial sequencing
spellingShingle Jeffry M. Gaston
Eric J. Alm
An-Ni Zhang
X-Mapper: fast and accurate sequence alignment via gapped x-mers
Genome Biology
Bioinformatics
Sequence alignment algorithms
K-mer
Microbial sequencing
title X-Mapper: fast and accurate sequence alignment via gapped x-mers
title_full X-Mapper: fast and accurate sequence alignment via gapped x-mers
title_fullStr X-Mapper: fast and accurate sequence alignment via gapped x-mers
title_full_unstemmed X-Mapper: fast and accurate sequence alignment via gapped x-mers
title_short X-Mapper: fast and accurate sequence alignment via gapped x-mers
title_sort x mapper fast and accurate sequence alignment via gapped x mers
topic Bioinformatics
Sequence alignment algorithms
K-mer
Microbial sequencing
url https://doi.org/10.1186/s13059-024-03473-7
work_keys_str_mv AT jeffrymgaston xmapperfastandaccuratesequencealignmentviagappedxmers
AT ericjalm xmapperfastandaccuratesequencealignmentviagappedxmers
AT annizhang xmapperfastandaccuratesequencealignmentviagappedxmers