$$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings$

$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings

Abstract Analyzing and comparing sequences of symbols is among the most fundamental problems in computer science, possibly even more so in bioinformatics. Maximal Common Subsequences (MCSs), i.e., inclusion-maximal sequences of non-contiguous symbols common to two or more strings, have only recently...

Full description

Saved in:

Bibliographic Details
Main Authors:	Giovanni Buzzega, Alessio Conte, Roberto Grossi, Giulia Punzi
Format:	Article
Language:	English
Published:	BMC 2025-04-01
Series:	Algorithms for Molecular Biology
Subjects:	Common subsequences Inclusion maximality Directed acyclic graph Index data structure
Online Access:	https://doi.org/10.1186/s13015-025-00271-z
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850146419627786240
author	Giovanni Buzzega Alessio Conte Roberto Grossi Giulia Punzi
author_facet	Giovanni Buzzega Alessio Conte Roberto Grossi Giulia Punzi
author_sort	Giovanni Buzzega
collection	DOAJ
description	Abstract Analyzing and comparing sequences of symbols is among the most fundamental problems in computer science, possibly even more so in bioinformatics. Maximal Common Subsequences (MCSs), i.e., inclusion-maximal sequences of non-contiguous symbols common to two or more strings, have only recently received attention in this area, despite being a basic notion and a natural generalization of more common tools like Longest Common Substrings/Subsequences. In this paper we simplify and engineer recent advancements in MCSs into a practical tool called $$\textsc {McDag}$$ M C D A G , the first publicly available tool that can index MCSs of real genomic data, and show that its definition can be generalized to multiple strings. We demonstrate that our tool can index pairs of sequences exceeding 10,000 base pairs within minutes, utilizing only 4-7% more than the minimum required nodes. For three or more sequences, we observe experimentally that the minimum index may exhibit a significant increase in the number of nodes.
format	Article
id	doaj-art-fbb0cec12c0a42f9a6e5ab9746b2a31c
institution	OA Journals
issn	1748-7188
language	English
publishDate	2025-04-01
publisher	BMC
record_format	Article
series	Algorithms for Molecular Biology
spelling	doaj-art-fbb0cec12c0a42f9a6e5ab9746b2a31c2025-08-20T02:27:50ZengBMCAlgorithms for Molecular Biology1748-71882025-04-0120112110.1186/s13015-025-00271-z$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k stringsGiovanni Buzzega0Alessio Conte1Roberto Grossi2Giulia Punzi3Dipartimento di Informatica, Università di PisaDipartimento di Informatica, Università di PisaDipartimento di Informatica, Università di PisaDipartimento di Informatica, Università di PisaAbstract Analyzing and comparing sequences of symbols is among the most fundamental problems in computer science, possibly even more so in bioinformatics. Maximal Common Subsequences (MCSs), i.e., inclusion-maximal sequences of non-contiguous symbols common to two or more strings, have only recently received attention in this area, despite being a basic notion and a natural generalization of more common tools like Longest Common Substrings/Subsequences. In this paper we simplify and engineer recent advancements in MCSs into a practical tool called $$\textsc {McDag}$$ M C D A G , the first publicly available tool that can index MCSs of real genomic data, and show that its definition can be generalized to multiple strings. We demonstrate that our tool can index pairs of sequences exceeding 10,000 base pairs within minutes, utilizing only 4-7% more than the minimum required nodes. For three or more sequences, we observe experimentally that the minimum index may exhibit a significant increase in the number of nodes.https://doi.org/10.1186/s13015-025-00271-zCommon subsequencesInclusion maximalityDirected acyclic graphIndex data structure
spellingShingle	Giovanni Buzzega Alessio Conte Roberto Grossi Giulia Punzi $$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings Algorithms for Molecular Biology Common subsequences Inclusion maximality Directed acyclic graph Index data structure
title	$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings
title_full	$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings
title_fullStr	$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings
title_full_unstemmed	$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings
title_short	$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings
title_sort	textsc mcdag m c d a g indexing maximal common subsequences for k strings
topic	Common subsequences Inclusion maximality Directed acyclic graph Index data structure
url	https://doi.org/10.1186/s13015-025-00271-z
work_keys_str_mv	AT giovannibuzzega textscmcdagmcdagindexingmaximalcommonsubsequencesforkstrings AT alessioconte textscmcdagmcdagindexingmaximalcommonsubsequencesforkstrings AT robertogrossi textscmcdagmcdagindexingmaximalcommonsubsequencesforkstrings AT giuliapunzi textscmcdagmcdagindexingmaximalcommonsubsequencesforkstrings

$$\textsc {McDag}$$ M C D A G : indexing maximal common subsequences for k strings

Similar Items