The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity Results

There are many different algorithms for calculating the distances between DNA chains. Different algorithms for determining such distances give different results. This paper does not consider issues related to which of the classical algorithms is better, but shows the inconsistency of two classical a...

Full description

Saved in:
Bibliographic Details
Main Author: Boris Melnikov
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/2/263
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588042597040128
author Boris Melnikov
author_facet Boris Melnikov
author_sort Boris Melnikov
collection DOAJ
description There are many different algorithms for calculating the distances between DNA chains. Different algorithms for determining such distances give different results. This paper does not consider issues related to which of the classical algorithms is better, but shows the inconsistency of two classical algorithms, specifically the algorithms of Jaro–Winkler and Needleman–Wunsch. To do this, we consider distance matrices based on both of these algorithms. We explain that, ideally, the triangles formed by the distance matrix corresponding to each triple of distances should be acute-angled isosceles. Of course, in reality, this fact is violated, and we can determine the badness for each such triangle. In this case, the two algorithms for determining distances will be consistent. In the case where such sequences of badness are located in the same order for them, and the greater the difference from this order, the less they are consistent. In this paper, we consider the distance matrices for the two mentioned algorithms, calculated for the mitochondrial DNA of 32 species of monkeys belonging to different genera. For them, 4960 triangles are formed in both matrices, and we calculate the values of the rank correlation between these sequences. We obtain very small results for these values (with different methods of calculating the rank correlation, it does not exceed the value 0.14), which indicates the inconsistency of the two algorithms under consideration.
format Article
id doaj-art-19c525c40d1d4651af08468313281dbc
institution Kabale University
issn 2227-7390
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-19c525c40d1d4651af08468313281dbc2025-01-24T13:39:56ZengMDPI AGMathematics2227-73902025-01-0113226310.3390/math13020263The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity ResultsBoris Melnikov0Department of Computational Mathematics and Cybernetics, Shenzhen MSU–BIT University, 1 International University Park Road, Dayun New Town, Longgang District, Shenzhen 518172, ChinaThere are many different algorithms for calculating the distances between DNA chains. Different algorithms for determining such distances give different results. This paper does not consider issues related to which of the classical algorithms is better, but shows the inconsistency of two classical algorithms, specifically the algorithms of Jaro–Winkler and Needleman–Wunsch. To do this, we consider distance matrices based on both of these algorithms. We explain that, ideally, the triangles formed by the distance matrix corresponding to each triple of distances should be acute-angled isosceles. Of course, in reality, this fact is violated, and we can determine the badness for each such triangle. In this case, the two algorithms for determining distances will be consistent. In the case where such sequences of badness are located in the same order for them, and the greater the difference from this order, the less they are consistent. In this paper, we consider the distance matrices for the two mentioned algorithms, calculated for the mitochondrial DNA of 32 species of monkeys belonging to different genera. For them, 4960 triangles are formed in both matrices, and we calculate the values of the rank correlation between these sequences. We obtain very small results for these values (with different methods of calculating the rank correlation, it does not exceed the value 0.14), which indicates the inconsistency of the two algorithms under consideration.https://www.mdpi.com/2227-7390/13/2/263heuristic algorithmsDNA chainsdistance matrixJaro–Winkler algorithmNeedleman–Wunsch algorithmpair correlation
spellingShingle Boris Melnikov
The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity Results
Mathematics
heuristic algorithms
DNA chains
distance matrix
Jaro–Winkler algorithm
Needleman–Wunsch algorithm
pair correlation
title The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity Results
title_full The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity Results
title_fullStr The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity Results
title_full_unstemmed The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity Results
title_short The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity Results
title_sort inconsistency of the algorithms of jaro winkler and needleman wunsch applied to dna chain similarity results
topic heuristic algorithms
DNA chains
distance matrix
Jaro–Winkler algorithm
Needleman–Wunsch algorithm
pair correlation
url https://www.mdpi.com/2227-7390/13/2/263
work_keys_str_mv AT borismelnikov theinconsistencyofthealgorithmsofjarowinklerandneedlemanwunschappliedtodnachainsimilarityresults
AT borismelnikov inconsistencyofthealgorithmsofjarowinklerandneedlemanwunschappliedtodnachainsimilarityresults