The Inconsistency of the Algorithms of Jaro–Winkler and Needleman–Wunsch Applied to DNA Chain Similarity Results

There are many different algorithms for calculating the distances between DNA chains. Different algorithms for determining such distances give different results. This paper does not consider issues related to which of the classical algorithms is better, but shows the inconsistency of two classical a...

Full description

Saved in:
Bibliographic Details
Main Author: Boris Melnikov
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/2/263
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:There are many different algorithms for calculating the distances between DNA chains. Different algorithms for determining such distances give different results. This paper does not consider issues related to which of the classical algorithms is better, but shows the inconsistency of two classical algorithms, specifically the algorithms of Jaro–Winkler and Needleman–Wunsch. To do this, we consider distance matrices based on both of these algorithms. We explain that, ideally, the triangles formed by the distance matrix corresponding to each triple of distances should be acute-angled isosceles. Of course, in reality, this fact is violated, and we can determine the badness for each such triangle. In this case, the two algorithms for determining distances will be consistent. In the case where such sequences of badness are located in the same order for them, and the greater the difference from this order, the less they are consistent. In this paper, we consider the distance matrices for the two mentioned algorithms, calculated for the mitochondrial DNA of 32 species of monkeys belonging to different genera. For them, 4960 triangles are formed in both matrices, and we calculate the values of the rank correlation between these sequences. We obtain very small results for these values (with different methods of calculating the rank correlation, it does not exceed the value 0.14), which indicates the inconsistency of the two algorithms under consideration.
ISSN:2227-7390