Hamming Diversification Index: A New Clustering-Based Metric to Understand and Visualize Time Evolution of Patterns in Multi-Dimensional Datasets

One of the most challenging problems in data analysis is visualizing patterns and extracting insights from multi-dimensional datasets that vary over time. The complexity of data and variations in the correlations between different features adds further difficulty to the analysis. In this paper, we p...

Full description

Saved in:
Bibliographic Details
Main Authors: Sarthak Pattnaik, Eugene Pinsky
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/14/7760
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:One of the most challenging problems in data analysis is visualizing patterns and extracting insights from multi-dimensional datasets that vary over time. The complexity of data and variations in the correlations between different features adds further difficulty to the analysis. In this paper, we provide a framework to analyze the temporal dynamics of such datasets. We use machine learning clustering techniques and examine the time evolution of data patterns by constructing the corresponding cluster trajectories. These trajectories allow us to visualize the patterns and the changing nature of correlations over time. The similarity and correlations of features are reflected in common cluster membership, whereas the historical dynamics are described by a trajectory in the corresponding (cluster, time) space. This allows an effective visualization of multi-dimensional data over time. We introduce several statistical metrics to measure duration, volatility, and inertia of changes in patterns. Using the Hamming distance of trajectories over multiple time periods, we propose a novel metric, the Hamming diversification index, to measure the spread between trajectories. The novel metric is easy to compute, has a simple machine learning implementation, and provides additional insights into the temporal dynamics of data. This parsimonious diversification index can be used to examine changes in pattern similarities over aggregated time periods. We demonstrate the efficacy of our approach by analyzing a complex multi-year dataset of multiple worldwide economic indicators.
ISSN:2076-3417