JET: Fast Estimation of Hierarchical Time Series Clustering

Clustering is an effective, unsupervised classification approach for time series analysis applications that suffer a natural lack of training data. One such application is the development of jet engines, which involves numerous test runs and failure detection processes. While effective data mining a...

Full description

Saved in:
Bibliographic Details
Main Authors: Phillip Wenig, Mathias Höfgen, Thorsten Papenbrock
Format: Article
Language:English
Published: MDPI AG 2024-07-01
Series:Engineering Proceedings
Subjects:
Online Access:https://www.mdpi.com/2673-4591/68/1/37
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering is an effective, unsupervised classification approach for time series analysis applications that suffer a natural lack of training data. One such application is the development of jet engines, which involves numerous test runs and failure detection processes. While effective data mining algorithms exist for the detection of anomalous and structurally conspicuous test recordings, these algorithms do not perform any semantic labeling. So, data analysts spend many hours connecting the large amounts of automatically extracted observations to their underlying root causes. The complexity, number, and variety of extracted time series make this task hard not only for humans, but also for existing time series clustering algorithms. These algorithms either require training data for supervised learning, cannot deal with varying time series lengths, or suffer from exceptionally long runtimes. In this paper, we propose JET, an unsupervised, highly efficient clustering algorithm for large numbers of variable-lengths time series. The main idea is to transform the input time series into a metric space, then apply a very fast conventional clustering algorithm to obtain effective but rather coarse-grained pre-clustering of the data; this pre-clustering serves to subsequently estimate the more accurate but also more costly shape-based distances of the time series and, thus, enables JET to apply a highly effective hierarchical clustering algorithm to the entire input time series collection. Our experiments demonstrate that JET is highly accurate and much faster than its competitors.
ISSN:2673-4591