Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams

Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect d...

Full description

Saved in:
Bibliographic Details
Main Authors: Samira Khodabandehlou, Alireza Hashemi Golpayegani
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Engineering Science and Technology, an International Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215098625001740
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849252249802375168
author Samira Khodabandehlou
Alireza Hashemi Golpayegani
author_facet Samira Khodabandehlou
Alireza Hashemi Golpayegani
author_sort Samira Khodabandehlou
collection DOAJ
description Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.
format Article
id doaj-art-c35e955e25204e20b6764afa3f384b2c
institution Kabale University
issn 2215-0986
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series Engineering Science and Technology, an International Journal
spelling doaj-art-c35e955e25204e20b6764afa3f384b2c2025-08-20T03:56:41ZengElsevierEngineering Science and Technology, an International Journal2215-09862025-09-016910211910.1016/j.jestch.2025.102119Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streamsSamira Khodabandehlou0Alireza Hashemi Golpayegani1Department of Computer Engineering, Hamedan University of Technology, Hamedan, Iran; Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, IranAPA Research Center & Department of Information Technology Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran; Corresponding author.Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.http://www.sciencedirect.com/science/article/pii/S2215098625001740Real-time anomaly detectionMulti-aspect dataLocality-sensitive hashingUnsupervised learningStream miningMarket manipulation detection
spellingShingle Samira Khodabandehlou
Alireza Hashemi Golpayegani
Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams
Engineering Science and Technology, an International Journal
Real-time anomaly detection
Multi-aspect data
Locality-sensitive hashing
Unsupervised learning
Stream mining
Market manipulation detection
title Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams
title_full Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams
title_fullStr Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams
title_full_unstemmed Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams
title_short Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams
title_sort novel metrics and lsh algorithms for unsupervised real time anomaly detection in multi aspect data streams
topic Real-time anomaly detection
Multi-aspect data
Locality-sensitive hashing
Unsupervised learning
Stream mining
Market manipulation detection
url http://www.sciencedirect.com/science/article/pii/S2215098625001740
work_keys_str_mv AT samirakhodabandehlou novelmetricsandlshalgorithmsforunsupervisedrealtimeanomalydetectioninmultiaspectdatastreams
AT alirezahashemigolpayegani novelmetricsandlshalgorithmsforunsupervisedrealtimeanomalydetectioninmultiaspectdatastreams