Fast Ways to Detect Outliers

The occurrence of tremendous developments in the field of data has led to the formation of huge volumes of data, and it is normal that this leads to the presence of outliers in this data for many reasons, which may have small or large values ​​compared to the rest of the normal data, and the presen...

Full description

Saved in:
Bibliographic Details
Main Authors: Emad Obaid Merza, Nashaat Jasim Mohammed
Format: Article
Language:English
Published: middle technical university 2021-03-01
Series:Journal of Techniques
Subjects:
Online Access:https://journal.mtu.edu.iq/index.php/MTU/article/view/287
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595137377599488
author Emad Obaid Merza
Nashaat Jasim Mohammed
author_facet Emad Obaid Merza
Nashaat Jasim Mohammed
author_sort Emad Obaid Merza
collection DOAJ
description The occurrence of tremendous developments in the field of data has led to the formation of huge volumes of data, and it is normal that this leads to the presence of outliers in this data for many reasons, which may have small or large values ​​compared to the rest of the normal data, and the presence of outliers in the data affects the statistical analysis of this data, so we must try to reduce its impact in various ways. On the other hand, the presence of outliers ​​may be of great benefit, for example knowledge of geological activities that precede natural disasters such as (earthquakes, forest fires, floods ... etc.). Therefore, detection of outliers is of great importance in various fields. In this research, we aim to develop easy methods for detecting outliers in big data, as the problem that this research addresses is that many of the newly developed methods for detecting outliers suffer from computational complexity or are efficient when the sample size is small. An experimental approach was used in this research by suggesting three methods for detecting outliers, the first method is based on standard deviation and was tested and compared with the normal distribution method and the z-score method. The second method depends on the maximum and minimum value of the data, and the third method depends on the range between successive data points. The results of second and third methods are compared with Hample's Test method result. The accuracy of the results is measured based on the confusion matrix. The results of the proposed methods test showed the conformity of the first method with the results of the normal distribution method and the Z-Score method, as well as the superiority of the third method over the Hample's test method. In this paper, it was concluded that the Hample's test method suffers from a serious weakness when the zero values in the data constitute more than 50% of the number of elements.
format Article
id doaj-art-9f1320c93e264365a93e72125f446b0a
institution Kabale University
issn 1818-653X
2708-8383
language English
publishDate 2021-03-01
publisher middle technical university
record_format Article
series Journal of Techniques
spelling doaj-art-9f1320c93e264365a93e72125f446b0a2025-01-19T11:09:03Zengmiddle technical universityJournal of Techniques1818-653X2708-83832021-03-013110.51173/jt.v3i1.287Fast Ways to Detect OutliersEmad Obaid Merza0Nashaat Jasim Mohammed1Information Technology Department, Technical College of Management-Baghdad Middle Technical University, Baghdad, Iraq.Information Technology Department, Technical College of Management-Baghdad Middle Technical University, Baghdad, Iraq. The occurrence of tremendous developments in the field of data has led to the formation of huge volumes of data, and it is normal that this leads to the presence of outliers in this data for many reasons, which may have small or large values ​​compared to the rest of the normal data, and the presence of outliers in the data affects the statistical analysis of this data, so we must try to reduce its impact in various ways. On the other hand, the presence of outliers ​​may be of great benefit, for example knowledge of geological activities that precede natural disasters such as (earthquakes, forest fires, floods ... etc.). Therefore, detection of outliers is of great importance in various fields. In this research, we aim to develop easy methods for detecting outliers in big data, as the problem that this research addresses is that many of the newly developed methods for detecting outliers suffer from computational complexity or are efficient when the sample size is small. An experimental approach was used in this research by suggesting three methods for detecting outliers, the first method is based on standard deviation and was tested and compared with the normal distribution method and the z-score method. The second method depends on the maximum and minimum value of the data, and the third method depends on the range between successive data points. The results of second and third methods are compared with Hample's Test method result. The accuracy of the results is measured based on the confusion matrix. The results of the proposed methods test showed the conformity of the first method with the results of the normal distribution method and the Z-Score method, as well as the superiority of the third method over the Hample's test method. In this paper, it was concluded that the Hample's test method suffers from a serious weakness when the zero values in the data constitute more than 50% of the number of elements. https://journal.mtu.edu.iq/index.php/MTU/article/view/287outlieroutlier detectionbig datanormal distributionZ-ScoreHample's test
spellingShingle Emad Obaid Merza
Nashaat Jasim Mohammed
Fast Ways to Detect Outliers
Journal of Techniques
outlier
outlier detection
big data
normal distribution
Z-Score
Hample's test
title Fast Ways to Detect Outliers
title_full Fast Ways to Detect Outliers
title_fullStr Fast Ways to Detect Outliers
title_full_unstemmed Fast Ways to Detect Outliers
title_short Fast Ways to Detect Outliers
title_sort fast ways to detect outliers
topic outlier
outlier detection
big data
normal distribution
Z-Score
Hample's test
url https://journal.mtu.edu.iq/index.php/MTU/article/view/287
work_keys_str_mv AT emadobaidmerza fastwaystodetectoutliers
AT nashaatjasimmohammed fastwaystodetectoutliers