Fast Ways to Detect Outliers

The occurrence of tremendous developments in the field of data has led to the formation of huge volumes of data, and it is normal that this leads to the presence of outliers in this data for many reasons, which may have small or large values compared to the rest of the normal data, and the presen...

Full description

Saved in:

Bibliographic Details
Main Authors:	Emad Obaid Merza, Nashaat Jasim Mohammed
Format:	Article
Language:	English
Published:	middle technical university 2021-03-01
Series:	Journal of Techniques
Subjects:	outlier outlier detection big data normal distribution Z-Score Hample's test
Online Access:	https://journal.mtu.edu.iq/index.php/MTU/article/view/287
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832595137377599488
author	Emad Obaid Merza Nashaat Jasim Mohammed
author_facet	Emad Obaid Merza Nashaat Jasim Mohammed
author_sort	Emad Obaid Merza
collection	DOAJ
description	The occurrence of tremendous developments in the field of data has led to the formation of huge volumes of data, and it is normal that this leads to the presence of outliers in this data for many reasons, which may have small or large values compared to the rest of the normal data, and the presence of outliers in the data affects the statistical analysis of this data, so we must try to reduce its impact in various ways. On the other hand, the presence of outliers may be of great benefit, for example knowledge of geological activities that precede natural disasters such as (earthquakes, forest fires, floods ... etc.). Therefore, detection of outliers is of great importance in various fields. In this research, we aim to develop easy methods for detecting outliers in big data, as the problem that this research addresses is that many of the newly developed methods for detecting outliers suffer from computational complexity or are efficient when the sample size is small. An experimental approach was used in this research by suggesting three methods for detecting outliers, the first method is based on standard deviation and was tested and compared with the normal distribution method and the z-score method. The second method depends on the maximum and minimum value of the data, and the third method depends on the range between successive data points. The results of second and third methods are compared with Hample's Test method result. The accuracy of the results is measured based on the confusion matrix. The results of the proposed methods test showed the conformity of the first method with the results of the normal distribution method and the Z-Score method, as well as the superiority of the third method over the Hample's test method. In this paper, it was concluded that the Hample's test method suffers from a serious weakness when the zero values in the data constitute more than 50% of the number of elements.
format	Article
id	doaj-art-9f1320c93e264365a93e72125f446b0a
institution	Kabale University
issn	1818-653X 2708-8383
language	English
publishDate	2021-03-01
publisher	middle technical university
record_format	Article
series	Journal of Techniques
spelling	doaj-art-9f1320c93e264365a93e72125f446b0a2025-01-19T11:09:03Zengmiddle technical universityJournal of Techniques1818-653X2708-83832021-03-013110.51173/jt.v3i1.287Fast Ways to Detect OutliersEmad Obaid Merza0Nashaat Jasim Mohammed1Information Technology Department, Technical College of Management-Baghdad Middle Technical University, Baghdad, Iraq.Information Technology Department, Technical College of Management-Baghdad Middle Technical University, Baghdad, Iraq. The occurrence of tremendous developments in the field of data has led to the formation of huge volumes of data, and it is normal that this leads to the presence of outliers in this data for many reasons, which may have small or large values compared to the rest of the normal data, and the presence of outliers in the data affects the statistical analysis of this data, so we must try to reduce its impact in various ways. On the other hand, the presence of outliers may be of great benefit, for example knowledge of geological activities that precede natural disasters such as (earthquakes, forest fires, floods ... etc.). Therefore, detection of outliers is of great importance in various fields. In this research, we aim to develop easy methods for detecting outliers in big data, as the problem that this research addresses is that many of the newly developed methods for detecting outliers suffer from computational complexity or are efficient when the sample size is small. An experimental approach was used in this research by suggesting three methods for detecting outliers, the first method is based on standard deviation and was tested and compared with the normal distribution method and the z-score method. The second method depends on the maximum and minimum value of the data, and the third method depends on the range between successive data points. The results of second and third methods are compared with Hample's Test method result. The accuracy of the results is measured based on the confusion matrix. The results of the proposed methods test showed the conformity of the first method with the results of the normal distribution method and the Z-Score method, as well as the superiority of the third method over the Hample's test method. In this paper, it was concluded that the Hample's test method suffers from a serious weakness when the zero values in the data constitute more than 50% of the number of elements. https://journal.mtu.edu.iq/index.php/MTU/article/view/287outlieroutlier detectionbig datanormal distributionZ-ScoreHample's test
spellingShingle	Emad Obaid Merza Nashaat Jasim Mohammed Fast Ways to Detect Outliers Journal of Techniques outlier outlier detection big data normal distribution Z-Score Hample's test
title	Fast Ways to Detect Outliers
title_full	Fast Ways to Detect Outliers
title_fullStr	Fast Ways to Detect Outliers
title_full_unstemmed	Fast Ways to Detect Outliers
title_short	Fast Ways to Detect Outliers
title_sort	fast ways to detect outliers
topic	outlier outlier detection big data normal distribution Z-Score Hample's test
url	https://journal.mtu.edu.iq/index.php/MTU/article/view/287
work_keys_str_mv	AT emadobaidmerza fastwaystodetectoutliers AT nashaatjasimmohammed fastwaystodetectoutliers

Fast Ways to Detect Outliers

Similar Items