Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage

Data temperature is a response to the ever-growing amount of data. These data have to be stored, but they have been observed that only a small portion of the data are accessed more frequently at any one time. This leads to the concept of hot and cold data. Cold data can be migrated away from high-pe...

Full description

Saved in:

Bibliographic Details
Main Authors:	Dominic Davies-Tagg, Ashiq Anjum, Ali Zahir, Lu Liu, Muhammad Usman Yaseen, Nick Antonopoulos
Format:	Article
Language:	English
Published:	Tsinghua University Press 2024-06-01
Series:	Big Data Mining and Analytics
Subjects:	data temperature hot and cold data multi-tiered storage metadata variable multi-temperature system
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2023.9020039
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832544925388898304
author	Dominic Davies-Tagg Ashiq Anjum Ali Zahir Lu Liu Muhammad Usman Yaseen Nick Antonopoulos
author_facet	Dominic Davies-Tagg Ashiq Anjum Ali Zahir Lu Liu Muhammad Usman Yaseen Nick Antonopoulos
author_sort	Dominic Davies-Tagg
collection	DOAJ
description	Data temperature is a response to the ever-growing amount of data. These data have to be stored, but they have been observed that only a small portion of the data are accessed more frequently at any one time. This leads to the concept of hot and cold data. Cold data can be migrated away from high-performance nodes to free up performance for higher priority data. Existing studies classify hot and cold data primarily on the basis of data age and usage frequency. We present this as a limitation in the current implementation of data temperature. This is due to the fact that age automatically assumes that all new data have priority and that usage is purely reactive. We propose new variables and conditions that influence smarter decision-making on what are hot or cold data and allow greater user control over data location and their movement. We identify new metadata variables and user-defined variables to extend the current data temperature value. We further establish rules and conditions for limiting unnecessary movement of the data, which helps to prevent wasted input output (I/O) costs. We also propose a hybrid algorithm that combines existing variables and new variables and conditions into a single data temperature. The proposed system provides higher accuracy, increases performance, and gives greater user control for optimal positioning of data within multi-tiered storage solutions.
format	Article
id	doaj-art-e7ef7f431ccd4ec7a4720b8d6425e0a2
institution	Kabale University
issn	2096-0654
language	English
publishDate	2024-06-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj-art-e7ef7f431ccd4ec7a4720b8d6425e0a22025-02-03T09:01:25ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-06-017237139810.26599/BDMA.2023.9020039Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered StorageDominic Davies-Tagg0Ashiq Anjum1Ali Zahir2Lu Liu3Muhammad Usman Yaseen4Nick Antonopoulos5Department of Computing, University of Derby, Derby, DE22 1GB, UKDepartment of Informatics, University of Leicester, Leicester, LE1 7RH, UKDepartment of Informatics, University of Leicester, Leicester, LE1 7RH, UKDepartment of Informatics, University of Leicester, Leicester, LE1 7RH, UKDepartment of Computer Science, COMSATS University Islamabad, Islamabad 45550, PakistanEdinburgh Napier University, Edinburgh, EH11 4BN, UKData temperature is a response to the ever-growing amount of data. These data have to be stored, but they have been observed that only a small portion of the data are accessed more frequently at any one time. This leads to the concept of hot and cold data. Cold data can be migrated away from high-performance nodes to free up performance for higher priority data. Existing studies classify hot and cold data primarily on the basis of data age and usage frequency. We present this as a limitation in the current implementation of data temperature. This is due to the fact that age automatically assumes that all new data have priority and that usage is purely reactive. We propose new variables and conditions that influence smarter decision-making on what are hot or cold data and allow greater user control over data location and their movement. We identify new metadata variables and user-defined variables to extend the current data temperature value. We further establish rules and conditions for limiting unnecessary movement of the data, which helps to prevent wasted input output (I/O) costs. We also propose a hybrid algorithm that combines existing variables and new variables and conditions into a single data temperature. The proposed system provides higher accuracy, increases performance, and gives greater user control for optimal positioning of data within multi-tiered storage solutions.https://www.sciopen.com/article/10.26599/BDMA.2023.9020039data temperaturehot and cold datamulti-tiered storagemetadata variablemulti-temperature system
spellingShingle	Dominic Davies-Tagg Ashiq Anjum Ali Zahir Lu Liu Muhammad Usman Yaseen Nick Antonopoulos Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage Big Data Mining and Analytics data temperature hot and cold data multi-tiered storage metadata variable multi-temperature system
title	Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage
title_full	Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage
title_fullStr	Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage
title_full_unstemmed	Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage
title_short	Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage
title_sort	data temperature informed streaming for optimising large scale multi tiered storage
topic	data temperature hot and cold data multi-tiered storage metadata variable multi-temperature system
url	https://www.sciopen.com/article/10.26599/BDMA.2023.9020039
work_keys_str_mv	AT dominicdaviestagg datatemperatureinformedstreamingforoptimisinglargescalemultitieredstorage AT ashiqanjum datatemperatureinformedstreamingforoptimisinglargescalemultitieredstorage AT alizahir datatemperatureinformedstreamingforoptimisinglargescalemultitieredstorage AT luliu datatemperatureinformedstreamingforoptimisinglargescalemultitieredstorage AT muhammadusmanyaseen datatemperatureinformedstreamingforoptimisinglargescalemultitieredstorage AT nickantonopoulos datatemperatureinformedstreamingforoptimisinglargescalemultitieredstorage

Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage

Similar Items