Quantitative evaluation of fault propagation in a commercial cloud system

As semiconductor technology scales into the nano regime, hardware faults have been threats against computational devices. Cloud systems are incorporating more and more computing density and energy into themselves; thus, fundamental research on topics such as dependability validation is needed, in or...

Full description

Saved in:
Bibliographic Details
Main Authors: Chao Wang, Zhongchuan Fu
Format: Article
Language:English
Published: Wiley 2020-03-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1177/1550147720903613
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832547299320922112
author Chao Wang
Zhongchuan Fu
author_facet Chao Wang
Zhongchuan Fu
author_sort Chao Wang
collection DOAJ
description As semiconductor technology scales into the nano regime, hardware faults have been threats against computational devices. Cloud systems are incorporating more and more computing density and energy into themselves; thus, fundamental research on topics such as dependability validation is needed, in order to verify the robustness of clouds for sensor networks. However, dependability evaluation studies have often been carried out beyond isolated physical systems, such as processors, sensors, and single boards with or without operating system hosts. These studies have been performed using inaccurate simulations instead of validating complete cloud software stacks (firmware, hypervisor, operating system hosts and workloads) as a whole. In this article, we describe the implementation of a fault injection tool, which validates the dependability of a commercial cloud software stack. Hardware faults induced by high energy density environments can be injected; the fault propagation through the cloud software stack is traced, and quantitatively evaluated. Experimental results show that the integrated fault detection mechanism of the cloud system, such as fatal trap detectors, has left a detection margin of 20% silent data corruption to narrow down. We additionally propose two detection mechanisms, which proved good performance in fault detection of cloud systems.
format Article
id doaj-art-8be2d0b4de9045f9a4d8b6392f5b9dea
institution Kabale University
issn 1550-1477
language English
publishDate 2020-03-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-8be2d0b4de9045f9a4d8b6392f5b9dea2025-02-03T06:45:23ZengWileyInternational Journal of Distributed Sensor Networks1550-14772020-03-011610.1177/1550147720903613Quantitative evaluation of fault propagation in a commercial cloud systemChao Wang0Zhongchuan Fu1Computer School, Beijing Information Science and Technology University, Beijing, ChinaComputer Department, Harbin Institute of Technology, Harbin, ChinaAs semiconductor technology scales into the nano regime, hardware faults have been threats against computational devices. Cloud systems are incorporating more and more computing density and energy into themselves; thus, fundamental research on topics such as dependability validation is needed, in order to verify the robustness of clouds for sensor networks. However, dependability evaluation studies have often been carried out beyond isolated physical systems, such as processors, sensors, and single boards with or without operating system hosts. These studies have been performed using inaccurate simulations instead of validating complete cloud software stacks (firmware, hypervisor, operating system hosts and workloads) as a whole. In this article, we describe the implementation of a fault injection tool, which validates the dependability of a commercial cloud software stack. Hardware faults induced by high energy density environments can be injected; the fault propagation through the cloud software stack is traced, and quantitatively evaluated. Experimental results show that the integrated fault detection mechanism of the cloud system, such as fatal trap detectors, has left a detection margin of 20% silent data corruption to narrow down. We additionally propose two detection mechanisms, which proved good performance in fault detection of cloud systems.https://doi.org/10.1177/1550147720903613
spellingShingle Chao Wang
Zhongchuan Fu
Quantitative evaluation of fault propagation in a commercial cloud system
International Journal of Distributed Sensor Networks
title Quantitative evaluation of fault propagation in a commercial cloud system
title_full Quantitative evaluation of fault propagation in a commercial cloud system
title_fullStr Quantitative evaluation of fault propagation in a commercial cloud system
title_full_unstemmed Quantitative evaluation of fault propagation in a commercial cloud system
title_short Quantitative evaluation of fault propagation in a commercial cloud system
title_sort quantitative evaluation of fault propagation in a commercial cloud system
url https://doi.org/10.1177/1550147720903613
work_keys_str_mv AT chaowang quantitativeevaluationoffaultpropagationinacommercialcloudsystem
AT zhongchuanfu quantitativeevaluationoffaultpropagationinacommercialcloudsystem