Fault Tolerance Model for Hadoop Distributed System

Fault tolerance approaches in distributed systems are essentially based on replication and checkpointing. Each of these approaches has its advantages and limitations. This paper has two objectives: first, it proposes a fault tolerance approach based on the nodes status of a distributed system. For t...

Full description

Saved in:
Bibliographic Details
Main Authors: Soraya Setti Ahmed, Yahya Slimani, Riadh Frefita
Format: Article
Language:English
Published: Graz University of Technology 2025-01-01
Series:Journal of Universal Computer Science
Subjects:
Online Access:https://lib.jucs.org/article/120840/download/pdf/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fault tolerance approaches in distributed systems are essentially based on replication and checkpointing. Each of these approaches has its advantages and limitations. This paper has two objectives: first, it proposes a fault tolerance approach based on the nodes status of a distributed system. For this purpose, it defines 3 nodes status: safety, faulty and potentially faulty. With respect of classical node status (safety, faulty), it introduces a new status that we call potentially faulty. This last node allows to enhance the availability of a distributed system. Second, it discusses the efficiency of the proposed model on two types of architectures: virtual multi-node cluster and a physical multi-node cluster with WIFI connection. Experiments have showed that proposed approach increases the system performance throughput and its fault tolerance level.
ISSN:0948-6968