A Novel Low-Overhead Recovery Approach for Distributed Systems

We have addressed the complex problem of recovery for concurrent failures in distributed computing environment. We have proposed a new approach in which we have effectively dealt with both orphan and lost messages. The proposed checkpointing and recovery approaches enable each process to restart fro...

Full description

Saved in:
Bibliographic Details
Main Authors: B. Gupta, S. Rahimi
Format: Article
Language:English
Published: Wiley 2009-01-01
Series:Journal of Computer Systems, Networks, and Communications
Online Access:http://dx.doi.org/10.1155/2009/409873
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832550607298232320
author B. Gupta
S. Rahimi
author_facet B. Gupta
S. Rahimi
author_sort B. Gupta
collection DOAJ
description We have addressed the complex problem of recovery for concurrent failures in distributed computing environment. We have proposed a new approach in which we have effectively dealt with both orphan and lost messages. The proposed checkpointing and recovery approaches enable each process to restart from its recent checkpoint and hence guarantee the least amount of recomputation after recovery. It also means that a process needs to save only its recent local checkpoint. In this regard, we have introduced two new ideas. First, the proposed value of the common checkpointing interval is such that it enables an initiator process to log the minimum number of messages sent by each application process. Second, the determination of the lost messages is always done a priori by an initiator process; besides it is done while the normal distributed application is running. This is quite meaningful because it does not delay the recovery approach in any way.
format Article
id doaj-art-2aeb5f5a2f8d42fc822f81a34d1f88ef
institution Kabale University
issn 1687-7381
1687-739X
language English
publishDate 2009-01-01
publisher Wiley
record_format Article
series Journal of Computer Systems, Networks, and Communications
spelling doaj-art-2aeb5f5a2f8d42fc822f81a34d1f88ef2025-02-03T06:06:21ZengWileyJournal of Computer Systems, Networks, and Communications1687-73811687-739X2009-01-01200910.1155/2009/409873409873A Novel Low-Overhead Recovery Approach for Distributed SystemsB. Gupta0S. Rahimi1Computer Science Department, Southern Illinois University, Carbondale, IL 62901, USAComputer Science Department, Southern Illinois University, Carbondale, IL 62901, USAWe have addressed the complex problem of recovery for concurrent failures in distributed computing environment. We have proposed a new approach in which we have effectively dealt with both orphan and lost messages. The proposed checkpointing and recovery approaches enable each process to restart from its recent checkpoint and hence guarantee the least amount of recomputation after recovery. It also means that a process needs to save only its recent local checkpoint. In this regard, we have introduced two new ideas. First, the proposed value of the common checkpointing interval is such that it enables an initiator process to log the minimum number of messages sent by each application process. Second, the determination of the lost messages is always done a priori by an initiator process; besides it is done while the normal distributed application is running. This is quite meaningful because it does not delay the recovery approach in any way.http://dx.doi.org/10.1155/2009/409873
spellingShingle B. Gupta
S. Rahimi
A Novel Low-Overhead Recovery Approach for Distributed Systems
Journal of Computer Systems, Networks, and Communications
title A Novel Low-Overhead Recovery Approach for Distributed Systems
title_full A Novel Low-Overhead Recovery Approach for Distributed Systems
title_fullStr A Novel Low-Overhead Recovery Approach for Distributed Systems
title_full_unstemmed A Novel Low-Overhead Recovery Approach for Distributed Systems
title_short A Novel Low-Overhead Recovery Approach for Distributed Systems
title_sort novel low overhead recovery approach for distributed systems
url http://dx.doi.org/10.1155/2009/409873
work_keys_str_mv AT bgupta anovellowoverheadrecoveryapproachfordistributedsystems
AT srahimi anovellowoverheadrecoveryapproachfordistributedsystems
AT bgupta novellowoverheadrecoveryapproachfordistributedsystems
AT srahimi novellowoverheadrecoveryapproachfordistributedsystems