A Novel Low-Overhead Recovery Approach for Distributed Systems
We have addressed the complex problem of recovery for concurrent failures in distributed computing environment. We have proposed a new approach in which we have effectively dealt with both orphan and lost messages. The proposed checkpointing and recovery approaches enable each process to restart fro...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2009-01-01
|
Series: | Journal of Computer Systems, Networks, and Communications |
Online Access: | http://dx.doi.org/10.1155/2009/409873 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832550607298232320 |
---|---|
author | B. Gupta S. Rahimi |
author_facet | B. Gupta S. Rahimi |
author_sort | B. Gupta |
collection | DOAJ |
description | We have addressed the complex problem of recovery for concurrent failures in distributed computing environment. We have proposed a new approach in which we have effectively dealt with both orphan and lost messages. The proposed checkpointing and recovery approaches enable each process to restart from its recent checkpoint and hence guarantee the least amount of recomputation after recovery. It also means that a process needs to save only its recent local checkpoint. In this regard, we have introduced two new ideas. First, the proposed value of the common checkpointing interval is such that it enables an initiator process to log the minimum number of messages sent by each application process. Second, the determination of the lost messages is always done a priori by an initiator process; besides it is done while the normal distributed application is running. This is quite meaningful because it does not delay the recovery approach in any way. |
format | Article |
id | doaj-art-2aeb5f5a2f8d42fc822f81a34d1f88ef |
institution | Kabale University |
issn | 1687-7381 1687-739X |
language | English |
publishDate | 2009-01-01 |
publisher | Wiley |
record_format | Article |
series | Journal of Computer Systems, Networks, and Communications |
spelling | doaj-art-2aeb5f5a2f8d42fc822f81a34d1f88ef2025-02-03T06:06:21ZengWileyJournal of Computer Systems, Networks, and Communications1687-73811687-739X2009-01-01200910.1155/2009/409873409873A Novel Low-Overhead Recovery Approach for Distributed SystemsB. Gupta0S. Rahimi1Computer Science Department, Southern Illinois University, Carbondale, IL 62901, USAComputer Science Department, Southern Illinois University, Carbondale, IL 62901, USAWe have addressed the complex problem of recovery for concurrent failures in distributed computing environment. We have proposed a new approach in which we have effectively dealt with both orphan and lost messages. The proposed checkpointing and recovery approaches enable each process to restart from its recent checkpoint and hence guarantee the least amount of recomputation after recovery. It also means that a process needs to save only its recent local checkpoint. In this regard, we have introduced two new ideas. First, the proposed value of the common checkpointing interval is such that it enables an initiator process to log the minimum number of messages sent by each application process. Second, the determination of the lost messages is always done a priori by an initiator process; besides it is done while the normal distributed application is running. This is quite meaningful because it does not delay the recovery approach in any way.http://dx.doi.org/10.1155/2009/409873 |
spellingShingle | B. Gupta S. Rahimi A Novel Low-Overhead Recovery Approach for Distributed Systems Journal of Computer Systems, Networks, and Communications |
title | A Novel Low-Overhead Recovery Approach for Distributed Systems |
title_full | A Novel Low-Overhead Recovery Approach for Distributed Systems |
title_fullStr | A Novel Low-Overhead Recovery Approach for Distributed Systems |
title_full_unstemmed | A Novel Low-Overhead Recovery Approach for Distributed Systems |
title_short | A Novel Low-Overhead Recovery Approach for Distributed Systems |
title_sort | novel low overhead recovery approach for distributed systems |
url | http://dx.doi.org/10.1155/2009/409873 |
work_keys_str_mv | AT bgupta anovellowoverheadrecoveryapproachfordistributedsystems AT srahimi anovellowoverheadrecoveryapproachfordistributedsystems AT bgupta novellowoverheadrecoveryapproachfordistributedsystems AT srahimi novellowoverheadrecoveryapproachfordistributedsystems |