Preview

A Novel Roll-Back Mechanism for Performance Enhancement of Asynchronous Checkpointing and Recovery

Powerful Essays
Open Document
Open Document
8565 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
A Novel Roll-Back Mechanism for Performance Enhancement of Asynchronous Checkpointing and Recovery
Informatica 31 (2007) 1–13

1

A Novel Roll-Back Mechanism for Performance Enhancement of
Asynchronous Checkpointing and Recovery
Keywords: asynchronous checkpointing, recovery, maximum consistent state

In this paper, we present a high performance recovery algorithm for distributed systems in which checkpoints are taken asynchronously. It offers fast determination of the recent consistent global checkpoint (maximum consistent state) of a distributed system after the system recovers from a failure.
The main feature of the proposed recovery algorithm is that it avoids to a good extent unnecessary comparisons of checkpoints while testing for their mutual consistency. The algorithm is executed simultaneously by all participating processes, which ensures its fast execution. Moreover, we have presented an enhancement of the proposed recovery idea to put a limit on the dynamically growing lengths of the data structures used. It further reduces the number of comparisons necessary to determine a recent consistent state and thereby reducing further the time of completion of the recovery algorithm.
Finally, it is shown that the proposed algorithm offers better performance compared to some related existing works that use asynchronous checkpointing.

1

Introduction

Checkpointing and rollback-recovery are wellknown techniques for providing fault-tolerance in distributed systems [1]-[5]. The failures are basically transient in nature such as hardware error [1]. Typically, in distributed systems, all the sites save their local states, known as local checkpoints. All the local checkpoints, one from each site, collectively form a global checkpoint.
A global checkpoint is consistent if no message is sent after a checkpoint of the set and received before another checkpoint of the set [2]-[4], that is, each message recorded as received in a checkpoint should also be recorded as sent in another checkpoint. In this context, it may be mentioned that a

You May Also Find These Documents Helpful

  • Powerful Essays

    Primary hardware that must have a backup to ensure availability is the web server and the database server. In addition to having a primary and a backup of each of these two servers a replication server must also be implemented into the architecture in order for the databases on each server to mirror each other. With proper planning and implementation of this system if the primary servers have a failure there will not be any interruption of service to the customer who is accessing the…

    • 2777 Words
    • 12 Pages
    Powerful Essays
  • Powerful Essays

    Cmgt 554 Week4

    • 1618 Words
    • 7 Pages

    Iniewski, K., McCrosky, C., & Minoli, D. (2008). Network infrastructure and architecture: Designing high-availability networks. Retrieved from The University of Phoenix eBook Collection database.…

    • 1618 Words
    • 7 Pages
    Powerful Essays
  • Good Essays

    There are two kinds of systems that people can utilize when setting up a network. They can use a distributed system or the other kind of system called a centralized system. In this paper we will find out what can happen as far as the failures in these systems and what if anything can be done to fix these systems when they fail.…

    • 726 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    What is the term for the process of locating and recovering information from your memory store?…

    • 329 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    a guide to mysql ch 7

    • 1287 Words
    • 9 Pages

    Security of data, simplicity for removing extra information and the ability to better examine data.…

    • 1287 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical,…

    • 13941 Words
    • 174 Pages
    Powerful Essays
  • Better Essays

    Website Migration Project

    • 3004 Words
    • 13 Pages

    This project aims to produce a system that will adequately address Tony’s Chips system requirements. In light of this, the system’s architecture will consider all of the system’s requirements in its design. The system’s architecture will make use of the ideally performing applications. The project aims to create a cohesive system from the many available system components by putting emphasis on application compatibility. The project also aims at creating reliable recovery solutions for the system. This will be undertaken with the aim of enhancing system recoverability.…

    • 3004 Words
    • 13 Pages
    Better Essays
  • Satisfactory Essays

    that is ,made for this memory to be stored and retrieval involves going back and getting what…

    • 824 Words
    • 4 Pages
    Satisfactory Essays
  • Powerful Essays

    Pos/355 Failures

    • 2109 Words
    • 9 Pages

    First off to start the assignment only requires writing about four different types of failures that can happen on a distributed system, however there are many more than just four types of failures that can happen and they are all important to learn about if you are going to work with a distributed system so that you know how to deal with and handle each one of them.…

    • 2109 Words
    • 9 Pages
    Powerful Essays
  • Good Essays

    Failures Paper

    • 1001 Words
    • 5 Pages

    It is important to understand that no distributed system is ever safe from any failures. No matter how fault tolerant a system is prepared, there is no such thing as a complete failure-proof system. A constant stream of problems will always arise and taking the necessary precautions and having strong problem solving skills are essential to the success of improving a distributed system from any type of failure. We will discuss four types of failures that may occur within a distributed system and discuss the proper way of addressing them. Without the proper precaution, knowledge, and understanding of these distributed systems and its failures, business continuity is put at risk and can be disrupted.…

    • 1001 Words
    • 5 Pages
    Good Essays
  • Good Essays

    A distributed system is an application that executes a collection of protocols to coordinate the actions of multiple processes on a network, where all component work together to perform a single set of related tasks. A distributed system can be much larger and more powerful given the combined capabilities of the distributed components, than combinations of stand-alone systems. But it's not easy - for a distributed system to be useful, it must be reliable. This is a difficult goal to achieve because of the complexity of the interactions between simultaneously running components. A distributed system must have the following characteristics:…

    • 833 Words
    • 4 Pages
    Good Essays
  • Better Essays

    Forgetting information from the short term memory can be explained using the theories of trace decay and displacement. In reference to the multi store model of memory the theory states that in the STM both capacity and duration are limited. The capacity of STM is about 5-9 units of information and the duration of STM is given at only a few seconds, to a maximum of a minute or so. As information cannot stay indefinitely In STM, if it is not transferred into LTM it will be forgotten. Therefore theories of forgetting in STM are based on availability. There are two main theories about how information is lost from the STM, trace decay and displacement theories.…

    • 1762 Words
    • 8 Pages
    Better Essays
  • Satisfactory Essays

    in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.…

    • 1874 Words
    • 8 Pages
    Satisfactory Essays
  • Good Essays

    methods can be placed in one of two categories: methods that help to reconstruct the past…

    • 506 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Have you ever felt like a piece of information has just vanished from memory? Or maybe you know that it's there, you just can't seem to find it. The inability to retrieve a memory is one of the most common causes of forgetting. One possible explanation retrieval failure is known as decay theory. According to this theory, a memory trace is created every time a new theory is formed. Decay theory suggests that over time, these memory traces begin to fade and disappear. If information is not retrieved and rehearsed, it will eventually be lost.…

    • 593 Words
    • 3 Pages
    Good Essays