Nt1330 Unit 1 Problem Analysis Paper

Designing a fault-tolerant system can be done at different levels of the software stack. We call general purpose the approaches that detect and correct the failures at a given level of that stack, masking them entirely to the higher levels (and ultimately to the end-user, who eventually see a correct result, despite the occurrence of failures). General-purpose approaches can target specific types of failures (e.g. message loss, or message corruption), and let other types of failures hit higher levels of the software stack. In this section, we discuss a set of well-known and recently developed protocols to provide general-purpose fault tolerance for a large set of failure types, at different levels of the software stack, but always below the …show more content…
Among them the first approach was proposed in 1984 by Chandy and Lamport, to build a possible global state of a distributed system [20]. The goal ofthis protocol is to build a consistent distributed snapshot of the distributed system. A distributed snapshot is a collection of process checkpoints (one per process), and a collection of in-flight messages (an ordered list of messages for each point to point channel). The protocol assumes ordered loss-less communication channel; for a given application, messages can be sent or received after or before a process took its checkpoint. A message from process p to process q that is sent by the application after the checkpoint of process p but received before process q checkpointed is said to be an orphan message. Orphan messages must be avoided by the protocol, because they are going to be re-generated by the application, if it were to restart in that snapshot. Similarly, a message from process p to process q that is sent by the application before the checkpoint of process p but received after the checkpoint of process q is said to be missing. That message must belong to the list of messages in channel p to q, or the snapshot is inconsistent. A snapshot that includes no orphan message, and for which all the saved channel messages are missing messages is consistent, since the application can be started from that state and pursue its computation

Nt1330 Unit 1 Problem Analysis Paper

You May Also Find These Documents Helpful

Nt1310 Unit 1 Exercise 1 Case Study

Nt1310 Unit 1 Exercise 1 Case Study

Nt1310 Unit 1 Question Paper

Nt1310 Unit 1 Question Paper

Nt1310 Unit 10 Research Paper

Nt1310 Unit 10 Research Paper

Nt1310 Project Part 1 Multi-Layered Security Plan

Nt1310 Project Part 1 Multi-Layered Security Plan

Nt1330 Unit 1 Algorithm Application Paper

Nt1330 Unit 1 Algorithm Application Paper

Nt1110 Unit 1 Research Paper

Nt1110 Unit 1 Research Paper

Computerizimg the Regitration Process at Universities

Computerizimg the Regitration Process at Universities

Distributed System Problem Solution Andrew Taneebaum & Maarten Van Steen

Distributed System Problem Solution Andrew Taneebaum & Maarten Van Steen

network cryptography

network cryptography

Achieving Fault-Tolerance in Operating System Design and Implementation

Achieving Fault-Tolerance in Operating System Design and Implementation

Comparison of Linux/Unix and Windows Xp

Comparison of Linux/Unix and Windows Xp

Real Time Fault Tolerance

Real Time Fault Tolerance

Logical Data Modelling

Logical Data Modelling

Statement of Purpose for Computer Science

Statement of Purpose for Computer Science

Resumen de Modelos de Sistemas Distribuidos

Resumen de Modelos de Sistemas Distribuidos