Failures in a Distributed System

Failures in a Distributed System Paper
Phyllis Lenoir
POS/355
November 19, 2012
Asho Rao

A distributed system is an application that executes a collection of protocols to coordinate the actions of multiple processes on a network, where all component work together to perform a single set of related tasks. A distributed system can be much larger and more powerful given the combined capabilities of the distributed components, than combinations of stand-alone systems. But it's not easy - for a distributed system to be useful, it must be reliable. This is a difficult goal to achieve because of the complexity of the interactions between simultaneously running components. A distributed system must have the following characteristics: * Fault-Tolerant: It can recover from component failures without performing incorrect actions. * Highly Available: It can restore operations, permitting it to resume providing services even when some components have failed. * Recoverable: Failed components can restart themselves and rejoin the system, after the cause of failure has been repaired. * Consistent: The system can coordinate actions by multiple components often in the presence of concurrency and failure. This underlies the ability of a distributed system to act like a non-distributed system. * Scalable: It can operate correctly even as some aspect of the system is scaled to a larger size. For example, we might increase the size of the network on which the system is running. This increases the frequency of network outages and could degrade a "non-scalable" system. Similarly, we might increase the number of users or servers, or overall load on the system. In a scalable system, this should not have a significant effect. * Predictable Performance: The ability to provide desired responsiveness in a timely manner. * Secure: The system authenticates access to data and services
These are high standards, which are challenging to achieve. Probably

References: Introduction to Distributed Systems Design. Retrieved from: http://www.code.google.com/ edu/parallel/dsd-tutorial.html Concurrent Reading. Retrieved from http://www.s.uiowa.edu/...

Failures in a Distributed System

You May Also Find These Documents Helpful

CIS 210 Website Migration Project

CIS 210 Website Migration Project

Is589

Is589

Lesson 36 Acquisition Logistics Supportability PlanningSupport

Lesson 36 Acquisition Logistics Supportability PlanningSupport

Nt1330 Unit 1 Problem Analysis Paper

Nt1330 Unit 1 Problem Analysis Paper

Kudler Fine Foods Networking Strategy

Kudler Fine Foods Networking Strategy

Nt1310 Unit 1 Question Paper

Nt1310 Unit 1 Question Paper

NTC 405 Week 4 TCP IP Paper

NTC 405 Week 4 TCP IP Paper

Cmgt 554 Week4

Cmgt 554 Week4

Cookies Are Us Case Study

Cookies Are Us Case Study

Unit 4 Pos/355 Week 4 Network Failure

Unit 4 Pos/355 Week 4 Network Failure

It230 Wk1 Checkpoint

It230 Wk1 Checkpoint

Introduction to Information Security: Assessment Worksheet

Introduction to Information Security: Assessment Worksheet

Nt1310 Unit 3 Os

Nt1310 Unit 3 Os

Cloud Computing in Healthcare

Cloud Computing in Healthcare

As The Organization

As The Organization

Related Topics