Failures Paper

Topics: Failure, Fault tolerance, Reliability engineering Pages: 3 (1001 words) Published: October 22, 2013

University of Phoenix


June 10, 2013

It is important to understand that no distributed system is ever safe from any failures. No matter how fault tolerant a system is prepared, there is no such thing as a complete failure-proof system. A constant stream of problems will always arise and taking the necessary precautions and having strong problem solving skills are essential to the success of improving a distributed system from any type of failure. We will discuss four types of failures that may occur within a distributed system and discuss the proper way of addressing them. Without the proper precaution, knowledge, and understanding of these distributed systems and its failures, business continuity is put at risk and can be disrupted.

One of the most common failures in a distributed system is hardware failure and is also one of the main reasons why performing backups are necessary. No other failure will make you think twice about realizing the importance of backups than an unrecoverable hard disk failure. Depending on which particular hardware was the root of the failure, it can be a simple plug and play replacement, or even extensive as a catastrophic meltdown. This type of failure is also applicable to a centralized system and can leave the same consequences if the system is not properly designed to be fault tolerant. To isolate this failure, you must understand the purpose of a synchronous system. This type of systems sends a message to a device and waits a given time for it to respond. If no response is received after a certain amount of time, it will send the message again. After a certain amount of resends, that device will be labeled as failed. To fix and avoid this failure is to have physical redundancy. Meaning, either have an active replication or have a primary backup of the system. Physical redundancy also involves having physical components to replace any failure of hardware that may have occurred.

Another common...
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • Failures: Network Failure Detection Essay
  • Essay on Failures: Failure and Entire Network Shutdown
  • Failures in Operating Systems Essay
  • Distributed System Failure Types Essay
  • Pos/355 Failures Essay
  • Failure: Learning and Paper Words
  • Failure Essay
  • Failures in a Distributed System Essay

Become a StudyMode Member

Sign Up - It's Free