Types of Requirements
Error checking, recovery Protection against system failures Reliability, availability Proscribe unsafe or insecure behaviors Can sometimes be expressed as functional requirements
“Shall not” not”
Factors in Computer-based System Reliability
What is the probability of a hardware component failing, and for how long? How likely is it that software will produce bogus results? Software doesn’t wear out. doesn’ How likely is it that the human operator will make an error?
Hardware errors can trigger unexpected signals or input to software Software can behave in unexpected ways Strange behavior confuses operator Confused, stressed operator makes mistake in handling situation Mistaken reaction further destabilizes the system
Reliability Metrics I
Probability of Failure on Demand (POFOD): likelihood the system will fail when a request for service is made. A POFOD of 0.001 means that 1 in 1000 requests will fail. Rate of Failure Occurrence (ROCOF): likely frequency of occurrence for unexpected behavior. ROCOF of 2/100 means 2 failures in 100 time units (also called failure intensity)
Reliability Metrics II
Mean time to failure (MTTF): the average time between system failures. An MTTF of 500 means that we expect one failure every 500 units. Availability (AVAIL): probability that a system will be available for use at a given time. An AVAIL of 0.998 means that for any 1000 time units, the system is likely to be available for 998 of them.
Meaning of “Time” in Metrics Time”
Time might be calendar time, processor time, or discrete units such as transactions Systems with continuous load—calendar time load— is fine Systems idle most of the time—processor time— time is better Transaction processing systems with variabledemand load (reservation systems or ATMs) are more concerned with ROCOF and might measure time units as transactions
Measure number of system failures over a large batch of requests to determine POFOD. Time or number of transactions between observed failures, for ROCOF and MTTF. Elapsed repair/restart time once failure occurs. This affects AVAIL.
Useless Non-functional Requirements
“The software shall be reliable under normal use” use”
What does that mean? How will we measure it? If we can’t tell when all faults have been can’ discovered, how can we know this? Remember that failures are faults in action, and we’re measuring failures we’
“At most N faults per 1000 lines of code.” code.”
Steps in Establishing Reliability Specification I
For each sub-system, identify the different types of possible system failure, and the consequences of these failures From that analysis, classify the failures into appropriate classes (see next slide)
T: occurs only with certain inputs P: occurs with all inputs System does (R: does not) require operator intervention to recover Failure does (N: does not) corrupt system state or data
Steps in Establishing Reliability Specification II
For each class define the reliability requirement using an appropriate metric. Unrecoverable Recoverable
⇒ PODOF ⇒ ROCOF
Identify functional reliability requirements to reduce the probability of critical failures, where appropriate.
Machine used 300 times/day Machine lifetime: 8 years Software upgraded every 2 years 1000 machines in network 300 / day * 730 days ≈ 200,000 transactions per software release per machine. 300,000 trans. per day Approx. 100,000,000 transactions on central database per year