Reliability engineering Study Guide
Study Guide
📖 Core Concepts
Reliability – Probability a system performs its intended function for a specified time under stated conditions.
Availability – Probability a system is operational at a given moment; depends on reliability and maintainability.
Reliability Function $R(t)=1-\int{0}^{t} f(u)\,du$ (probability of success up to time t).
Mean Time To Failure (MTTF) – Average time to first failure; for constant‑rate (exponential) failures, $MTTF = 1/\lambda$.
Failure on Demand (PFD) – Used for single‑shot or standby devices; probability the device fails when called upon.
Residual Risk – Risk left after all reliability actions; never fully quantifiable.
RAMT – Balance of Reliability, Availability, Maintainability, Testability in program planning.
Design for Reliability (DfR) – Systematic use of analysis, testing, and mitigation to meet reliability requirements throughout life.
📌 Must Remember
Reliability ≠ Availability: Availability = $\frac{MTBF}{MTBF+MTTR}$ (requires both reliability and maintainability).
MTTF = 1/λ only for exponential (constant‑rate) failure models.
Redundancy improves system reliability only if common‑cause failures are mitigated.
Historical data are valid only for identical designs, processes, loads, and environments.
Quantitative targets alone (e.g., MTBF) are insufficient; they must be tied to failure mechanisms and severity.
Accelerated Life Testing uses stress–life models (Arrhenius, Eyring, inverse power law) to extrapolate normal life.
FRACAS outputs: field MTBF, MTTR, spare consumption, failure mode distribution, reliability growth trend.
🔄 Key Processes
Reliability Assessment Workflow
Identify hazards (failure modes, human errors, interactions).
Quantify risk (probability × severity).
Plan mitigations (design change, detection logic, maintenance, training).
Perform cost‑benefit analysis → select best mitigation.
Agree on acceptable residual risk.
Design for Reliability (DfR) Procedure
Derive reliability requirements (quantitative levels + qualitative verification).
Choose analysis tools (FMEA, fault tree, block diagram, physics‑of‑failure).
Model system reliability (parts‑stress or physics‑of‑failure).
Apply redundancy/diversity, component derating.
Verify via testing (environmental stress screening, accelerated life tests).
Iterate with reliability growth analysis.
Accelerated Life Test → Field Life Prediction
Select stress(s) → fit stress–life model (e.g., Arrhenius: $\ln t = A + B/T$).
Conduct test → collect failure times.
Estimate model parameters → extrapolate to normal stress.
Compute confidence intervals for predicted life.
🔍 Key Comparisons
Reliability vs Availability
Reliability: probability of no failure over a time interval.
Availability: probability of being operational now; incorporates repair time.
Redundancy vs Diversity
Redundancy: duplicate identical components; improves reliability if failures are independent.
Diversity: use different designs/suppliers; reduces common‑cause failures.
Physics‑of‑Failure vs Parts‑Stress Modeling
PoF: uses mechanistic understanding (stress, crack growth).
Parts‑Stress: empirical counting of parts and stress levels; faster but less mechanistic.
MTTF vs MTBF
MTTF: time to first failure for non‑repairable items.
MTBF: average time between successive failures for repairable systems.
⚠️ Common Misunderstandings
“Higher MTBF = higher reliability” – MTBF ignores repair time; a system with long MTBF but very long MTTR can have low availability.
Treating historical failure rates as universal – small design/process changes can invalidate past rates.
Assuming a single numerical reliability value suffices – reliability is condition‑specific; different environments need separate assessments.
Confusing PFD with failure rate λ – PFD is a probability for a single demand, not a rate per hour.
🧠 Mental Models / Intuition
“Leak‑Bucket” analogy – Reliability is the chance the bucket stays un‑leaked over time; each stress is a hole that can be patched (mitigation) or duplicated (redundancy).
Series‑Parallel reliability – Think of a chain (series) vs multiple routes (parallel); a single weak link kills the chain, any open route keeps a parallel system alive.
Cost‑Benefit slope – Plot mitigation cost vs risk reduction; stop where the slope flattens (diminishing returns).
🚩 Exceptions & Edge Cases
Single‑Shot Devices – Use PFD, not MTTF/MTBF.
Highly Variable Environments – Reliability must be evaluated separately for each distinct operating condition.
Common‑Cause Failures – Redundancy does not improve reliability if a single event can knock out all redundant paths.
📍 When to Use Which
Use FMEA when you need a systematic, component‑level view of failure modes and effects.
Use Fault Tree Analysis for top‑down tracing of how basic events combine to cause a system failure.
Use Reliability Block Diagram for quick series‑parallel reliability calculations.
Choose Physics‑of‑Failure modeling when detailed stress, material, and loading data are available.
Choose Parts‑Stress modeling for early‑stage, data‑poor concepts where component counts dominate.
Apply Redundancy when failure consequences are severe and common‑cause risk is low.
Apply Diversity when common‑cause risk is high (e.g., same supplier, same design).
👀 Patterns to Recognize
Series‑Only Chains → overall reliability = product of component reliabilities → a single low‑reliability part drags the whole system.
“Bathtub Curve” in failure rate plots → early infant‑mortality, constant‑rate useful life, wear‑out; informs where to focus mitigation.
Accelerated Test Data → log‑linear relationship (Arrhenius) or power‑law trend; deviations signal non‑thermal failure mechanisms.
Repeated FRACAS entries for same part → likely design or manufacturing defect → priority for redesign or process control.
🗂️ Exam Traps
Mistaking MTBF for reliability – Remember MTBF is a rate; reliability requires integration over time and often includes repair considerations.
Choosing redundancy without checking common‑cause – Test questions may list redundancy as “always improves reliability”; the correct answer notes the need to assess common‑cause.
Confusing availability formula – Some options use $Availability = MTBF/(MTBF+MTTR)$; others invert numerator/denominator. Pick the version with MTBF in the numerator.
Selecting PFD for a continuously operating device – PFD applies to standby or demand‑type items only; continuous devices use failure rate/MTTF.
Assuming historical failure data are directly reusable – Exams often test awareness that data must match design, environment, and usage.
---
Keep this guide handy; it condenses the most exam‑relevant reliability engineering concepts into bite‑size, recall‑ready nuggets.
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or