Subjects/Engineering/Materials and Manufacturing Engineering/Industrial Engineering/Reliability engineering

Reliability engineering - Operational Management and Standards

Understand reliability assessment techniques (monitoring and FRACAS), the organizational roles of reliability engineers, and key standards such as the French FIDES methodology.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What is required for systems in dormant storage or on standby to ensure reliability?

1 of 11

Summary

Reliability Operational Assessment Introduction Reliability operational assessment is the process of monitoring, testing, and analyzing how well systems perform in real-world conditions after they've been deployed. This is fundamentally different from predicting reliability before a system is built—instead, it focuses on collecting actual failure data, understanding why failures occur, and continuously improving the system based on what we learn from the field. This approach ensures that reliability issues are caught and corrected during the system's operational life, not just during design and testing. Monitoring and Data Collection Effective operational reliability assessment begins with systematic monitoring of critical system parameters. During the fault tree analysis phase of reliability design, engineers identify which parameters are most likely to indicate problems. Once the system is deployed, both electronic surveillance (such as sensor data and automatic alerts) and visual inspection are used to track these parameters. Different systems require different data collection approaches. For vehicles, failure logs recorded by onboard diagnostic systems provide detailed information. For consumer products, manufacturers track return rates and failure counts. Systems that sit dormant or on standby—such as emergency backup equipment or stored military assets—require a formal surveillance program where random samples are periodically inspected and tested to detect problems before they become critical. A crucial but often overlooked aspect of data collection is monitoring reliability after any changes to the system. Whenever field upgrades are applied, recall repairs are made, or system modifications are implemented, additional reliability testing must follow. Small changes that seem inconsequential can sometimes have unexpected effects on reliability. Reliability Testing and Modifications No reliability program can anticipate every possible failure mode before a system is deployed. This is especially true when human interaction is involved—users find creative ways to break systems that engineers never imagined. Therefore, failures will inevitably occur, and this is not a sign of failure; it's simply part of operational reality. The key to continuous improvement is systematic root cause analysis. When a failure occurs, reliability engineers investigate not just what failed, but why it failed. This investigation traces causal relationships and identifies the underlying conditions that led to the failure. Once the root cause is understood, corrective actions can be implemented to prevent similar failures. All system failures and the corrective actions taken must be formally reported to the reliability engineering organization. This creates a feedback loop where field experience directly informs design and manufacturing improvements. Failure Reporting, Analysis, and Corrective Action System (FRACAS) The most common tool for operational reliability assessment is the Failure Reporting, Analysis, and Corrective Action System (FRACAS). This is a structured database and workflow system designed specifically to manage reliability in operational systems. Here's how FRACAS works: Failure reporting: When a failure or incident occurs in the field, it is reported into a centralized system with all relevant details (what failed, when, where, how, etc.) Analysis: The system organizes and stores all failure data, creating a comprehensive repository Corrective action: Once failures are analyzed and root causes identified, corrective actions are tracked through completion Statistical analysis: The accumulated data enables statistical analysis that reveals reliability, safety, and quality metrics For FRACAS to be effective, organizations must adopt a single, easy-to-use system across all products and systems. When different teams use different reporting systems, data becomes fragmented and the statistical power of the analysis is lost. Common Outputs from a FRACAS System Once failure data has been collected in a FRACAS system, various reliability metrics can be calculated and analyzed: Field Mean Time Between Failures (MTBF) represents the average time a system operates before experiencing a failure. Unlike MTBF calculated during design, field MTBF reflects real-world operating conditions and actual failure rates. Mean Time To Repair (MTTR) measures how quickly the system can be restored to operation after a failure occurs. This metric is critical because a system that fails frequently but repairs quickly may be more acceptable than a system that fails rarely but takes forever to repair. Spare parts consumption rates track how many replacement parts are being used in the field. Unexpectedly high consumption rates for certain parts can indicate design weaknesses or manufacturing quality issues. Reliability growth trends show whether system reliability is improving, declining, or remaining stable over time. When corrective actions are effective, you should see reliability improving as field experience accumulates. Distribution of failures breaks down where and why failures are occurring—organized by failure type, location in the system, part number, serial number, or symptom. This distribution analysis often reveals patterns that point to systemic issues rather than random failures. Limitations of Using Historical Data While FRACAS provides valuable operational data, it's important to understand the limitations of using past failure data to predict reliability of new systems. Context matters enormously. Reliability depends not just on the system itself, but on how and where it is used. A system with excellent reliability in one environment may fail frequently in a different climate, geographic region, or operating pattern. Historical data from similar systems is useful for developing estimates, but it should never be treated as definitive for a new system in a new context. Additionally, small design or manufacturing changes can significantly affect reliability outcomes. A component supplier might change materials to reduce costs, a manufacturing process might be slightly modified for efficiency, or a design might be slightly simplified. Any of these seemingly minor changes can alter failure modes in unpredictable ways. This is why field data from the old version cannot be directly applied to predict reliability of the new version. Organizational Context for Reliability Engineering <extrainfo> How Reliability Engineering Organizations are Structured The way reliability engineering is organized depends on project size and criticality. Small, non-critical projects may handle reliability informally without a dedicated organization or person. However, larger projects and safety-critical systems typically establish a formal reliability function, often placed within a product assurance organization, systems engineering group, or specialty engineering department. A key advantage of creating an independent reliability organization is that it can be insulated from budget and schedule pressures. When reliability engineers report directly to the project manager, there's sometimes pressure to downplay reliability concerns if they might delay a release. An independent organization can more objectively assess reliability issues without such conflicts of interest. Because reliability decisions are most impactful during early design phases, reliability engineers frequently serve as members of integrated product teams (IPTs) that guide the overall design process. This ensures that reliability is built in from the start rather than added as an afterthought. </extrainfo> <extrainfo> Professional Development in Reliability Engineering Licensing Requirements: Reliability engineers working on systems that directly affect public safety (aircraft, medical devices, nuclear systems, etc.) must be licensed as Professional Engineers (PE) by their state or province, just like structural engineers or electrical engineers. However, not all reliability practitioners need licensure—those in non-safety-critical roles or supporting positions may not require PE certification, though relevant credentials are typically expected for advancing in the field. Career Opportunities: Reliability engineers are essential in aerospace, defense, automotive, medical devices, energy, and any industry where system failure could jeopardize safety or mission success. These industries actively recruit reliability engineers because the cost of failures is so high—a failed spacecraft cannot be recalled, and a failed medical device can cause injury. International Standards and Methodologies Several international standards guide reliability analysis. The French FIDES methodology (UTE-C 80-811) uses physics-of-failure principles combined with test data and field returns to predict reliability. The RDF2000 standard (UTE-C 80-810) is another French methodology based on decades of telecommunications industry experience. </extrainfo>

Flashcards

What is required for systems in dormant storage or on standby to ensure reliability?

A formal surveillance program that inspects and tests random samples

When must additional reliability testing be performed following system modifications?

After any field upgrades, recall repairs, or modifications

Why is systematic root cause analysis included in a reliability program?

To identify causal relationships and implement effective corrective actions

What is the primary purpose of the FRACAS data repository?

To enable statistical analysis for reliability, safety, and quality metrics

What software or process implementation strategy is critical for the success of a FRACAS program?

Adoption of a single, easy-to-use system for all end items

Why can historical failure data be misleading when predicting the reliability of new systems?

Reliability depends on the specific context of use

Where is a formal reliability function typically established within larger projects?

Within a product assurance or specialty engineering organization

What is the primary benefit of creating an independent reliability organization?

To protect reliability activities from budget and schedule pressures

Why do reliability engineers often serve as members of an integrated product team (IPT)?

Because reliability is critical early in system design

When must a reliability engineer be licensed as a professional engineer (PE)?

When working on systems that affect public safety

Which principles does the FIDES methodology (UTE-C 80-811) use for reliability prediction?

Physics-of-failure principles, test data, and field returns

Quiz

What type of surveillance is used to monitor critical parameters identified during fault‑tree analysis design?

1 of 4

Key Concepts

Reliability Assessment and Analysis

Reliability Operational Assessment

Failure Reporting, Analysis, and Corrective Action System (FRACAS)

Mean Time Between Failures (MTBF)

Mean Time To Repair (MTTR)

Root Cause Analysis

FIDES Methodology

RDF2000

Reliability Engineering and Teams

Reliability Engineering Organization

Integrated Product Team

Professional Engineer (PE) Licensing

Definitions

Reliability Operational Assessment

Evaluation of system reliability through monitoring, data collection, testing, and analysis of operational performance.

Failure Reporting, Analysis, and Corrective Action System (FRACAS)

Structured process for recording failures, analyzing causes, and implementing corrective actions.

Mean Time Between Failures (MTBF)

Average elapsed time between successive failures of a system or component.

Mean Time To Repair (MTTR)

Average time required to repair a failed component and restore it to operational condition.

Root Cause Analysis

Systematic investigation aimed at identifying the fundamental cause of a failure or problem.

Reliability Engineering Organization

Dedicated group within an enterprise responsible for planning, executing, and overseeing reliability activities.

Integrated Product Team

Cross‑functional team that includes reliability engineers to address design and development issues early in a project.

Professional Engineer (PE) Licensing

State‑ or province‑issued certification required for engineers working on safety‑critical systems.

FIDES Methodology

French reliability prediction approach (UTE‑C 80‑811) that uses physics‑of‑failure principles, test data, and field returns.

RDF2000

French telecommunications reliability analysis standard (UTE‑C 80‑810) for assessing and predicting system reliability.