3.6. Quantitative FTA

Quantitative fault trees can be evaluated to compute the top event unavailability, unreliability, conditional failure intensity as well as the importance of individual events. The mathematical background and formulas are described in the Ansys medini analyze User Guide in the section "Analysis of Fault Trees". This section summarizes potential pitfalls when using the tool and interpreting the results.

The evaluation of a fault tree can be triggered on any event of the fault tree and is a manual step. The tool has two distinct computations for the top and intermediate event probabilities:

The tool computes the probabilities (unavailability) and failure frequencies of the top events and intermediate events at the specified mission time T (depending on selection). To trigger this computation, in the context menu, select Calculate probabilities. All events in the subtree of the selected event are updated as shown in the diagram(s).
The tool can also complete a full evaluation over the complete mission time interval [0...T] to compute aggregated values such as unreliability. To trigger this evaluation, in the context menu, select Evaluate Fault Tree....

RECOMMENDATION [3.6.1]

The calculation should always be triggered on the top-level event to keep the whole tree consistent. For R19.2 and earlier releases: If multiple models are used, set the mission time to the same value to avoid confusion in the graphical visualization.

Basic events can derive their probability from SysML elements and safety mechanisms, converting the failure rate and/or diagnostic coverage into a probability for the event.

RECOMMENDATION [3.6.2]

Before triggering the fault tree evaluation, users shall make sure that all failure rates are up-to-date. If starting a fault tree evaluation and analysis of its results, users shall recompute failure rates to guarantee that the fault tree uses current values.

NOTE [3.6.3]

In time-dependent probability models, changes that only have an impact outside the mission time do not influence the calculation of reliability metrics. For example, in a monitored event, increasing the length of a test interval that is already outside of the mission time does not affect the probability calculation. Exceptions are MTTF and MTBF, which are calculated independently of the mission time.

RECOMMENDATION [3.6.4]

To get accurate results, the integration step width parameter must be set to an appropriate value. The integration step width determines the points to be computed for the numerical integration. It should be adjusted according to the test interval for monitored/latent events. In general, the step width should not be too large since it could lead to coarse-grained approximations of integrals in certain cases.

In addition, the first interval between [0..step-width] is calculated with higher resolution since its contribution for repairable events and certain Weibull settings is disproportionally high, especially for small mission times.

NOTE [3.6.5]

When using the Evaluate Fault Tree operation, the computed value for the mean time to failure (MTTF) may be inaccurate, either overestimating or underestimating the true value depending on the chosen mission time. MTTF is defined as the integral of the reliability from 0 to infinity and is therefore independent of the mission time. While the integration on [0,T] is done using the provided step width, the integral on [T, infinity) is approximated using a different technique: starting at T, the conditional failure intensity λ_sys is evaluated at certain points and is only accurate if λ_sys approaches a steady state at T. Otherwise, a relative error of ten percent or more is possible, underestimating the true MTTF when evaluating at maximum λ_sys or overestimating MTTF when evaluating at the minimum λ_sys.

NOTE [3.6.6]

The mean time between failures (MTBF) is calculated by evaluating 1/w(t*) for a very large t*. This is generally a good approximation, but it can introduce non-negligible errors in cases, such as when w(t) is periodic or a sawtooth curve, as can happen in the presence of monitored events.

NOTE [3.6.7]

The maximum resolution for integrals is currently 1h (as a step width for the integration). Note that this means monitored events with a test interval of less than 1h will appear as if their probability is constant.

NOTE [3.6.8]

Ansys medini analyze uses one of two algorithms to compute quantitative figures.

Binary Decision Diagrams (BDD)
By default, the tool uses BDD to compute quantitative figures. The precision of the results is independent of the cut-off settings for the cut set extraction. BDD calculates unavailibility exactly, except for rounding errors. The calculation of some other metrics, such as reliability, relies on approximations and may have other problems.
For more information about potential errors in calculations and approximations, see Warning [3.6.9].
Cut Sets
You can also use minimal cut sets to truncate the fault tree computation and use approximations of the unavailability and its derived metrics. This computation does not use BDD and provides best effort approximations. Results can vary depending on the truncation cut-off (the number of events in a cut set).

For information about how these algorithms can affect performance, see "FTA Performance Options" in the Ansys medini analyze User Guide.

For additional details about potential errors in calculations, see WARNING [3.6.9], below.

WARNING [3.6.9]

Following are descriptions of areas in which quantitative calculations may be inexact:

Floating-point arithmetic
Ansys medini analyze uses double-precision numbers (approximately 16 digits) or high-precision numbers (E-22) to calculate probability. Rounding is applied conservatively, that is, calculations are rounded up.
Representing numbers as floating-point numbers may introduce issues. This can be shown using an example of double-precision floating-point numbers:
For any real number x, let double(x) denote its double representation (the closest double-precision floating-point to it).
These can introduce the following issues:
- Extremely small numbers can no longer be represented, for example, double(10³³⁰) = 0. This may arise in extremely rare cases during frequency and probability calculation.
- Catastrophic cancellation: Subtracting two numbers that are close to each other in size leads to a loss of precision. For example, let eps = 10¹³, then diff = double(1-eps) - double(1)= 1.0003...10¹³. This is only correct up to the fourth significant digit.
  Further, choosing eps =10¹⁶ yields diff = 1.1... 10¹⁶, where only the first significant digit is correct. Situations where this commonly happens include:
  - Unavailability (Q_sys) approaches 1 during the calculation of the conditional failure intensity and its derived metrics.
  - Probabilities of events approach 1 during the calculation of Q_sys, more specifically when calculating P(¬ Ei) = 1-P(Ei)
  - When calculating the system failure frequency, more specifically the terms P(¬ Ei) and P(TLE|Ei) - P(TLE|¬ Ei)
  - When calculating Birnbaum and criticality measures, in particular P(TLE|A) - P(TLE|¬(A))
Conditional Failure Intensity
The system failure rate^[3] (hazard rate) is defined as F'/(1-F) and is approximated using the conditional failure intensity λ_sys = w/(1-Q). This approximation is generally only exact if the system is non-repairable or if the failure rate is constant. A single repairable or monitored event is fine.
Because the formula for reliability makes use of λ_sys, it can only produce exact results if the system is non-repairable or if its failure rate is constant. Heavy use of monitored and repairable events with large values for the repair rate, the failure rate, and the monitored interval tend to produce larger errors in this approximation.
Numerical Integration
Some metrics, such as reliability or PFD, require integration. This is done numerically and therefore introduces an error, which can be minimized by reducing the step width. If the function that is integrated varies drastically between two supporting points, a large error is introduced. This can happen when integrating the conditional failure intensity during the reliability calculation for certain probability models, for example:
- The conditional failure intensity of the Weibull probability model with a beta value of less than 1 has a pole at 0. As mentioned in RECOMMENDATION [3.6.4], this is addressed using additional supporting points in [0, step-width]. Choosing an inappropriate step-width may still result in a large error.
- Using scripted probability models, it is possible to create large changes between two supporting points, especially when poles are involved.
- Repairable probability models may reach their steady state very fast, that is, their unavailability may change from 0 to their steady state value almost instantaneously. To address this, additional supporting points are used in [0, step-width]. Note that choosing an inappropriate step-width may still result in a noticeable error.
- The failure frequency and unavailability of monitored probability models are discontinuous at multiples of the monitored interval. The conditional failure intensity of systems using monitored probability models may therefore have the same points of discontinuity. They might introduce a considerable error during integration, depending on the step width.
Because numerical integration is used to calculate PFD, CFI average, and Number of failures, they can have the same problems.
Algorithm
The cut set-based algorithm calculates only approximations for the unavailability and system frequency, as well as for all dependent metrics, such as reliability and PFD.
There are two sources of error:
- Depending on the cut-off options, not all cut sets are considered.
- For the cut sets that are taken into account, only an approximation of their unavailability is calculated.
Note that the formula used to calculate unrelated metrics such as the unavailability produces exact results if the BDD-algorithm is selected for evaluation, but the implementation of the formula may not yield exact results due to the other issues listed here. For a comparison of the two algorithms used by medini analyze, see NOTE [3.6.8], above.

^[3]Not to be confused with the other (constant) failure rates in the tool.