Divide and Conquer In Reliability Analyses
Gain understanding by looking at different population segments
by Necip Doganaksoy, Gerald J. Hahn and William Q. Meeker
All product is not created equal. Some units are more likely to fail in service than others. Thus, in reliability evaluations, you need to identify subpopulations with different failure susceptibility. This is accomplished through segmentation—a divide-and-conquer strategy that breaks down the product population into meaningful subpopulations so you can conduct separate analyses on each and then act on the resulting information.
Segmentation (also known as data stratification) is one of the so-called seven basic quality tools.1 In this column, we describe and illustrate the use of segmentation for, principally, reliability applications.2
What creates subpopulations?
In a specific product application, subpopulations result from differences in the manufacture and use of a product. Differences in reliability may, for example, be due to variability in raw materials and components or differences in manufacturing processing conditions.
In a recent application, Yili Hong, William Q. Meeker and James D. McCalley segmented data on a fleet of high-voltage power transformers according to manufacturer and manufacturing period—first to model lifetime and then to predict the remaining life for the units in the fleet.3 In another application dealing with the prediction of warranty costs for an electronic product, the population was broken down into component genealogy groups consisting of combinations of part numbers.
Segmentation is especially appropriate when failures due to a particular defect occur in only some production lots. In studying the cracking of the plastic casing of a laptop computer, for example, segmentation revealed such failures took place exclusively on units built during a one-month period at one of several assembly plants. This led to further study, which revealed the wrong type of screw was used in assembly at this plant during this time period, and grease on the screws led to chemical degradation of the plastic casing.
Isolating the problem facilitated root cause identification and steps to ensure the problem would not recur in future product. More immediately, it led to identifying and, when needed, repairing previously built computers that were vulnerable to this failure.
Also, different units of a product population often experience different use environments. A problem may be accentuated or perhaps limited to occur at only extreme ambient conditions, such as severe heat or cold. Moreover, the performance of a dishwasher may depend on the characteristics of the local water supply. In such cases, you might focus immediate corrective action on product in the most vulnerable geographical regions; segmentation of the data by region will help identify the subpopulations that warrant special attention.
Example: aircraft engine
The following example deals with a system that bleeds off air pressure from an aircraft engine to operate a compressor:4, 5
Initial analysis. Lifetime data were available on bleed systems from 2,256 engines in military aircraft operating from various bases. Figure 1 shows a Weibull distribution probability plot for the 19 failures that occurred. Note that unfailed units, although not shown in the plot, are taken into consideration in arriving at the plotting positions.
The slope of the plot seems to change around 600 hours, indicating that a simple Weibull distribution does not provide an adequate representation for the lifetimes. This pattern, which is common in our experience, suggests a mixture of early (infant mortality) failures (on the left side of the plot) and wear-out failures (on the right side).
Segmented data analysis. Examination of the data revealed that 10 of the 19 failures occurred at base D, one of the bases where aircraft were stationed. Separate Weibull probability plots for the lifetimes of the systems at base D and those at all other bases are shown in Figure 2.
The data in each of these two plots scatter around straight lines, suggesting that simple Weibull distributions provide adequate representations if you consider base D and the other bases separately. Moreover, the probability of failure by 3,000 hours is estimated from the plot to be 0.467 for the systems at base D, as compared to 0.013 for the systems at the other bases.6
A recent analysis suggested that lognormal distributions might provide a better fit to the data than Weibull distributions. Fortunately, both analyses led to similar findings.
Resulting action. Further investigation revealed the serious failure problem at base D was caused by corrosion accelerated by salty air (base D was near the ocean), and a change in maintenance procedures was implemented there. This resulted in essentially eliminating the failure mode.
Note that segmentation analyses typically do not provide cause-and-effect conclusions by themselves. The difference in failure probabilities between base D and the other bases could have been due to one factor or a combination of many factors. The determination that the underlying cause was corrosion due to salty air involved an engineering assessment of failed parts. The segmentation analysis, however, helped focus and expedite the physical evaluations.
Identification of subpopulations
If at all possible, the selection of subpopulations should be based on physical considerations. This requires an in-depth understanding of the design, manufacture and use conditions of the product.
In practice, however, the reasons for differences between subpopulations may not be known and, therefore, effective subpopulations often cannot be readily determined. If you knew what created the differences—at least, to the degree that these pertain to the manufacture of the product and are controllable—you would, in fact, want to act to remove them. Thus, identifying subpopulations may be a trial-and-error process.
Initially, subpopulations are often arrived at somewhat arbitrarily, based upon, for example, the period of production (week, month, quarter or year). Such choices should, however, be trumped by manufacturing knowledge. The times at which changes are introduced on line, for example, generally provide an improved criterion for segmentation. Segmentation might also be based on factors such as parts supplier, the geographical region where the product is being used, customer type or a combination of these.
The fact that a Weibull probability plot of the data does not result in a straight line, as in the bleed system example, also suggests the existence of subpopulations (and multiple failure modes, as discussed later) and might provide clues for defining subpopulations.
Segmenting data elsewhere
We have discussed segmentation in the context of reliability data tracking for nonrepairable products. Segmentation of data, however, is useful in many other situations.
For example, a chemical cure process showed inconsistent results. To gain improved understanding, the data were segmented and plotted in various ways, including by shift. The resulting plot showed two of the shifts were providing satisfactory product, but the night shift was not.
To find the cause for this difference, a late-night visit to the factory floor revealed the third-shift operators frequently turned off the plant’s air conditioning. This increased humidity, which in turn had a negative impact on product performance. After correcting the problem, it was decided to control chart the performance segmented by shift.
Another example arises in the comparison of drugs, an area that has become known as comparative effectiveness research and was part of the 2009 U.S. economic stimulus bill. In assessing the effectiveness of competing drugs or medical devices, you want to know if a particular drug is effective in one or more parts of the population, such as the elderly, even if it may not be so in other parts. This calls for segmentation in the data analysis.7
Multiple failure mode analyses
Segmentation bears some similarity to the analysis of multiple failure modes discussed in one of our earlier columns.8 In both cases, the life data cannot be described adequately by a single, simple distribution.
In studying multiple failure modes, information on the mode of failure of each failed unit is required. All of the data is then used in each analysis, but observations from failure modes other than the one under consideration are taken as censored.
In contrast, for the bleed system example, failure mode information was not available at the time of the analysis (and possibly one or more of the base D failures was not actually from corrosion). Thus, the data were segmented into subpopulations, and separate analyses were conducted for each subpopulation.
In both situations, the results of the individual analyses can subsequently be combined to obtain an omnibus analysis for the entire population. For segmentation, this requires knowledge of the proportion of units in the population belonging to each subpopulation.
Short term vs. long term
In the short term, segmentation may result in the speedy and accurate isolation of field problems to well-identified segments of the total product population, so you can identify the most susceptible units and take corrective action. Segmentation may, for example, help determine whether a recall is needed, and if so, what part of the product population needs to be recalled.
By isolating a problem to a relatively small part of the population, you may be able to address an otherwise extremely costly problem without inconveniencing customers not impacted by the problem. The long-term answer, however, is to eliminate the problem in future units, perhaps by designing a sufficiently robust product whose performance is insensitive to the use environment.
References and Notes
- James J. Rooney, et al., "Building From the Basics," Quality Progress, January 2009, pp. 18-29.
- This article has been adapted from sections 9.3.2 to 9.3.4 of Gerald J. Hahn and Necip Doganaksoy’s The Role of Statistics in Business and Industry (Wiley, 2008).
- Yili Hong, William Q. Meeker and James D. McCalley, "Prediction of Remaining Life of Power Transformers Based on Left Truncated and Right Censored Lifetime Data," Annals of Applied Statistics, Vol. 3, 2009, pp. 857-879.
- This example from William Q. Meeker and Luis Escobar’s Statistical Methods for Reliability Data (Wiley, 1998) was originally described in Abernethy, Breneman, Medlin and Reinman’s Weibull Analysis Handbook, Technical Report, AFWAL-TR-83-207 (see reference 5).
- Robert B. Abernethy, James E., Breneman, Charles H. Medlin and Glenn L. Reinman, Weibull Analysis Handbook, Technical Report AFWAL-TR-83-207, Air Force Wright Aeronautical Laboratories, Washington, D.C., 1983, http://handle.dtic.mil/100.2/ADA143100.
- The two plots in Figure 2 have quite different slopes, indicating different Weibull distribution shape parameters (ß^). Base D exhibits an increasing estimated hazard function (ß^≈3) over time, suggesting wearout. For more information, see sections A2 and A3.2 of the appendix to chapter 5 of Hahn and Doganaksoy’s The Role of Statistics in Business and Industry (reference 2). The estimated hazard function for the other bases is close to constant (for example, ß^ is close to 1).
- John D. Kendrick, "Don’t Forget Data Stratification," Six Sigma Forum Magazine, Vol. 7, No. 4, August 2008, pp. 17-25, provides further discussion and examples of segmentation.
- Necip Doganaksoy, Gerald J. Hahn and William Q. Meeker, "Reliability Analysis by Failure Mode," Quality Progress, June 2002, pp. 47-52.
Necip Doganaksoy is a principal technologist-statistician at the GE Global Research Center in Schenectady, NY. He has a doctorate in administrative and engineering systems from Union College in Schenectady. Doganaksoy is a fellow of ASQ and the American Statistical Association.
Gerald J. Hahn is a retired manager of statistics at the GE Global Research Center in Schenectady, NY. He has a doctorate in statistics and operations research from Rensselaer Polytechnic Institute in Troy, NY. Hahn is a fellow of ASQ and the American Statistical Association.
William Q. Meeker is professor of statistics and distinguished professor of liberal arts and sciences at Iowa State University in Ames, IA. He has a doctorate in administrative and engineering systems from Union College in Schenectady, NY. Meeker is a fellow of ASQ and the American Statistical Association.