Action That Matters
Practical significance provides a basis for action
by Ronald D. Snee and Roger W. Hoerl
Suppose you’re using a control chart to monitor the performance of an important process. You observe some points outside the control limits. These observed points, however, don’t seem to cause any real problem with the process or its output. What action or inaction do you take?
Suppose in another situation you run an experiment with the objective of improving the yield of an important process. You find a way to increase yield by 1%, but this solution will require a significant capital investment to implement. Your finance department suggests this capital expense will not pay for itself for several years. What action do you take?
Data collection and statistical analysis have long been an integral part of process and product development, control and improvement, and they continue to grow in importance. Such studies result in questions like those earlier. One of the lingering questions is how you should interpret and take action on a statistical analysis. Curiously, the practical issue of when it makes sense to act on statistical results is discussed little in the statistics and quality literature, including textbooks. The assumption seems to be that anything detected in statistical analyses is actionable.
This issue relates to the difference between statistical significance—how confident we can be that an effect is real—and practical significance—whether the effect is important enough to be acted on. Unfortunately, analysts often simply note that an effect is "significant," without clarification, exacerbating the confusion.
We argue that you should be most concerned about practical significance of results because practical significance determines what action, if any, should be taken. This column provides guidance on how to determine the practical significance of results, and combining this with statistical significance, decide on an appropriate course of action. Examples are included to illustrate the proposed analysis and decision-making processes.
What is statistical significance?
Stating that an effect (for example, the average response to a pharmaceutical) is statistically significant implies that the observed effect is larger than can reasonably be attributed to random variation. In other words, you can be confident there really is an effect—that is, the drug has some effect. It is detectable. The metric commonly used to assess statistical significance is the so-called p-value, or the probability of obtaining a result "this unusual," assuming the hypothesis of no effect (that is, assuming the drug has no effect whatsoever). A common rule of thumb used in practice is to say that the observed effect is statistically significant if p < 0.05. This would mean that an observed drug effect of this magnitude would occur less than 5% of the time in a clinical trial if the drug had, in reality, no effect whatsoever.
Obviously, p < 0.05 is just a rule of thumb. There is no science to justify it. Unfortunately, some journals have interpreted this simple rule as a scientific fact and won’t publish research unless the p-value is lower than 0.05. The logical result is that researchers may be tempted to manipulate the data and the analysis until they achieve p < 0.05. Obviously, this is not sound application of the scientific method. To help clarify the issue, the American Statistical Association recently published a statement on the context, process and purpose of p-values.1
Beyond misinterpreting the 0.05 threshold, p-values have other critical limitations. If the process variation (noise) is small, an effect may be statistically significant but of no practical importance. Further, elementary probability shows that the p-value also depends on sample size. If the sample size is large (as in big data studies, for example), small differences can attain statistical significance. This explains why p-values are rarely used in big data analytics. Conversely, an important effect may go unnoticed because of low sample size or large process variation, producing a p-value greater than 0.05. Certainly, we should follow up on important results, regardless of the p-value.
Unfortunately, statistical significance does not determine practical significance—the real importance of the effect, nor vice versa. The principal value of attaining statistical significance is to be confident in the statistical results, and our subsequent action. In short, both types of significance must be considered to respond appropriately to statistical studies. This is another reason why we feel it is unfortunate that so few books discuss this critical consideration.
What is practical significance?
Practical significance is determined by assessing the magnitude and nature of an effect in light of the experiment that produced the data, and subject matter (domain) knowledge. Is the effect large enough to have real meaning in context of the study and its objectives?
For example, is the effect of the drug large enough to warrant further expenses toward bringing it to market? While being able to detect that the drug has an effect (significance) is noteworthy, is this effect larger than competitive drugs? Is it large enough to make a convincing case to doctors that they should switch to prescribing this drug? Perhaps it has lower side effects. Cost, priorities and requirements are important practical considerations. Will these results justify an action, change business decisions, revise policies or procedures, or influence the behavior of professionals in the field?
Practical significance is domain specific. Clinical significance, in the context of pharmaceutical development, for example, is related to whether the effect is large enough to have a meaningful impact on a patient’s health.2 On the other hand, regulatory significance should answer another question—whether the observed effect, known with a given degree of confidence, is important enough to warrant regulatory action or inaction.3
Often, experience with a certain phenomenon identifies the size of an effect that is considered practically important. For example, in some situations an effect greater than 5 to 10% of the typical response is considered to have practical significance. In our experience, coefficients of variation, also known as relative standard deviation, being less than 5 to 10% of the typical response are often considered to be acceptable for measurement system variation and process variation.
Jacob Cohen and Shlomo S. Sawilowsky provide statistical measures that divide effects into categories such as small, medium and large.4, 5 Unfortunately, these measures are based on statistical considerations and not practical relevancy.
Integrating statistical and practical significance
Let’s return to the two situations mentioned at the beginning of this column. In the case of the control chart, out-of-control (OOC) signals can be viewed as a statistically significant signal for special causes. Some would argue that every OOC event should be investigated. However, production operators are busy people. So are nurses and financial analysts. They only have time to investigate OOC points that matter.
One way to assess practical significance in this case is to compare the OOC values to the specifications. If the OOC points are well within specs, then the differences are not practically significant. One option is to footnote the special cause, but hold off on investigating for now.
Note that we are suggesting that statistical and practical significance be considered, not one or the other. If there is time to investigate points that are statistically detectable, but not yet practically significant, then, in the spirit of continuous improvement, they should be investigated and addressed. At a minimum, these OOC points should be remembered as an indication that the process may have more serious problems in the future. The acceptance control chart combines process stability and process specifications, and should be considered as a viable option.6
In the second example, a study has found a way to increase the yield by 1%, which is statistically significant but will require a capital investment to implement. This investment will not pay for itself for years, and there are other investment options being considered that would pay for themselves sooner. Two practical considerations come into play here:
First, as just noted, what is the financial value of the 1% increase in yield and the return on investment (ROI) of the needed capital improvement? If this is a high-volume process, then the 1% yield improvement, while numerically small, can produce a significant increase in revenue when multiplied by the high volume. This assumes, of course, that additional production can be sold.
Secondly, are there any customer satisfaction issues with the current yield? In other words, are customers unable to purchase desired product currently, which could force them to go with competition if yield cannot be improved? As with most real problems, there are often other considerations besides ROI that must be considered, such as customer satisfaction, safety and environmental compliance.
Table 1 offers guidance on making the results of statistical analyses actionable by integrating statistical and practical significance.7 Table 1 also illustrates the four situations that can occur relative to statistical and practical significance. Of course, what we are looking for are those situations in the upper left in which the difference is important and real. However, when the effect is large enough to be important but there are insufficient data to be confident that it is real, gathering additional data will either confirm the effect or demonstrate that it is not as large as we had hoped.
Conversely, if the effect can be detected, but is not of practical importance, we often may choose not to act, but rather to continue monitoring, looking for a potential increase in the effect to the point that it becomes important. A lack of both practical and statistical significance allows us to cross this variable off our list and move on to other potential sources of improvement. A negative result can be helpful in improvement efforts because it allows us to focus attention on the critical few issues that drive improvement.
Using statistical thinking and methods
There is another important consideration in the issue of statistical and practical significance: the collection of data that enables practical interpretation of the statistical analysis. Too often, practical significance is an afterthought, only considered after the data are collected.
However, we can plan and collect appropriate data for practical interpretation. In our experience, for example, nearly all the guidance provided in statistical texts on individual methods relate to their statistical performance.
As Table 2 shows, however, you also can consider practical importance and interpretation from the beginning of planning the study and data collection. Table 2 shows that statistical considerations, and the associated concept of statistical significance, revolve for the most part around sample size. Essentially, the question boils down to: "Is the sample size large enough to determine that an effect of a desired size is statistically significant with a desired probability?"
This problem can be easily "mathematized" to calculate a unique correct answer, which, of course, is typical in textbooks. Practical considerations are rarely so cut and dried, however, and are difficult to "mathematize."
The practical considerations in Table 2 relate to ensuring a reasonable time period and range of factors are studied, as is based on experience. We cannot prove that these guidelines are correct in any sense, but we argue they are reasonable from a practical point of view. Obviously, incorporating the desired sources of variation with a reasonable time period—and studying an appropriate range of each variable—will influence the inferences that can be made from the study and, therefore, influence the actions that can be taken based on the results of the study.
Creating bases for action that matter
Statisticians and quality professionals should be placing greater emphasis on the practical significance of the results of their studies. Such consideration augments, and does not replace, the concept of statistical significance. Finding statistical significance is necessary but not sufficient.
Integration of both types of significance should be in our problem solving and process improvement work, as well as in the articles we write and the statistics and quality texts we publish. This enhanced emphasis on practical significance will result in our employers and scientists, engineers and other professionals making better decisions.
Of course, this should be our overarching objective in any statistical study. An added benefit will be that our profession becomes more relevant as its practitioners create more solutions that truly matter.
© 2018 Ronald D. Snee and Roger W. Hoerl
- American Statistical Association (ASA), "The ASA’s Statement on p-Values: Context, Process and Purpose," American Statistician, 2016, Vol. 20, No. 2, pp. 129-133, https://tinyurl.com/ASA-statement-p-value.
- Yifan Wang, Ronald D. Snee, Golshid Keyvan and Fernado J. Muzzio, "Statistical Comparison of Dissolution Profiles," Drug Development and Industrial Pharmacy, 2016, Vol. 42, No. 5, pp. 796-807.
- Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences, Routledge Taylor-Francis Group, 1988.
- Shlomo S. Sawilowsky, "New Effect Size Rules of Thumb," Journal of Modern Applied Statistical Methods, 2009, Vol. 8, No. 2, pp. 467–474.
- Richard A. Freund, "Acceptance Control Charts," Industrial Quality Control, 1957, Vol. 14, No. 4, pp. 13-23.
- Roger W. Hoerl and Ronald D. Snee, Statistical Thinking—Improving Business Performance, second edition, John Wiley and Sons, 2012.
Ronald D. Snee is president of Snee Associates LLC in Newark, DE. He has a doctorate in applied and mathematical statistics from Rutgers University in New Brunswick, NJ. Snee is an honorary member of ASQ and has received ASQ’s Shewhart, Grant and Distinguished Service Medals. He is an ASQ fellow and an academician in the International Academy for Quality.
Roger W. Hoerl is a Brate-Peschel assistant professor of statistics at Union College in Schenectady, NY. He has a doctorate in applied statistics from the University of Delaware in Newark. Hoerl is an ASQ fellow, a recipient of the ASQ’s Shewhart Medal and Brumbaugh Award, and an academician in the International Academy for Quality.