Answering the right question with the right interval
by Christine M. Anderson-Cook
Statisticians love intervals. Scientists, engineers and researchers should also come to love intervals. Choosing the right type of interval provides a means of supplementing an estimated quantity with an appropriate calibration of the uncertainty associated with that value. It means not only that you have a single best guess, but you also can provide a range of sensible values likely to contain the true value and a precise interpretation of how "sensible" is defined.
The flip side of this, however, is that an interval is often provided that doesn’t match the question it’s supposed to answer, or insufficient details are provided about some aspect of the interval. In these cases, intervals can cause confusion and lead to erroneous conclusions.
Consider a recent example I encountered: A 95% interval (110.6, 131.3) was given to summarize the results from a single population’s sample. Unfortunately, the raw data were not provided, and untangling the meaning of the interval was impossible.
A starting point
First, let’s consider some of the ground rules of intervals and examine the common types of intervals constructed.
Intervals provide an estimated range for a population characteristic based on an observed representative sample from that population. A representative sample means the sample shares the same characteristics as the population from which it was drawn. Most often, this is achieved by drawing a random sample from the population in which all items in the population have equal probability of being selected.
Different intervals exist for many different characteristics of the population: mean, variance and new observations. Consequently, for clarity, it is essential to identify the population characteristic for which the interval is intended.
Intervals have a level associated with them. For example, a 95% confidence interval for the population mean has a level of 95%. This means if you repeated the process of collecting a sample of data many times—each time building a confidence interval—then the true population mean, on average, would be contained in 95% of the intervals. This is reassuring in some ways; however, you are generally only looking at a single sample for any particular problem.
Consider three of the most common types of intervals you might construct, based on a representative sample from a homogeneous population in which the characteristic of interest takes on continuous values.
- A confidence interval for the mean: In general, confidence intervals can be constructed for a parameter or a function of parameters that characterize the population. The mean is a common choice for the center of the population.
- A prediction interval for a new observation: This provides a range where you predict a new observation from the same population will fall.
- A tolerance interval for a specified proportion of the population: This interval gives a range you’re confident will contain at least a certain proportion (usually chosen to be large) of the population distribution.
Each of these intervals answers a different question about the population, and the width of the interval is adjusted so the level of the interval (for instance, 95%) will be correct for the question of interest. The level is often referred to as the "coverage probability" because we anticipate the interval will cover the population characteristic of interest according to the interval’s level.
Suppose you have 40 observations drawn at random from a large population, which is thought to be approximately normally distributed. You wish to construct the three types of intervals described above. Each of the intervals has the same basic form:
in which n is the sample size,
is the sample mean,
is the sample standard error, and c(level, n) is a constant that changes, depending on the type of interval you wish to construct, the specified level and available sample size.
For each of the intervals, is the best guess for the center of the population and gives you a good midpoint for the range. For this example, =121.2 with s = 10.6 based on the 40 observations. A histogram of the raw data is shown in the top panel of Figure 1.
For the confidence interval,
in which tlevel, n-1 is a value, the percentage of a t-distribution with n-1 degrees of freedom contained in (-tlevel, n-1, tlevel, n-1) is equal to the level selected. You use a t-distribution instead of a normal distribution for this calculation because there is some uncertainty in the estimate of the unknown population standard deviation from using the sample standard deviation. You must expand the interval to reflect this uncertainty. For this example, a 95% confidence interval for the mean of the population is:
This is shown as the first interval (labeled C.I. or confidence interval) below the histogram in Figure 1.
A 95% prediction interval for a new observation uses a value for c(level, n) of
Again, the t-distribution is used because there is uncertainty associated with the estimation of the population standard deviation. This interval must be wider than the confidence interval for the population mean. Why? In addition to estimating the center of the distribution, we must also allow for the natural variability of the new observation around that mean (hence the additional value of 1 under the square root). For our example, a 95% prediction interval for a new observation is:
This is shown as the second interval (labeled P.I. or prediction interval) below the histogram in Figure 1.
For the tolerance interval, you must decide what proportion of the population distribution you wish to have contained in the interval. Suppose you are interested in a 95% tolerance interval that contains at least 80% of the population.
The c(level, n) can be obtained from looking up c(level, p, n) in Statistical Intervals: A Guide for Practitioners,1 in which p is the proportion of the population distribution to be included in the interval. For our example, a 95% tolerance interval for at least 80% of the population is:
121.2 ±1.602 10.6 = [104.2,138.2].
This is shown as the third interval (labeled T.I. or tolerance interval) below the histogram in Figure 1.
A few notes about these intervals:
- All of the intervals are centered at the sample mean because it is our single best guess of where the population mean and new observations are located. The form of the intervals ensures the intervals will be symmetric in width around this point estimate.
- Estimating the population mean can be done quite precisely, and the estimate is quite robust to the assumption that the original data come from a normal distribution.
- For the sample of 40 observations, one observation lies outside the 95% prediction interval for new observations. This is not unexpected because of the interpretation of a prediction interval: If we collected new observations, we would expect about 95% of them to fall within this interval. Therefore, it is not unexpected for the majority of our observed observations to fall inside the prediction interval.
- There is often confusion about how to interpret tolerance intervals because of the two percentage values in its description. For our interval, you can say you are "95% confident that at least 80% of the population distribution will lie within this interval." Or, if you repeated the process of collecting a sample many times, you can say about 95% of the constructed intervals would have at least the correct coverage (for example, at least 80% of the population).
- Prediction and tolerance intervals are more sensitive to the assumption of normality than confidence intervals for the mean because they are interested in the tails of the distribution, not just the center. Therefore, for these intervals, it is good to examine the distribution of the raw data to make sure the normal assumption is reasonable.
Figure 2 shows variations of prediction and tolerance intervals. The top three prediction intervals illustrate the effect of changing sample size on which the prediction interval is based. As the sample size increases, the width of the interval shrinks but at a diminishing rate. Here, we have assumed the mean and standard deviation estimates are unchanged with any new sample. Of course, that would not necessarily be the case in practice.
The second set of three intervals shows the impact of changing the level of the interval based on the original 40 observations. To be more confident the interval will include a new observation, we must add extra width. The third set of intervals shows tolerance intervals—where we are changing the proportion of the population to be covered by the interval. As you can see, to have a larger proportion included, the width of the interval needs to increase.
Making the right choice
Choosing the right interval is critical for having a meaningful result. Clearly communicating the purpose of the interval will allow those viewing the interval to understand it. Returning to the initial interval [110.6, 131.3] presented earlier, this was actually the sample mean plus or minus one standard deviation, which does not have any obvious interpretation or level associated with it, despite the label that it had received.
There are many types of intervals that answer many other questions and many types of intervals for other different data types. For an excellent guide on available options for appropriately constructing and interpreting intervals, see Statistical Intervals: A Guide for Practitioners.2
- Gerald J. Hahn and William Q. Meeker, Statistical Intervals: A Guide for Practitioners, Wiley-Interscience, 1991.
Christine M. Anderson-Cook is a research scientist at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of the American Statistical Association and a senior member of ASQ.