Needles in Haystacks
Sampling for detection of defectives
by Lynne B. Hare
Seemingly intelligent people look you straight in the eye and instruct you to tell them how many samples are needed to be absolutely certain there are no defectives in the manufactured lot. These are people with advanced degrees. They make some of our most important decisions.
Impatient of lesser minds, you breathe deeply and listen to the greater angel on your right shoulder: “Now, now,” the angel remarks. “Some people live in a deterministic world. Their paychecks depend on it.”
You ask the group: “How many units are at risk? Are they all in the same lot?”
“Two hundred thousand or more,” the people say. “All the same lot.”
“We can come close,” you say, promising deliverables later in the day.”
They can’t meet ’til tomorrow. Staff, you know. Good—that gives you time to ponder amid sulfurous fumes while you absorb their retort—“Close isn’t good enough.”
Probably triggered by a vague complaint or rumor overheard at the country club, what they really want to know is, “Are there any needles in the haystack?” No one wants to look bad if there are.
“Any remotest inkling of what the proportion of defectives could be, even on the worst of days, please?” you prod.
“No, not really; pick a number, any number,” the group says.
“How about one in 1,000?” you reply.
“No idea, but it’s a start. And by the way, we don’t have a lot of money to spend on this. Gotta run—we’re counting on you,” the group tells you.
No pressure, right? Certainty of detecting is out of the question, of course. You wanted to say so. Angels wouldn’t allow you to, so here you go. How about 95% chance, the great statistical default? No, too loose for this one. How about 98% or 99%? That ought to do it—99 is almost 100. They’ll buy it. It helps to think like they do.
But statistical pondering is in order. You know the probability distribution of defectives might be assumed to be binomial:
|in which||f(x) is the probability.|
|x is the number of defectives in the sample.|
|n is the number of units in the sample.|
|p is the hypothetical proportion of defectives.|
The expression, is the binomial coefficient and is equivalent to .
The “!” symbol is used to designate the product of the integers up to and including the letter preceding it. For example, n! = 1 · 2 · 3 ··· n.
The probability of not finding a defective unit in a sample of n units is:
This means that the probability of detecting one or more defectives in a sample of size n is P = 1 − (1 − p)n. So you can find n from that equation: it implies that (1 − p)n =(1 − P). Therefore, n * log(1 − p) = log(1 − P) and
You don’t forget to round up to the nearest whole sample, and you create Table 1. Homework done. Piece of cake. That oughta fix ’em. Now for the hard part.
You meet and pass the table across the, um, table. “What’s the story?” they want to know.
You explain: “Look, suppose you want to be 98% sure to catch anything as bad as one defective in 5,000. It’ll cost you 19,560 samples.”
“Wow! You’re killing us. We can’t afford that. Give us a break. Besides, we need to be 99% certain,” they reply.
“Ninety-nine percent certain of what?” you’re thinking. But you know they don’t know. Some don’t care what it is they’re certain about—just so long as they have 99%. You suggest a 99% chance of detecting as bad as one in 1,000. They buy it: 4,605 samples.
You promise a randomization scheme by email later today.
“Wait, what? Why do we need one of those? Since when?”
You’re thinking, “Since Daniel Bernoulli in the 18th century.” But, that angel intercedes again and you don’t say it. Instead, you explain that all inference is based on probability theory and for it to be valid, samples must be selected according to some randomization scheme. That means every unit in the lot must have an equal chance of appearing in the sample.
Sample sizes and plans
This scenario (a bit harsh, I’ll admit) conjures up and inflates dark thoughts, aggregated frustrations and selfish ruminations from past encounters with clients who have long since recycled their Stats 101 books. Of course, good statistical bedside manner, tact and diplomacy are essential elements of productive organizational membership. So is anticipation of organizational needs, often unspoken.
For example, what if there were multiple production lots? What if they contained different numbers of units? Shouldn’t the sample size increase with increased lot size? In scenarios similar to the fiction described earlier, some have offered ANSI/ASQ Z1.4 as a source of sample size. Its precursor, written under U.S. government auspice and having been updated and published by two noble institutions make it right. Right? It increases sample sizes with increased lot sizes. Also (with enthusiasm), it offers plans with different acceptable quality levels (AQL).
Hold on. Many AQL sampling plans were published at a different time and for different purposes. They are employed to aid decisions regarding continuous streams of batches, not for a single batch, as earlier described. Developed during World War II in an effort to ensure and improve the quality of supplies for U.S. troops, sample plans struck a bargain with producers: most or approximately 95% of the producer’s lots would be accepted by the government if they were equal to or better than the stated AQL of the plan.
The AQL is not the only story, however. Associated with any sampling plan is an operating characteristic (OC) curve describing its full impact by graphically displaying the probability of lot acceptance corresponding to the full range of product quality, measured in percentage or proportion nonconforming (see Figure 1.) Government operatives were fully aware of the entire range of the curve, and they paid close attention to the end of the curve showing consumer protection, sometimes called the rejectable quality level or the lot tolerance percentage defective. It is that quality level that, if produced, would result in approximately 90% detection or, put differently, only approximately a 10% chance of being accepted.
The government’s first concession to manufacturers was to talk in terms of producer protection, not consumer protection. The second was to increase sample size with increased lot size because, upon initial thinking, it was only logical. Surely, larger lots require larger samples for the same level of protection, don’t they? No, they don’t.
It turns out that unless the calculated sample size—using probability statements similar to that shown in the earlier example—is greater than roughly 10% of the lot size, then the sample size is completely independent of the lot size. It is quite possible that increased sample sizes with increased lot sizes affords a greater chance of representativeness. But it does, in fact, result in increased and unnecessary sampling cost.
A criticism of acceptance sampling plans is that when the day is done, users are left with a pile of good stuff and a pile of bad stuff. What do you do with the bad stuff? This is not entirely fair because developers of the plans were aware that the bad stuff should be examined to find and eliminate root causes of problems.
Quality perspectives have changed since the development of acceptance sampling plans. Many years ago, the emphasis shifted from acceptance and rejection to building quality into the product using other tools and management techniques. Programs such as total quality management and lean Six Sigma have brought progress of astronomical proportions.
Still, many organizations find themselves wanting the additional assurance of post-process inspection. Or they find themselves in the unfortunate position of seeking needles in haystacks.
Lynne B. Hare is a statistical consultant. He holds a doctorate in statistics from Rutgers University in New Brunswick, NJ. He is past chairman of the ASQ Statistics Division and a fellow of ASQ and the American Statistical Association.