BACK TO BASICS
All About Data
by Jack B. ReVelle
We can’t be certain, but Lt. Commander Data, a character on “Star Trek: The Next Generation,” may have derived his name from his ability to acquire and process data critical to the mission of the United Star Ship Enterprise. Clearly, the series’ production staff believed the importance of data would continue well into the future.
As a quality practitioner, you owe it to yourself to gain a better understanding of the types of data, data stratification and data collection.
Types of Data
Attribute data, also known as discrete data, are counted in whole numbers or integers. An attribute is the presence or absence of a particular characteristic. The result will always be a whole number—never a decimal fraction.
Typically, the question of whether something has a particular attribute can be answered with either a yes or no. In working with products, services and processes, items are often classified as good vs. bad, accept vs. reject or go vs. no-go. When you are dealing with defects or parts returned for rework or scrap, you are dealing with attribute data.
Attribute data are much easier to collect and record than are variable data, but they don’t provide as much information about the subject items.
Variable data, also known as continuous data, are measurements from a continuous scale. They can be, and frequently are, decimal fractions. The accuracy of a measurement is a function of the level of sensitivity or precision of the measuring instrument being used—the more sensitive the instrumentation, the more precise the measurements.
Variable data provide more information about product and process characteristics than attribute data, but they are more complex and time consuming to collect and record.
Now you need to consider whether the data should be stratified. The purpose of data stratification is to convert a heterogeneous population into a collection of homogeneous subpopulations. This separation process facilitates those studies or analyses of the heterogeneous population from which statistical samples may be drawn.
Data stratification can include the analysis of a population of machines to determine which types create specific kinds of defects or excessive variation and studies of a population of employees to identify the needs and expectations of each category of employees.
Once you identify the population of concern, determine the various types of categories that exist in it, such as size, age, supplier(s), color, weight, distance, gender and cost. Next, divide the population according to the pertinent categories. Then, as you collect data regarding the population, record the categorical information about the sampled units using a tally sheet or some other type of data table.
Data Collection Strategy
We collect data to help make better decisions. Better decisions are made by reducing uncertainty instead of making guesses, going with a gut feeling or even using common sense. Data are facts, but they are not information ready to be used in making decisions.
For data to become ready for use, they must lead to understanding. To correct a problem, you need to understand its nature and causes. As data are collected and compared with desired performance levels, you will learn more about the causes of a problem, what should be measured and how it should be measured.
The following steps should be part of your data collection strategy:
- Determine the purpose of the data to be collected. Will they be used to assess the status of a process or a product? Will they provide a basis for decisions about process or product quality?
- Determine the nature of the data to be collected. Are they measurable (variable or continuous) or are they counted (attribute or discrete)?
- Determine the characteristics of the data to be collected. Can the data be easily understood by people who will evaluate product and process improvement, including customers?
- Determine whether the data can be expressed in terms that invite comparisons with similar processes. Can the performance metric be expressed as parts per million, defects per million opportunities, Cp or Cpk or 6 sigma?
- Determine whether the data place priority on the most important quality influences and whether the data are economical and easy to collect.
- Determine the best type of data gathering check sheet to use: checklists, tally sheets or defect concentration diagrams.
- Determine whether it will be possible to use random sampling or necessary to use 100% data collection.
This column is adapted from pp. 33-35 of Quality Essentials: A Reference Guide From A to Z, published by ASQ Quality Press in 2004.
JACK B. REVELLE is a consulting statistician at ReVelle Solutions LLC in Santa Ana, CA. He earned a doctorate in industrial engineering and management from Oklahoma State University, Stillwater, and is an ASQ Fellow