Straight Line Or Not?
Extrapolating patterns beyond their natural range can lead to false conclusions
by Christine M. Anderson-Cook
Until this spring, I had never heard of the book Factfulness.1 Then Bill Gates made a big splash in the news by giving each U.S. graduate from college a copy.2 In reading the book, I found it to be a remarkable combination of hopefulness and commentary about some of our blindspots regarding numeracy, and how we interpret and internalize information in a media-saturated world.
If you have not read the book, I recommend it highly—not only for how it can help us think about the world in a more informed and balanced way, but also how it analyzes classic mistakes that we are all prone to. The book describes our instincts to erroneously divide things into groups when distributions are more accurate (the gap instinct), misallocate attention to fear-triggering but low probability events (the fear instinct), plus eight more. But beyond that, Factfulness also presents helpful strategies to tame these debilitating instincts and prevent them from leading us toward false conclusions and perceptions.
In this column, I would like to focus on one of these 10 instincts: the straight line instinct, which suggests that we often are tempted to extrapolate patterns with straight lines, when these may not be appropriate. Factfulness gives several examples in which incongruous results are obtained, including one that projects the growth rate of newborns unaltered through adulthood and reaches outlandish heights.
The premise of the straight line instinct is that in the absence of other information, it is natural to extend the pattern that we see with the obvious straight line. Even in a less-formal way, these extrapolations of trends surround us every day. Each weekday evening, I watch a business show that summarizes stock market trends. If there are two-plus “up” days on the markets, the announcers’ stories all have a jubilant “We are soaring to new heights” flavor. If there are two-plus “down” days on the markets, the show’s stories adopt the tone of “How should an investor weather the oncoming storm?”
Not only are these gyrations in tone exhausting, but they also could induce overreactions to small changes that would lead to poor investment actions. The reporting suggests that a trend of a couple of points in the same direction suggests the pattern will continue on that trajectory for an extended time, when in fact these chronologically local patterns often are more a reflection of variability and volatility than they are a trend.
When is a straight line appropriate?
Beyond the urge to extrapolate based on a small amount of data, there are nuances to the straight line instinct that are worth exploring more deeply. Factfulness argues that most relationships are not sensibly described by straight lines, but with functions that curve or include asymptotes. But this seems to lie in direct contradiction to the response surface methodology3 (RSM) approach of using straight lines and main effect models to characterize many types of relationships.
RSM relies on the principle that complex relationships often can be estimated with Taylor series approximations—with a first-order approximation corresponding to a straight line if a single response, Y, is being described by a single explanatory factor, X. As the number of X’s used to describe the response increases, the approximation is defined by a straight plane or hyperplane. But how can Factfulness and RSM both be reasonable? There are three key differences to the scenarios considered: changes to the underlying relationships, human reaction and response to observing the patterns, and the range over which a curve is being used.
First, Factfulness is almost exclusively focused on relationships between inputs and responses as they change over time, where prediction involves looking into the future. Response surface methods are predicated on a consistent underlying mechanism for all of the data. If we are modeling a scientific relationship, it is much more likely that the mechanisms driving the observed pattern are going to remain consistent for all of the data, and most importantly, for future data.
When we are looking into the future, the underlying rules on which new data will be based are likely to change. In Factfulness, the authors consider predicting future population growth and note how changes in the affluence level of the world population drives changes in people’s behavior. As more of the world moves from subsistence, the average number of children they have is reduced. Hence, projecting historical rates of population using a straight line going forward without adjusting for underlying changes in wealth would lead to wild misses in our predictions. Hence an important question to ask when looking to generalize a relationship and make predictions for new observations is: Are the underlying drivers of the relationship stable or are there changes that might suggest different patterns are appropriate for different subsets of data? Depending on what the answer is, our level of confidence in what we are predicting (particularly in an extrapolated region) should change.
Second, when a system involving people and their actions is being observed, the rules driving relationships may change in response to the previous pattern. Consider the stock market example, in which several days of gains on the market might lead to investors taking profits, which might translate into a drop in stock prices. Relationships governed by science are unlikely to change with observation. A ball dropped will fall at the same rate whether it is watched because the rules driving the pattern are not influenced by human response. However, when we examine changes in the world over time—whether economic, global health or climate—the roles of humans often are changing the nature of these relationships through their intervention. It is critical that we take these impacts into account as we think about future changes.
Finally, there are big differences between a local approximation and a global one. A straight line can be a good approximation to many complicated relationships if we are looking at a small range of input values. Where we can get into trouble is extending that relationship for an unrealistic range.
Figure 1 shows an example where the true relationship is moderating in growth for larger values of the explanatory variable. Several black dashed lines have been added to show that if we consider smaller ranges of X (shown a bit offset below the blue line for easier viewing), the straight line approximation works well. When we look across the entire range of X in the plot, however, a straight line does a terrible job of capturing the true pattern of change and could lead to wild predictions if we tried to extend beyond the range of observed X values. RSM actively takes this into account as models of different complexity are considered and compared for their ability to characterize relationships. First order models, models with interactions and second order models progressively allow more flexibility to capture the patterns that we see in our data.
Simple models can be helpful. They allow us to summarize relationships to see patterns quickly and represent them efficiently. Where we can get into trouble is if the model that we use is inadequate to capture the true behavior of the system. There are several key ways we might be led astray:
- The data on which the model is built come from several different mechanisms, and grouping them together may disguise some of the information we need to understand the patterns we see.
- Humans have a way of changing things, and the feedback from seeing or experiencing current patterns often leads to changes in behavior that will fundamentally change what future patterns will look like.
- It is much easier to get a good fit with a straight line to a local pattern in a relationship, compared to looking across the entire range of inputs.
So how can we keep from being fooled? The first part of the answer is remarkably simple, but sometimes eludes even the most sensible of people: Look at a plot of the data; by examining a plot of data that generated the blue curve in Figure 1, it is easy to see that a straight line is inadequate to summarize the entire relationship.
Next, it is beneficial to think about what underlying attributes could be changing across the different observations. If we think through what might be driving the relationship and see the big picture, our ability to select an appropriate model to guide our predictions and understanding is vastly improved.
So the authors of Factfulness had it right: Straight lines can be dangerous and misleading, particularly if the ground rules on which the relationship is based are changing. Hopefully, this has helped clarify when we should be particularly concerned about building our understanding on such a simplification of the observed pattern.
- Hans Rosling, Anna R. Ronnlund and Ola Rosling, Factfulness: Ten Reasons We’re Wrong About the World—and Why Things Are Better Than You Think, Flatiron Books, 2018.
- Gates Notes on Factfulness, gatesnotes.com/Books/Factfulness.
- Raymond H. Myers, Douglas C. Montgomery and Christine M. Anderson-Cook, Response Surface Methodology: Process and Product Optimization Using Designed Experiments, Wiley-Interscience, 2016.
Christine M. Anderson-Cook is a research scientist in the Statistical Sciences Group at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario, Canada. Anderson-Cook is a fellow of ASQ and the American Statistical Association. She is the 2018 recipient of the ASQ Shewhart Medal.