Fitting Lines to Data
Summary of the Video
Why do most dieters fail to lose weight permanently? The story in the video looks at research on obesity that helps answer this question. Renee personifies a common pattern: reducing food intake reduces weight only to a certain point. The relationship between lean body mass (weight without fat) and metabolic rate (how many calories the body uses per unit time when resting) provides insight. A scatterplot shows a positive linear association—larger people burn more calories—for subjects before dieting.
We want to fit a straight line to the points on this scatterplot. Animated graphics introduce the least squares idea. Because we want to predict metabolism (response) from body mass (explanatory), look at the vertical deviations of the points from a fitted line. For any reasonable line, some deviations are positive and some are negative. The squared deviations are all positive. The least-squares regression line is the line that makes the sum of the squares of the vertical deviations as small as possible. It is the “best” line for predicting y from x in this specific sense.
Fit the least-squares regression line to the data on mass and metabolism. Now the subject's diet. Use the line to predict metabolism from the new, lower, lean body mass. Aha. The line consistently overpredicts. Metabolic rate drops lower than expected when a person diets. The researchers explain this finding: To help survive famine, our bodies are programmed to reduce metabolism when they are deprived of food. Every “famine” (or every attempt at dieting) makes our bodies more efficient in using food. So less food goes farther and we find it hard to lose weight because our bodies get better at using the fewer calories we take in.
How can we obtain the equation of the least-squares regression line from data? Call the line
to emphasize that the line gives the predicted y . The recipe for the slope is
This formula appears formidable, but the graphic shows how to pull it apart into a few parts, the building block sums and sums of squares that are easy to obtain from the data. The recipe for the intercept is simpler,
The video recommends using a calculator that will find the slope and intercept from keyed-in data, or statistical software.
Wise use of regression requires more than the ability to follow the recipe. A scatterplot of moisture versus time in drying corn shows a curved pattern—the recipe will fit a line to these data, but a straight line is not the right model. A scatterplot of the earnings of an employee at a fast-food place shows a linear pattern for the first three years—but it is silly to extrapolate by using the line fitted to these data to predict her earnings after 10 years. Finally, the least-squares regression line is strongly influenced by individual points that are extreme in the x direction. Such points are called influential because they pull the line to themselves.