13.4 Outliers and Influential Observations

In simple linear regression, we must also watch out for outliers and influential observations. Outliers are observations that are far away from the majority of the data. An influential observation is a data point that changes the regression equation dramatically if included. Note that an outlier might or might not be an influential observation.

Example: Outlier and Influential Observations

In the following figures, identify whether the red point is an outlier or an influential observation.

A scatter plot with an outlier. The outlier does not significantly change the regression line. Image description available.
Figure 13.5: An Outlier But Not Influential. [Image Description (See Appendix D Figure 13.5)] Click on the image to enlarge it.
A scatter plot with an outlier. The outlier does significantly change the regression line. Image description available.
Figure 13.6: An Outlier and Influential [Image Description (See Appendix D Figure 13.6)] Click on the image to enlarge.

The red point on the left panel is an outlier since it is far away from the majority of the data; however, it is not an influential observation since the regression lines are almost identical with and without the red point.

The red point on the right panel is an outlier and an influential observation since including the red point dramatically changes the regression line. Without the red point, the slope of the regression line is positive; the slope becomes negative when the red observation is included. The red observation is also far away from the majority of the data and hence is an outlier.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Applied Statistics Copyright © 2024 by Wanhua Su is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.