{"id":1256,"date":"2021-07-03T10:17:50","date_gmt":"2021-07-03T14:17:50","guid":{"rendered":"https:\/\/openbooks.macewan.ca\/rcommander\/?post_type=chapter&#038;p=1256"},"modified":"2024-02-08T14:47:06","modified_gmt":"2024-02-08T19:47:06","slug":"13-2-least-squares-straight-line","status":"publish","type":"chapter","link":"https:\/\/openbooks.macewan.ca\/introstats\/chapter\/13-2-least-squares-straight-line\/","title":{"raw":"13.2 Least-Squares Straight Line","rendered":"13.2 Least-Squares Straight Line"},"content":{"raw":"We use a straight line to model the relationship between two quantitative variables y\u00a0and x: [latex]y = b_0 + b_1 x[\/latex]. The interpretations of the terms in the equation are given as follows:\r\n<ul>\r\n \t<li>[latex]x[\/latex]: the <em>predictor <\/em>(independent) variable<\/li>\r\n \t<li>[latex]y[\/latex]: the <em>response<\/em> (dependent) variable<\/li>\r\n \t<li>[latex]b_0[\/latex]: the intercept, it is the value of [latex]y[\/latex]\u00a0when [latex]x=0[\/latex]<\/li>\r\n \t<li>[latex]b_1[\/latex]: the slope of the straight line. It is <strong>the change in <em>y<\/em><\/strong>\u00a0<strong>when <em>x<\/em>\u00a0increases by 1 unit<\/strong>. If [latex]b_1 &gt; 0[\/latex], <em>y<\/em>\u00a0increases when <em>x<\/em> increases; if [latex]b_1 &lt; 0[\/latex], <em>y<\/em>\u00a0decreases when\u00a0<em>x<\/em>\u00a0increases.<\/li>\r\n<\/ul>\r\nThe figure below illustrates the meanings of the intercept and the slope of a straight line.<a id=\"retfig13.2\"><\/a>\r\n\r\n[caption id=\"attachment_3103\" align=\"aligncenter\" width=\"1139\"]<img class=\"wp-image-3103 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression.png\" alt=\"A demonstration of how to interpret a linear regression equation. Image description available.\" width=\"1139\" height=\"760\" \/> <strong>Figure 13.2<\/strong>: Interpretation of Intercept and Slope of a Straight Line. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#fig13.2\">Image Description (See Appendix D Figure 13.2)<\/a>][\/caption]\r\n<div style=\"text-align: center;\" align=\"center\"><\/div>\r\nOur first objective is to determine\u00a0the values of\u00a0[latex]b_0[\/latex]\u00a0and [latex]b_1[\/latex]\u00a0that characterize the line of best fit:\u00a0[latex]\\hat{y} = b_0 + b_1 x[\/latex]. To properly quantify what is meant by \"best fit\",\u00a0we introduce some definitions.\u00a0The <strong>fitted values<\/strong> are\u00a0 [latex]\\hat{y}_i = b_0 + b_1 x_i[\/latex], where\u00a0[latex]x_i[\/latex]\u00a0is the observed x-value corresponding to\u00a0[latex]y_i[\/latex],\u00a0the observed y-value,\u00a0for\u00a0[latex]i=1, 2, \\dots , n[\/latex]. Each <strong>residual<\/strong> is defined as the difference between the observed [latex]y[\/latex] value and the fitted value. That is, the <em>i<\/em>th\u00a0residual is:\r\n<p style=\"text-align: center;\">[latex]e_i = y_i - \\hat{y}_i = y_i - (b_0 + b_1 x_i)[\/latex].<\/p>\r\nThe <strong>least-squares regression line<\/strong> is obtained by finding the values of [latex]b_0[\/latex] and [latex]b_1[\/latex] that minimize the residual sum of squares [latex]SSE = \\sum e_i^2 = \\sum [ y_i - (b_0 + b_1 x_i) ]^2[\/latex]. The figure below illustrates the least-squares regression line as the red line.<a id=\"retfig13.3\"><\/a>\r\n\r\n[caption id=\"attachment_1261\" align=\"aligncenter\" width=\"530\"]<img class=\"wp-image-1261 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/m13_UsedCar_LeastSquare.png\" alt=\"A linear regression line with the difference between y-hat and y shown. Image description available.\" width=\"530\" height=\"494\" \/> <strong>Figure 13.3<\/strong>: Residuals and Fitted Least-Squares Regression Line. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#fig13.3\">Image Description (See Appendix D Figure 13.3)<\/a>][\/caption]Note that:\r\n<ul>\r\n \t<li>Some residuals are positive, and some are negative. Note that [latex] \\sum e_i = \\sum (y_i - \\hat{y}_i) = 0[\/latex].<\/li>\r\n \t<li>We want a straight line closest to the data points, i.e., the total distance from the points to the line is minimized.<\/li>\r\n \t<li>We use the square of the residual [latex]e_i^2[\/latex] to quantify the distance from the data point [latex]y_i[\/latex] to the straight line.<\/li>\r\n \t<li>The total error is the sum of the squared distances from each point to the straight line, i.e., [latex]\\sum e_i^2[\/latex].<\/li>\r\n \t<li>The straight line yielding the smallest [latex]\\sum e_i^2[\/latex] is called the least-squares line since it makes the sum of squares of the residuals the smallest.<\/li>\r\n<\/ul>\r\nTo find the values of [latex]b_0[\/latex] and [latex]b_1[\/latex] that minimize the residual sum of squares\r\n<p style=\"text-align: center;\">[latex]SSE = \\sum e_i^2 = \\sum [ y_i - (b_0 + b_1 x_i) ]^2[\/latex]<\/p>\r\nis an optimization problem. It can be shown that the solutions are\r\n<p style=\"text-align: center;\" align=\"center\">[latex]\\begin{align*}b_1 &amp;= \\frac{S_{xy}}{S_{xx}} = \\frac{\\sum(x_i - \\bar{x})(y_i - \\bar{y})}{\\sum (x_i - \\bar{x} )^2} = \\frac{\\sum x_i y_i - \\frac{\\left(\\sum x_i\\right)\\left(\\sum y_i \\right)}{n}}{\\sum x_i^2 - \\frac{\\left(\\sum x_i\\right)^2}{n}},\\\\b_0&amp;= \\bar{y} - b_1 \\bar{x} = \\frac{\\sum y_i }{n} - b_1 \\frac{\\sum x_i}{n}.\\end{align*}[\/latex]<\/p>\r\nLike ANOVA, the least-squares regression equation can be obtained using software in practice.\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Example: Least-Squares Regression Line<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nGiven the summaries of the 15 used cars,\r\n<p style=\"text-align: center;\">[latex]n = 15, \\sum x_i = 92, \\sum x_i^2 = 724, \\sum y_i = 125, \\sum y_i^2 = 1193, \\sum x_i y_i = 616[\/latex]<\/p>\r\n\r\n<ol type=\"a\">\r\n \t<li>Find the least-squares regression line to model the relationship between the used cars' price (y) and age (x).\r\n<strong>Steps<\/strong>:\r\n<ol start=\"1\">\r\n \t<li>Calculate the sum of squares:\r\n<p align=\"center\">[latex]\\begin{align*}S_{xy} &amp;= \\sum x_i y_i - \\frac{\\left( \\sum x_i \\right) \\left( \\sum y_i \\right) }{n} = 616 - \\frac{92 \\times 125}{15} = -150.667,\\\\S_{xx} &amp;= \\sum x_i^2 - \\frac{ \\left( \\sum x_i \\right)^2}{n} = 724 - \\frac{92^2}{15} = 159.733.\\end{align*}[\/latex]<\/p>\r\n<\/li>\r\n \t<li>Find the slope and intercept:\r\n<p style=\"text-align: center;\">[latex]\\begin{align*} b_1 &amp;= \\frac{S_{xy}}{S_{xx}} = \\frac{-150.667}{159.733} = -0.9432,\\\\ b_0 &amp;= \\bar{y} - b_1 \\bar{x} = \\frac{\\sum y_i}{n} - b_1 \\frac{\\sum x_i}{n} = \\frac{125}{15} - (-0.9432) \\times \\frac{92}{15} = 14.118.\\end{align*}[\/latex]<\/p>\r\n<\/li>\r\n<\/ol>\r\nTherefore, the least-squares regression line for the used cars is\r\n<p style=\"text-align: center;\">[latex]\\hat{y} = b_0 + b_1 x \\Longrightarrow \\widehat{\\text{price}} = 14.118 + (-0.9432) \\times \\text{age} = 14.118 - 0.9432 \\times \\text{age}[\/latex].<\/p>\r\n<\/li>\r\n \t<li>Interpret the slope [latex]b_1 = -0.9432[\/latex] (in $1000).\r\nOn average, the price of used cars drops by $943.2 when they get one year older.<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>","rendered":"<p>We use a straight line to model the relationship between two quantitative variables y\u00a0and x: [latex]y = b_0 + b_1 x[\/latex]. The interpretations of the terms in the equation are given as follows:<\/p>\n<ul>\n<li>[latex]x[\/latex]: the <em>predictor <\/em>(independent) variable<\/li>\n<li>[latex]y[\/latex]: the <em>response<\/em> (dependent) variable<\/li>\n<li>[latex]b_0[\/latex]: the intercept, it is the value of [latex]y[\/latex]\u00a0when [latex]x=0[\/latex]<\/li>\n<li>[latex]b_1[\/latex]: the slope of the straight line. It is <strong>the change in <em>y<\/em><\/strong>\u00a0<strong>when <em>x<\/em>\u00a0increases by 1 unit<\/strong>. If [latex]b_1 > 0[\/latex], <em>y<\/em>\u00a0increases when <em>x<\/em> increases; if [latex]b_1 < 0[\/latex], <em>y<\/em>\u00a0decreases when\u00a0<em>x<\/em>\u00a0increases.<\/li>\n<\/ul>\n<p>The figure below illustrates the meanings of the intercept and the slope of a straight line.<a id=\"retfig13.2\"><\/a><\/p>\n<figure id=\"attachment_3103\" aria-describedby=\"caption-attachment-3103\" style=\"width: 1139px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3103 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression.png\" alt=\"A demonstration of how to interpret a linear regression equation. Image description available.\" width=\"1139\" height=\"760\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression.png 1139w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression-300x200.png 300w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression-1024x683.png 1024w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression-768x512.png 768w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression-65x43.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression-225x150.png 225w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/interpretation_regression-350x234.png 350w\" sizes=\"auto, (max-width: 1139px) 100vw, 1139px\" \/><figcaption id=\"caption-attachment-3103\" class=\"wp-caption-text\"><strong>Figure 13.2<\/strong>: Interpretation of Intercept and Slope of a Straight Line. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#fig13.2\">Image Description (See Appendix D Figure 13.2)<\/a>]<\/figcaption><\/figure>\n<div style=\"text-align: center; margin: auto;\"><\/div>\n<p>Our first objective is to determine\u00a0the values of\u00a0[latex]b_0[\/latex]\u00a0and [latex]b_1[\/latex]\u00a0that characterize the line of best fit:\u00a0[latex]\\hat{y} = b_0 + b_1 x[\/latex]. To properly quantify what is meant by &#8220;best fit&#8221;,\u00a0we introduce some definitions.\u00a0The <strong>fitted values<\/strong> are\u00a0 [latex]\\hat{y}_i = b_0 + b_1 x_i[\/latex], where\u00a0[latex]x_i[\/latex]\u00a0is the observed x-value corresponding to\u00a0[latex]y_i[\/latex],\u00a0the observed y-value,\u00a0for\u00a0[latex]i=1, 2, \\dots , n[\/latex]. Each <strong>residual<\/strong> is defined as the difference between the observed [latex]y[\/latex] value and the fitted value. That is, the <em>i<\/em>th\u00a0residual is:<\/p>\n<p style=\"text-align: center;\">[latex]e_i = y_i - \\hat{y}_i = y_i - (b_0 + b_1 x_i)[\/latex].<\/p>\n<p>The <strong>least-squares regression line<\/strong> is obtained by finding the values of [latex]b_0[\/latex] and [latex]b_1[\/latex] that minimize the residual sum of squares [latex]SSE = \\sum e_i^2 = \\sum [ y_i - (b_0 + b_1 x_i) ]^2[\/latex]. The figure below illustrates the least-squares regression line as the red line.<a id=\"retfig13.3\"><\/a><\/p>\n<figure id=\"attachment_1261\" aria-describedby=\"caption-attachment-1261\" style=\"width: 530px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1261 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/m13_UsedCar_LeastSquare.png\" alt=\"A linear regression line with the difference between y-hat and y shown. Image description available.\" width=\"530\" height=\"494\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/m13_UsedCar_LeastSquare.png 530w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/m13_UsedCar_LeastSquare-300x280.png 300w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/m13_UsedCar_LeastSquare-65x61.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/m13_UsedCar_LeastSquare-225x210.png 225w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/07\/m13_UsedCar_LeastSquare-350x326.png 350w\" sizes=\"auto, (max-width: 530px) 100vw, 530px\" \/><figcaption id=\"caption-attachment-1261\" class=\"wp-caption-text\"><strong>Figure 13.3<\/strong>: Residuals and Fitted Least-Squares Regression Line. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#fig13.3\">Image Description (See Appendix D Figure 13.3)<\/a>]<\/figcaption><\/figure>\n<p>Note that:<\/p>\n<ul>\n<li>Some residuals are positive, and some are negative. Note that [latex]\\sum e_i = \\sum (y_i - \\hat{y}_i) = 0[\/latex].<\/li>\n<li>We want a straight line closest to the data points, i.e., the total distance from the points to the line is minimized.<\/li>\n<li>We use the square of the residual [latex]e_i^2[\/latex] to quantify the distance from the data point [latex]y_i[\/latex] to the straight line.<\/li>\n<li>The total error is the sum of the squared distances from each point to the straight line, i.e., [latex]\\sum e_i^2[\/latex].<\/li>\n<li>The straight line yielding the smallest [latex]\\sum e_i^2[\/latex] is called the least-squares line since it makes the sum of squares of the residuals the smallest.<\/li>\n<\/ul>\n<p>To find the values of [latex]b_0[\/latex] and [latex]b_1[\/latex] that minimize the residual sum of squares<\/p>\n<p style=\"text-align: center;\">[latex]SSE = \\sum e_i^2 = \\sum [ y_i - (b_0 + b_1 x_i) ]^2[\/latex]<\/p>\n<p>is an optimization problem. It can be shown that the solutions are<\/p>\n<p style=\"text-align: center; text-align: center;\">[latex]\\begin{align*}b_1 &= \\frac{S_{xy}}{S_{xx}} = \\frac{\\sum(x_i - \\bar{x})(y_i - \\bar{y})}{\\sum (x_i - \\bar{x} )^2} = \\frac{\\sum x_i y_i - \\frac{\\left(\\sum x_i\\right)\\left(\\sum y_i \\right)}{n}}{\\sum x_i^2 - \\frac{\\left(\\sum x_i\\right)^2}{n}},\\\\b_0&= \\bar{y} - b_1 \\bar{x} = \\frac{\\sum y_i }{n} - b_1 \\frac{\\sum x_i}{n}.\\end{align*}[\/latex]<\/p>\n<p>Like ANOVA, the least-squares regression equation can be obtained using software in practice.<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Example: Least-Squares Regression Line<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>Given the summaries of the 15 used cars,<\/p>\n<p style=\"text-align: center;\">[latex]n = 15, \\sum x_i = 92, \\sum x_i^2 = 724, \\sum y_i = 125, \\sum y_i^2 = 1193, \\sum x_i y_i = 616[\/latex]<\/p>\n<ol type=\"a\">\n<li>Find the least-squares regression line to model the relationship between the used cars&#8217; price (y) and age (x).<br \/>\n<strong>Steps<\/strong>:<\/p>\n<ol start=\"1\">\n<li>Calculate the sum of squares:\n<p style=\"text-align: center;\">[latex]\\begin{align*}S_{xy} &= \\sum x_i y_i - \\frac{\\left( \\sum x_i \\right) \\left( \\sum y_i \\right) }{n} = 616 - \\frac{92 \\times 125}{15} = -150.667,\\\\S_{xx} &= \\sum x_i^2 - \\frac{ \\left( \\sum x_i \\right)^2}{n} = 724 - \\frac{92^2}{15} = 159.733.\\end{align*}[\/latex]<\/p>\n<\/li>\n<li>Find the slope and intercept:\n<p style=\"text-align: center;\">[latex]\\begin{align*} b_1 &= \\frac{S_{xy}}{S_{xx}} = \\frac{-150.667}{159.733} = -0.9432,\\\\ b_0 &= \\bar{y} - b_1 \\bar{x} = \\frac{\\sum y_i}{n} - b_1 \\frac{\\sum x_i}{n} = \\frac{125}{15} - (-0.9432) \\times \\frac{92}{15} = 14.118.\\end{align*}[\/latex]<\/p>\n<\/li>\n<\/ol>\n<p>Therefore, the least-squares regression line for the used cars is<\/p>\n<p style=\"text-align: center;\">[latex]\\hat{y} = b_0 + b_1 x \\Longrightarrow \\widehat{\\text{price}} = 14.118 + (-0.9432) \\times \\text{age} = 14.118 - 0.9432 \\times \\text{age}[\/latex].<\/p>\n<\/li>\n<li>Interpret the slope [latex]b_1 = -0.9432[\/latex] (in $1000).<br \/>\nOn average, the price of used cars drops by $943.2 when they get one year older.<\/li>\n<\/ol>\n<\/div>\n<\/div>\n","protected":false},"author":19,"menu_order":2,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-1256","chapter","type-chapter","status-publish","hentry"],"part":1246,"_links":{"self":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1256","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/users\/19"}],"version-history":[{"count":68,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1256\/revisions"}],"predecessor-version":[{"id":5319,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1256\/revisions\/5319"}],"part":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/parts\/1246"}],"metadata":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1256\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/media?parent=1256"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapter-type?post=1256"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/contributor?post=1256"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/license?post=1256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}