{"id":1198,"date":"2021-06-27T15:21:44","date_gmt":"2021-06-27T19:21:44","guid":{"rendered":"https:\/\/openbooks.macewan.ca\/rcommander\/?post_type=chapter&#038;p=1198"},"modified":"2025-06-24T17:45:56","modified_gmt":"2025-06-24T21:45:56","slug":"12-2-main-idea-behind-one-way-anova","status":"publish","type":"chapter","link":"https:\/\/openbooks.macewan.ca\/introstats\/chapter\/12-2-main-idea-behind-one-way-anova\/","title":{"raw":"12.2 Main Idea Behind One-Way ANOVA","rendered":"12.2 Main Idea Behind One-Way ANOVA"},"content":{"raw":"Let [latex]\\mu_1, \\mu_2, \\dots , \\mu_k[\/latex] be [latex]k[\/latex] population means. The hypotheses of one-way ANOVA are formulated as\r\n\r\n[latex]H_0[\/latex]: all means are equal, i.e., [latex]\\mu_1 = \\mu_2 = \\dots = \\mu_k[\/latex]\r\n\r\n[latex]H_a[\/latex]: not all the means are equal.\r\n\r\nThe alternative hypothesis [latex]H_{a}[\/latex]\u00a0means there exists at least one pair of means that are not equal. <strong>Do not<\/strong> write as [latex]H_{a}: \\mu_1 \\neq \\mu_2 \\neq \\dots \\neq \\mu_k[\/latex]. <strong>Do not<\/strong> write \u201cat least one mean is different from the others\u201d since it sounds like at least one mean is different from the others while all the others are the same. Both are just two special cases of what \"not all the means are equal\" means.<a id=\"retfig12.1\"><\/a>\r\n\r\n[caption id=\"attachment_2911\" align=\"aligncenter\" width=\"817\"]<img class=\"wp-image-2911 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/ANOVA_sample.png\" alt=\"Three ovals representing independent samples from independent populations show that the data from each is independent. Image description available.\" width=\"817\" height=\"354\" \/> <strong>Figure 12.1<\/strong>: One-Way ANOVA Based on k Independent Samples. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#fig12.1\">Image Description (See Appendix D Figure 12.1)<\/a>][\/caption]ANOVA F tests are based on [latex]k[\/latex] independent, simple random samples from [latex]k[\/latex] populations. If [latex]H_0: \\mu_1 = \\mu_2 = \\dots = \\mu_k[\/latex] is true, the sample means [latex] \\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k[\/latex] should be close to one another and hence, the variation among sample means should be small. Therefore, we should reject\u00a0 [latex]H_0[\/latex] if the sample means are very different from one another (meaning the variation among the sample means would be large).\r\n<h2><strong>Quantifying Variation<\/strong><\/h2>\r\nThe total variation of the data (SST: the total sum of squares) is quantified as the sum of squared distances from each observation to the overall mean, i.e., [latex]SST = \\sum (x_{ij} - \\bar{x})^2[\/latex], where [latex]x_{ij}[\/latex] is the [latex]j[\/latex]th observation of sample [latex]i[\/latex], [latex]\\bar{x} = \\frac{\\sum x_{ij}}{n}[\/latex] \u00a0is the overall mean, and [latex]n = n_1 + n_2 + \\dots + n_k[\/latex] is the overall sample size.\r\n\r\nThe treatment sum of squares quantifies the variation of the sample means:\r\n<p align=\"center\">[latex]SSTR = \\sum\\limits_{i=1}^{k} n_i (\\bar{x}_i - \\bar{x})^2.[\/latex]<\/p>\r\n\u00a0SSTR quantifies the so-called between-group variation. For this reason,\u00a0\u00a0we should reject [latex]H_0: \\mu_1 = \\mu_2 = \\dots = \\mu_k[\/latex] if SSTR is too large. However, SSTR is only considered \"large\" if it is large relative to the next measure of variation.\r\n\r\nThe within-group variation can be quantified as the sum of squared distances from each observation to the mean of its sample group, i.e.,\r\n<p align=\"center\">[latex]SSE = \\sum (x_{ij} - \\bar{x}_i)^2 = \\sum_\\limits{i=1}^k (n_i - 1) s_i^2.[\/latex]<\/p>\r\nIn practice, software is used to calculate all these sums of squares and other ANOVA calculations.\r\n\r\nThe total variation SST can be shown as:\r\n<p align=\"center\">[latex]SST = SSTR + SSE = \\text{between group variation + within group variation}.[\/latex]<\/p>\r\nThe relationship between SSTR (between group variation) and SSE (within-group variation) will be illustrated in the following example.\r\n\r\nA person recorded waiting times each time he called either Uber or Taxi service from his house and again each time he called either service from work. His results are summarized in the following table (red values correspond to waiting times for Uber and blue values correspond to Taxi):\r\n<p style=\"text-align: center;\"><strong>Table 12.2<\/strong>: Waiting Time for Uber (red) and Taxi (blue) Called from Home and Work<\/p>\r\n\r\n<table class=\"aligncenter first-col-border\" border=\"0\">\r\n<tbody>\r\n<tr class=\"border-bottom\">\r\n<th scope=\"row\">Home<\/th>\r\n<td><span style=\"color: #ff0000;\">1<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">2<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">3<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">3<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">4<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">5<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">6<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">7<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">8<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">8<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">9<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">10<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<th scope=\"row\">Work<\/th>\r\n<td><span style=\"color: #ff0000;\">1<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">2<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">3<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">3<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">4<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">5<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">6<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">7<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">8<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">8<\/span><\/td>\r\n<td><span style=\"color: #ff0000;\">9<\/span><\/td>\r\n<td><span style=\"color: #0000ff;\">10<\/span><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nWe define the waiting times called from Home as Data Set 1 and those called from Work as Data Set 2. The figure below shows two data sets. Each set consists of two populations: waiting time for Uber (red circle with mean [latex]\\mu_1[\/latex]) and waiting time for Taxi (blue cross with mean [latex]\\mu_2[\/latex]).<a id=\"retfig12.2\"><\/a>\r\n\r\n[caption id=\"attachment_1202\" align=\"aligncenter\" width=\"416\"]<img class=\"wp-image-1202 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/image048.png\" alt=\"Two data sets show the wait times for Uber and taxi in red and blue respectively. Image description available.\" width=\"416\" height=\"402\" \/> <strong>Figure 12.2<\/strong>: Waiting Time for Uber (red) and Taxi (blue) Called from Home (Data Set 1) and Work (Data Set 2). [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#fig12.2\">Image Description (See Appendix D Figure 12.2)<\/a>][\/caption]\r\n<div style=\"height: 55px; margin-top: 5px;\"><img class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" \/><\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Exercise: Quantify Variation<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nBased on the two data sets shown above, answer these questions.\r\n<ul>\r\n \t<li>The two data sets have <span style=\"text-decoration: underline;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>(the same, different) total variation.<\/li>\r\n \t<li>Data set 1 has a <span style=\"text-decoration: underline;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>(larger, smaller) within-group variation.<\/li>\r\n \t<li>Data set 1 has a <span style=\"text-decoration: underline;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>(larger, smaller) between-group variation.<\/li>\r\n<\/ul>\r\nSupport your answer by calculating the sums of squares [latex]SST, SSTR[\/latex] and [latex]SSE[\/latex] for each of the two groups.\r\n\r\n<details><summary>Show\/Hide Answer<\/summary><strong>Answers:<\/strong>\r\n<ul>\r\n \t<li>The two data sets have the same total variation.<\/li>\r\n \t<li>Data set 1 has a smaller within-group variation.<\/li>\r\n \t<li>Data set 1 has a larger between-group variation.<\/li>\r\n<\/ul>\r\nThe two data sets have the same overall sample mean [latex]\\bar x[\/latex] and the same total sum of square (SST):\r\n\r\n[latex]\\bar x=\\frac{\\sum x_i}{n}=\\frac{1+2+3+3+4+5+6+7+8+8+9+10}{12}=5.5.[\/latex]\r\n\r\n[latex]\\begin{align*}SST&amp;=\\sum (x_i-\\bar x)^2\\\\&amp;=(1-5.5)^2+(2-5.5)^2+(3-5.5)^2+\\cdots+(8-5.5)^2+(9-5.5)^2\\\\&amp;+(10-5.5)^2\\\\&amp;=95.\\end{align*}[\/latex]\r\n\r\nFor Data Set 1, the mean waiting time for Uber is [latex]\\bar x_1=\\frac{1+2+3+3+4+5}{6}=3,[\/latex] and the mean waiting time for Taxi is [latex]\\bar x_2=\\frac{6+7+8+8+9+10}{6}=8.[\/latex] The between-group and within-group variation are:\r\n\r\n[latex]SSTR=\\sum n_i(\\bar x_i-\\bar x)=n_1(\\bar x_1-\\bar x)^2+ n_2(\\bar x_2-\\bar x)^2=6(3-5.5)^2+6(8-5.5)^2=75.[\/latex]\r\n\r\n[latex]\\begin{align*}SSE&amp;=\\sum (x_{ij}-\\bar x_i)^2\\\\&amp;=(1-3)^2+(2-3)^2+(3-3)^2+(3-3)^2+(4-3)^2+(5-3)^2\\\\&amp;+(6-8)^2+(7-8)^2+(8-8)^2+(8-8)^2+(9-8)^2+(10-8)^2\\\\&amp;=10+10=20.\\end{align*}[\/latex]\r\n\r\nFor Data Set 2, the mean waiting time for Uber is [latex]\\bar x_1=\\frac{1+3+4+5+8+9}{6}=5,[\/latex] and the mean waiting time for Taxi is [latex]\\bar x_2=\\frac{2+3+6+7+8+10}{6}=6.[\/latex] The between-group and within-group variation are:\r\n\r\n[latex]SSTR=\\sum n_i(\\bar x_i-\\bar x)= n_1(\\bar x_1-\\bar x)^2+ n_2(\\bar x_2-\\bar x)^2=6(5-5.5)^2+6(6-5.5)^2=3.[\/latex]\r\n\r\n[latex]\\begin{align*}SSE&amp;=\\sum (x_{ij}-\\bar x_i)^2\\\\&amp;=(1-5)^2+(3-5)^2+(4-5)^2+(5-5)^2+(8-5)^2+(9-5)^2\\\\&amp;+(2-6)^2+(3-6)^2+(6-6)^2+(7-6)^2+(8-6)^2+(10-6)^2\\\\&amp;=46+46=92.\\end{align*}[\/latex]\r\n\r\nIn both data sets, we have SST=SSTR+SSE. The two data sets have the same total variation SST=95. Data Set 1 has a smaller within-group variation SSE (20 versus 92). Data Set 1 has a larger between-group variation SSTR (75 versus 2).\r\n\r\n<\/details><\/div>\r\n<\/div>\r\nFor data set 1, it is clear that the data are from two populations with different means; for data set 2, however, it is hard to tell whether the data are from a single population or from two populations with similar means.\r\n\r\n<strong>The main idea of one-way ANOVA is to decompose the total variation of the data (SST) into two parts: the variation within the samples (SSE) and the variation between sample means (SSTR)<\/strong>. <strong>Reject <\/strong>[latex]H_0: \\mu_1 = \\mu_2 = \\dots = \\mu_k[\/latex]<strong> if the variation between sample means is large compared to the variation within samples<\/strong>. Or reject [latex]H_0[\/latex]\u00a0if the ratio [latex]F = \\frac{SSTR \/ (k-1)}{SSE \/ (n-k)} = \\frac{MSTR}{MSE}[\/latex]\u00a0is too large, where MSTR is called the mean square of the treatments and MSE is the mean square error. The ratio follows an F distribution characterized by two degrees of freedom:\r\n<ul>\r\n \t<li>The numerator degrees of freedom:\u00a0 [latex]df_n = k-1[\/latex],<\/li>\r\n \t<li>The denominator degrees of freedom:\u00a0 [latex]df_d = n-k[\/latex].<\/li>\r\n<\/ul>\r\nLike chi-square tests, F tests are always right-tailed. That is both the rejection region and the p-value are upper-tailed probabilities.","rendered":"<p>Let [latex]\\mu_1, \\mu_2, \\dots , \\mu_k[\/latex] be [latex]k[\/latex] population means. The hypotheses of one-way ANOVA are formulated as<\/p>\n<p>[latex]H_0[\/latex]: all means are equal, i.e., [latex]\\mu_1 = \\mu_2 = \\dots = \\mu_k[\/latex]<\/p>\n<p>[latex]H_a[\/latex]: not all the means are equal.<\/p>\n<p>The alternative hypothesis [latex]H_{a}[\/latex]\u00a0means there exists at least one pair of means that are not equal. <strong>Do not<\/strong> write as [latex]H_{a}: \\mu_1 \\neq \\mu_2 \\neq \\dots \\neq \\mu_k[\/latex]. <strong>Do not<\/strong> write \u201cat least one mean is different from the others\u201d since it sounds like at least one mean is different from the others while all the others are the same. Both are just two special cases of what &#8220;not all the means are equal&#8221; means.<a id=\"retfig12.1\"><\/a><\/p>\n<figure id=\"attachment_2911\" aria-describedby=\"caption-attachment-2911\" style=\"width: 817px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2911 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/ANOVA_sample.png\" alt=\"Three ovals representing independent samples from independent populations show that the data from each is independent. Image description available.\" width=\"817\" height=\"354\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/ANOVA_sample.png 817w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/ANOVA_sample-300x130.png 300w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/ANOVA_sample-768x333.png 768w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/ANOVA_sample-65x28.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/ANOVA_sample-225x97.png 225w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/ANOVA_sample-350x152.png 350w\" sizes=\"auto, (max-width: 817px) 100vw, 817px\" \/><figcaption id=\"caption-attachment-2911\" class=\"wp-caption-text\"><strong>Figure 12.1<\/strong>: One-Way ANOVA Based on k Independent Samples. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#fig12.1\">Image Description (See Appendix D Figure 12.1)<\/a>]<\/figcaption><\/figure>\n<p>ANOVA F tests are based on [latex]k[\/latex] independent, simple random samples from [latex]k[\/latex] populations. If [latex]H_0: \\mu_1 = \\mu_2 = \\dots = \\mu_k[\/latex] is true, the sample means [latex]\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k[\/latex] should be close to one another and hence, the variation among sample means should be small. Therefore, we should reject\u00a0 [latex]H_0[\/latex] if the sample means are very different from one another (meaning the variation among the sample means would be large).<\/p>\n<h2><strong>Quantifying Variation<\/strong><\/h2>\n<p>The total variation of the data (SST: the total sum of squares) is quantified as the sum of squared distances from each observation to the overall mean, i.e., [latex]SST = \\sum (x_{ij} - \\bar{x})^2[\/latex], where [latex]x_{ij}[\/latex] is the [latex]j[\/latex]th observation of sample [latex]i[\/latex], [latex]\\bar{x} = \\frac{\\sum x_{ij}}{n}[\/latex] \u00a0is the overall mean, and [latex]n = n_1 + n_2 + \\dots + n_k[\/latex] is the overall sample size.<\/p>\n<p>The treatment sum of squares quantifies the variation of the sample means:<\/p>\n<p style=\"text-align: center;\">[latex]SSTR = \\sum\\limits_{i=1}^{k} n_i (\\bar{x}_i - \\bar{x})^2.[\/latex]<\/p>\n<p>\u00a0SSTR quantifies the so-called between-group variation. For this reason,\u00a0\u00a0we should reject [latex]H_0: \\mu_1 = \\mu_2 = \\dots = \\mu_k[\/latex] if SSTR is too large. However, SSTR is only considered &#8220;large&#8221; if it is large relative to the next measure of variation.<\/p>\n<p>The within-group variation can be quantified as the sum of squared distances from each observation to the mean of its sample group, i.e.,<\/p>\n<p style=\"text-align: center;\">[latex]SSE = \\sum (x_{ij} - \\bar{x}_i)^2 = \\sum_\\limits{i=1}^k (n_i - 1) s_i^2.[\/latex]<\/p>\n<p>In practice, software is used to calculate all these sums of squares and other ANOVA calculations.<\/p>\n<p>The total variation SST can be shown as:<\/p>\n<p style=\"text-align: center;\">[latex]SST = SSTR + SSE = \\text{between group variation + within group variation}.[\/latex]<\/p>\n<p>The relationship between SSTR (between group variation) and SSE (within-group variation) will be illustrated in the following example.<\/p>\n<p>A person recorded waiting times each time he called either Uber or Taxi service from his house and again each time he called either service from work. His results are summarized in the following table (red values correspond to waiting times for Uber and blue values correspond to Taxi):<\/p>\n<p style=\"text-align: center;\"><strong>Table 12.2<\/strong>: Waiting Time for Uber (red) and Taxi (blue) Called from Home and Work<\/p>\n<table class=\"aligncenter first-col-border\">\n<tbody>\n<tr class=\"border-bottom\">\n<th scope=\"row\">Home<\/th>\n<td><span style=\"color: #ff0000;\">1<\/span><\/td>\n<td><span style=\"color: #ff0000;\">2<\/span><\/td>\n<td><span style=\"color: #ff0000;\">3<\/span><\/td>\n<td><span style=\"color: #ff0000;\">3<\/span><\/td>\n<td><span style=\"color: #ff0000;\">4<\/span><\/td>\n<td><span style=\"color: #ff0000;\">5<\/span><\/td>\n<td><span style=\"color: #0000ff;\">6<\/span><\/td>\n<td><span style=\"color: #0000ff;\">7<\/span><\/td>\n<td><span style=\"color: #0000ff;\">8<\/span><\/td>\n<td><span style=\"color: #0000ff;\">8<\/span><\/td>\n<td><span style=\"color: #0000ff;\">9<\/span><\/td>\n<td><span style=\"color: #0000ff;\">10<\/span><\/td>\n<\/tr>\n<tr>\n<th scope=\"row\">Work<\/th>\n<td><span style=\"color: #ff0000;\">1<\/span><\/td>\n<td><span style=\"color: #0000ff;\">2<\/span><\/td>\n<td><span style=\"color: #ff0000;\">3<\/span><\/td>\n<td><span style=\"color: #0000ff;\">3<\/span><\/td>\n<td><span style=\"color: #ff0000;\">4<\/span><\/td>\n<td><span style=\"color: #ff0000;\">5<\/span><\/td>\n<td><span style=\"color: #0000ff;\">6<\/span><\/td>\n<td><span style=\"color: #0000ff;\">7<\/span><\/td>\n<td><span style=\"color: #0000ff;\">8<\/span><\/td>\n<td><span style=\"color: #ff0000;\">8<\/span><\/td>\n<td><span style=\"color: #ff0000;\">9<\/span><\/td>\n<td><span style=\"color: #0000ff;\">10<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We define the waiting times called from Home as Data Set 1 and those called from Work as Data Set 2. The figure below shows two data sets. Each set consists of two populations: waiting time for Uber (red circle with mean [latex]\\mu_1[\/latex]) and waiting time for Taxi (blue cross with mean [latex]\\mu_2[\/latex]).<a id=\"retfig12.2\"><\/a><\/p>\n<figure id=\"attachment_1202\" aria-describedby=\"caption-attachment-1202\" style=\"width: 416px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1202 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/image048.png\" alt=\"Two data sets show the wait times for Uber and taxi in red and blue respectively. Image description available.\" width=\"416\" height=\"402\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/image048.png 416w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/image048-300x290.png 300w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/image048-65x63.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/image048-225x217.png 225w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2021\/06\/image048-350x338.png 350w\" sizes=\"auto, (max-width: 416px) 100vw, 416px\" \/><figcaption id=\"caption-attachment-1202\" class=\"wp-caption-text\"><strong>Figure 12.2<\/strong>: Waiting Time for Uber (red) and Taxi (blue) Called from Home (Data Set 1) and Work (Data Set 2). [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#fig12.2\">Image Description (See Appendix D Figure 12.2)<\/a>]<\/figcaption><\/figure>\n<div style=\"height: 55px; margin-top: 5px;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png 250w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-65x13.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-225x45.png 225w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Exercise: Quantify Variation<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>Based on the two data sets shown above, answer these questions.<\/p>\n<ul>\n<li>The two data sets have <span style=\"text-decoration: underline;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>(the same, different) total variation.<\/li>\n<li>Data set 1 has a <span style=\"text-decoration: underline;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>(larger, smaller) within-group variation.<\/li>\n<li>Data set 1 has a <span style=\"text-decoration: underline;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>(larger, smaller) between-group variation.<\/li>\n<\/ul>\n<p>Support your answer by calculating the sums of squares [latex]SST, SSTR[\/latex] and [latex]SSE[\/latex] for each of the two groups.<\/p>\n<details>\n<summary>Show\/Hide Answer<\/summary>\n<p><strong>Answers:<\/strong><\/p>\n<ul>\n<li>The two data sets have the same total variation.<\/li>\n<li>Data set 1 has a smaller within-group variation.<\/li>\n<li>Data set 1 has a larger between-group variation.<\/li>\n<\/ul>\n<p>The two data sets have the same overall sample mean [latex]\\bar x[\/latex] and the same total sum of square (SST):<\/p>\n<p>[latex]\\bar x=\\frac{\\sum x_i}{n}=\\frac{1+2+3+3+4+5+6+7+8+8+9+10}{12}=5.5.[\/latex]<\/p>\n<p>[latex]\\begin{align*}SST&=\\sum (x_i-\\bar x)^2\\\\&=(1-5.5)^2+(2-5.5)^2+(3-5.5)^2+\\cdots+(8-5.5)^2+(9-5.5)^2\\\\&+(10-5.5)^2\\\\&=95.\\end{align*}[\/latex]<\/p>\n<p>For Data Set 1, the mean waiting time for Uber is [latex]\\bar x_1=\\frac{1+2+3+3+4+5}{6}=3,[\/latex] and the mean waiting time for Taxi is [latex]\\bar x_2=\\frac{6+7+8+8+9+10}{6}=8.[\/latex] The between-group and within-group variation are:<\/p>\n<p>[latex]SSTR=\\sum n_i(\\bar x_i-\\bar x)=n_1(\\bar x_1-\\bar x)^2+ n_2(\\bar x_2-\\bar x)^2=6(3-5.5)^2+6(8-5.5)^2=75.[\/latex]<\/p>\n<p>[latex]\\begin{align*}SSE&=\\sum (x_{ij}-\\bar x_i)^2\\\\&=(1-3)^2+(2-3)^2+(3-3)^2+(3-3)^2+(4-3)^2+(5-3)^2\\\\&+(6-8)^2+(7-8)^2+(8-8)^2+(8-8)^2+(9-8)^2+(10-8)^2\\\\&=10+10=20.\\end{align*}[\/latex]<\/p>\n<p>For Data Set 2, the mean waiting time for Uber is [latex]\\bar x_1=\\frac{1+3+4+5+8+9}{6}=5,[\/latex] and the mean waiting time for Taxi is [latex]\\bar x_2=\\frac{2+3+6+7+8+10}{6}=6.[\/latex] The between-group and within-group variation are:<\/p>\n<p>[latex]SSTR=\\sum n_i(\\bar x_i-\\bar x)= n_1(\\bar x_1-\\bar x)^2+ n_2(\\bar x_2-\\bar x)^2=6(5-5.5)^2+6(6-5.5)^2=3.[\/latex]<\/p>\n<p>[latex]\\begin{align*}SSE&=\\sum (x_{ij}-\\bar x_i)^2\\\\&=(1-5)^2+(3-5)^2+(4-5)^2+(5-5)^2+(8-5)^2+(9-5)^2\\\\&+(2-6)^2+(3-6)^2+(6-6)^2+(7-6)^2+(8-6)^2+(10-6)^2\\\\&=46+46=92.\\end{align*}[\/latex]<\/p>\n<p>In both data sets, we have SST=SSTR+SSE. The two data sets have the same total variation SST=95. Data Set 1 has a smaller within-group variation SSE (20 versus 92). Data Set 1 has a larger between-group variation SSTR (75 versus 2).<\/p>\n<\/details>\n<\/div>\n<\/div>\n<p>For data set 1, it is clear that the data are from two populations with different means; for data set 2, however, it is hard to tell whether the data are from a single population or from two populations with similar means.<\/p>\n<p><strong>The main idea of one-way ANOVA is to decompose the total variation of the data (SST) into two parts: the variation within the samples (SSE) and the variation between sample means (SSTR)<\/strong>. <strong>Reject <\/strong>[latex]H_0: \\mu_1 = \\mu_2 = \\dots = \\mu_k[\/latex]<strong> if the variation between sample means is large compared to the variation within samples<\/strong>. Or reject [latex]H_0[\/latex]\u00a0if the ratio [latex]F = \\frac{SSTR \/ (k-1)}{SSE \/ (n-k)} = \\frac{MSTR}{MSE}[\/latex]\u00a0is too large, where MSTR is called the mean square of the treatments and MSE is the mean square error. The ratio follows an F distribution characterized by two degrees of freedom:<\/p>\n<ul>\n<li>The numerator degrees of freedom:\u00a0 [latex]df_n = k-1[\/latex],<\/li>\n<li>The denominator degrees of freedom:\u00a0 [latex]df_d = n-k[\/latex].<\/li>\n<\/ul>\n<p>Like chi-square tests, F tests are always right-tailed. That is both the rejection region and the p-value are upper-tailed probabilities.<\/p>\n","protected":false},"author":19,"menu_order":2,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-1198","chapter","type-chapter","status-publish","hentry"],"part":1189,"_links":{"self":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/users\/19"}],"version-history":[{"count":45,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1198\/revisions"}],"predecessor-version":[{"id":5533,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1198\/revisions\/5533"}],"part":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/parts\/1189"}],"metadata":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1198\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/media?parent=1198"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapter-type?post=1198"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/contributor?post=1198"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/license?post=1198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}