{"id":1112,"date":"2021-06-12T20:30:45","date_gmt":"2021-06-13T00:30:45","guid":{"rendered":"https:\/\/openbooks.macewan.ca\/rcommander\/?post_type=chapter&#038;p=1112"},"modified":"2025-05-07T18:07:13","modified_gmt":"2025-05-07T22:07:13","slug":"10-6-inferences-for-two-population-proportions","status":"publish","type":"chapter","link":"https:\/\/openbooks.macewan.ca\/introstats\/chapter\/10-6-inferences-for-two-population-proportions\/","title":{"raw":"10.6 Inferences for Two Population Proportions","rendered":"10.6 Inferences for Two Population Proportions"},"content":{"raw":"Previous studies suggest that more women than men have arthritis. The Centers for Disease Control and Prevention reported a survey of randomly selected Americans aged 65 and older. They found 411 of 1,012 men and 535 of 1,062 women had arthritis. Is there any evidence that women are more likely to suffer from arthritis than men? Let [latex]p_1[\/latex] be the proportion of male arthritis sufferers and [latex]p_2[\/latex] be the proportion of female sufferers. We want to test [latex]H_0: p_1 \\geq p_2[\/latex] versus [latex]H_a: p_1 &lt; p_2[\/latex] or [latex]H_0: p_1-p_2\\geq 0[\/latex] versus [latex]H_a: p_1 - p_2&lt;0[\/latex].\r\n\r\nInference on the population mean [latex]\\mu[\/latex] is based on the distribution of the sample mean [latex]\\bar X;[\/latex] inference on the difference of two population means [latex]\\mu_1-\\mu_2[\/latex] is based on the distribution of the difference between the sample means [latex]\\bar X_1-\\bar X_2[\/latex]; and inference on the population proportion [latex]p[\/latex] is based on the distribution of the sample proportion [latex]\\hat p[\/latex]. Similarly, inference on the difference of two population proportions [latex]p_1-p_2[\/latex] is based on the distribution of the difference between the sample proportions [latex]\\hat p_1-\\hat p_2[\/latex].\r\n<h2><strong>10.6.1 Sampling Distribution of Difference Between Two Sample Proportions [latex]\\hat{p}_1 - \\hat{p}_2[\/latex]<\/strong><\/h2>\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Key Facts: Sampling Distribution of Difference Between Two Sample Proportions<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nFor independent samples of size [latex]n_1[\/latex] and [latex]n_2[\/latex] from the two populations:\r\n<ul>\r\n \t<li>The mean of [latex]\\hat{p}_1 - \\hat{p}_2[\/latex] equals the difference of the population proportions, i.e., [latex]\\mu_{\\scriptsize \\hat{p}_1 - \\hat{p}_2} = \\mu_{\\scriptsize \\hat{p}_1} - \\mu_{\\scriptsize \\hat{p}_2} = p_1 - p_2[\/latex].<\/li>\r\n \t<li>The standard deviation of [latex]\\hat{p}_1 - \\hat{p}_2[\/latex]: [latex]\\sigma_{\\scriptsize \\hat{p}_1 - \\hat{p}_2} = \\sqrt{\\frac{p_1(1 - p_1)}{n_1} + \\frac{p_2(1-p_2)}{n_2}}[\/latex].\r\n<strong>These two conclusions are always true regardless of the sample sizes [latex]n_1[\/latex] and [latex]n_2[\/latex].<\/strong><\/li>\r\n \t<li>The shape of the distribution of [latex]\\hat{p}_1 - \\hat{p}_2[\/latex]: by the central limit theorem, when the sample sizes [latex]n_1[\/latex] and [latex]n_2[\/latex] are large enough, [latex]\\hat{p}_1 - \\hat{p}_2[\/latex] is approximately normally distributed. The rule of thumb is [latex]n_1 p_1 \\geq 5 , n_1 (1 - p_1) \\geq 5[\/latex] and [latex]n_2 p_2 \\geq 5 , n_2 (1 - p_2) \\geq 5[\/latex].<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\nTo summarize, when [latex]n_1 p_1 \\geq 5 , n_1 (1 - p_1) \\geq 5[\/latex] and [latex]n_2 p_2 \\geq 5 , n_2 (1 - p_2) \\geq 5[\/latex],\r\n<p align=\"center\">[latex]\\hat{p}_1 - \\hat{p}_2 \\sim N \\left( p_1 - p_2 , \\sqrt{\\frac{p_1(1 - p_1)}{n_1} + \\frac{p_2(1-p_2)}{n_2}} \\right).[\/latex]<\/p>\r\nThe standardized version is\r\n<p align=\"center\">[latex]Z = \\frac{(\\hat{p}_1 - \\hat{p}_2) - (p_1 - p_2)}{\\sqrt{ \\frac{p_1(1 - p_1)}{n_1} + \\frac{p_2(1 -p_2)}{n_2}}} \\sim N(0, 1).[\/latex]<\/p>\r\n\r\n<h2><strong>10.6.3 Two-Proportion z Interval for the Difference Between Two Proportions [latex]p_1 - p_2[\/latex]<\/strong><\/h2>\r\nA point estimate for the difference between two population proportions [latex](p_1 - p_2)[\/latex] is the difference between the sample proportions [latex](\\hat{p}_1 - \\hat{p}_2)[\/latex].\r\n<div class=\"textbox\">\r\n\r\n<strong>Assumptions<\/strong>:\r\n<ol>\r\n \t<li>Both samples are simple random samples from their respective populations.<\/li>\r\n \t<li>The two samples are independent.<\/li>\r\n \t<li>Large samples, all the number of successes, and the number of failures [latex]x_1, n_1 -x_1, x_2 [\/latex], and [latex]n_2 - x_2[\/latex] are at least 5.<\/li>\r\n<\/ol>\r\n<strong>Note:\u00a0<\/strong>As was the case with one-proportion inferences, [latex]p_1[\/latex] and [latex]p_2[\/latex] are generally unknown and estimated with [latex]\\hat{p}_1 = \\frac{x_1}{n_1}[\/latex] and [latex]\\hat{p}_2 = \\frac{x_2}{n_2}[\/latex]. Thus, since [latex]n_i\\hat{p}_i = n_i \\frac{x_i}{n_i} = x_i[\/latex] and [latex]n_i(1 - \\hat{p}_i) = n_i \\left( 1 - \\frac{x_i}{n_i} \\right) = n_i \\left( \\frac{n_i - x_i}{n_i} \\right) = n_i - x_i[\/latex], the sample is deemed sufficiently large if [latex]n_i \\hat{p}_i = x_i \\geq 5[\/latex] and [latex]n_i (1 - \\hat{p}_i) = n_i - x_i \\geq 5[\/latex] for [latex]i = 1, 2[\/latex].\r\n\r\nA [latex](1 \u2013 \\alpha) \\times 100\\%[\/latex] confidence interval for the difference between the population proportions [latex](p_1 - p_2)[\/latex] is\r\n<p align=\"center\">[latex](\\hat{p}_1 - \\hat{p}_2) \\pm z_{\\alpha \/ 2 } \\sqrt{ \\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1 - \\hat{p}_2)}{n_2}}[\/latex]<\/p>\r\nwhere [latex]z_{\\alpha \/ 2}[\/latex] is the <em>z<\/em> score such that the area under the standard normal curve to its right is [latex]\\frac{\\alpha}{2}[\/latex]. This is a two-tailed interval.\r\n\r\nA [latex](1 \u2013 \\alpha) \\times 100\\%[\/latex] upper-tail confidence interval is\r\n<p align=\"center\">[latex]\\left( (\\hat{p}_1 - \\hat{p}_2) - z_{\\alpha } \\sqrt{\\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1-\\hat{p}_2)}{n_2}}, 1 \\right),[\/latex]<\/p>\r\nand a [latex](1 \u2013 \\alpha) \\times 100\\%[\/latex] lower-tailed confidence interval is\r\n<p align=\"center\">[latex]\\left( -1 , (\\hat{p}_1 - \\hat{p}_2) + z_{\\alpha } \\sqrt{\\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1-\\hat{p}_2)}{n_2}} \\right).[\/latex]<\/p>\r\nNote that the largest possible value of [latex]p_1-p_2[\/latex] is 1 when [latex]p_1=1, p_2=0[\/latex], and the smallest possible value of [latex]p_1-p_2[\/latex] is -1 when [latex]p_1=0, p_2=1.[\/latex]\r\n\r\n<\/div>\r\n<h2><strong>10.6.2 Two-Proportion z Test for the Difference Between Two Proportions [latex]p_1 - p_2[\/latex]<\/strong><\/h2>\r\nRecall that the population proportion can be viewed as the average of the indicator random variable [latex] X = \\begin{cases}\r\n1 &amp; \\text{with probability } p \\\\\r\n0 &amp; \\text{with probability } 1-p\r\n\\end{cases} [\/latex] with a mean [latex]\\mu = p[\/latex] and standard deviation [latex]\\sigma = \\sqrt{p(1-p)}[\/latex]. Note that the standard deviation is a function in [latex]p[\/latex]. For a two-tailed test, the null hypothesis is that two population proportions are equal, that is, [latex]H_0: p_1 = p_2[\/latex]; consequently, if the null hypothesis is true, it follows that the populations have the same standard deviation. Therefore, similar to a pooled two-sample t-test, we can pool the two samples together to obtain a better estimate of the common standard deviation. If [latex]H_0: p_1 = p_2[\/latex] is true, let [latex]p_1 = p_2 = p_p[\/latex] , where [latex]p_p[\/latex] is the common standard deviation. Then, the test statistic becomes\r\n<p align=\"center\">[latex]Z = \\frac{(\\hat{p}_1 - \\hat{p}_2) - (p_1 - p_2)}{\\sqrt{ \\frac{p_1(1 - p_1)}{n_1} + \\frac{p_2(1 -p_2)}{n_2}}} = \\frac{(\\hat{p}_1 - \\hat{p}_2) - 0}{\\sqrt{ \\frac{p_{\\scriptsize p}(1 - p_{\\scriptsize p})}{n_1} + \\frac{p_{\\scriptsize p}(1 -p_{\\scriptsize p})}{n_2}}} = \\frac{\\hat{p}_1 - \\hat{p}_2}{\\sqrt{p_{\\scriptsize p} (1-p_{\\scriptsize p})} \\sqrt{\\frac{1}{n_1} + \\frac{1}{n_2}}}.[\/latex]<\/p>\r\nThe common proportion [latex]p_{\\scriptsize p}[\/latex] is estimated by\r\n<p style=\"text-align: center;\">[latex]\\hat{p}_{\\scriptsize p} = \\frac{x_1 + x_2}{n_1 + n_2}[\/latex].<\/p>\r\n\r\n<div class=\"textbox\">\r\n\r\n<strong>Assumptions<\/strong>:\r\n<ol>\r\n \t<li>Both samples are simple random samples from their respective populations.<\/li>\r\n \t<li>The two samples are independent.<\/li>\r\n \t<li>Large samples: all the number of successes and failures [latex]x_1, n_1 - x_1, x_2[\/latex], and [latex]n_2 - x_2[\/latex] and are at least 5.<\/li>\r\n<\/ol>\r\n<strong>Steps to perform a two-proportion z<\/strong><strong> test<\/strong>:\r\n<ol>\r\n \t<li>Set up the hypotheses:\r\n<div align=\"center\">\r\n<table style=\"width: 555px;\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\r\n<thead>\r\n<tr class=\"shaded\">\r\n<th scope=\"col\">\r\n<div align=\"center\">Two-tailed<\/div><\/th>\r\n<th scope=\"col\">\r\n<div align=\"center\">Right-tailed<\/div><\/th>\r\n<th scope=\"col\">\r\n<div align=\"center\">Left-tailed<\/div><\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td valign=\"top\" width=\"203\">\r\n<div align=\"center\">[latex]H_0: p_1 = p_2 [\/latex]<\/div><\/td>\r\n<td valign=\"top\" width=\"209\">\r\n<div align=\"center\">[latex]H_0: p_1 \\leq p_2 [\/latex]<\/div><\/td>\r\n<td valign=\"top\" width=\"209\">\r\n<div align=\"center\">[latex]H_0: p_1 \\geq p_2[\/latex]<\/div><\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"203\">\r\n<div align=\"center\">[latex]H_a: p_1 \\neq p_2 [\/latex]<\/div><\/td>\r\n<td valign=\"top\" width=\"209\">\r\n<div align=\"center\">[latex]H_a: p_1 \\: \\gt \\: p_2 [\/latex]<\/div><\/td>\r\n<td valign=\"top\" width=\"209\">\r\n<div align=\"center\">[latex]H_a: p_1 &lt; p_2[\/latex]<\/div><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div><\/li>\r\n \t<li>State the significance level [latex]\\alpha[\/latex].<\/li>\r\n \t<li>Compute the value of the test statistic:\r\n<p align=\"center\">[latex]z_o = \\frac{\\hat{p}_1 - \\hat{p}_2}{\\sqrt{\\hat{p}_{\\scriptsize p} (1 - \\hat{p}_{\\scriptsize p})} \\sqrt{ \\frac{1}{n_1} + \\frac{1}{n_2}}}[\/latex] with [latex]\\hat{p}_{\\scriptsize p} = \\frac{x_1 + x_2}{n_1 + n_2} , \\hat{p}_1 = \\frac{x_1}{n_1}, \\hat{p}_2 = \\frac{x_2}{n_2}[\/latex].<\/p>\r\n<\/li>\r\n \t<li>Find the P-value <strong>or<\/strong> rejection region.\r\n<div align=\"center\">\r\n<table class=\"first-col-border\" style=\"width: 100%; height: 90px;\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\r\n<thead>\r\n<tr class=\"border-bottom\" style=\"height: 15px;\">\r\n<td style=\"text-align: center; width: 22.790697674418606%; height: 15px;\"><\/td>\r\n<th style=\"text-align: center; width: 34.255813953488374%; height: 15px;\" scope=\"col\">Two-tailed<\/th>\r\n<th style=\"text-align: center; width: 24.093023255813954%; height: 15px;\" scope=\"col\">\r\n<div align=\"center\">Right-tailed<\/div><\/th>\r\n<th style=\"text-align: center; width: 18.744186046511626%; height: 15px;\" scope=\"col\">\r\n<div align=\"center\">Left-tailed<\/div><\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr style=\"height: 15px;\">\r\n<th style=\"width: 22.790697674418606%; height: 15px; text-align: left;\" scope=\"row\" valign=\"top\" width=\"149\">Null<\/th>\r\n<td style=\"text-align: center; width: 34.255813953488374%; height: 15px;\" valign=\"top\" width=\"189\">\r\n<div align=\"center\">[latex]H_0: p_1 = p_2 [\/latex]<\/div><\/td>\r\n<td style=\"text-align: center; width: 24.093023255813954%; height: 15px;\" valign=\"top\" width=\"180\">\r\n<div align=\"center\">[latex]H_0: p_1 \\leq p_2 [\/latex]<\/div><\/td>\r\n<td style=\"text-align: center; width: 18.744186046511626%; height: 15px;\" valign=\"top\" width=\"142\">\r\n<div align=\"center\">[latex]H_0: p_1 \\geq p_2[\/latex]<\/div><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px;\">\r\n<th style=\"width: 22.790697674418606%; height: 15px; text-align: left;\" scope=\"row\" valign=\"top\" width=\"149\">Alternative<\/th>\r\n<td style=\"text-align: center; width: 34.255813953488374%; height: 15px;\" valign=\"top\" width=\"189\">\r\n<div align=\"center\">[latex]H_a: p_1 \\neq p_2 [\/latex]<\/div><\/td>\r\n<td style=\"text-align: center; width: 24.093023255813954%; height: 15px;\" valign=\"top\" width=\"180\">\r\n<div align=\"center\">[latex]H_a: p_1 \\: \\gt \\: p_2 [\/latex]<\/div><\/td>\r\n<td style=\"text-align: center; width: 18.744186046511626%; height: 15px;\" valign=\"top\" width=\"142\">\r\n<div align=\"center\">[latex]H_a: p_1 &lt; p_2[\/latex]<\/div><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px;\">\r\n<th style=\"text-align: left; width: 22.790697674418606%; height: 15px;\" scope=\"row\" valign=\"top\" width=\"149\">P-value<\/th>\r\n<td style=\"text-align: center; width: 34.255813953488374%; height: 15px;\" valign=\"top\" width=\"189\">\r\n<div align=\"center\">[latex]2P(Z \\geq |z_o|)[\/latex]<\/div><\/td>\r\n<td style=\"text-align: center; width: 24.093023255813954%; height: 15px;\" valign=\"top\" width=\"180\">\r\n<div align=\"center\">[latex]P(Z \\geq z_o)[\/latex]<\/div><\/td>\r\n<td style=\"text-align: center; width: 18.744186046511626%; height: 15px;\" valign=\"top\" width=\"142\">\r\n<div align=\"center\">[latex]P(Z \\leq z_o)[\/latex]<\/div><\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px;\">\r\n<th style=\"text-align: left; width: 22.790697674418606%; height: 30px;\" scope=\"row\" valign=\"top\" width=\"149\" height=\"23\">Rejection region<\/th>\r\n<td style=\"text-align: center; width: 34.255813953488374%; height: 30px;\" valign=\"top\" width=\"189\">\r\n<div align=\"center\">[latex]Z \\geq z_{\\alpha \/ 2}[\/latex] or [latex]Z \\leq - z_{\\alpha \/ 2}[\/latex]<\/div><\/td>\r\n<td style=\"text-align: center; width: 24.093023255813954%; height: 30px;\" valign=\"top\" width=\"180\">\r\n<div align=\"center\">[latex]Z \\geq z_{\\alpha }[\/latex]<\/div><\/td>\r\n<td style=\"text-align: center; width: 18.744186046511626%; height: 30px;\" valign=\"top\" width=\"142\">\r\n<div align=\"center\">[latex]Z \\leq - z_{\\alpha }[\/latex]<\/div><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div><\/li>\r\n \t<li>Reject the null [latex]H_0[\/latex] if the P-value [latex] \\leq \\alpha[\/latex] or [latex]z_o[\/latex] falls in the rejection region.<\/li>\r\n \t<li>Conclusion.<\/li>\r\n<\/ol>\r\n<\/div>\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Example: Two-Proportion z Test and z Interval<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nThe Centers for Disease Control and Prevention reported a survey of randomly selected Americans aged 65 and older. They found 411 of 1,012 men and 535 of 1,062 women had arthritis.\r\n<ol type=\"a\">\r\n \t<li>Is there any evidence that women are more likely to suffer from arthritis than men? Test at the 1% significance level.<strong>\r\n<\/strong>Let [latex]p_1[\/latex] be the proportion of men who have arthritis and [latex]p_1[\/latex] be the proportion of women who have arthritis.\r\n<strong>Check the assumptions<\/strong>:\r\n<ol>\r\n \t<li>We have simple random samples.<\/li>\r\n \t<li>The two samples are independent.<\/li>\r\n \t<li>All the number of successes and failures [latex]x_1 = 411, n_1 - x_1 = 601, x_2= 535[\/latex], and [latex]n_2 - x_2 = 572[\/latex] are at least 5.<\/li>\r\n<\/ol>\r\n<strong>Steps<\/strong>:\r\n<ol>\r\n \t<li>Set up the hypotheses: [latex]H_0: p_1 \\geq p_2[\/latex]: versus [latex]H_a: p_1 &lt; p_2[\/latex].<\/li>\r\n \t<li>State the significance level [latex]\\alpha = 0.01[\/latex].<\/li>\r\n \t<li>The test statistic:\r\n<div align=\"center\">[latex]z_o = \\frac{\\hat{p}_1 - \\hat{p}_2}{\\sqrt{\\hat{p}_p (1 - \\hat{p}_p)} \\sqrt{ \\frac{1}{n_1} + \\frac{1}{n_2}}} = \\frac{0.406 - 0.504}{\\sqrt{0.456 (1 - 0.456)} \\sqrt{ \\frac{1}{1012} + \\frac{1}{1062}}} = -4.479[\/latex]<\/div>\r\nwhere\r\n<p align=\"center\">[latex]\\hat{p}_p = \\frac{x_1 + x_2}{n_1 + n_2} = \\frac{411 + 535}{1012 + 1062} = 0.456 , [\/latex] [latex]\\hat{p}_1 = \\frac{x_1}{n_1} = \\frac{411}{1012} = 0.406 , \\hat{p}_2 = \\frac{x_2}{n_2} = \\frac{535}{1062} = 0.504.[\/latex]<\/p>\r\n<\/li>\r\n \t<li>Find the P-value. For a left-tailed test, the P-value is the area to the left of the observed test statistic [latex]z_o[\/latex]:\r\nP-value = [latex]P(Z \\leq z_o) = P(Z \\leq - 4.479) \\approx 0[\/latex].<\/li>\r\n \t<li>Decision: Since the P-value [latex]\\approx 0 &lt; 0.01 (\\alpha)[\/latex], we should reject the null [latex]H_0[\/latex].<\/li>\r\n \t<li>Conclusion: At the 1% significance level, we have sufficient evidence that women are <strong>more likely<\/strong> to suffer from arthritis than men.<\/li>\r\n<\/ol>\r\n<\/li>\r\n \t<li>Obtain a confidence interval for [latex]p_1 - p_2[\/latex], corresponding to the test in part a).<strong>\r\n<\/strong>For a left-tailed test at the 1% significance level, we should obtain a 99% lower-tailed interval. [latex]1 - \\alpha = 0.99 \\Longrightarrow \\alpha = 0.01 \\Longrightarrow z_{\\alpha } = z_{0.01} = 2.33 [\/latex].\r\nA 99% lower-tail confidence interval for [latex]p_1 - p_2[\/latex] is\r\n<p align=\"center\">[latex] \\left( -1 , (\\hat{p}_1 - \\hat{p}_2) + z_{\\alpha } \\sqrt{\\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1-\\hat{p}_2)}{n_2}} \\right)[\/latex]<\/p>\r\n<p align=\"center\">[latex] = \u00a0\\left( - 1 , (0.406 - 0.504) + 2.33 \\sqrt{\\frac{0.406 (1 - 0.406)}{1012} + \\frac{0.504(1 - 0.504)}{1062}} \\right) = ( - 1 , - 0.047)[\/latex].<\/p>\r\n<strong>Interpretation<\/strong>: We are 99% confident that [latex](p_1 - p_2)[\/latex] is below -0.047. That is, we are 99% confident that the proportion of women who have arthritis is at least 0.047 higher than the proportion of men.<\/li>\r\n \t<li>Does the interval in part b) support the conclusion of the test in part a)?<strong>\r\n<\/strong>Yes. In part a), we reject [latex]H_0[\/latex] and claim [latex]H_a: p_1 &lt; p_2[\/latex] (suggesting men have a smaller proportion than women). In part b), the entire interval is below 0, so we are 99% confident that [latex]p_1 - p_2 &lt; 0[\/latex].<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n<div style=\"height: 55px; margin-top: 5px;\"><img class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" \/><\/div>\r\n<div><\/div>\r\n<div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Exercises: Inference on Proportions<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nIt is believed that there is an association between breast cancer and smoking. The following table summarizes the results of an observational study of 200 females classified by their disease and smoking status.\r\n<table class=\"first-col-border last-col-border\" style=\"width: 100%;\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\" align=\"center\">\r\n<thead>\r\n<tr class=\"border-bottom\">\r\n<td valign=\"top\" width=\"142\"><\/td>\r\n<th scope=\"col\" valign=\"top\" width=\"98\">\r\n<div align=\"center\"><strong>Smoker <\/strong><\/div><\/th>\r\n<th scope=\"col\" valign=\"top\" width=\"109\">\r\n<div align=\"center\"><strong>Non-smoker <\/strong><\/div><\/th>\r\n<th scope=\"col\" valign=\"top\" width=\"76\">\r\n<div align=\"center\"><strong>Total<\/strong><\/div><\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<th scope=\"row\" valign=\"top\" width=\"142\"><strong>Breast Cancer <\/strong><\/th>\r\n<td valign=\"top\" width=\"98\">\r\n<div align=\"center\">10<\/div><\/td>\r\n<td valign=\"top\" width=\"109\">\r\n<div align=\"center\">30<\/div><\/td>\r\n<td valign=\"top\" width=\"76\">\r\n<div align=\"center\">40<\/div><\/td>\r\n<\/tr>\r\n<tr class=\"border-bottom\">\r\n<th scope=\"row\" valign=\"top\" width=\"142\"><strong>Cancer Free <\/strong><\/th>\r\n<td valign=\"top\" width=\"98\">\r\n<div align=\"center\">20<\/div><\/td>\r\n<td valign=\"top\" width=\"109\">\r\n<div align=\"center\">140<\/div><\/td>\r\n<td valign=\"top\" width=\"76\">\r\n<div align=\"center\">160<\/div><\/td>\r\n<\/tr>\r\n<tr>\r\n<th scope=\"row\" valign=\"top\" width=\"142\"><strong>Total<\/strong><\/th>\r\n<td valign=\"top\" width=\"98\">\r\n<div align=\"center\">30<\/div><\/td>\r\n<td valign=\"top\" width=\"109\">\r\n<div align=\"center\">170<\/div><\/td>\r\n<td valign=\"top\" width=\"76\">\r\n<div align=\"center\">200<\/div><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<ol start=\"1\" type=\"a\">\r\n \t<li>Obtain a 99% confidence interval for the proportion of females with breast cancer.<\/li>\r\n \t<li>Obtain the minimum sample size <em>n<\/em> needed so that we are 95% confident that the error is at most 0.02 when [latex]\\hat{p}[\/latex] is used to estimate <em>p<\/em>. Use the conservative estimate [latex]\\hat{p} = 0.5[\/latex].<\/li>\r\n \t<li>Test at the 5% significance level whether the proportion of females with breast cancer is higher among smokers than non-smokers.<\/li>\r\n \t<li>Obtain a confidence interval corresponding to the test in part c).<\/li>\r\n<\/ol>\r\n<details><summary>Show\/Hide Answer<\/summary>\r\n<ol type=\"a\">\r\n \t<li>Obtain a 99% confidence interval for the proportion of females with breast cancer.\r\nThe point estimate for the proportion of females with breast cancer is [latex]\\hat p = \\frac{x}{n} = \\frac{40}{200} = 0.2[\/latex].\r\n<p align=\"center\">[latex]1 - \\alpha = 0.99 \\Longrightarrow \\alpha = 0.01 \\Longrightarrow z_{\\alpha \/ 2} = z_{0.005} = 2.575 [\/latex].<\/p>\r\nThe 99% confidence interval for the proportion of breast cancer is\r\n<p align=\"center\">[latex]\\hat{p} \\pm \\sqrt{\\frac{\\hat{p} (1 - \\hat{p})}{n}} = 0.2 \\pm 2.575 \\times \\sqrt{\\frac{0.2(1-0.2)}{200}} = (0.127, 0.273)[\/latex].<\/p>\r\n<strong>Interpretation<\/strong>: We are 99% confident that the proportion of females with breast cancer is somewhere between 0.127 and 0.273.<\/li>\r\n \t<li>Obtain the minimum sample size <em>n<\/em> needed so that we are 95% confident that the error is at most 0.02 when [latex]\\hat{p}[\/latex] is used to estimate <em>p<\/em>. Use the conservative estimate [latex]\\hat{p} = 0.5[\/latex].\r\n<p align=\"center\">[latex]n = 0.25 \\left( \\frac{z_{\\alpha \/2 }}{E} \\right)^2 = 0.25 \\left( \\frac{2.575}{0.02} \\right)^2 = 4144.14, \\quad \\text{rounded up to } n=4145 [\/latex].<\/p>\r\n<\/li>\r\n \t<li>Test at the 5% significance level whether the proportion of females with breast cancer is higher among smokers than non-smokers.\r\nLet [latex]p_1[\/latex] be the proportion of females with breast cancer among smokers and [latex]p_2[\/latex] be the proportion of females with breast cancer among non-smokers.\r\n<strong>Check the assumptions<\/strong>:\r\n<ol>\r\n \t<li>We have simple random samples.<\/li>\r\n \t<li>The two samples are independent.<\/li>\r\n \t<li>All the number of successes and failures [latex]x_1 = 10, n_1 - x_1 = 20, x_2 = 30[\/latex] and [latex]n_2 - x_2 = 140[\/latex] are at least 5.<\/li>\r\n<\/ol>\r\n<strong>Steps<\/strong>:\r\n<ol>\r\n \t<li>Set up the hypotheses: [latex]H_0: p_1 \\leq p_2 [\/latex] versus [latex]H_a: p_1 \\: \\gt \\: p_2 [\/latex].<\/li>\r\n \t<li>The significance level [latex]\\alpha = 0.05[\/latex].<\/li>\r\n \t<li>Compute the test statistic:\r\n<p align=\"center\">[latex]z_o = \\frac{\\hat{p}_1 - \\hat{p}_2}{\\sqrt{\\hat{p}_p (1 - \\hat{p}_p)} \\sqrt{\\frac{1}{n_1} + \\frac{1}{n_2} }} = \\frac{0.333 - 0.176}{\\sqrt{0.2 (1-0.2)} \\sqrt{ \\frac{1}{30} + \\frac{1}{170}}} = 1.982 [\/latex],<\/p>\r\nwhere\r\n<p align=\"center\">[latex]\\hat{p}_p = \\frac{x_1 + x_2}{n_1 + n_2} = \\frac{10 + 30}{30 + 170} = 0.2,[\/latex] [latex]\\hat{p}_1 = \\frac{x_1}{n_1} = \\frac{10}{30} = 0.333, \\hat{p}_2 = \\frac{x_2}{n_2} = \\frac{30}{170} = 0.176[\/latex].<\/p>\r\n<\/li>\r\n \t<li>Find the P-value. For a right-tailed test, the P-value is the area to the right of the observed test statistic [latex]z_o[\/latex].\r\n<p align=\"center\">P-value = [latex]P(Z \\geq z_o) = P(Z \\geq 1.982) = P( Z \\leq -1.982) = 0.0239[\/latex].<\/p>\r\n<\/li>\r\n \t<li>Decision: Since the P-value [latex]=0.0239 &lt; 0.05(\\alpha)[\/latex], we should reject the null [latex]H_0[\/latex].<\/li>\r\n \t<li>Conclusion: At the 5% significance level, we have sufficient evidence that the proportion of females with breast cancer is higher among smokers than non-smokers.<\/li>\r\n<\/ol>\r\n<\/li>\r\n \t<li>Obtain a confidence interval corresponding to the test in part c).\r\nFor a right-tailed test at the 5% significance level, we should obtain a 95% upper-tailed confidence interval\r\n<p align=\"center\">.[latex] \\left( (\\hat{p}_1 - \\hat{p}_2) - z_\\alpha \\sqrt{\\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1-\\hat{p}_2)}{n_2}} , 1\\right) [\/latex]<\/p>\r\n<p align=\"center\">[latex] = \u00a0\\left( (0.333 - 0.176) - 1.645 \\sqrt{\\frac{0.333 (1 - 0.333)}{30} + \\frac{0.176(1 - 0.176)}{170}}, 1 \\right)= ( 0.0075 , 1)[\/latex].<\/p>\r\n<strong>Interpretation<\/strong>: We are 95% confident that the proportion of females with breast cancer is at least 0.0075 higher for smokers than non-smokers.<\/li>\r\n<\/ol>\r\n<\/details><\/div>\r\n<\/div>\r\n<\/div>","rendered":"<p>Previous studies suggest that more women than men have arthritis. The Centers for Disease Control and Prevention reported a survey of randomly selected Americans aged 65 and older. They found 411 of 1,012 men and 535 of 1,062 women had arthritis. Is there any evidence that women are more likely to suffer from arthritis than men? Let [latex]p_1[\/latex] be the proportion of male arthritis sufferers and [latex]p_2[\/latex] be the proportion of female sufferers. We want to test [latex]H_0: p_1 \\geq p_2[\/latex] versus [latex]H_a: p_1 < p_2[\/latex] or [latex]H_0: p_1-p_2\\geq 0[\/latex] versus [latex]H_a: p_1 - p_2<0[\/latex].\n\nInference on the population mean [latex]\\mu[\/latex] is based on the distribution of the sample mean [latex]\\bar X;[\/latex] inference on the difference of two population means [latex]\\mu_1-\\mu_2[\/latex] is based on the distribution of the difference between the sample means [latex]\\bar X_1-\\bar X_2[\/latex]; and inference on the population proportion [latex]p[\/latex] is based on the distribution of the sample proportion [latex]\\hat p[\/latex]. Similarly, inference on the difference of two population proportions [latex]p_1-p_2[\/latex] is based on the distribution of the difference between the sample proportions [latex]\\hat p_1-\\hat p_2[\/latex].\n\n\n<h2><strong>10.6.1 Sampling Distribution of Difference Between Two Sample Proportions [latex]\\hat{p}_1 - \\hat{p}_2[\/latex]<\/strong><\/h2>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Key Facts: Sampling Distribution of Difference Between Two Sample Proportions<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>For independent samples of size [latex]n_1[\/latex] and [latex]n_2[\/latex] from the two populations:<\/p>\n<ul>\n<li>The mean of [latex]\\hat{p}_1 - \\hat{p}_2[\/latex] equals the difference of the population proportions, i.e., [latex]\\mu_{\\scriptsize \\hat{p}_1 - \\hat{p}_2} = \\mu_{\\scriptsize \\hat{p}_1} - \\mu_{\\scriptsize \\hat{p}_2} = p_1 - p_2[\/latex].<\/li>\n<li>The standard deviation of [latex]\\hat{p}_1 - \\hat{p}_2[\/latex]: [latex]\\sigma_{\\scriptsize \\hat{p}_1 - \\hat{p}_2} = \\sqrt{\\frac{p_1(1 - p_1)}{n_1} + \\frac{p_2(1-p_2)}{n_2}}[\/latex].<br \/>\n<strong>These two conclusions are always true regardless of the sample sizes [latex]n_1[\/latex] and [latex]n_2[\/latex].<\/strong><\/li>\n<li>The shape of the distribution of [latex]\\hat{p}_1 - \\hat{p}_2[\/latex]: by the central limit theorem, when the sample sizes [latex]n_1[\/latex] and [latex]n_2[\/latex] are large enough, [latex]\\hat{p}_1 - \\hat{p}_2[\/latex] is approximately normally distributed. The rule of thumb is [latex]n_1 p_1 \\geq 5 , n_1 (1 - p_1) \\geq 5[\/latex] and [latex]n_2 p_2 \\geq 5 , n_2 (1 - p_2) \\geq 5[\/latex].<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<p>To summarize, when [latex]n_1 p_1 \\geq 5 , n_1 (1 - p_1) \\geq 5[\/latex] and [latex]n_2 p_2 \\geq 5 , n_2 (1 - p_2) \\geq 5[\/latex],<\/p>\n<p style=\"text-align: center;\">[latex]\\hat{p}_1 - \\hat{p}_2 \\sim N \\left( p_1 - p_2 , \\sqrt{\\frac{p_1(1 - p_1)}{n_1} + \\frac{p_2(1-p_2)}{n_2}} \\right).[\/latex]<\/p>\n<p>The standardized version is<\/p>\n<p style=\"text-align: center;\">[latex]Z = \\frac{(\\hat{p}_1 - \\hat{p}_2) - (p_1 - p_2)}{\\sqrt{ \\frac{p_1(1 - p_1)}{n_1} + \\frac{p_2(1 -p_2)}{n_2}}} \\sim N(0, 1).[\/latex]<\/p>\n<h2><strong>10.6.3 Two-Proportion z Interval for the Difference Between Two Proportions [latex]p_1 - p_2[\/latex]<\/strong><\/h2>\n<p>A point estimate for the difference between two population proportions [latex](p_1 - p_2)[\/latex] is the difference between the sample proportions [latex](\\hat{p}_1 - \\hat{p}_2)[\/latex].<\/p>\n<div class=\"textbox\">\n<p><strong>Assumptions<\/strong>:<\/p>\n<ol>\n<li>Both samples are simple random samples from their respective populations.<\/li>\n<li>The two samples are independent.<\/li>\n<li>Large samples, all the number of successes, and the number of failures [latex]x_1, n_1 -x_1, x_2[\/latex], and [latex]n_2 - x_2[\/latex] are at least 5.<\/li>\n<\/ol>\n<p><strong>Note:\u00a0<\/strong>As was the case with one-proportion inferences, [latex]p_1[\/latex] and [latex]p_2[\/latex] are generally unknown and estimated with [latex]\\hat{p}_1 = \\frac{x_1}{n_1}[\/latex] and [latex]\\hat{p}_2 = \\frac{x_2}{n_2}[\/latex]. Thus, since [latex]n_i\\hat{p}_i = n_i \\frac{x_i}{n_i} = x_i[\/latex] and [latex]n_i(1 - \\hat{p}_i) = n_i \\left( 1 - \\frac{x_i}{n_i} \\right) = n_i \\left( \\frac{n_i - x_i}{n_i} \\right) = n_i - x_i[\/latex], the sample is deemed sufficiently large if [latex]n_i \\hat{p}_i = x_i \\geq 5[\/latex] and [latex]n_i (1 - \\hat{p}_i) = n_i - x_i \\geq 5[\/latex] for [latex]i = 1, 2[\/latex].<\/p>\n<p>A [latex](1 \u2013 \\alpha) \\times 100\\%[\/latex] confidence interval for the difference between the population proportions [latex](p_1 - p_2)[\/latex] is<\/p>\n<p style=\"text-align: center;\">[latex](\\hat{p}_1 - \\hat{p}_2) \\pm z_{\\alpha \/ 2 } \\sqrt{ \\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1 - \\hat{p}_2)}{n_2}}[\/latex]<\/p>\n<p>where [latex]z_{\\alpha \/ 2}[\/latex] is the <em>z<\/em> score such that the area under the standard normal curve to its right is [latex]\\frac{\\alpha}{2}[\/latex]. This is a two-tailed interval.<\/p>\n<p>A [latex](1 \u2013 \\alpha) \\times 100\\%[\/latex] upper-tail confidence interval is<\/p>\n<p style=\"text-align: center;\">[latex]\\left( (\\hat{p}_1 - \\hat{p}_2) - z_{\\alpha } \\sqrt{\\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1-\\hat{p}_2)}{n_2}}, 1 \\right),[\/latex]<\/p>\n<p>and a [latex](1 \u2013 \\alpha) \\times 100\\%[\/latex] lower-tailed confidence interval is<\/p>\n<p style=\"text-align: center;\">[latex]\\left( -1 , (\\hat{p}_1 - \\hat{p}_2) + z_{\\alpha } \\sqrt{\\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1-\\hat{p}_2)}{n_2}} \\right).[\/latex]<\/p>\n<p>Note that the largest possible value of [latex]p_1-p_2[\/latex] is 1 when [latex]p_1=1, p_2=0[\/latex], and the smallest possible value of [latex]p_1-p_2[\/latex] is -1 when [latex]p_1=0, p_2=1.[\/latex]<\/p>\n<\/div>\n<h2><strong>10.6.2 Two-Proportion z Test for the Difference Between Two Proportions [latex]p_1 - p_2[\/latex]<\/strong><\/h2>\n<p>Recall that the population proportion can be viewed as the average of the indicator random variable [latex]X = \\begin{cases}  1 & \\text{with probability } p \\\\  0 & \\text{with probability } 1-p  \\end{cases}[\/latex] with a mean [latex]\\mu = p[\/latex] and standard deviation [latex]\\sigma = \\sqrt{p(1-p)}[\/latex]. Note that the standard deviation is a function in [latex]p[\/latex]. For a two-tailed test, the null hypothesis is that two population proportions are equal, that is, [latex]H_0: p_1 = p_2[\/latex]; consequently, if the null hypothesis is true, it follows that the populations have the same standard deviation. Therefore, similar to a pooled two-sample t-test, we can pool the two samples together to obtain a better estimate of the common standard deviation. If [latex]H_0: p_1 = p_2[\/latex] is true, let [latex]p_1 = p_2 = p_p[\/latex] , where [latex]p_p[\/latex] is the common standard deviation. Then, the test statistic becomes<\/p>\n<p style=\"text-align: center;\">[latex]Z = \\frac{(\\hat{p}_1 - \\hat{p}_2) - (p_1 - p_2)}{\\sqrt{ \\frac{p_1(1 - p_1)}{n_1} + \\frac{p_2(1 -p_2)}{n_2}}} = \\frac{(\\hat{p}_1 - \\hat{p}_2) - 0}{\\sqrt{ \\frac{p_{\\scriptsize p}(1 - p_{\\scriptsize p})}{n_1} + \\frac{p_{\\scriptsize p}(1 -p_{\\scriptsize p})}{n_2}}} = \\frac{\\hat{p}_1 - \\hat{p}_2}{\\sqrt{p_{\\scriptsize p} (1-p_{\\scriptsize p})} \\sqrt{\\frac{1}{n_1} + \\frac{1}{n_2}}}.[\/latex]<\/p>\n<p>The common proportion [latex]p_{\\scriptsize p}[\/latex] is estimated by<\/p>\n<p style=\"text-align: center;\">[latex]\\hat{p}_{\\scriptsize p} = \\frac{x_1 + x_2}{n_1 + n_2}[\/latex].<\/p>\n<div class=\"textbox\">\n<p><strong>Assumptions<\/strong>:<\/p>\n<ol>\n<li>Both samples are simple random samples from their respective populations.<\/li>\n<li>The two samples are independent.<\/li>\n<li>Large samples: all the number of successes and failures [latex]x_1, n_1 - x_1, x_2[\/latex], and [latex]n_2 - x_2[\/latex] and are at least 5.<\/li>\n<\/ol>\n<p><strong>Steps to perform a two-proportion z<\/strong><strong> test<\/strong>:<\/p>\n<ol>\n<li>Set up the hypotheses:\n<div style=\"margin: auto;\">\n<table style=\"width: 555px; border-spacing: 0px;\" cellpadding=\"0\">\n<thead>\n<tr class=\"shaded\">\n<th scope=\"col\">\n<div style=\"margin: auto;\">Two-tailed<\/div>\n<\/th>\n<th scope=\"col\">\n<div style=\"margin: auto;\">Right-tailed<\/div>\n<\/th>\n<th scope=\"col\">\n<div style=\"margin: auto;\">Left-tailed<\/div>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td valign=\"top\" style=\"width: 203px;\">\n<div style=\"margin: auto;\">[latex]H_0: p_1 = p_2[\/latex]<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 209px;\">\n<div style=\"margin: auto;\">[latex]H_0: p_1 \\leq p_2[\/latex]<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 209px;\">\n<div style=\"margin: auto;\">[latex]H_0: p_1 \\geq p_2[\/latex]<\/div>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" style=\"width: 203px;\">\n<div style=\"margin: auto;\">[latex]H_a: p_1 \\neq p_2[\/latex]<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 209px;\">\n<div style=\"margin: auto;\">[latex]H_a: p_1 \\: \\gt \\: p_2[\/latex]<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 209px;\">\n<div style=\"margin: auto;\">[latex]H_a: p_1 < p_2[\/latex]<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/li>\n<li>State the significance level [latex]\\alpha[\/latex].<\/li>\n<li>Compute the value of the test statistic:\n<p style=\"text-align: center;\">[latex]z_o = \\frac{\\hat{p}_1 - \\hat{p}_2}{\\sqrt{\\hat{p}_{\\scriptsize p} (1 - \\hat{p}_{\\scriptsize p})} \\sqrt{ \\frac{1}{n_1} + \\frac{1}{n_2}}}[\/latex] with [latex]\\hat{p}_{\\scriptsize p} = \\frac{x_1 + x_2}{n_1 + n_2} , \\hat{p}_1 = \\frac{x_1}{n_1}, \\hat{p}_2 = \\frac{x_2}{n_2}[\/latex].<\/p>\n<\/li>\n<li>Find the P-value <strong>or<\/strong> rejection region.\n<div style=\"margin: auto;\">\n<table class=\"first-col-border\" style=\"width: 100%; height: 90px; border-spacing: 0px;\" cellpadding=\"0\">\n<thead>\n<tr class=\"border-bottom\" style=\"height: 15px;\">\n<td style=\"text-align: center; width: 22.790697674418606%; height: 15px;\"><\/td>\n<th style=\"text-align: center; width: 34.255813953488374%; height: 15px;\" scope=\"col\">Two-tailed<\/th>\n<th style=\"text-align: center; width: 24.093023255813954%; height: 15px;\" scope=\"col\">\n<div style=\"margin: auto;\">Right-tailed<\/div>\n<\/th>\n<th style=\"text-align: center; width: 18.744186046511626%; height: 15px;\" scope=\"col\">\n<div style=\"margin: auto;\">Left-tailed<\/div>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"height: 15px;\">\n<th style=\"width: 22.790697674418606%; height: 15px; text-align: left; width: 149px;\" scope=\"row\" valign=\"top\">Null<\/th>\n<td style=\"text-align: center; width: 34.255813953488374%; height: 15px; width: 189px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]H_0: p_1 = p_2[\/latex]<\/div>\n<\/td>\n<td style=\"text-align: center; width: 24.093023255813954%; height: 15px; width: 180px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]H_0: p_1 \\leq p_2[\/latex]<\/div>\n<\/td>\n<td style=\"text-align: center; width: 18.744186046511626%; height: 15px; width: 142px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]H_0: p_1 \\geq p_2[\/latex]<\/div>\n<\/td>\n<\/tr>\n<tr style=\"height: 15px;\">\n<th style=\"width: 22.790697674418606%; height: 15px; text-align: left; width: 149px;\" scope=\"row\" valign=\"top\">Alternative<\/th>\n<td style=\"text-align: center; width: 34.255813953488374%; height: 15px; width: 189px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]H_a: p_1 \\neq p_2[\/latex]<\/div>\n<\/td>\n<td style=\"text-align: center; width: 24.093023255813954%; height: 15px; width: 180px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]H_a: p_1 \\: \\gt \\: p_2[\/latex]<\/div>\n<\/td>\n<td style=\"text-align: center; width: 18.744186046511626%; height: 15px; width: 142px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]H_a: p_1 < p_2[\/latex]<\/div>\n<\/td>\n<\/tr>\n<tr style=\"height: 15px;\">\n<th style=\"text-align: left; width: 22.790697674418606%; height: 15px; width: 149px;\" scope=\"row\" valign=\"top\">P-value<\/th>\n<td style=\"text-align: center; width: 34.255813953488374%; height: 15px; width: 189px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]2P(Z \\geq |z_o|)[\/latex]<\/div>\n<\/td>\n<td style=\"text-align: center; width: 24.093023255813954%; height: 15px; width: 180px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]P(Z \\geq z_o)[\/latex]<\/div>\n<\/td>\n<td style=\"text-align: center; width: 18.744186046511626%; height: 15px; width: 142px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]P(Z \\leq z_o)[\/latex]<\/div>\n<\/td>\n<\/tr>\n<tr style=\"height: 30px;\">\n<th style=\"text-align: left; width: 22.790697674418606%; height: 30px; width: 149px; height: 23px;\" scope=\"row\" valign=\"top\">Rejection region<\/th>\n<td style=\"text-align: center; width: 34.255813953488374%; height: 30px; width: 189px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]Z \\geq z_{\\alpha \/ 2}[\/latex] or [latex]Z \\leq - z_{\\alpha \/ 2}[\/latex]<\/div>\n<\/td>\n<td style=\"text-align: center; width: 24.093023255813954%; height: 30px; width: 180px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]Z \\geq z_{\\alpha }[\/latex]<\/div>\n<\/td>\n<td style=\"text-align: center; width: 18.744186046511626%; height: 30px; width: 142px;\" valign=\"top\">\n<div style=\"margin: auto;\">[latex]Z \\leq - z_{\\alpha }[\/latex]<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/li>\n<li>Reject the null [latex]H_0[\/latex] if the P-value [latex]\\leq \\alpha[\/latex] or [latex]z_o[\/latex] falls in the rejection region.<\/li>\n<li>Conclusion.<\/li>\n<\/ol>\n<\/div>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Example: Two-Proportion z Test and z Interval<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>The Centers for Disease Control and Prevention reported a survey of randomly selected Americans aged 65 and older. They found 411 of 1,012 men and 535 of 1,062 women had arthritis.<\/p>\n<ol type=\"a\">\n<li>Is there any evidence that women are more likely to suffer from arthritis than men? Test at the 1% significance level.<strong><br \/>\n<\/strong>Let [latex]p_1[\/latex] be the proportion of men who have arthritis and [latex]p_1[\/latex] be the proportion of women who have arthritis.<br \/>\n<strong>Check the assumptions<\/strong>:<\/p>\n<ol>\n<li>We have simple random samples.<\/li>\n<li>The two samples are independent.<\/li>\n<li>All the number of successes and failures [latex]x_1 = 411, n_1 - x_1 = 601, x_2= 535[\/latex], and [latex]n_2 - x_2 = 572[\/latex] are at least 5.<\/li>\n<\/ol>\n<p><strong>Steps<\/strong>:<\/p>\n<ol>\n<li>Set up the hypotheses: [latex]H_0: p_1 \\geq p_2[\/latex]: versus [latex]H_a: p_1 < p_2[\/latex].<\/li>\n<li>State the significance level [latex]\\alpha = 0.01[\/latex].<\/li>\n<li>The test statistic:\n<div style=\"margin: auto;\">[latex]z_o = \\frac{\\hat{p}_1 - \\hat{p}_2}{\\sqrt{\\hat{p}_p (1 - \\hat{p}_p)} \\sqrt{ \\frac{1}{n_1} + \\frac{1}{n_2}}} = \\frac{0.406 - 0.504}{\\sqrt{0.456 (1 - 0.456)} \\sqrt{ \\frac{1}{1012} + \\frac{1}{1062}}} = -4.479[\/latex]<\/div>\n<p>where<\/p>\n<p style=\"text-align: center;\">[latex]\\hat{p}_p = \\frac{x_1 + x_2}{n_1 + n_2} = \\frac{411 + 535}{1012 + 1062} = 0.456 ,[\/latex] [latex]\\hat{p}_1 = \\frac{x_1}{n_1} = \\frac{411}{1012} = 0.406 , \\hat{p}_2 = \\frac{x_2}{n_2} = \\frac{535}{1062} = 0.504.[\/latex]<\/p>\n<\/li>\n<li>Find the P-value. For a left-tailed test, the P-value is the area to the left of the observed test statistic [latex]z_o[\/latex]:<br \/>\nP-value = [latex]P(Z \\leq z_o) = P(Z \\leq - 4.479) \\approx 0[\/latex].<\/li>\n<li>Decision: Since the P-value [latex]\\approx 0 < 0.01 (\\alpha)[\/latex], we should reject the null [latex]H_0[\/latex].<\/li>\n<li>Conclusion: At the 1% significance level, we have sufficient evidence that women are <strong>more likely<\/strong> to suffer from arthritis than men.<\/li>\n<\/ol>\n<\/li>\n<li>Obtain a confidence interval for [latex]p_1 - p_2[\/latex], corresponding to the test in part a).<strong><br \/>\n<\/strong>For a left-tailed test at the 1% significance level, we should obtain a 99% lower-tailed interval. [latex]1 - \\alpha = 0.99 \\Longrightarrow \\alpha = 0.01 \\Longrightarrow z_{\\alpha } = z_{0.01} = 2.33[\/latex].<br \/>\nA 99% lower-tail confidence interval for [latex]p_1 - p_2[\/latex] is<\/p>\n<p style=\"text-align: center;\">[latex]\\left( -1 , (\\hat{p}_1 - \\hat{p}_2) + z_{\\alpha } \\sqrt{\\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1-\\hat{p}_2)}{n_2}} \\right)[\/latex]<\/p>\n<p style=\"text-align: center;\">[latex]= \u00a0\\left( - 1 , (0.406 - 0.504) + 2.33 \\sqrt{\\frac{0.406 (1 - 0.406)}{1012} + \\frac{0.504(1 - 0.504)}{1062}} \\right) = ( - 1 , - 0.047)[\/latex].<\/p>\n<p><strong>Interpretation<\/strong>: We are 99% confident that [latex](p_1 - p_2)[\/latex] is below -0.047. That is, we are 99% confident that the proportion of women who have arthritis is at least 0.047 higher than the proportion of men.<\/li>\n<li>Does the interval in part b) support the conclusion of the test in part a)?<strong><br \/>\n<\/strong>Yes. In part a), we reject [latex]H_0[\/latex] and claim [latex]H_a: p_1 < p_2[\/latex] (suggesting men have a smaller proportion than women). In part b), the entire interval is below 0, so we are 99% confident that [latex]p_1 - p_2 < 0[\/latex].<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<div style=\"height: 55px; margin-top: 5px;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png 250w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-65x13.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-225x45.png 225w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/div>\n<div><\/div>\n<div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Exercises: Inference on Proportions<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>It is believed that there is an association between breast cancer and smoking. The following table summarizes the results of an observational study of 200 females classified by their disease and smoking status.<\/p>\n<table class=\"first-col-border last-col-border\" style=\"width: 100%; border-spacing: 0px; margin: auto;\" cellpadding=\"0\">\n<thead>\n<tr class=\"border-bottom\">\n<td valign=\"top\" style=\"width: 142px;\"><\/td>\n<th scope=\"col\" valign=\"top\" style=\"width: 98px;\">\n<div style=\"margin: auto;\"><strong>Smoker <\/strong><\/div>\n<\/th>\n<th scope=\"col\" valign=\"top\" style=\"width: 109px;\">\n<div style=\"margin: auto;\"><strong>Non-smoker <\/strong><\/div>\n<\/th>\n<th scope=\"col\" valign=\"top\" style=\"width: 76px;\">\n<div style=\"margin: auto;\"><strong>Total<\/strong><\/div>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th scope=\"row\" valign=\"top\" style=\"width: 142px;\"><strong>Breast Cancer <\/strong><\/th>\n<td valign=\"top\" style=\"width: 98px;\">\n<div style=\"margin: auto;\">10<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 109px;\">\n<div style=\"margin: auto;\">30<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 76px;\">\n<div style=\"margin: auto;\">40<\/div>\n<\/td>\n<\/tr>\n<tr class=\"border-bottom\">\n<th scope=\"row\" valign=\"top\" style=\"width: 142px;\"><strong>Cancer Free <\/strong><\/th>\n<td valign=\"top\" style=\"width: 98px;\">\n<div style=\"margin: auto;\">20<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 109px;\">\n<div style=\"margin: auto;\">140<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 76px;\">\n<div style=\"margin: auto;\">160<\/div>\n<\/td>\n<\/tr>\n<tr>\n<th scope=\"row\" valign=\"top\" style=\"width: 142px;\"><strong>Total<\/strong><\/th>\n<td valign=\"top\" style=\"width: 98px;\">\n<div style=\"margin: auto;\">30<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 109px;\">\n<div style=\"margin: auto;\">170<\/div>\n<\/td>\n<td valign=\"top\" style=\"width: 76px;\">\n<div style=\"margin: auto;\">200<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ol start=\"1\" type=\"a\">\n<li>Obtain a 99% confidence interval for the proportion of females with breast cancer.<\/li>\n<li>Obtain the minimum sample size <em>n<\/em> needed so that we are 95% confident that the error is at most 0.02 when [latex]\\hat{p}[\/latex] is used to estimate <em>p<\/em>. Use the conservative estimate [latex]\\hat{p} = 0.5[\/latex].<\/li>\n<li>Test at the 5% significance level whether the proportion of females with breast cancer is higher among smokers than non-smokers.<\/li>\n<li>Obtain a confidence interval corresponding to the test in part c).<\/li>\n<\/ol>\n<details>\n<summary>Show\/Hide Answer<\/summary>\n<ol type=\"a\">\n<li>Obtain a 99% confidence interval for the proportion of females with breast cancer.<br \/>\nThe point estimate for the proportion of females with breast cancer is [latex]\\hat p = \\frac{x}{n} = \\frac{40}{200} = 0.2[\/latex].<\/p>\n<p style=\"text-align: center;\">[latex]1 - \\alpha = 0.99 \\Longrightarrow \\alpha = 0.01 \\Longrightarrow z_{\\alpha \/ 2} = z_{0.005} = 2.575[\/latex].<\/p>\n<p>The 99% confidence interval for the proportion of breast cancer is<\/p>\n<p style=\"text-align: center;\">[latex]\\hat{p} \\pm \\sqrt{\\frac{\\hat{p} (1 - \\hat{p})}{n}} = 0.2 \\pm 2.575 \\times \\sqrt{\\frac{0.2(1-0.2)}{200}} = (0.127, 0.273)[\/latex].<\/p>\n<p><strong>Interpretation<\/strong>: We are 99% confident that the proportion of females with breast cancer is somewhere between 0.127 and 0.273.<\/li>\n<li>Obtain the minimum sample size <em>n<\/em> needed so that we are 95% confident that the error is at most 0.02 when [latex]\\hat{p}[\/latex] is used to estimate <em>p<\/em>. Use the conservative estimate [latex]\\hat{p} = 0.5[\/latex].\n<p style=\"text-align: center;\">[latex]n = 0.25 \\left( \\frac{z_{\\alpha \/2 }}{E} \\right)^2 = 0.25 \\left( \\frac{2.575}{0.02} \\right)^2 = 4144.14, \\quad \\text{rounded up to } n=4145[\/latex].<\/p>\n<\/li>\n<li>Test at the 5% significance level whether the proportion of females with breast cancer is higher among smokers than non-smokers.<br \/>\nLet [latex]p_1[\/latex] be the proportion of females with breast cancer among smokers and [latex]p_2[\/latex] be the proportion of females with breast cancer among non-smokers.<br \/>\n<strong>Check the assumptions<\/strong>:<\/p>\n<ol>\n<li>We have simple random samples.<\/li>\n<li>The two samples are independent.<\/li>\n<li>All the number of successes and failures [latex]x_1 = 10, n_1 - x_1 = 20, x_2 = 30[\/latex] and [latex]n_2 - x_2 = 140[\/latex] are at least 5.<\/li>\n<\/ol>\n<p><strong>Steps<\/strong>:<\/p>\n<ol>\n<li>Set up the hypotheses: [latex]H_0: p_1 \\leq p_2[\/latex] versus [latex]H_a: p_1 \\: \\gt \\: p_2[\/latex].<\/li>\n<li>The significance level [latex]\\alpha = 0.05[\/latex].<\/li>\n<li>Compute the test statistic:\n<p style=\"text-align: center;\">[latex]z_o = \\frac{\\hat{p}_1 - \\hat{p}_2}{\\sqrt{\\hat{p}_p (1 - \\hat{p}_p)} \\sqrt{\\frac{1}{n_1} + \\frac{1}{n_2} }} = \\frac{0.333 - 0.176}{\\sqrt{0.2 (1-0.2)} \\sqrt{ \\frac{1}{30} + \\frac{1}{170}}} = 1.982[\/latex],<\/p>\n<p>where<\/p>\n<p style=\"text-align: center;\">[latex]\\hat{p}_p = \\frac{x_1 + x_2}{n_1 + n_2} = \\frac{10 + 30}{30 + 170} = 0.2,[\/latex] [latex]\\hat{p}_1 = \\frac{x_1}{n_1} = \\frac{10}{30} = 0.333, \\hat{p}_2 = \\frac{x_2}{n_2} = \\frac{30}{170} = 0.176[\/latex].<\/p>\n<\/li>\n<li>Find the P-value. For a right-tailed test, the P-value is the area to the right of the observed test statistic [latex]z_o[\/latex].\n<p style=\"text-align: center;\">P-value = [latex]P(Z \\geq z_o) = P(Z \\geq 1.982) = P( Z \\leq -1.982) = 0.0239[\/latex].<\/p>\n<\/li>\n<li>Decision: Since the P-value [latex]=0.0239 < 0.05(\\alpha)[\/latex], we should reject the null [latex]H_0[\/latex].<\/li>\n<li>Conclusion: At the 5% significance level, we have sufficient evidence that the proportion of females with breast cancer is higher among smokers than non-smokers.<\/li>\n<\/ol>\n<\/li>\n<li>Obtain a confidence interval corresponding to the test in part c).<br \/>\nFor a right-tailed test at the 5% significance level, we should obtain a 95% upper-tailed confidence interval<\/p>\n<p style=\"text-align: center;\">.[latex]\\left( (\\hat{p}_1 - \\hat{p}_2) - z_\\alpha \\sqrt{\\frac{\\hat{p}_1(1 - \\hat{p}_1)}{n_1} + \\frac{\\hat{p}_2(1-\\hat{p}_2)}{n_2}} , 1\\right)[\/latex]<\/p>\n<p style=\"text-align: center;\">[latex]= \u00a0\\left( (0.333 - 0.176) - 1.645 \\sqrt{\\frac{0.333 (1 - 0.333)}{30} + \\frac{0.176(1 - 0.176)}{170}}, 1 \\right)= ( 0.0075 , 1)[\/latex].<\/p>\n<p><strong>Interpretation<\/strong>: We are 95% confident that the proportion of females with breast cancer is at least 0.0075 higher for smokers than non-smokers.<\/li>\n<\/ol>\n<\/details>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":19,"menu_order":6,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-1112","chapter","type-chapter","status-publish","hentry"],"part":1075,"_links":{"self":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/users\/19"}],"version-history":[{"count":46,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1112\/revisions"}],"predecessor-version":[{"id":5520,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1112\/revisions\/5520"}],"part":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/parts\/1075"}],"metadata":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/1112\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/media?parent=1112"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapter-type?post=1112"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/contributor?post=1112"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/license?post=1112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}