9.2 Two-Sample t Test and t Interval Based on Two Independent Samples
Two-sample t-tests are used to test hypotheses regarding the difference between two population means. Depending on whether the two population standard deviations (σ1 and σ2) are equal or not, we have the non-pooled and pooled two-sample t-tests and t interval. Minor advantages of the pooled t-test are a slightly narrower confidence interval, a slightly more powerful test, and a simpler formula for the degrees of freedom. However, the pooled t-test is valid only when the two population standard deviations are close; otherwise, it gives poor results. Therefore, we recommend using the non-pooled t-test unless we are quite confident that σ1 = σ2, which is very difficult to verify.
9.2.1 Non-Pooled Two-Sample t Test and t Interval
Assumptions:
- Simple random samples
- Two samples are independent
- Normal populations or large sample sizes (n1≥30,n2≥30)
Steps:
- Set up the hypotheses:
Two-tailed test
Right (upper)-tailed test
Left (lower)-tailed test
H0:μ1−μ2=Δ0H0:μ1−μ2≤Δ0H0:μ1−μ2≥Δ0Ha:μ1−μ2≠Δ0Ha:μ1−μ2>Δ0Ha:μ1−μ2<Δ0Note that Δ0 can be zero or any value you want to test. In most cases, however, Δ0=0.
- State the significance level α.
- Compute the value of the test statistic: to=(ˉx1−ˉx2)−(Δ0)√s21n1+s22n2 with df=(s21n1+s22n2)21n1−1(s21n1)2+1n2−1(s22n2)2, rounded down to the nearest integer or min{n1−1,n2−1}.
- Use the t-score table (Table IV) to find the P-value or rejection region.
Two-tailedRight-tailedLeft-tailed
Null H0:μ1−μ2=Δ0H0:μ1−μ2≤Δ0H0:μ1−μ2≥Δ0Alternative Ha:μ1−μ2≠Δ0Ha:μ1−μ2>Δ0Ha:μ1−μ2<Δ0P-value 2P(t≥|to|)P(t≥to)P(t≤to)Rejection region t≥tα/2 or t≤−tα/2 t≥tαt≤−tα - Decision: Reject the null H0 if P-value ≤α or to falls in the rejection region.
- Conclusion.
A (1–α)×100% two-sample t confidence interval for μ1 – μ2 is
Two-tailed
|
Right-tailed
|
Left-tailed
|
---|---|---|
H0:μ1−μ2=Δ0
|
H0:μ1−μ2≤Δ0
|
H0:μ1−μ2≥Δ0
|
Ha:μ1−μ2≠Δ0
|
Ha:μ1−μ2>Δ0
|
Ha:μ1−μ2<Δ0
|
(ˉx1−ˉx2)±tα/2√s21n1+s22n2 | ((ˉx1−ˉx2)−tα√s21n1+s22n2,∞) | (−∞,(ˉx1−ˉx2)+tα√s21n1+s22n2) |
Example: Two-Sample Non-Pooled t-Test and t Interval
Some students attend class regularly, but some do not. An instructor wants to compare the class averages for those who attend lectures regularly (μ1) with those who do not (μ2). A simple random sample of size n1=135 is selected from the attendees and a simple random sample of size n2=35 is taken from the non-attendees. The sample mean and sample standard deviation for attendees are ˉx1=67,s1=17; and for non-attendees are ˉx2=49,s2=18.
- Test at the 1% significance level whether those who attend lectures have a higher average, i.e., μ1>μ2 or μ1−μ2>0.
Check the assumptions:
- We have simple random samples from attendees and non-attendees.
- The two samples are independent.
- We do not have the data, so we cannot check whether two populations are normally distributed using normal probability plot (Q-Q plot); however, we have large sample sizes with n1=135>30,n2=35>30.
Therefore, the assumptions are met.
Steps:
- Set up the hypotheses: H0:μ1−μ2≤0 versus Ha:μ1−μ2>0.
This is a right-tailed test. - The significance level is α=0.01.
- Compute the value of the test statistic:
to=(ˉx1−ˉx2)−Δ0√s21n1+s22n2=(67−49)−0√172135+18235=5.332 with
df=(s21n1+s22n2)21n1−1(s21n1)2+1n2−1(s22n2)2=(172135+18235)21135−1(172135)2+135−1(18235)2=50.85, rounded down to df=50.
- Find the P-value. For a right-tailed test with the observed test statistics to=5.332, the P-value is the area to the right of to, i.e., P-value=P(t≥to)=P(t≥5.332)<0.0005,sinceto=5.332>3.496(t0.0005)
- Decision: Since the P-value <0.0005<0.01(α), reject the null hypothesis H0.
- Conclusion: At the 1% significance level, the data provide sufficient evidence that those who attend lectures have a higher average.
If using the critical value approach, steps 1-3 are the same, steps 4-6 become:
-
- Rejection region:
α=0.01,tα=t0.01=2.403 For a right-tailed test, the critical value is 2.403. The rejection region is to the right of 2.403.
- Decision: Since the observed value to=5.332>2.403 falls in the rejection region, we reject the null hypothesis H0.
- Conclusion: At the 1% significance level, the data provide sufficient evidence that those who attend lectures have a higher average.
- Rejection region:
- Obtain a confidence interval for the difference between the class average for attendees and non-attendees μ1−μ2 corresponding to the test in part a).
Part a) contains a right-tailed test at the 1% significance level. Therefore, we should obtain a 99% upper-tailed interval: ((ˉx1−ˉx2)−tα√s21n1+s22n2,∞).
α=0.01,df=50,tα=t0.01=2.403.
The lower bound for the upper-tailed interval is:(ˉx1−ˉx2)−tα√s21n1+s22n2=(67−49)−2.403×√172135+18235=9.887.
Thus, the corresponding 99% confidence interval for μ1−μ2 is (9.887,∞).
Interpretation: we are 99% confident that the difference in average grades is at least 9.887 between attendees and non-attendees. - Does the interval in part (b) support the conclusion in part a)?
In part a), we reject H0 at the 1% significance level and claim that μ1−μ2>0.
In part b), since the entire interval is above 0, we can claim that μ1−μ2>0 with 99% confidence, which supports the results obtained in part a). - Based on the interval obtained in part b), can we claim that the class average of attendees is at least 5 marks higher than that of the non-attendees? How about 10 marks higher?
We can claim that the class average of attendees is at least 5 marks higher than that of the non-attendees since the entire interval is above 5. However, we cannot claim that the class average of attendees is at least 10 marks higher than that of the non-attendees since the interval contains 10.
9.2.2 Pooled Two-Sample t Test and t Interval
If the two population standard deviations are equal, i.e., σ1=σ2=σ, we can pool the two samples together to get a better estimate of the common standard deviation σ
ˆσ=sp=√(n1−1)s21+(n2−1)s22(n1−1)+(n2−1)
where the term (n1−1)s21=∑sample 1(x−ˉx1)2 is the variation of the data within sample 1, and (n2−1)s22=∑sample 2(x−ˉx2)2 is the variation of the data within sample 2.
Recall that the standard deviation of ˉX1−ˉX2 is σˉX1−ˉX2=√σ21n1+σ22n2. Thus, if σ1=σ2=σ, then σˉX1−ˉX2 reduces to √σ2n1+σ2n2=σ√1n1+1n2. Estimating σ with sp leads to the pooled test statistic:
t=(ˉX1−ˉX2)−(μ1−μ2)sp√1n1+1n2∼t distribution
with df=(n1−1)+(n2−1)=n1+n2−2.
The assumption σ1=σ2 is very difficult to verify. Some textbooks suggest a rule of thumb:
If the ratio of the larger to the smaller sample standard deviation is less than 2, then the assumption is considered to be reasonable, i.e., max{s1,s2}min{s1,s2}<2.
Assumptions:
- Simple random samples.
- Independent samples.
- Normal populations or large sample sizes (n1≥30,n2≥30).
- Equal population standard deviations. This assumption is reasonable if max{s1,s2}min{s1,s2}<2.
Steps:
- Set up the hypotheses:
Two-tailed test Right (upper)-tailed test Left (lower)-tailed test H0:μ1−μ2=Δ0H0:μ1−μ2≤Δ0H0:μ1−μ2≥Δ0Ha:μ1−μ2≠Δ0Ha:μ1−μ2>Δ0Ha:μ1−μ2<Δ0Note that Δ0 can be zero or any value you want to test.
- State the significance level α.
- Compute the value of the test statistic: to=(ˉx1−ˉx2)−Δ0sp√1n1+1n2, with df=n1+n2–2 and sp=√(n1−1)s21+(n2−1)s22(n1−1)+(n2−1).
- Use the t-score table (Table IV) to find the P-value or rejection region.
Two-tailedRight-tailedLeft-tailedNull H0:μ1−μ2=Δ0H0:μ1−μ2≤Δ0H0:μ1−μ2≥Δ0Alternative Ha:μ1−μ2≠Δ0Ha:μ1−μ2>Δ0Ha:μ1−μ2<Δ0P-value 2P(t≥|to|)P(t≥to)P(t≤to)Rejection region t≥tα/2 or t≤−tα/2 t≥tαt≤−tα - Decision: Reject the null H0 if P-value ≤α or to falls in the rejection region.
- Conclusion.
A (1−α)×100 two-sample t confidence interval for μ1−μ2 is
Two-tailed
|
Right-tailed
|
Left-tailed
|
---|---|---|
H0:μ1−μ2=Δ0
|
H0:μ1−μ2≤Δ0
|
H0:μ1−μ2≥Δ0
|
Ha:μ1−μ2≠Δ0
|
Ha:μ1−μ2>Δ0
|
Ha:μ1−μ2<Δ0
|
(ˉx1−ˉx2)±tα/2sp√1n1+1n2 | ((ˉx1−ˉx2)−tαsp√1n1+1n2,∞) | (−∞,(ˉx1−ˉx2)+tαsp√1n1+1n2) |
Example: Pooled Two-Sample t Test and Interval
Some students attend class regularly, but some do not. An instructor wants to compare the class averages for those who attend lectures regularly (μ1) with those who do not (μ2). A simple random sample of size n1=135 is selected from the attendees, and a simple random sample of size n2=35 is taken from the non-attendees. The sample mean and sample standard deviation for attendees are ˉx1=67,s1=17; and for non-attendees are ˉx2=49,s2=18.
- Is it reasonable to conduct a pooled two-sample t-test to test whether those who attend lectures have a higher average? If yes, run the test at the 1% significance level.
Check the assumptions:
- We have simple random samples.
- The two samples are independent.
- We have large sample sizes (n1=135>30,n2=35>30).
- Equal standard deviation max{s1,s2}min{s1,s2}=max{17,18}min{17,18}=1817<2.
It is reasonable to conduct a pooled two-sample t-test since all the assumptions for pooled two-sample t-test are met.
Steps:
-
- Set up the hypotheses: H0:μ1−μ2≤0 versus Ha:μ1−μ2>0. This is a right-tailed test.
- The significance level is α=0.01.
- Compute the value of the test statistic:
to=(ˉx1−ˉx2)−Δ0sp√1n1+1n2=(67−49)−017.207√1135+135=5.515 with df=n1+n2–2=135+35–2=168 (not given in Table IV, use df=100), and with
sp=√(n1−1)s21+(n2−1)s22n1+n2−2=√(135−1)172+(35−1)182135+35−2=17.207. - Find the P-value. For a right-tailed test with the observed test statistics to=5.515, the P-value is the area to the right of to i.e., p-value =P(t≥to)=P(t≥5.515)<0.0005, since to=5.515>3.390(t0.0005) with df=100.
- Decision: Since the P-value <0.0005<0.01(α) reject the null hypothesis H0
- Conclusion: At the 1% significance level, the data provide sufficient evidence that those who attend lectures have a higher average.
- Obtain a confidence interval for the difference between the class average for attendees and non-attendees, μ1−μ2, corresponding to the test in part a).
Part a) contains a right-tailed test at the 1% significance level. Therefore, we should obtain a 99% upper-tailed interval ((ˉx1−ˉx2)−tαsp√1n1+1n2,∞), with α=0.01,df=100, and t0.01=2.364. The lower bound for the upper-tailed interval is (ˉx1−ˉx2)−tαsp√1n1+1n2=(67−49)−2.364×17.207×√1135+135=10.284. Thus, the corresponding 99% confidence interval for μ1−μ2 is (10.284,∞).
Interpretation: we are 99% confident that the difference in average grades is at least 10.284 between attendees and non-attendees. - Based on the confidence interval in part b), can we claim that the class average of attendees is at least 10 marks higher than that of the non-attendees?
Yes, since the entire interval is above 10, we can claim that μ1−μ2>10.
Exercise: Two-Sample Test
The following table summarizes the operative times of neurosurgeries conducted by a dynamic system (Z-plate) and a static system (ALPS plate).
Table 9.1: Operating Time of Dynamic and Static System
Dynamic
|
Static
|
ˉx1=400
|
ˉx2=480
|
s1=85
|
s2=40
|
n1=60
|
n2=30
|
- Test at the 5% significance level whether the dynamic system (Z-plate) has a lower mean operative time than the static system (ALPS plate).
- Obtain a confidence interval for the difference in mean operative time between the dynamic and the static systems, μ1−μ2, corresponding to the test in part a).
Show/Hide Answer
- Check the assumptions:
- We have simple random samples.
- The two samples are independent.
- We have large sample sizes n1=60>30,n2=30≥30.
- Equal standard deviations max{s1,s2}min{s1,s2}=max{85,40}min{85,40}=8540>2.
Since the equal standard deviation assumption is violated, we should use the non-pooled two-sample t-test.
Steps:
- Set up the hypotheses: H0:μ1−μ2≥0 versus Ha:μ1−μ2<0. This is a left-tailed test.
- The significance level is α=0.05.
- Compute the value of the test statistic:to=(ˉx1−ˉx2)−Δ0√s21n1+s22n2=(400−480)−0√85260+40230=−6.069 withdf=(s21n1+s22n2)21n1−1(s21n1)2+1n2−1(s22n2)2=(85260+40230)2160−1(85260)2+130−1(40230)2=87.797, rounded down to df=87.
- Find the P-value. For a left-tailed test with the observed test statistics to=–6.069, the P-value is the area to the left of to, i.e., P-value=P(t≤to)=P(t≤−6.069)=P(t≥6.069)<0.0005, since 6.069>3.406(t0.0005).
- Decision: Since the P-value <0.0005<0.05(α), reject the null hypothesis H0.
- Conclusion: At the 5% significance level, the data provide sufficient evidence that the dynamic system (Z-plate) has a lower mean operative time than the static system (ALPS plate).
- For a left-tailed test at the 5% significance level, the corresponding confidence interval is a 95% lower-tailed interval (−∞,(ˉx1−ˉx2)+tα√s21n1+s22n2) with the upper confidence bound (ˉx1−ˉx2)+tα√s21n1+s22n2=(400−480)+1.663×√85260+40230=−58.079. Note that for df=87, t0.05=1.663. Therefore, the 95% lower-tailed interval is (−∞,(ˉx1−ˉx2)+tα√s21n1+s22n2)=(−∞,−58.079).
Interpretation: we are 95% confident that the difference in mean operative time between the dynamic and the static systems is below -58.097. Since the entire interval is below 0, we can claim that μ1−μ2<0, which supports the conclusion of the hypothesis test in part a).
It is safer to use the non-pooled two-sample t test if we are not sure whether the two population standard deviations are equal. Use the pooled two-sample t test only if we have evidence that the population standard deviations are equal. For example, we can use the pooled two-sample t test when we compare two independent groups in a one-way ANOVA (analysis of variance) analysis since equal standard deviation is one of the assumptions of the one-way ANOVA F test which will be covered in Chapter 13.