"

9.1 Distribution of the Difference between Two Sample Means for Two Independent Samples

Suppose two populations have means μ1, μ2 and standard deviations σ1, σ2. Further, suppose that we obtain from each population simple random samples, from which we obtain sample means x¯1 and x¯2. Our objective is to make inferences about μ1μ2 using the unbiased estimate x¯1x¯2 and as such, we need to know the distribution of X¯1X¯2.

 

A figure demonstrating that two independent populations have independent samples. Image description available.
Figure 9.1: Two Independent Samples. [Image Description (See Appendix D Figure 9.1)]

Recall the conclusions about the sampling distribution of the sample mean X¯ based on samples of size n taken from a population with mean μ and standard deviation σ:

  1. The mean of X¯ equals the population mean μ, i.e., μX¯=μ.
  2. The standard deviation of X¯ equals the population standard deviation divided by the square root of the sample size n, i.e., σX¯=σn.
    These two conclusions are always true regardless of the population distribution and the sample size n.
  3. The shape of the distribution of X¯:
    1. If the population is normally distributed, so is X¯ regardless of the sample size n.
    2. If the population is not normally distributed, but the sample size n is relatively large, say n30, then the sample mean X¯ is approximately normally distributed.

A similar idea applies to the distribution of X1¯X2¯.

Key Facts: Sampling Distribution of X1¯X2¯

  1. The mean of X1¯X2¯ equals the difference of the population means: μX1¯X2¯=μ1μ2.
  2. The standard deviation of X1¯X2¯ is: σX1¯X2¯=σ12n1+σ22n2.
    These two conclusions are always true regardless of the population distributions and the sample sizes n1 and n2.
  3. The shape of the distribution of X1¯X2¯:
    1. If the populations are normally distributed, X1¯X2¯ is exactly normally distributed regardless of the sample sizes n1 and n2.
    2. If the populations are not normally distributed, but sample sizes n1 and n2 are relatively large, say n130 and n230, then by the central limit theorem both X1¯ and X2¯ are approximately normally distributed. The difference of two normal distributions is still normal; therefore, for n130 and n230, X1¯X2¯ is approximately normally distributed.

To summarize, for normal populations OR large sample sizes

X1¯X2¯N(μ1μ2,σ12n1+σ22n2).

We can also standardize X1¯X2¯ to convert it into a standard normal random variable:

Z=(X1¯X2¯)(μ1μ2)σ12n1+σ22n2N(0,1).

If the population standard deviations σ1 and σ2 are unknown and estimated by sample standard deviations s1 and s2, the studentized version of X1¯X2¯ is

t=(X1¯X2¯)(μ1μ2)s12n1+s22n2t distribution

with degrees of freedom

df=(s12n1+s22n2)21n11(s12n1)2+1n21(s22n2)2 rounded down to the nearest integer.

 

The degrees of freedom calculation given in the above equation is very complicated, so for exams, you can use the conservative lower bound, which is defined as the smaller value of n11 and n21. That is, you may use df=min{n11,n21}.

For example, if n1=40,n2=50, then df=min{n11,n21}=min{401,501}=min{39,49}=39.