In testing hypotheses, there are only two possible outcomes: either reject or do not reject ; in reality, there are only two possible scenarios: either is true or is false. Hence, regardless of which conclusion we make, we have a chance to make an error. There are two types of errors: Type I and Type II.
Type I error: reject the null when is in fact true.
Type II error: do not reject the null when is false.
Table 8.2: Type I and Type II Error
is True
is False
Decision: Do not reject
Correct decision
Type II error
Decision: Reject
Type I error
Correct decision
The probability of type I error is denoted as , and the probability of type II error is denoted as . That is:
The type I error rate is also called the significance level of a hypothesis test.
Example: Type I and Type II Errors
In a diabetes blood test, a patient is diagnosed with the disease if the sugar level in their bloodstream is larger than the threshold C=130 mg/dL. Suppose the distributions of sugar levels for the two populations (diabetes-free and having diabetes) are the two bell-shaped curves shown in the following figure.
Define the hypotheses: (a patient is disease free) vs. (a patient has diabetes). What are the type I and type II errors in this example?
Type I error: claim the person has diabetes (reject the null ) but actually the person does not have diabetes ( is in fact true). This is often referred to as a false positive.Type II error: claim the person does not have diabetes(do not reject the null ), but actually the person has diabetes ( is false). This is often referred to as a false negative.
The figure in the above example shows the trade-off between type I and type II errors. The gold area gives , the probability of the type I error; and the blue area gives , the probability of the type II error. If we increase the threshold C (move the cut-off to the right), the gold area will reduce and the blue area will increase. That is the type I error rate will decrease and the type II error rate will increase. On the other hand, if we reduce the threshold C (move the cut-off to the left), the type I error rate will increase and the type II error rate will decrease. This is the trade-off between the type I and type II errors and . It is not a good idea to set either or to be too close to 0; otherwise, the other error rate will be huge. For example, if we set the threshold C very large, few individuals will be diagnosed as diabetic; as a result, many diabetic individuals will be misclassified as not having the disease (meaning we have a high probability of committing a type I error). On the other hand, if we set the threshold C very small, most individuals will be diagnosed as diabetic; consequently, many individuals who are free of diabetes will be misclassified as diabetic (meaning we have a high probability of committing a type II error). In general, we can set (or ) to be relatively small if the consequence of the type I (or type II) error is more serious. The power of a test is defined as
This is the probability that we reject when is false. Thus, it is of interest for a statistical test to have a high level of power.
Exercise: Type I and Type II Errors
Suppose you are performing a statistical test to decide whether a nuclear reactor should be approved. The null hypothesis is that the reactor is safe to use, and so failing to reject the null hypothesis corresponds to approval.
Write down the null and alternative hypotheses.
What are the type I and type II errors in this example?
Which error has a more serious consequence, type I or type II? Which of or should be smaller?
Show/Hide Answer
Answers:
: the nuclear reactor is safe versus : the nuclear reactor is not safe.
Type I error: disapprove the nuclear reactor for use given that the nuclear reactor is actually safe.
Type II error: approve the nuclear reactor for use given that the nuclear reactor is not safe.
The type II error is more serious than the type I. Disapproving a safe reactor would waste time and money, but approving an unsafe reactor could lead to a nuclear meltdown, which is a catastrophic event. For this reason, we should set the type II error rate to be relatively small.