Type 1 and 2 Errors

19 May 2017 Steve Mathieu Evidence-based Medicine Leave a comment

Definitions

Null Hypothesis: In a statistical test, the hypothesis that there is no significant difference between specified populations, any observed difference being due to chance

Alternative hypothesis: The hypothesis contrary to the null hypothesis. It is usually taken to be that the observations are not due to chance, i.e. are the result of a real effect (with some amount of chance variation superposed)

Consider Aesop’s Fable about Peter and the wolf:

A boy has the job of protecting a flock of sheep from wolves. If a wolf comes, he is to ring a bell and cry out “wolf”, so that the men from the village will come with their guns. After a few days with no wolf, the boy is getting bored, so he pretends that a wolf is attacking. The null hypothesis is that there is no wolf; the alternative hypothesis is that there is a wolf.

The men come running, and praise the boy even when they find no wolf, believing his story of the wolf having run off. A type 1 or false positive error has occurred.

The boy enjoys the attention, so repeats the trick. This time he is not praised. The men do not believe that there was a wolf. When a wolf really does attack, and the boy rings his bell and cries “wolf”, the men do not come, thinking that he is playing the trick again. The wolf takes one of the fattest sheep. A type 2 or false negative error has occurred

Type I error (false positive): Incorrectly rejecting null hypothesis e.g villagers believing the boy when there was no wolf

Type II Error (false negative): Incorrectly accepting the null hypothesis e.g villagers not believing the boy when there actually was a wolf

An alternative way of remembering type I and type II errors is below

Alpha: The probability of a type I error – finding a difference when a difference does not exist. Most medical literature uses an alpha cut-off of 5% (0.05), indicating a 5% chance that a significant difference is actually due to chance and is not a true difference. A tutorial on how to calculate the alpha level is here

Beta: The probability of a type II error – not detecting a difference when one actually exists. Beta is directly related to study power (Power = 1 – β). Most medical literature uses a beta cut-off of 20% (0.2), indicating a 20% chance that a significant difference is missed

1 - the power

Power: the pre-study probability that we correctly reject the null hypothesis when there really is a significant difference i.e we can detect a treatment effect if it is present

1 - the probability of a type II error

Power is influenced by the following factors:

1. The statistical significance criterion used in the test

Commonly used criteria are probabilities of 0.05 (5%, 1 in 20), 0.01 (1%, 1 in 100), and 0.001 (0.1%, 1 in 1000). This set threshold is called the α level. By convention, the alpha (α) level is set to 0.05
α = 1 – confidence level. If you want to be 95 percent confident that your analysis is correct, the α level would be 1 – 0.95 = 5 %
If the criterion is 0.05, the probability of the data implying an effect at least as large as the observed effect when the null hypothesis is true must be 0.05 (or less), for the null hypothesis of no effect to be rejected
One easy way to increase the power of a test is to carry out a less conservative test by using a larger significance criterion, for example 0.10 instead of 0.05. So why not just do this? More on this shortly

2. The magnitude of the effect (effect size) of interest in the population

If the difference between two treatments is small, more patients will be required to detect a difference
Effect size must be carefully considered when designing a study
If constructed appropriately, a standardised effect size, along with the sample size, will completely determine the power

3. Population variance

The higher the variance (standard deviation), the more patients are needed to demonstrate a difference
This determines the amount of sampling error inherent in a test result

4. Baseline incidence: If an outcome occurs infrequently, many more patients are needed in order to detect a difference

Before a study is conducted, investigators need to decide how many subjects should be included. By enrolling too few subjects, a study may not have enough statistical power to detect a difference (type II error). Enrolling too many patients can be unnecessarily costly or time-consuming

Why is an α level of 0.05 chosen as a cut-off for statistical significance?

The α level of 0.05 is thought to represent the best balance to avoid excessive type I or type II errors

(adapted from https://www.thoughtco.com/what-is-the-standard-normal-distribution-3126371 – accessed on 23.04.2017)

Imagine you decide to increase the α level so that it is now 0.35

This increases the chance of rejecting the null hypothesis
The risk of a Type II error (false negative) is REDUCED
But the risk of a Type I error (false positive) is INCREASED

Imagine you decide to decrease the α level

This increases the chance of accepting the null hypothesis
The risk of a Type I error (false positive) is REDUCED
But the risk of a Type II error (false negative) is INCREASED

The Bottom Line

A type I error is the incorrect rejection of a true null hypothesis (a “false positive”), while a type II error is incorrectly retaining a false null hypothesis (a “false negative”)
Power is the pre-study probability that we correctly reject the null hypothesis when there really is a significant difference i.e we can detect a treatment effect if it is present. It is influenced by statistical significance criterion, the magnitude of the effect, population variance and baseline incidence
In statistical hypothesis testing, the more you try and avoid a Type I error (false positive), the more likely a Type II error (false negative) may happen. Researchers have found that an alpha level of 5% provides a good balance

External Links

[videocast] Statistics 101: Visualising type 1 and type 2 errors
[further reading] St.Emlyn’s Introduction to sample size calculations
[further reading] ClinCalc Sample Size Calculator

Metadata

Summary author: Steve Mathieu
Summary date: 19th May 2017
Peer-review editor: Charlotte Summers

EBM Featured