# Type 1 and 2 Errors

### Definitions

**Null Hypothesis:** In a statistical test, the hypothesis that there is no significant difference between specified populations, any observed difference being due to chance

**Alternative hypothesis:** The hypothesis contrary to the null hypothesis. It is usually taken to be that the observations are not due to chance, i.e. are the result of a real effect (with some amount of chance variation superposed)

Consider Aesop’s Fable about Peter and the wolf:

*A boy has the job of protecting a flock of sheep from wolves. If a wolf comes, he is to ring a bell and cry out “wolf”, so that the men from the village will come with their guns. After a few days with no wolf, the boy is getting bored, so he pretends that a wolf is attacking. *The **null hypothesis** is that there is no wolf; the **alternative hypothesis** is that there is a wolf.

*The men come running, and praise the boy even when they find no wolf, believing his story of the wolf having run off. ***A type 1 or false positive error** has occurred.

*The boy enjoys the attention, so repeats the trick. This time he is not praised. The men do not believe that there was a wolf. **When a wolf really does attack, and the boy rings his bell and cries “wolf”, the men do not come**, thinking that he is playing the trick again. The wolf takes one of the fattest sheep. *A **type 2 or false negative error **has occurred

**Type I error (false positive): **Incorrectly rejecting null hypothesis e.g villagers believing the boy when there was no wolf

**Type II Error (false negative):** Incorrectly accepting the null hypothesis e.g villagers not believing the boy when there actually was a wolf

An alternative way of remembering type I and type II errors is below

**Alpha:** The probability of a type I error – finding a difference when a difference does not exist. Most medical literature uses an alpha cut-off of 5% (0.05), indicating a 5% chance that a significant difference is actually due to chance and is not a true difference. A tutorial on how to calculate the alpha level is here

**Beta:** The probability of a type II error – not detecting a difference when one actually exists. Beta is directly related to study power (Power = 1 – β). Most medical literature uses a beta cut-off of 20% (0.2), indicating a 20% chance that a significant difference is missed

1 - the power

**Power: **the pre-study probability that we correctly reject the null hypothesis when there really is a significant difference i.e we can detect a treatment effect if it is present

1 - the probability of a type II error

### Power is influenced by the following factors:

1. The **statistical significance criterion** used in the test

- Commonly used criteria are probabilities of 0.05 (5%, 1 in 20), 0.01 (1%, 1 in 100), and 0.001 (0.1%, 1 in 1000). This set threshold is called the α level. By convention, the alpha (α) level is set to 0.05
- α = 1 – confidence level. If you want to be 95 percent confident that your analysis is correct, the α level would be 1 – 0.95 = 5 %
- If the criterion is 0.05, the probability of the data
*implying*an effect at least as large as the observed effect when the null hypothesis is true must be 0.05 (or less), for the null hypothesis of no effect to be rejected - One easy way to increase the power of a test is to carry out a less conservative test by using a larger significance criterion, for example 0.10 instead of 0.05. So why not just do this? More on this shortly

2. The **magnitude of the effect** (effect size) of interest in the population

- If the difference between two treatments is small, more patients will be required to detect a difference
- Effect size must be carefully considered when designing a study
- If constructed appropriately, a standardised effect size, along with the sample size, will completely determine the power

3. **Population variance**

- The higher the variance (standard deviation), the more patients are needed to demonstrate a difference
- This determines the amount of sampling error inherent in a test result

4. **Baseline incidence:** If an outcome occurs infrequently, many more patients are needed in order to detect a difference

Before a study is conducted, investigators need to decide how many subjects should be included. By enrolling too few subjects, a study may not have enough statistical power to detect a difference (type II error). Enrolling too many patients can be unnecessarily costly or time-consuming

### Why is an α level of 0.05 chosen as a cut-off for statistical significance?

The α level of 0.05 is thought to represent the best balance to avoid excessive type I or type II errors

*(adapted from https://www.thoughtco.com/what-is-the-standard-normal-distribution-3126371 – accessed on 23.04.2017)*

Imagine you decide to increase the α level so that it is now 0.35

- This increases the chance of rejecting the null hypothesis
- The risk of a
**Type II error**(false negative) is REDUCED - But the risk of a
**Type I error**(false positive) is INCREASED

Imagine you decide to decrease the α level

- This increases the chance of accepting the null hypothesis
- The risk of a
**Type I error**(false positive) is__REDUCED__ - But the risk of a
**Type II error**(false negative) is__INCREASED__

### The Bottom Line

- A type I error is the incorrect rejection of a true null hypothesis (a “false positive”), while a type II error is incorrectly retaining a false null hypothesis (a “false negative”)
- Power is the pre-study probability that we correctly reject the null hypothesis when there really is a significant difference i.e we can detect a treatment effect if it is present. It is influenced by statistical significance criterion, the magnitude of the effect, population variance and baseline incidence
- In statistical hypothesis testing, the more you try and avoid a Type I error (false positive), the more likely a Type II error (false negative) may happen. Researchers have found that an alpha level of 5% provides a good balance

#### External Links

- [videocast] Statistics 101: Visualising type 1 and type 2 errors
- [further reading] St.Emlyn’s Introduction to sample size calculations
- [further reading] ClinCalc Sample Size Calculator

#### Metadata

Summary author: Steve Mathieu

Summary date: 19th May 2017

Peer-review editor: Charlotte Summers