Probability Distributions and Hypothesis Testing
A Short Course by Philipp Burckhardt
Continuous and Discrete Probability Distributions
Hypothesis Testing
In relation to any experiment we may speak of this hypothesis as the "null hypothesis," and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.
Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion.
Introduction
Often when working with data, you might notice some pattern and observe an effect. The question which then arises is whether this effect has arisen purely by chance or reflects some true property of the underlying population. For example, consider the following scenario: Consuming a new weight loss drink for a period of two weeks has resulted in an average decrease of body weight of 3 lbs among 30 particicpants. Hypothesis testing can help us to assess whether such an observed effect is significant, i.e. whether it is unlikely that the effect is a product of chance.
The logic of hypothesis testing is similar to a mathematical proof by contradiction. To prove a statement by contradiction, you assume temporarily that the statement is wrong. If this leads to a contraditction, you can then conclude that the statement must actually be true. Hypothesis testing proceeds similarly, although it does not allow us to draw definitive conclusions, but will only give us probabilistic guarantees.
To test whether a certain effect exists, we assume that it does not exist. This is the null hypothesis. It is denoted as H0. The alternative hypothesis, which corresponds to the existence of an effect, is denoted as H1. Whatever conclusion we reach from a test, we cannot be certain that we made the right decision. Depending on which hypothesis is true, there are two types of error we could have made:
| Truth \ Decision | Fail to reject H0 | Reject H0 |
|---|---|---|
| H0 | Correct | Type 1 error |
| H1 | Type 2 error | Correct |
As can be glanced from the table, a type 1 error refers to a situation in which the null hypothesis is rejected even though it is true, whereas the type 2 error refers to the scenario in which we fail to reject the null hypothesis although it is incorrect. In the context of our weight loss drink, we might want to test whether the drink is actually effective. If μ denotes the true mean change after using the drink for a month, we would like to test
In setting up the null and alternative hypothesis, what we would like to show is usally set as the alternative such that we would like to be able to reject the null. The reason for this choice is that the machinery of hypothesis testing let's us control the probability of making a type 1 error. In fact, we are free to choose it: For each test we conduct, we pick a significance level α which will be equal to the type 1 error probability. However, it is not possible to fix both error probabilities at a desired level.
Let's review some of the basic definitions of hypothesis testing: Quiz
To carry out an hypothesis test, the following steps are generally required:
- A specific probability model for the data is assumed
- The observed data is summarized into a test statistic with a known distribution under H0. The distribution of the test statistic lets us assess how unexpected the observed data are.
- For a given significance level α, we obtain a critical value from the distribution of the test statistic. If the observed test statistic exceeds this value in magnitude, the null hypothesis is rejected.
Let us now consider in detail one of the simplest hypothesis testing scenarios, in which one would like to test whether an effect exists, i.e. whether the location parameter of a distribution is larger than zero.
One-Sided Location Test
The basic assumption for the z-Test to be applicable is that the data are drawn from a normal distribution with some mean μ and standard deviation σ:
Let's start by considering a situation in which you want to detect a positive effect. As described previously, you will then choose the null hypothesis to cover the case of no or a negative effect, so that we have:
For the described situation, you can think of μ0 as being equal to zero in the equation. However, there might be other scenarios where you might want to test whether the parameter exceeds a certain threshold: For example, an environmental agency might need to assess whether the average emissions of a car exceed the allowed maximum or not.
The test statistic used for the z-Test is:
Given our distributional assumptions, the test statistic follows a standard normal distribution under the null hypothesis.
Since its distribution does not depend anymore on the unknown mean parameter, it is callled a pivotal quantity. This follows from some basic properties of the normal distribution.
Under H0, the sample mean is distributed as . Since the normal distribution is a location-scale distribution family, we know that subtracting μ0 from x̄ and dividing by the standard deviation gives us a standard normal random variable.
The PDF of our test statistic Z under the null is thus the famous bell-shaped curve displayed in the following plot: Plot of the PDF of a standard normal distribution with mean zero and standard deviation one.
Large values of the test statistic are inconsistent with the null hypothesis stated above, which means that we will reject it when we observe a value far into the right tail of the distribution. How far? Recall that the hypothesis test is designed in a way as to ensure that the probability of committing a type 1 error is equal to the significance level α. Hence, we will use as a critical value the value z1-α for which α of the probability mass lies to the right. A commonly chosen value for α is 0.05. We denote the area under the curve corresponding to values for which we reject H0 as the rejection region.
For the one-sided z-test, let us explore the rejection region for different significance levels:
At a significance level of , the critical value, i.e. the value above which the null hypothesis is rejected, is equal to . The resulting rejection region for the location test is displayed as the shaded region in the following plot.
Check your understanding by answering the following multiple-choice quiz: Quiz
Let us consider the following example: Worked Example A randomized control trial was conducted to assess the effectiveness of a new drug. For a sample of N = 400 patients, the average effect was x̄ = 0.3. It is assumed that the data are drawn from a normal distribution with known standard deviation σ = 2. We wish to test whether there is a positive effect of the drug at a significance level of α = 0.01.
- For the given α, we find a critical value of z = 2.33.
- The observed test statistic is
- Since zobs > z, we reject the null hypothesis at α = 0.01 and conclude that there is sufficient evidence that the new drug is effective
In this test, the probability of having obtained a false positive is α = 0.01.
Two-Sided Location Test
You should consider using a two-sided hypothesis test whenever you are not interested in a directional effect, but whether there is any effect at all. The hypotheses to be tested are now
At a significance level of , the upper critical value is and the lower one -. The resulting rejection region for the two-sided location test is displayed as the shaded region in the following plot.
Worked Example
A researcher is interested in the effectiveness of a new set of instructional materials for the classroom. He gathered data on how the grades of pupils at school changed on average after
using the new material. The change in GPA (Grade Point Average) after using the new material for 10 students was collected:
Is there sufficient evidence that the GPA changed after the new material was introduced? Conduct a two-sided hypothesis test at a
significance level of α = 0.05.
- For the given α, we find a critical value of z = 1.96. (try to verify using the interactive plot!)
-
The mean of the data is and the standard deviation is Since we have observations in total, the observed test statistic evaluates to (Use themeanandstdevfunctions in above code box to calculate the mean and standard deviation of the data, respectively. You can obtain the number of elements in the array viax.length) -
Since |zobs| > z, we reject the null hypothesis at α = 0.01 and conclude that there is sufficient evidence that the new course material is effective. Since |zobs| < z, we fail to reject the null hypothesis at α = 0.01 and conclude that it seems as if the new course material is not effective.