Hypothesis Testing Explained: Complete Guide for Statistics Students
Quick Answer
Hypothesis testing is a 5-step process: (1) State null (H₀) and alternative (H₁) hypotheses, (2) Choose significance level α, (3) Calculate test statistic, (4) Find p-value or critical value, (5) Make decision and state conclusion. If p ≤ α, reject H₀. The logic is backwards from intuition — you don't prove the alternative, you look for evidence strong enough to reject the null.
Hypothesis testing is one of the most common reasons students seek statistics help. The logic feels backwards, the terminology is abstract, and the formulas can overwhelm — especially under timed conditions. This guide walks you through everything you need to know, from the basics to worked examples, in plain English.
📑 Table of Contents
What Is Hypothesis Testing?
Hypothesis testing is a method for making decisions about populations based on sample data. It's the backbone of statistical inference — from medical research ("Does this drug work?") to marketing ("Did this ad campaign increase sales?") to quality control ("Is this batch defective?").
The core question: Is the pattern I see in my sample real, or just random chance?
Key Terms
| Term | Definition |
|---|---|
| Null Hypothesis (H₀) | The default assumption — no effect, no difference |
| Alternative Hypothesis (H₁) | What you're trying to find evidence for |
| Significance Level (α) | Threshold for rejecting H₀ (typically 0.05) |
| Test Statistic | Standardized value (z or t) calculated from sample |
| P-Value | Probability of seeing your result (or more extreme) if H₀ is true |
For a deep dive on p-values specifically, see our complete p-value guide.
Null vs Alternative Hypothesis
Before running any test, you define two competing claims:
- Null Hypothesis (H₀): The status quo — there's no effect, no difference, no relationship. This is what you're trying to disprove.
- Alternative Hypothesis (H₁): What you're looking for evidence of — there is an effect, difference, or relationship.
The Backwards Logic
This is where hypothesis testing gets confusing: You don't try to prove the alternative directly. Instead, you ask: "If the null were true, how unlikely is my observed result?" If it's very unlikely (p ≤ α), you reject the null and accept the alternative by default.
Analogy: It's like a court trial. H₀ = "defendant is innocent" (the default). You don't prove guilt directly — you look for evidence strong enough to reject the presumption of innocence beyond reasonable doubt.
Example
Research question: Does a new tutoring program improve test scores?
- H₀: μtutoring = μno tutoring (no difference)
- H₁: μtutoring > μno tutoring (tutoring improves scores)
Understanding Alpha (Significance Level)
Alpha (α) is the threshold for deciding whether to reject the null hypothesis. It represents the maximum probability of Type I error you're willing to accept — rejecting a true null.
| Alpha Level | Interpretation | When Used |
|---|---|---|
| α = 0.10 | 10% risk of false positive | Exploratory research |
| α = 0.05 | 5% risk of false positive | Standard (most common) |
| α = 0.01 | 1% risk of false positive | High-stakes decisions |
The decision rule: If p-value ≤ α, reject H₀. If p-value > α, fail to reject H₀.
Type I vs Type II Errors
Because you're making decisions from limited data, there's always risk of error. These fall into two categories:
| Error Type | What Happens | Probability | Example |
|---|---|---|---|
| Type I (False Positive) | Reject true H₀ | α | Concluding drug works when it doesn't |
| Type II (False Negative) | Fail to reject false H₀ | β | Missing that drug actually works |
The tradeoff: Reducing Type I error (lowering α) increases Type II error (β) unless you increase sample size. You can't minimize both simultaneously without collecting more data.
One-Tailed vs Two-Tailed Tests
The direction of your alternative hypothesis determines whether you run a one-tailed or two-tailed test:
| Test Type | H₁ Form | Critical Value (α=0.05) | Use When |
|---|---|---|---|
| Two-Tailed | μ ≠ μ₀ | ±1.96 | Testing for any difference |
| Right-Tailed | μ > μ₀ | +1.645 | "Increases," "greater than" |
| Left-Tailed | μ < μ₀ | -1.645 | "Decreases," "less than" |
Watch for directional words: "Is the mean different?" → two-tailed. "Does treatment increase scores?" → right-tailed. "Does drug reduce blood pressure?" → left-tailed.
Z-Test vs T-Test: When to Use Which
The main difference is whether you know the population standard deviation:
| Use Z-Test When | Use T-Test When |
|---|---|
| Population σ is known | Using sample s to estimate σ |
| Large sample (n ≥ 30) | Small sample (n < 30) |
| Testing proportions | Testing means with unknown σ |
The Formulas
Z-Test: z = (x̄ − μ₀) / (σ / √n)
T-Test: t = (x̄ − μ₀) / (s / √n)
Practical tip: In most real-world stats courses, you'll use t-tests because population σ is rarely known. When in doubt, use t-test unless the problem explicitly gives you σ.
The 5-Step Hypothesis Testing Process
Every hypothesis test follows the same general structure:
Step 1: State Hypotheses
Write H₀ and H₁ clearly. Determine if one-tailed or two-tailed.
Step 2: Choose Alpha
Usually α = 0.05 unless otherwise specified.
Step 3: Calculate Test Statistic
Use z or t formula depending on whether σ is known.
Step 4: Find P-Value
Or compare test statistic to critical value.
Step 5: State Conclusion
Reject or fail to reject H₀, then interpret in context.
Worked Examples
Z-Test Example
Problem: A professor claims the average exam score is 75. You sample 50 students and find x̄ = 72. Population σ = 10. Test at α = 0.05 whether the mean differs from 75.
Step 1: H₀: μ = 75 vs H₁: μ ≠ 75 (two-tailed)
Step 2: α = 0.05
Step 3: z = (72 - 75) / (10/√50) = -3 / 1.414 = -2.12
Step 4: Critical values: ±1.96. Since |-2.12| > 1.96...
Step 5: Reject H₀. Evidence suggests the mean differs from 75.
T-Test Example
Problem: A new teaching method is tested on 20 students. Mean = 81, sample s = 8. National average = 78. Test at α = 0.05 whether the new method improves scores.
Step 1: H₀: μ = 78 vs H₁: μ > 78 (right-tailed)
Step 2: α = 0.05
Step 3: t = (81 - 78) / (8/√20) = 3 / 1.789 = 1.68
Step 4: df = 19, critical t = 1.729. Since 1.68 < 1.729...
Step 5: Fail to reject H₀. Insufficient evidence that method improves scores.
Common Mistakes
| Mistake | Why It Costs Points |
|---|---|
| Mixing up H₀ and H₁ | Entire conclusion becomes invalid |
| Wrong tail direction | Incorrect critical value and p-value |
| Using z-test when t-test needed | Wrong test statistic and conclusion |
| Saying "accept H₀" | Correct term is "fail to reject H₀" |
| No contextual conclusion | Must interpret result in problem terms |
Platform-Specific Tips
Knowledge Checks can reset progress. Must select correct test type from dropdown. Rounding errors cause wrong answers. No partial credit.
Often requires exact interpretation wording. May include Excel output interpretation. Strict about "reject" vs "fail to reject" language.
Fill-in-blank with little feedback. May require hypotheses in symbolic form. Strict rounding requirements vary by problem.
Frequently Asked Questions
What is hypothesis testing?
Hypothesis testing is a statistical method for making decisions about population parameters based on sample data. You start with a null hypothesis (no effect), collect data, calculate a test statistic, and determine whether the evidence is strong enough to reject the null hypothesis.
What's the difference between null and alternative hypotheses?
The null hypothesis (H₀) is the default assumption — typically that there's no effect or no difference. The alternative hypothesis (H₁) is what you're trying to find evidence for. You never "prove" the alternative; you either reject or fail to reject the null.
What is a Type I error vs Type II error?
Type I error (false positive) is rejecting a true null hypothesis — concluding there's an effect when there isn't. Type II error (false negative) is failing to reject a false null — missing a real effect. Type I probability equals α; Type II probability is β.
When should I use a one-tailed vs two-tailed test?
Use one-tailed when testing for an effect in a specific direction (greater than OR less than). Use two-tailed when testing for any difference in either direction. Look for directional words: "increases," "decreases" → one-tailed; "different" → two-tailed.
When do I use a z-test vs t-test?
Use z-test when population σ is known and sample is large (n ≥ 30). Use t-test when using sample s or sample is small. In most real-world stats courses, you'll use t-tests because σ is rarely known.
What does it mean to "fail to reject" the null hypothesis?
It means you didn't find enough evidence to conclude the alternative is true — NOT that you proved the null is true. "Fail to reject" is intentionally cautious language. Absence of evidence isn't evidence of absence.
How does hypothesis testing relate to confidence intervals?
They're two sides of the same coin. A 95% confidence interval that excludes a hypothesized value corresponds to rejecting that hypothesis at α = 0.05. See our confidence intervals guide for more.
How does this relate to p-values?
The p-value is the probability of seeing your result (or more extreme) if H₀ is true. If p ≤ α, you reject the null. For a complete explanation, see our p-value guide.
Struggling with Hypothesis Testing?
Quizzes, exams, lab reports — we handle it all.