Hypothesis Testing Explained: Complete Guide for Statistics Students

Quick Answer

Hypothesis testing is a 5-step process: (1) State null (H₀) and alternative (H₁) hypotheses, (2) Choose significance level α, (3) Calculate test statistic, (4) Find p-value or critical value, (5) Make decision and state conclusion. If p ≤ α, reject H₀. The logic is backwards from intuition — you don't prove the alternative, you look for evidence strong enough to reject the null.

Hypothesis testing is one of the most common reasons students seek statistics help. The logic feels backwards, the terminology is abstract, and the formulas can overwhelm — especially under timed conditions. This guide walks you through everything you need to know, from the basics to worked examples, in plain English.

What Is Hypothesis Testing?

Hypothesis testing is a method for making decisions about populations based on sample data. It's the backbone of statistical inference — from medical research ("Does this drug work?") to marketing ("Did this ad campaign increase sales?") to quality control ("Is this batch defective?").

The core question: Is the pattern I see in my sample real, or just random chance?

Key Terms

Term Definition
Null Hypothesis (H₀) The default assumption — no effect, no difference
Alternative Hypothesis (H₁) What you're trying to find evidence for
Significance Level (α) Threshold for rejecting H₀ (typically 0.05)
Test Statistic Standardized value (z or t) calculated from sample
P-Value Probability of seeing your result (or more extreme) if H₀ is true

For a deep dive on p-values specifically, see our complete p-value guide.

Null vs Alternative Hypothesis

Before running any test, you define two competing claims:

  • Null Hypothesis (H₀): The status quo — there's no effect, no difference, no relationship. This is what you're trying to disprove.
  • Alternative Hypothesis (H₁): What you're looking for evidence of — there is an effect, difference, or relationship.

The Backwards Logic

This is where hypothesis testing gets confusing: You don't try to prove the alternative directly. Instead, you ask: "If the null were true, how unlikely is my observed result?" If it's very unlikely (p ≤ α), you reject the null and accept the alternative by default.

Analogy: It's like a court trial. H₀ = "defendant is innocent" (the default). You don't prove guilt directly — you look for evidence strong enough to reject the presumption of innocence beyond reasonable doubt.

Example

Research question: Does a new tutoring program improve test scores?

  • H₀: μtutoring = μno tutoring (no difference)
  • H₁: μtutoring > μno tutoring (tutoring improves scores)

Understanding Alpha (Significance Level)

Alpha (α) is the threshold for deciding whether to reject the null hypothesis. It represents the maximum probability of Type I error you're willing to accept — rejecting a true null.

Alpha Level Interpretation When Used
α = 0.10 10% risk of false positive Exploratory research
α = 0.05 5% risk of false positive Standard (most common)
α = 0.01 1% risk of false positive High-stakes decisions

The decision rule: If p-value ≤ α, reject H₀. If p-value > α, fail to reject H₀.

Type I vs Type II Errors

Because you're making decisions from limited data, there's always risk of error. These fall into two categories:

Type I vs Type II error grid showing possible outcomes in hypothesis testing
Error Type What Happens Probability Example
Type I (False Positive) Reject true H₀ α Concluding drug works when it doesn't
Type II (False Negative) Fail to reject false H₀ β Missing that drug actually works

The tradeoff: Reducing Type I error (lowering α) increases Type II error (β) unless you increase sample size. You can't minimize both simultaneously without collecting more data.

One-Tailed vs Two-Tailed Tests

The direction of your alternative hypothesis determines whether you run a one-tailed or two-tailed test:

Comparison of one-tailed and two-tailed tests showing rejection regions
Test Type H₁ Form Critical Value (α=0.05) Use When
Two-Tailed μ ≠ μ₀ ±1.96 Testing for any difference
Right-Tailed μ > μ₀ +1.645 "Increases," "greater than"
Left-Tailed μ < μ₀ -1.645 "Decreases," "less than"

Watch for directional words: "Is the mean different?" → two-tailed. "Does treatment increase scores?" → right-tailed. "Does drug reduce blood pressure?" → left-tailed.

Z-Test vs T-Test: When to Use Which

The main difference is whether you know the population standard deviation:

Use Z-Test When Use T-Test When
Population σ is known Using sample s to estimate σ
Large sample (n ≥ 30) Small sample (n < 30)
Testing proportions Testing means with unknown σ

The Formulas

Z-Test: z = (x̄ − μ₀) / (σ / √n)

T-Test: t = (x̄ − μ₀) / (s / √n)

Practical tip: In most real-world stats courses, you'll use t-tests because population σ is rarely known. When in doubt, use t-test unless the problem explicitly gives you σ.

The 5-Step Hypothesis Testing Process

Every hypothesis test follows the same general structure:

5-step hypothesis testing process flowchart

Step 1: State Hypotheses

Write H₀ and H₁ clearly. Determine if one-tailed or two-tailed.

Step 2: Choose Alpha

Usually α = 0.05 unless otherwise specified.

Step 3: Calculate Test Statistic

Use z or t formula depending on whether σ is known.

Step 4: Find P-Value

Or compare test statistic to critical value.

Step 5: State Conclusion

Reject or fail to reject H₀, then interpret in context.

Worked Examples

Z-Test Example

Problem: A professor claims the average exam score is 75. You sample 50 students and find x̄ = 72. Population σ = 10. Test at α = 0.05 whether the mean differs from 75.

Step 1: H₀: μ = 75 vs H₁: μ ≠ 75 (two-tailed)

Step 2: α = 0.05

Step 3: z = (72 - 75) / (10/√50) = -3 / 1.414 = -2.12

Step 4: Critical values: ±1.96. Since |-2.12| > 1.96...

Step 5: Reject H₀. Evidence suggests the mean differs from 75.

T-Test Example

Problem: A new teaching method is tested on 20 students. Mean = 81, sample s = 8. National average = 78. Test at α = 0.05 whether the new method improves scores.

Step 1: H₀: μ = 78 vs H₁: μ > 78 (right-tailed)

Step 2: α = 0.05

Step 3: t = (81 - 78) / (8/√20) = 3 / 1.789 = 1.68

Step 4: df = 19, critical t = 1.729. Since 1.68 < 1.729...

Step 5: Fail to reject H₀. Insufficient evidence that method improves scores.

Common Mistakes

Mistake Why It Costs Points
Mixing up H₀ and H₁ Entire conclusion becomes invalid
Wrong tail direction Incorrect critical value and p-value
Using z-test when t-test needed Wrong test statistic and conclusion
Saying "accept H₀" Correct term is "fail to reject H₀"
No contextual conclusion Must interpret result in problem terms

Platform-Specific Tips

ALEKS Statistics

Knowledge Checks can reset progress. Must select correct test type from dropdown. Rounding errors cause wrong answers. No partial credit.

Pearson MyStatLab

Often requires exact interpretation wording. May include Excel output interpretation. Strict about "reject" vs "fail to reject" language.

Cengage WebAssign

Fill-in-blank with little feedback. May require hypotheses in symbolic form. Strict rounding requirements vary by problem.

Frequently Asked Questions

What is hypothesis testing?

Hypothesis testing is a statistical method for making decisions about population parameters based on sample data. You start with a null hypothesis (no effect), collect data, calculate a test statistic, and determine whether the evidence is strong enough to reject the null hypothesis.

What's the difference between null and alternative hypotheses?

The null hypothesis (H₀) is the default assumption — typically that there's no effect or no difference. The alternative hypothesis (H₁) is what you're trying to find evidence for. You never "prove" the alternative; you either reject or fail to reject the null.

What is a Type I error vs Type II error?

Type I error (false positive) is rejecting a true null hypothesis — concluding there's an effect when there isn't. Type II error (false negative) is failing to reject a false null — missing a real effect. Type I probability equals α; Type II probability is β.

When should I use a one-tailed vs two-tailed test?

Use one-tailed when testing for an effect in a specific direction (greater than OR less than). Use two-tailed when testing for any difference in either direction. Look for directional words: "increases," "decreases" → one-tailed; "different" → two-tailed.

When do I use a z-test vs t-test?

Use z-test when population σ is known and sample is large (n ≥ 30). Use t-test when using sample s or sample is small. In most real-world stats courses, you'll use t-tests because σ is rarely known.

What does it mean to "fail to reject" the null hypothesis?

It means you didn't find enough evidence to conclude the alternative is true — NOT that you proved the null is true. "Fail to reject" is intentionally cautious language. Absence of evidence isn't evidence of absence.

How does hypothesis testing relate to confidence intervals?

They're two sides of the same coin. A 95% confidence interval that excludes a hypothesized value corresponds to rejecting that hypothesis at α = 0.05. See our confidence intervals guide for more.

How does this relate to p-values?

The p-value is the probability of seeing your result (or more extreme) if H₀ is true. If p ≤ α, you reject the null. For a complete explanation, see our p-value guide.

Struggling with Hypothesis Testing?

Quizzes, exams, lab reports — we handle it all.

Get a Free Quote

A/B Grade Guarantee

We guarantee results — or you get a full refund.

Learn More