📘 9.1 Hypothesis Testing and Statistical Errors

1. Key Definitions

2. Visualizing Hypothesis Testing

This chart compares two normal distributions: one under H₀ (μ = 50) and one under H₁ (μ = 52). The blue curve shows the distribution under the null hypothesis. The red curve shows the distribution under the alternative hypothesis. The critical region is set using z = ±1.96, centered at 50.

Understanding Type I and II Errors

We calculate the critical values using the significance level α = 0.05 and z = ±1.96.
With μ₀ = 50, σ = 2.5, and n = 10, the standard error (SE) is σ / √n = 2.5 / √10 ≈ 0.79.
Therefore, critical values are 50 ± 1.96 × 0.79 → [48.45, 51.55].

Type I Error (α): The probability that we incorrectly reject H₀ when H₀ is true. This occurs when the sample mean falls outside [48.45, 51.55] under the blue curve. Since α = 0.05, the combined tail area outside this interval under H₀ totals 5% of the probability mass.

Type II Error (β): The probability that we fail to reject H₀ when the true mean is μ = 52 (H₁ is true). Even though 52 is above 51.55, the red distribution (centered at 52) has some probability mass inside the range [48.45, 51.55].
To find this probability, we calculate the area of the red curve that falls between 48.45 and 51.55.
Under H₁: the z-scores for 48.45 and 51.55 become:
z₁ = (48.45 - 52) / 0.79 ≈ -4.49
z₂ = (51.55 - 52) / 0.79 ≈ -0.57

Using standard normal distribution tables:
Φ(-0.57) ≈ 0.2843, Φ(-4.49) ≈ almost 0.
So β = Φ(-0.57) - Φ(-4.49) ≈ 0.2843.

Interpretation: There is about a 28.4% chance that we will incorrectly accept H₀ (i.e., fail to detect the true mean is 52).
Power = 1 − β = 1 − 0.2843 = 0.7157: About 71.6% chance to correctly reject H₀ if μ = 52.

3. Example Calculation

Suppose H₀: μ = 50, H₁: μ ≠ 50, σ = 2.5, n = 10. Test at α = 0.05.

Critical z = ±1.96 → Critical values = 50 ± 1.96×(2.5/√10) ≈ [48.45, 51.55]

If sample mean x̄ = 52:

z = (52 - 50)/(2.5/√10) ≈ 2.53 → Reject H₀

4. Try Your Own Values






5. How to Interpret

4. Practice Questions for Students

Practice 1 – Light Bulb Lifespan (Two-Tailed)

Question: A light bulb manufacturer claims the average lifespan of its bulbs is 1,000 hours. A sample of 36 bulbs shows a mean of 960 hours with a standard deviation of 80. Test the claim at the 0.05 significance level.

Try solving first, then click to reveal answer.

Claim: Mean lifespan is 1,000 hours. Sample mean = 960, σ = 80, n = 36.

H₀: μ = 1000, H₁: μ ≠ 1000

α = 0.05 (two-tailed) → Each tail gets α/2 = 0.025 → Critical z = ±1.96

SE = 80 / √36 = 13.33

z = (960 - 1000) / 13.33 = -3.00 → Reject H₀ → Lifespan differs from 1,000 hours.

Type I error = 5%

If true mean = 980: Critical values = 1000 ± 1.96 × 13.33 = [973.87, 1026.13]

z₁ = (973.87 - 980) / 13.33 ≈ -0.46, z₂ = (1026.13 - 980) / 13.33 ≈ 3.46

β = Φ(3.46) - Φ(-0.46) ≈ 0.9997 - 0.3228 = 0.6769 → Power ≈ 32.3%

Practice 2 – Exam Scores (One-Tailed)

Question: A professor claims students average at least 75 points on an exam. A sample of 25 students has a mean of 72 with a standard deviation of 5. Test the claim at the 0.05 level.

Try solving first, then click to reveal answer.

Claim: Students average at least 75 points. Sample mean = 72, σ = 5, n = 25.

H₀: μ ≥ 75, H₁: μ < 75

α = 0.05 (one-tailed) → Critical z = -1.645

SE = 5 / √25 = 1

z = (72 - 75) / 1 = -3 → Reject H₀ → Students score lower than 75.

Type I error = 5%

If true mean = 73: z = (-1.645 - (73 - 75)/1) = (-1.645 + 2) = 0.355

β = Φ(0.355) ≈ 0.638 → Power ≈ 1 - 0.638 = 0.362 → (Incorrect prior claim; β ≈ 36.2%, Power ≈ 63.8%)

Practice 3 – Manufacturing Defect Rate (Two-Tailed)

Question: A factory claims its defect rate is 3%. A sample of 16 items shows a defect rate of 2.5%, with a standard deviation of 0.6%. Test this at the 5% level.

Try solving first, then click to reveal answer.

H₀: μ = 3%, H₁: μ ≠ 3%, σ = 0.6%, n = 16, sample mean = 2.5%

SE = 0.6 / √16 = 0.15%

α = 0.05 (two-tailed) → Each tail gets α/2 = 0.025 → Critical z = ±1.96

z = (2.5 - 3) / 0.15 = -3.33 → Reject H₀ → Defect rate differs from 3%

Assume true mean = 2.8%:

Critical values in raw score = 3 ± 1.96 × 0.15 = [2.706, 3.294]

z₁ = (2.706 - 2.8)/0.15 = -0.63, z₂ = (3.294 - 2.8)/0.15 = 3.29

β = Φ(3.29) - Φ(-0.63) ≈ 0.9995 - 0.2643 ≈ 0.735 → Power ≈ 26.5%

🧠 How to Use α (Type I) and β (Type II) in Real Decisions

1. What They Really Mean

2. Math Recap

We assume:

Type I Error (α): Probability that we say "μ ≠ 50" even though μ = 50 → false alarm
Controlled by your chosen significance level. Usually α = 0.05 (5%).

Type II Error (β): Probability that we say "μ = 50" even though μ = 52 → missed effect
Calculated using overlap from the H₁ curve into the H₀ zone.

In our case: β ≈ 28.4%, so Power = 1 - β ≈ 71.6%

3. If α is high or low...

4. What’s the Damage?

Error Type What You Say Reality Damage Example
Type I (α) “It works!” Actually doesn’t work You approve a bad drug, waste money, risk safety
Type II (β) “It doesn’t work.” Actually does work You reject a useful medicine, miss a breakthrough

5. When Do You Care Most About α vs. β?

6. Conclusion

α and β are about risk of being wrong. You pick α to control false positives. You estimate β to check if your test is strong enough to detect the real effect.

Think of it like this:

💡 Final Tip: Good tests have low α, low β, and high power. But there’s a trade-off. You must choose what mistake matters most in your situation.

7. What Affects Type I and Type II Errors?

Type I Error (α) is set by you — usually 0.05 — and defines how strict you are. But Type II Error (β) depends on:

📉 Why Small Effects Are Hard to Detect

If μ₁ is very close to μ₀ (e.g., 50.0 vs 50.5), then even with a large sample size, the red and blue curves still overlap a lot. That means:

📊 Example:

Assume:

🛠️ Limitations of Hypothesis Testing

So: Always consider:

💡 Summary: Type II error (β) is like a hidden trap — it depends on sample size, effect size, and data variability. Always check your power before trusting a non-significant result.