Bayes’ Theorem helps us reverse conditional probabilities. Instead of knowing P(A | B), we use Bayes’ Theorem to compute it from P(B | A).
P(A | B) = [P(B | A) × P(A)] / P(B)
We often use this when we know how often a result happens given a condition (like test results), and we want to know how likely the condition is after seeing the result.
Bayes’ Theorem — Derivation and Tree (Pepper Example)
Common mistake:P(F) ≠ 0.10 + 0.005.
Those are conditional rates; weight by prevalence:
P(F) = 0.10×0.20 + 0.005×0.80 = 0.024.
Sanity check: 0.024 lies between 0.005 and 0.10.
Interpretation: Among failed chips, about 83% were produced under high contamination. It’s not 100% because some failures also occur when contamination isn’t high.
Step 4 – Interpret in plain words
Although the test is very sensitive (99%) and fairly specific (95%), the disease is extremely rare (0.01%).
Out of all positives, most will actually be false positives. So if you test positive, the chance you really have the disease is only about 1 in 506.
Frequency picture (per 1,000,000 people)
Condition
Population
Positive
Negative
Sick
100
99
1
Healthy
999,900
49,995
949,905
Totals
1,000,000
50,094
949,906
Among 50,094 positives, only 99 are real cases → 99 / 50,094 ≈ 0.2%.
Takeaway
Even good tests can give misleading results when the disease is rare.
This is called the base-rate fallacy: people ignore how small the prior probability is.
Always combine test accuracy and disease prevalence when interpreting results.
Bayes’ Theorem — Dorms & Gender (ICE 9/8/2025)
Question. Students live in Dorm A (30%), Dorm B (50%), or Dorm C (20%).
Female proportions by dorm: A = 90% female, B = 30% female, C = 50% female.
If a randomly selected student is known to be female, find:
The denominator is the probability of being female overall, across all dorms.
Each term is the chance of being in a dorm and being female from that dorm:
P(F ∧ A) + P(F ∧ B) + P(F ∧ C). This is the Law of Total Probability.
Step 4 — Probability Tree (visual check)
🧠 Try This Practice Problem
A factory uses two machines. Machine A produces 40% of items with a 2% defect rate. Machine B produces 60% of items with a 5% defect rate. An item is found to be defective. What’s the probability it came from Machine B?
Let D = defective, A = machine A, B = machine B.
P(D | B) = 0.05, P(B) = 0.6
P(D | A) = 0.02, P(A) = 0.4
P(D) = 0.05 × 0.6 + 0.02 × 0.4 = 0.03 + 0.008 = 0.038
P(B | D) = P(D | B) × P(B) / P(D) = 0.03 / 0.038 ≈ 0.789
📢 Student Q&A (Bayes’ Theorem)
Q1: Why isn’t P(H | F) the same as P(F | H)?
A1: They answer different questions. P(F|H) is “fail rate under high contamination.” P(H|F) is “chance contamination was high given a failure.” Bayes connects them: P(H|F) = [P(F|H)·P(H)] / P(F). You must also know the prior P(H) and the overall rate P(F).
Q2: Where does P(F) in the denominator come from?
A2: From the law of total probability: P(F) = P(F|H)P(H) + P(F|H′)P(H′) (for two causes). It’s the weighted average of failure rates across all possible conditions, not “0.10 + 0.005”.
Q3: What’s the “base-rate fallacy” in Bayes problems?
A3: Ignoring the prior P(H). Even a strong indicator (large P(F|H)) can yield a modest P(H|F) if P(H) is small. Always combine the likelihood (P(F|H)) with the base rate P(H).
Q4: How do I handle multiple possible causes (H₁, H₂, …, Hₖ)?
A4: Use the multi-cause form: P(Hᵢ | F) = [P(F|Hᵢ)·P(Hᵢ)] / Σⱼ P(F|Hⱼ)·P(Hⱼ). It’s the same idea—posterior is proportional to prior × likelihood, normalized by the total evidence.
Q5: Any quick intuition for P(H | F) = 5/6 in the contamination example?
A5: Think in counts. Per 1000 chips: 200 are high-contam → 10% fail = 20 fails; 800 are not-high → 0.5% fail = 4 fails. Among all 24 failures, 20 came from high contamination → 20/24 = 5/6 ≈ 0.8333.