🔍 Session 2.8 - Bayes’ Theorem

Understanding Bayes' Theorem

Bayes’ Theorem helps us reverse conditional probabilities. Instead of knowing P(A | B), we use Bayes’ Theorem to compute it from P(B | A).

P(A | B) = [P(B | A) × P(A)] / P(B)

We often use this when we know how often a result happens given a condition (like test results), and we want to know how likely the condition is after seeing the result.

Bayes’ Theorem — Derivation and Tree (Pepper Example)

P(A∩B) = P(A)P(B|A) = P(B)P(A|B)
⇒ P(A|B) = [P(B|A)·P(A)] / P(B)
P(B) = P(B|A)P(A) + P(B|¬A)P(¬A)

Let A = Foreign, B = Likes Pepper. Use: P(A)=0.20, P(B|A)=0.80, P(B|¬A)=0.30.

P(A)=0.20 P(¬A)=0.80 A = Foreign P(B|A)=0.80 1−P(B|A)=0.20 B P(A∩B)=0.16 ¬B P(A∩¬B)=0.04 ¬A = USA P(B|¬A)=0.30 1−P(B|¬A)=0.70 B P(¬A∩B)=0.24 ¬B P(¬A∩¬B)=0.56 P(B) = 0.16 + 0.24 = 0.40 P(A|B) = 0.16 / 0.40 = 0.40
Likes Pepper (B) Not Like Pepper (¬B) Total
Foreign (A) 0.16 0.04 0.20
USA (¬A) 0.24 0.56 0.80
Total 0.40 0.60 1.00
Results among pepper-lovers (B):
P(Foreign | B) = 0.16 / 0.40 = 0.40 = 40%
P(USA | B) = 0.24 / 0.40 = 0.60 = 60%

General Form of Bayes' Theorem (Multiple Events)

If events E₁, E₂, ..., Eₖ are mutually exclusive and exhaustive, and B is any event, then:

P(E₁ | B) = [P(B | E₁) × P(E₁)] / [P(B | E₁)P(E₁) + P(B | E₂)P(E₂) + ... + P(B | Eₖ)P(Eₖ)]

This form is used when there are several possible causes for one outcome.

🧪 Example 2.26 – Contamination Problem

A semiconductor fails. What is the probability that high contamination was present?

Step 1 – Overall failure rate
P(F) = P(F|H)P(H) + P(F|H′)P(H′) = (0.10)(0.20) + (0.005)(0.80) = 0.020 + 0.004 = 0.024.
Common mistake: P(F) ≠ 0.10 + 0.005.
Those are conditional rates; weight by prevalence: P(F) = 0.10×0.20 + 0.005×0.80 = 0.024. Sanity check: 0.024 lies between 0.005 and 0.10.
Step 2 – Bayes’ Theorem (invert to get P(H|F))
P(H | F) = P(F | H) · P(H)P(F) = 0.10 · 0.200.024 = 0.0200.024 = 2024 = 5/6 ≈ 0.8333

Interpretation: Among failed chips, about 83% were produced under high contamination. It’s not 100% because some failures also occur when contamination isn’t high.

Frequency picture (per 1000 chips)
Condition Chips Fail Pass
High contamination (H) 200 20 180
Not high (H′) 800 4 796
Totals 1000 24 976

Given a failure, P(H|F) = 20/24 = 5/6.

Likelihood-ratio / odds view (why 83% is so high)
Prior odds for H:   P(H)/P(H′) = 0.20/0.80 = 1/4.
Likelihood ratio:   LR = P(F|H) / P(F|H′) = 0.10 / 0.005 = 20.
Posterior odds:   prior × LR = (1/4) × 20 = 5.
Convert to probability: 5 / (1 + 5) = 5/6 = 0.8333….

Intuition: a failure is 20× more likely under high contamination. That strong evidence flips prior odds of 1:4 into posterior odds of 5:1.

Avoid premature rounding—carry at least 3–4 decimals until the end. Here, 0.020/0.024 = 5/6 exactly.

🏥 Example 2.27 – Medical Diagnostic

A new test has:

You test positive. What is P(Sick | Positive)?

Step 1 – Write Bayes’ Theorem
P(Sick | Positive) = P(Positive | Sick) × P(Sick)P(Positive)
Step 2 – Expand denominator with law of total probability
P(Positive) = P(Positive | Sick)P(Sick) + P(Positive | Healthy)P(Healthy).
Step 3 – Plug in numbers
Numerator = (0.99)(0.0001) = 0.000099
Denominator = (0.99)(0.0001) + (0.05)(0.9999)
= 0.000099 + 0.049995 ≈ 0.050094

P(Sick | Positive) = 0.000099 / 0.050094 ≈ 0.00198 ≈ 0.2%.
Step 4 – Interpret in plain words
Although the test is very sensitive (99%) and fairly specific (95%), the disease is extremely rare (0.01%).
Out of all positives, most will actually be false positives. So if you test positive, the chance you really have the disease is only about 1 in 506.
Frequency picture (per 1,000,000 people)
Condition Population Positive Negative
Sick 100 99 1
Healthy 999,900 49,995 949,905
Totals 1,000,000 50,094 949,906

Among 50,094 positives, only 99 are real cases → 99 / 50,094 ≈ 0.2%.

Takeaway

Bayes’ Theorem — Dorms & Gender (ICE 9/8/2025)

Question. Students live in Dorm A (30%), Dorm B (50%), or Dorm C (20%). Female proportions by dorm: A = 90% female, B = 30% female, C = 50% female. If a randomly selected student is known to be female, find:

  1. P(Dorm A | Female)
  2. P(Dorm B | Female)
  3. P(Dorm C | Female)
Show solution
Formula to use (Bayes + Total Probability)
Posterior for a dorm D:  P(D | F) = [ P(F | D) × P(D) ] / [ P(F | A)×P(A) + P(F | B)×P(B) + P(F | C)×P(C) ]
Total female probability:  P(F) = P(F | A)×P(A) + P(F | B)×P(B) + P(F | C)×P(C)
Use decimals: 30% → 0.30, 90% → 0.90, etc.

Step 1 — Given

  • P(A) = 0.30, P(B) = 0.50, P(C) = 0.20
  • P(F | A) = 0.90, P(F | B) = 0.30, P(F | C) = 0.50

Step 2 — Compute the denominator P(F) (Total Probability)

P(F) = (0.90×0.30) + (0.30×0.50) + (0.50×0.20) = 0.27 + 0.15 + 0.10 = 0.52
DormP(Dorm)P(Female | Dorm)P(Female ∧ Dorm) = product
A0.300.900.90×0.30 = 0.27
B0.500.300.30×0.50 = 0.15
C0.200.500.50×0.20 = 0.10
Sum = P(F)0.27 + 0.15 + 0.10 = 0.52

Step 3 — Plug into Bayes’ Theorem (one line each)

P(A | F) = (0.90×0.30) / 0.52 = 0.27 / 0.52 = 27/52 ≈ 0.5192 (51.92%)
P(B | F) = (0.30×0.50) / 0.52 = 0.15 / 0.52 = 15/52 ≈ 0.2885 (28.85%)
P(C | F) = (0.50×0.20) / 0.52 = 0.10 / 0.52 = 10/52 = 5/26 ≈ 0.1923 (19.23%)

Check: 0.5192 + 0.2885 + 0.1923 = 1.0000

Show Why the Denominator Works
The denominator is the probability of being female overall, across all dorms. Each term is the chance of being in a dorm and being female from that dorm: P(F ∧ A) + P(F ∧ B) + P(F ∧ C). This is the Law of Total Probability.

Step 4 — Probability Tree (visual check)

Dorm A (0.30) B (0.50) C (0.20) Female | A = 0.90 → 0.27 Male | A = 0.10 → 0.03 Female | B = 0.30 → 0.15 Male | B = 0.70 → 0.35 Female | C = 0.50 → 0.10 Male | C = 0.50 → 0.10 Total Female 0.27 + 0.15 + 0.10 = 0.52

🧠 Try This Practice Problem

A factory uses two machines. Machine A produces 40% of items with a 2% defect rate. Machine B produces 60% of items with a 5% defect rate. An item is found to be defective. What’s the probability it came from Machine B?

Let D = defective, A = machine A, B = machine B.
P(D | B) = 0.05, P(B) = 0.6
P(D | A) = 0.02, P(A) = 0.4
P(D) = 0.05 × 0.6 + 0.02 × 0.4 = 0.03 + 0.008 = 0.038
P(B | D) = P(D | B) × P(B) / P(D) = 0.03 / 0.038 ≈ 0.789

📢 Student Q&A (Bayes’ Theorem)

Q1: Why isn’t P(H | F) the same as P(F | H)?

A1: They answer different questions. P(F|H) is “fail rate under high contamination.” P(H|F) is “chance contamination was high given a failure.” Bayes connects them: P(H|F) = [P(F|H)·P(H)] / P(F). You must also know the prior P(H) and the overall rate P(F).

Q2: Where does P(F) in the denominator come from?

A2: From the law of total probability: P(F) = P(F|H)P(H) + P(F|H′)P(H′) (for two causes). It’s the weighted average of failure rates across all possible conditions, not “0.10 + 0.005”.

Q3: What’s the “base-rate fallacy” in Bayes problems?

A3: Ignoring the prior P(H). Even a strong indicator (large P(F|H)) can yield a modest P(H|F) if P(H) is small. Always combine the likelihood (P(F|H)) with the base rate P(H).

Q4: How do I handle multiple possible causes (H₁, H₂, …, Hₖ)?

A4: Use the multi-cause form: P(Hᵢ | F) = [P(F|Hᵢ)·P(Hᵢ)] / Σⱼ P(F|Hⱼ)·P(Hⱼ). It’s the same idea—posterior is proportional to prior × likelihood, normalized by the total evidence.

Q5: Any quick intuition for P(H | F) = 5/6 in the contamination example?

A5: Think in counts. Per 1000 chips: 200 are high-contam → 10% fail = 20 fails; 800 are not-high → 0.5% fail = 4 fails. Among all 24 failures, 20 came from high contamination → 20/24 = 5/6 ≈ 0.8333.