📘 Session 3.3 - Mean and Variance of Discrete Random Variables

📐 Definitions

Mean (Expected Value): μ = E(X) = Σ x·f(x)
Variance: σ² = V(X) = Σ (x - μ)²·f(x) = Σ x²·f(x) - μ²
Also (algebraically):
V(X) = Σx (x − μ)² f(x) = Σx x² f(x) − 2μ Σx x f(x) + μ² = Σx x² f(x) − μ²
Standard Deviation: σ = √V(X)

The mean is a weighted average. The variance shows how spread out the values are around the mean.

📊 Example 3.7 – Digital Channel: Mean and Variance

Let X be the number of error bits in the next four bits transmitted. X ∈ {0,1,2,3,4} with:

P(X = 0) = 0.6561
P(X = 1) = 0.2916
P(X = 2) = 0.0486
P(X = 3) = 0.0036
P(X = 4) = 0.0001

Mean (μ = E[X]):

μ = 0(0.6561) + 1(0.2916) + 2(0.0486) + 3(0.0036) + 4(0.0001) = 0.4

Although X never takes the value 0.4, the expected value (weighted average) is 0.4.

Variance (table method):

x	x − 0.4	(x − 0.4)²	f(x)	f(x)(x − 0.4)²
0	−0.4	0.16	0.6561	0.104976
1	0.6	0.36	0.2916	0.104976
2	1.6	2.56	0.0486	0.124416
3	2.6	6.76	0.0036	0.024336
4	3.6	12.96	0.0001	0.001296

Variance (add the last column):

Variance is the weighted average of squared deviations:

V(X) = Σ f(x)·(x − μ)²

From the table, simply add up the last column:

0.104976

+

0.104976

+

0.124416

+

0.024336

+

0.001296

= 0.36

✔️ Shortcut check: E[X²] − (E[X])² = 0.52 − (0.4)² = 0.36

✔️ Shortcut check (step by step):

The shortcut formula is:
V(X) = E[X²] − (E[X])²

First compute E[X²]:
E[X²] = 0²·0.6561 + 1²·0.2916 + 2²·0.0486 + 3²·0.0036 + 4²·0.0001
= 0 + 0.2916 + 0.1944 + 0.0324 + 0.0016
= 0.52
Next compute (E[X])²:
We already found E[X] = 0.4, so
(E[X])² = (0.4)² = 0.16
Now subtract:
V(X) = 0.52 − 0.16 = 0.36

Final Answer: V(X) = 0.36

❓ Why NOT just average the probabilities?

Averaging probabilities like AVERAGE(0.6561, 0.2916, …) would tell you the mean of a list of five numbers that happen to be probabilities. That has no direct meaning for how many error bits you expect.

We want the average of the outcomes (0,1,2,3,4) over many packets, weighted by how likely each outcome is. That is exactly the definition:
μ = E[X] = Σ x·P(X=x)

Think of a big class experiment: for each 4-bit packet, record the number of error bits (X). After many packets, the class average of those X’s will be close to 0.4. That’s what the 0.4 “means”: on average, 0.4 errors per 4-bit packet.

(These probabilities match a Binomial model with n=4 bits and error probability p=0.1, so E[X] = n·p = 4·0.1 = 0.4.)

🧪 Mini Simulation: Watch the average approach 0.4

Assume each bit flips incorrectly with probability p = 0.1 independently (Binomial(4, 0.1)).

Trials (packets): mean ≈ —

As the number of packets grows, the sample mean of X should hover near 0.4.

📊 Example 3.8 – Revenue Decision (Marketing)

Design A: $3 million with P = 1 → E(X) = 3, σ = 0
Design B: $7 million with P = 0.3, $2 million with P = 0.7

E(Y) = 7×0.3 + 2×0.7 = 3.5
σ² = (7−3.5)²×0.3 + (2−3.5)²×0.7 = 5.25
σ = √5.25 ≈ 2.29

📌 Design B has higher expected return, but also higher risk.

🔁 Example 3.9 – Expected Value of a Function

What is E(X²) for the same PMF?

E(X²) = 0²×0.6561 + 1²×0.2916 + 2²×0.0486 + 3²×0.0036 + 4²×0.0001 = 0.52

⚠️ Note: E(X²) ≠ [E(X)]²

📏 Linear Transformations of a Random Variable

Mean: E(aX + b) = a·E(X) + b
Variance: V(aX + b) = a² · V(X)

🧠 Practice Question

Let X have values 1, 2, 3 with probabilities 0.2, 0.5, 0.3. Find μ, σ², σ.

μ = 1×0.2 + 2×0.5 + 3×0.3 = 2.1
E(X²) = 1²×0.2 + 2²×0.5 + 3²×0.3 = 4.9
σ² = E(X²) − μ² = 4.9 − (2.1)² = 0.49
σ = √0.49 = 0.7

📎 Excel Cheat-Sheet (PMF with Weights)

Put your x values in A2:A6 and the matching probabilities in B2:B6 (with SUM(B2:B6)=1).

Mean (μ = E[X]) =SUMPRODUCT(A2:A6, B2:B6)
E[X²] =SUMPRODUCT((A2:A6)^2, B2:B6)
Variance (σ²) definition =SUMPRODUCT((A2:A6 - SUMPRODUCT(A2:A6,B2:B6))^2, B2:B6)
Variance (σ²) shortcut =SUMPRODUCT((A2:A6)^2, B2:B6) - (SUMPRODUCT(A2:A6,B2:B6))^2
Std. dev. (σ) =SQRT( ... )

🧪 Try Your Own PMF

Edit probabilities for x = 0…4 (they’ll auto-normalize). Live results and a micro-check that E[X²] ≠ [E[X]]² are shown below.

x	P(X=x)

📏 Linear Transformations — Why variance scales by a²

1) Derivation (short and clear)

Mean:
  U = aX + b
  E[U] = E[aX + b] = a·E[X] + b

Variance:
  V(U) = E[(U − E[U])²]
       = E[(aX + b − (aE[X] + b))²]
       = E[(a(X − E[X]))²]
       = a² · E[(X − E[X])²]
       = a² · V(X)

Shift b cancels (doesn’t change spread). Scaling by a stretches distances by |a|, so squared distances scale by a².

2) Numeric demonstration (Example 3.7 PMF)

X ∈ {0,1,2,3,4} with probabilities [0.6561, 0.2916, 0.0486, 0.0036, 0.0001]. For X: μ_X=0.4 and V(X)=0.36.

Scale (a): Shift (b): a = 1.00, b = 0.00 → a² = 1.00

Summary: μ_X = 0.4, V(X) = 0.36 • μ_U = 0.4, V(U) = 0.36 • Check: a²·V(X) = 0.36

x	f(x)	(x−μ_X)²	Term_X=f(x)(x−μ_X)²	u = a·x + b	μ_U	(u−μ_U)²	Term_U=f(x)(u−μ_U)²	Ratio Term_U/Term_X
Sum = V(X)			0.36	Sum = V(U)			0.36	Ratios (when defined) should equal a²

✅ Conclusion — Notice the difference between the two variances

Mean shifts with b: E[aX+b] = a·E[X] + b. In your run, μ_X=0.4 and μ_U=3·0.4+5=6.2.
Variance ignores b but scales with a²: V(aX+b) = a²·V(X). The shift b just re-centers; it does not change spread.
For example: Your numbers (a=3, b=5): V(X)=0.36 → V(U)=3²·0.36 = 9·0.36 = 3.24. The “Ratio” column equals a² = 9.00 on every row where the denominator is defined—this is why the totals obey V(U)=a²V(X).
Units matter: variance is in squared units. If X is in “errors per packet,” then U=3X+5 has variance in “(3·errors)² = 9·errors².” (This is why variance scales by a².)
Standard deviation scales by |a|: σ(U)=|a|·σ(X). Here, σ(X)=√0.36=0.6 and σ(U)=√3.24=1.8 = 3·0.6.

Sanity checks you can try:

Change b only (keep a=1): μ changes; V stays the same.
Change a only (keep b=0): the “Ratio” column becomes a² everywhere; the bottom sums obey V(U)=a²V(X).
Try a negative a (e.g., a=-2): the ratio is still a²=4 — the sign doesn’t matter for variance.

Common mistakes to avoid:

Averaging the probabilities themselves (e.g., AVERAGE(0.6561, ...)) — that’s not an expected value.
Thinking b changes variance — it doesn’t; only a does, via a².