Section 5.1 — Joint & Marginal Distributions

Joint Probability Mass Function (Discrete)

(1) Nonnegativity: f_XY(x, y) ≥ 0
(2) Normalization: ∑_x ∑_y f_XY(x, y) = 1
(3) Event probability: f_XY(x, y) = P(X = x, Y = y)

1) Joint Distributions for Two Random Variables

Continuous (joint PDF):
f_XY(x, y) ≥ 0, ∬_ℝ² f_XY(x, y) dx dy = 1.

For a region R ⊆ ℝ²,
P{ (X, Y) ∈ R } = ∬_R f_XY(x, y) dx dy.

Joint CDF:
F_XY(x, y) = P{ X ≤ x, Y ≤ y } = ∫_−∞^x ∫_−∞^y f_XY(u, v) dv du.

Interpretation: the joint PMF assigns probabilities to grid points (x, y); the joint PDF is a surface whose volume over a region equals the probability that (X, Y) falls in that region.

1.1) Class Survey Example — Lotte Market Visit (23 Students)

A simple real dataset: 23 JU students visited Lotte Market (Jacksonville). Each student recorded (1) minutes spent and (2) money spent ($). We’ll treat these as joint random variables:

X = visit duration (min): {5–10, 10–30, 30–60}
Y = money spent ($): {under 10, 10–30, 30–80}

Observed counts (out of 23 total):

Y \\ X	5–10 min	10–30 min	30–60 min	Total f_Y(y)
Under $10	4	2	1	7
$10–30	2	5	3	10
$30–80	0	2	4	6
Total f_X(x)	6	9	8	23

Convert to probabilities by dividing each count by 23.

Example: f_XY(10–30, $10–30) = 5 / 23 ≈ 0.217.
Marginal f_X(10–30) = (2+5+2)/23 = 9/23 ≈ 0.391.
Marginal f_Y($10–30) = (2+5+3)/23 = 10/23 ≈ 0.435.

Interactive 3D Joint PMF Visualization

Drag to rotate. Each bar’s height is the joint probability f(X,Y). The total volume under all bars equals 1. (Shows that joint probability is now a 3D surface over (X,Y) categories.)

1.2) Marginals & Conditional Probability — Lotte Market (n=23)

Marginal of X (minutes)

f_X(x) = ∑_y f_XY(x,y)

Marginal of Y ($ spent)

f_Y(y) = ∑_x f_XY(x,y)

Data recap (counts): X ∈ {5–10, 10–30, 30–60}, Y ∈ {<10, 10–30, 30–80}.
Table (rows = Y, cols = X): [[4,2,1],[2,5,3],[0,2,4]] with total 23.

Conditional Probability Calculator

Compute P(Y | X)
Given X = Target Y =

Compute P(X | Y)
Given Y = Target X =

Reminder: For a row/column in the joint table, conditional probability is “cell ÷ that row/column total.” Example: P(Y=10–30 | X=10–30) = 5/9 ≈ 0.556, since the X=10–30 column has 9 students total.

1.3) Quick Real-World Examples of Joint Variables (no math)

Use these to recognize when you have a joint distribution. They say what X and Y are, what values they can take (the “support”), and why they’re modeled together.

Discrete–Discrete

Website traffic: X = # visits in an hour, Y = # checkouts in that hour. (Support: nonnegative integers.) Counts tend to move together.
Call center: X = # incoming calls 9–10am, Y = # dropped calls 9–10am. (Integers.) Joint because more calls can mean more drops.
Dice game: X = die 1 outcome, Y = die 2 outcome. (1–6 each.) Classic independent case.
Quality control: X = defects per unit A, Y = defects per unit B from same line. (Integers.) Dependence via shared conditions.

Continuous–Continuous

Network latencies: X = connect time (ms), Y = auth time (ms). (Positive reals.) Often Y ≥ X or both positive; can be dependent.
Finance intraday: X = stock A return (%, 1-min), Y = stock B return (%, 1-min). (Real numbers.) Correlated due to market factors.
Manufacturing: X = part length (mm), Y = part width (mm). (Positive reals within tolerances.) Joint for yield estimation.
Weather: X = temperature (°F), Y = humidity (%). (Ranges like 0–120 and 0–100.) Nonlinear relationship common.
Reliability: X = time to first failure, Y = time to full replacement. (Positive.) Natural constraint X < Y.
Medical: X = systolic BP, Y = diastolic BP. (Realistic ranges.) Strong biological dependence.

Mixed (one discrete, one continuous)

E-commerce: X = number of items in cart (integer), Y = total spend ($, continuous). Larger X tends to increase Y.
Queues: X = number waiting (integer), Y = individual wait time (minutes). More people, longer waits.
Education: X = # study sessions this week (integer), Y = exam score (0–100). Behavior–outcome linkage.

Supports you’ll see (described, not computed)

Rectangle: both variables free in ranges (e.g., length 18–22mm, width 8–12mm). Often independence is assumed then checked.
Triangle “X < Y”: ordering or sequence times (start vs finish, connect vs authorize). Only points with X less than Y are allowed.
Band/curve constraints: physical laws or policies tie values (e.g., speed vs fuel use within safe operating band).

Why joint?

Plan probabilities about both together: “What’s the chance wait < 5 min and queue size ≤ 3?”
Summarize each alone: get the marginals for X or Y (e.g., distribution of total spend regardless of cart size).
Condition on one: “Given 10 callers are waiting, what is expected wait?” (decision support, SLAs, staffing).
Test independence/correlation: do X and Y move together or not? (risk, hedging, causality hints).

Rule of thumb: If your question mentions both X and Y in the same event or decision, you’re in joint-distribution land.

2) Marginal Distributions

Discrete: f_X(x) = ∑_y f_XY(x, y), f_Y(y) = ∑_x f_XY(x, y).

Continuous: f_X(x) = ∫_−∞^∞ f_XY(x, y) dy, f_Y(y) = ∫_−∞^∞ f_XY(x, y) dx.

Marginal CDFs: F_X(x) = ∫_−∞^x f_X(u) du, F_Y(y) = ∫_−∞^{y f_Y(v) dv.}

If X and Y are independent, then f_XY(x, y) = f_X(x) f_Y(y) and F_XY(x, y) = F_X(x) F_Y(y).

2.1) ✅ TL;DR Summary — Symbols & Integration Variables

Symbol	Meaning	Integration variable
f_XY(x, y)	Joint PDF	variables x, y
f_X(x)	Marginal PDF of X	integrated over y
f_Y(y)	Marginal PDF of Y	integrated over x
F_X(x)	CDF of X	integrate f_X(u) with respect to u up to x
F_Y(y)	CDF of Y	integrate f_Y(v) with respect to v up to y

Note: u and v are just dummy variables of integration.

2.2) Mean & Variance from a Joint Distribution

From the marginals
E[X] = ∫_−∞^∞ x f_X(x) dx, E[Y] = ∫_−∞^{∞ y f_Y(y) dy.

Var(X) = ∫_−∞^{∞ (x−μ_X)² f_X(x) dx
= E[X²] − μ_X², where μ_X=E[X].

Var(Y) = ∫_−∞^{∞ (y−μ_Y)² f_Y(y) dy
= E[Y²] − μ_Y².}}}

Directly from the joint pdf/pmf
E[X] = &iint; x f_X,Y(x,y) dx dy, E[Y] = &iint; y f_X,Y(x,y) dx dy.
E[X²] = &iint; x² f_X,Y(x,y) dx dy, E[Y²] = &iint; y² f_X,Y(x,y) dx dy.

2.3) Covariance & Correlation (optional)

Cov(X,Y)=E[XY]−E[X]E[Y], ρ=Cov(X,Y)/√(Var(X)Var(Y)).

For f(x,y)=2e^−(x+y) on 0<x<y: E[XY]=1 ⇒ Cov=1−(1/2)(3/2)=1/4, ρ=1/√5≈0.447.

Quick worked values for our examples

Discrete table (Section 3)
f_X(1)=0.20, f_X(2)=0.25, f_X(3)=0.55.
f_Y(1)=0.28, f_Y(2)=0.25, f_Y(3)=0.17, f_Y(4)=0.30.

E[X] = 1·0.20 + 2·0.25 + 3·0.55 = 2.35.
E[Y] = 1·0.28 + 2·0.25 + 3·0.17 + 4·0.30 = 2.49.

Triangular-support continuous (Section 5)
Marginals: f_X(x)=2e^−2x (x>0), f_Y(y)=2e^−y−2e^−2y (y>0).

E[X] = 0.5, Var(X) = 0.25.
E[Y] = 1.5, Var(Y) = 1.25.

Tip: Use marginals when you have them; it’s usually simpler. For continuous variables, the pdf has units 1/(unit_x·unit_y), so ∫ x·f or ∫ y·f yields unit-consistent expectations.

3) Discrete Illustration (Joint PMF Table)

A joint PMF can be shown in a table; row/column sums give the marginals.

y \\ x	1	2	3	Marginal f_Y(y)
1	0.01	0.02	0.25	0.28
2	0.02	0.03	0.20	0.25
3	0.02	0.10	0.05	0.17
4	0.15	0.10	0.05	0.30
Marginal f_X(x)	0.20	0.25	0.55	1.00

How to read this table (very clear):

What are X and Y?
X = number of requests (1–3). Y = response-time category (1–4).

Cell value: f_XY(x,y) = P(X=x, Y=y).

Examples: P(3,1)=0.25; P(2,3)=0.10; P(1,4)=0.15; P(3,2)=0.20.

Total = 1.00.

What are the marginals?

f_X(x): add down the column. f_Y(y): add across the row.

Example: f_X(2)=0.25, f_Y(4)=0.30.

4) Continuous Example (Server Access Time, 0 < x < y)

Let X be connect time (ms) and Y authorization time (ms) with joint PDF on 0 < x < y:

f_XY(x, y) = 6×10⁻⁶ · e^{−0.001x − 0.002y}, 0 < x < y, 0 < y < ∞.

For a, b ≥ 0 and m = min(a, b),

P(X ≤ a, Y ≤ b) = (1 − e^−0.003m) − 3 e^−0.002b(1 − e^−0.001m).

Shaded triangle = support (0 < x < y).

Darker polygon = event region (X ≤ a, Y ≤ b) inside the support.

Geometry shows where to integrate; probability is the integral of the pdf over the darker region.

Upper limit a for X (ms):
Upper limit b for Y (ms):

Result: —

Reset to (1000, 2000)

Note: Complement probability = 1 − Result.

Check: with a = 1000, b = 2000, the probability ≈ 0.915.

4.1) Worked Derivation of P(X ≤ a, Y ≤ b) (textbook style)

Region split (b ≥ a case)

P = ∫_y=0^a ∫_x=0^{y f(x,y) dx dy
+
∫_y=a^{b ∫_x=0^{a f(x,y) dx dy,
where f(x,y)=6·10⁻⁶ e^{−0.001x−0.002y}, 0<x<y.}}}

Compute I₁

I₁ = ∫₀^{a ∫₀^{y 6·10⁻⁶ e^{−0.001x−0.002y} dx dy
= ∫₀^{a 6·10⁻⁶ e^−0.002y
[ ∫₀^{y e^−0.001x dx ] dy
= ∫₀^{a 0.006 ( e^−0.002y − e^−0.003y ) dy.}}}}}

⇒ I₁ = 0.006 [ −(1/0.002) e^−0.002y + (1/0.003) e^−0.003y ]₀^{a
= 1 − 3 e^−0.002a + 2 e^−0.003a.}

Compute I₂

I₂ = ∫_a^{b ∫₀^{a 6·10⁻⁶ e^{−0.001x−0.002y} dx dy
= ∫_a^{b 6·10⁻⁶ e^−0.002y
[ ∫₀^{a e^−0.001x dx ] dy
= 0.006 (1 − e^−0.001a) ∫_a^{b e^−0.002y dy.}}}}}

⇒ I₂ = 0.006 (1 − e^−0.001a) [ −(1/0.002) e^−0.002y ]_a^{b
= −3 e^−0.002b (1 − e^−0.001a) + 3 e^−0.002a (1 − e^−0.001a).}

Add and simplify

P = I₁ + I₂ = 1 − e^−0.003a − 3 e^−0.002b (1 − e^−0.001a).
(The underlined term comes from 2e^−0.003a − 3e^−0.003a.)

Let m = min(a,b). Then
P(X ≤ a, Y ≤ b) = 1 − e^−0.003m − 3 e^−0.002b (1 − e^−0.001m).

Plug in a = 1000, b = 2000

m = 1000, e^−0.003m=e⁻³, e^−0.002b=e⁻⁴, e^−0.001m=e⁻¹.
P = 1 − e⁻³ − 3 e⁻⁴ (1 − e⁻¹) = 0.915480. Complement = 0.084520.

5) Continuous (Very Clear): Solve k, Marginals, CDF + 3D View (support 0 < x < y)

This triangular-support example is independent from the server-time example above.

5.1 Define the PDF and find k

Let f(x,y) = k · e^−(x+y) on 0 < x < y < ∞ (0 otherwise). Find k by enforcing ∬ f = 1:

1 = ∫₀^∞ ∫₀^y k e^−(x+y) dx dy = k ∫₀^∞ e^−y (1 − e^−y) dy = k (1 − 1/2) = k · 1/2.

⇒ k = 2, so f(x,y) = 2 e^−(x+y) on 0 < x < y.

5.2 Marginal PDFs

f_X(x) = ∫_y=x^∞ 2e^−(x+y) dy = 2 e^−2x, x>0.

f_Y(y) = ∫_x=0^y 2e^−(x+y) dx = 2 e^−y − 2 e^−2y, y>0.

5.3 Joint CDF F(x,y) = P(X ≤ x, Y ≤ y)

Because the support is 0 < x < y, the rectangle [0,x]×[0,y] intersects the triangle differently depending on y ≤ x or y > x.

Case y ≤ x: F(x,y) = (1 − e^−y)².

Case y > x: F(x,y) = 1 − e^−2x − 2 e^−y + 2 e^−(x+y).

Shaded triangle = support (0 < x < y).

Darker polygon = event P(X≤a, Y≤b) inside the support.

Left shows the geometry (where to integrate).

a = X upper limit:

b = Y upper limit:

P(X ≤ a, Y ≤ b) = = —

Right: 3D wireframe is the pdf surface z = 2·e^−(x+y) (domain clipped to 0≤x,y≤5). Drag to rotate.
Probability = the volume under this surface above the darker region on the left.

6) Limits & Order of Integration — FAQ (very clear)

6.1 Why are the inner limits 0 to y?

Support: S = { (x,y): 0 < x < y < ∞ }. Fix y. The support allows x only between 0 and y. Therefore,

∬_S f(x,y) dx dy = ∫_y=0^∞ ∫_x=0^{y f(x,y) dx dy.}

6.2 Can I integrate dy first instead?

Yes. Fix x. Then the support allows y from x to ∞. So the equivalent order is

∬_S f(x,y) dy dx = ∫_x=0^∞ ∫_y=x^∞ f(x,y) dy dx.

By Fubini/Tonelli (nonnegative integrable pdf), both orders give the same result.

6.3 Where do the −∞ bounds appear?

In the joint CDF definition:

F_XY(x,y) = P(X≤x, Y≤y) = ∫_−∞^x ∫_−∞^y f(u,v) dv du.

For normalization of a pdf, integrate over the support, not −∞..∞ blindly.

6.4 Normalization solved both ways (they match)

(1) ∫_y=0^∞ ∫_x=0^y k e^−(x+y) dx dy = k ∫₀^∞ e^−y (1 − e^−y) dy = k (1 − 1/2) = k/2.

(2) ∫_x=0^∞ ∫_y=x^∞ k e^−(x+y) dy dx = k ∫₀^{∞ e^−x ( ∫_y=x^∞ e^−y dy ) dx
= k ∫₀^{∞ e^−x e^−x dx
= k/2.}}

Set either equal to 1 ⇒ k = 2.

6.5 Geometry recap

S is the triangle above the line y=x in the first quadrant. For each y>0, x runs 0→y (vertical slices). For each x>0, y runs x→∞ (horizontal slices). Limits always come from S.

7) Student FAQ

Short questions with short answers. Click to expand.

Q. What is a joint PDF vs. a joint PMF?

PMF f_XY(x,y) is for discrete X,Y (probability at grid points).
PDF f_XY(x,y) is for continuous X,Y (a surface). Probability of a region R is the area/volume under the surface: ∬_R f_XY.

Q. What is the support and why do I care?

The support is where f(x,y)>0 (allowed points). Draw it first.
All integration limits come from the support. If the pdf is 0 outside, you do not integrate there.

Q. How do I check if a joint pdf is valid?

(1) f(x,y) ≥ 0 on the support. (2) ∬_support f(x,y) dx dy = 1.
Example: f(x,y)=k e^−(x+y) on 0<x<y ⇒ k=2.

Q. When do I integrate dx first vs dy first?

Either order works (Fubini). Choose the one that makes limits simpler.
For 0<x<y: vertical slices ⇒ x:0→y; horizontal slices ⇒ y:x→∞. Both give the same result.

Q. Why do I sometimes see −∞ in integrals?

Only in the CDF definition: F_XY(x,y)=∫_−∞^x∫_−∞^yf(u,v)dv du.
For normalization or marginals, integrate over the support, not −∞..∞ blindly.

Q. How do I get marginals from a joint pdf?

f_X(x)=∫ f(x,y) dy with y-limits from the support at that x.
f_Y(y)=∫ f(x,y) dx with x-limits from the support at that y.
Example (0<x<y): f_X(x)=∫_y=x^∞ 2e^−(x+y)dy=2e^−2x.

Q. What does “independent” mean here?

Independent ⇔ f_XY(x,y)=f_X(x)f_Y(y) for all x,y (or CDFs multiply).
If X and Y are independent, knowing one tells you nothing about the other.

Q. Does zero correlation mean independence?

No. Zero covariance/correlation does not guarantee independence (except in special families like jointly normal). Do not rely on it.

Q. How do I compute P(a<X<b, c<Y<d) with a pdf?

Integrate over the intersection of the rectangle and the support. If the support is triangular (0<x<y), the rectangle might split into pieces — set up piecewise integrals or use the joint CDF if it fits.

Q. How do I get a conditional pdf?

f_X|Y(x|y)=f(x,y)/f_Y(y) on the part of the support where Y=y.
For 0<x<y with f=2e^−(x+y): f_X|Y(x|y) = (2e^−(x+y))/(2e^−y − 2e^−2y) = e^−x/(1 − e^−y) for 0<x<y.

Q. Discrete table: how do I compute a conditional probability?

P(X=x | Y=y) = f_XY(x,y) / (∑_x f_XY(x,y)) = cell / row-sum.

Independence test: check if each cell ≈ column-marginal × row-marginal.

Q. Common mistakes checklist

• Integrating over −∞..∞ instead of the support.
• Forgetting that PDF ≠ probability at a point (only areas give probability).
• Mixing up inner/outer limits; draw the region first.
• Assuming independence without checking factorization.
• Dropping units: keep track of what x and y represent.

Section 5.1 — Joint & Marginal Distributions

Joint Probability Mass Function (Discrete)

1) Joint Distributions for Two Random Variables

1.1) Class Survey Example — Lotte Market Visit (23 Students)

Interactive 3D Joint PMF Visualization

1.2) Marginals & Conditional Probability — Lotte Market (n=23)

Marginal of X (minutes)

Marginal of Y ($ spent)

Conditional Probability Calculator

1.3) Quick Real-World Examples of Joint Variables (no math)

Discrete–Discrete

Continuous–Continuous

Mixed (one discrete, one continuous)

Supports you’ll see (described, not computed)

Why joint?

2) Marginal Distributions

2.1) ✅ TL;DR Summary — Symbols & Integration Variables

2.2) Mean & Variance from a Joint Distribution

2.3) Covariance & Correlation (optional)

Quick worked values for our examples

3) Discrete Illustration (Joint PMF Table)

How to read this table (very clear):

What are the marginals?

4) Continuous Example (Server Access Time, 0 < x < y)

4.1) Worked Derivation of P(X ≤ a, Y ≤ b) (textbook style)

Region split (b ≥ a case)

Compute I1

Compute I2

Add and simplify

Plug in a = 1000, b = 2000

5) Continuous (Very Clear): Solve k, Marginals, CDF + 3D View (support 0 < x < y)

5.1 Define the PDF and find k

5.2 Marginal PDFs

5.3 Joint CDF F(x,y) = P(X ≤ x, Y ≤ y)

6) Limits & Order of Integration — FAQ (very clear)

6.1 Why are the inner limits 0 to y?

6.2 Can I integrate dy first instead?

6.3 Where do the −∞ bounds appear?

6.4 Normalization solved both ways (they match)

6.5 Geometry recap

7) Student FAQ

Compute I₁

Compute I₂