Session 4.3 — Mean & Variance (Continuous)
Expected value (mean) for a continuous variable
In words.
- The mean is the long‑run balance point of the distribution.
- Weight each value \(x\) by how likely it is (its density) and integrate.
\(E[X]=\int_{-\infty}^{\infty} x\,f(x)\,dx.\)
Variance and the shortcut formula
In words.
- Variance measures typical squared distance from the mean.
- Shortcut: compute \(E[X^2]\) and subtract \(\mu^2\).
\(\operatorname{Var}(X)=E[(X-\mu)^2]=\int (x-\mu)^2f(x)dx=E[X^2]-\mu^2.\)
Session 4.4 — Uniform(a,b)
Definition of Uniform(a,b)
In words.
- All values between \(a\) and \(b\) are equally likely; nothing outside.
\(f(x)=\frac{1}{b-a}\) for \(a\le x\le b\); 0 otherwise.
Mean and variance for Uniform(a,b)
In words.
- The mean is the midpoint; spread grows with the length of the interval.
\(E[X]=\tfrac{a+b}{2}\), \(\operatorname{Var}(X)=\tfrac{(b-a)^2}{12}.\)
Session 4.5 — Normal & z‑Scores
What is a z‑score (standardization)?
In words.
- It tells you how many standard deviations a value is from the mean.
- Use it to convert any Normal(\(\mu,\sigma\)) to Standard Normal.
\(z=\dfrac{x-\mu}{\sigma}\). If \(X\sim N(\mu,\sigma^2)\), then \(Z=\frac{X-\mu}{\sigma}\sim N(0,1).\)
Using \(\Phi\) for Normal probabilities
In words.
- Convert to \(Z\), then use \(\Phi\) (or Excel
NORM.DIST). Between two values? subtract two CDFs.
\(P(a\le X\le b)=\Phi(\tfrac{b-\mu}{\sigma})-\Phi(\tfrac{a-\mu}{\sigma}).\)
Empirical 68–95–99.7% rule
In words.
- About 68% within 1 sd of the mean, 95% within 2 sd, 99.7% within 3 sd.
\(P(|X-\mu|\le k\sigma)\approx 0.68,0.95,0.997\) for \(k=1,2,3\) when \(X\) is roughly Normal.
Quantiles / percentiles for Normal
In words.
- Find the \(z\) with area \(p\) to the left, then un‑standardize back to \(x\).
\(x_p=\mu+\sigma z_p\), where \(z_p=\Phi^{-1}(p)\) (Excel: NORM.INV(p,μ,σ)).
Session 4.7 — Exponential(λ)
Definition and key summaries of Exponential(λ)
In words.
- Models waiting times between random arrivals; only supports nonnegative times.
- One parameter: the rate \(\lambda\) per time unit.
\(f(x)=\lambda e^{-\lambda x}\) for \(x\ge 0\); \(E[X]=1/\lambda\), \(\operatorname{Var}(X)=1/\lambda^2\).
Memoryless property
In words.
- Past waiting doesn’t change future waiting. “No aging.”
\(P(X>s+t\mid X>s)=P(X>t)\). Among continuous distributions, only Exponential has this.
Session 4.10 — Lognormal
What is a lognormal variable?
In words.
- A positive, right‑skewed variable whose log is Normal.
- Common for income, prices, failure times when multiplicative effects matter.
If \(\ln Y\sim N(\mu,\sigma^2)\), then \(Y\) is lognormal.
Key summaries for lognormal
In words.
- Median equals the geometric mean; mean is larger because of right tail.
\(\operatorname{median}(Y)=e^{\mu}\), \(E[Y]=e^{\mu+\sigma^2/2}\), \(\operatorname{mode}=e^{\mu-\sigma^2}.\)
Why log‑transform skewed data?
In words.
- Taking logs reduces right‑skew, making distributions more symmetric and easier to model.
If \(Y\) is lognormal, then \(\ln Y\sim N(\mu,\sigma^2)\) (approximately symmetric).
Session 5.1 — Joint Distributions
Joint pmf/pdf
In words.
- Describes the probability for pairs \((X,Y)\), not each alone.
\(f_{X,Y}(x,y)\ge 0\), with total probability 1 (sum or integral over support).
Marginal distributions
In words.
- What each variable does by itself — totals across the other variable.
Discrete: \(f_X(x)=\sum_y f_{X,Y}(x,y)\). Continuous: \(f_X(x)=\int f_{X,Y}(x,y)\,dy\). Likewise for \(f_Y\).
Support checks and totals
In words.
- Probabilities must be nonnegative and the table/region must sum/integrate to 1.
\(\sum_{x}\sum_{y} f_{X,Y}(x,y)=1\) (discrete) or \(\int\!\int f_{X,Y}(x,y)\,dx\,dy=1\) (continuous).
Session 5.2 — Conditional Distributions
Conditional pmf/pdf
In words.
- Update your distribution for \(Y\) after learning a value of \(X\).
\(f_{Y|X}(y|x)=\dfrac{f_{X,Y}(x,y)}{f_X(x)}\) when \(f_X(x)>0\). (Replace sums/integrals appropriately.)
Conditional expectation
In words.
- The average of \(Y\) among cases with the same \(X=x\).
\(E[Y\mid X=x]=\sum_y y\,f_{Y|X}(y|x)\) (discrete) or \(\int y\,f_{Y|X}(y|x)\,dy\) (continuous).
Session 5.3 — Independence
Definition of independence
In words.
- Knowing \(X\) tells you nothing about \(Y\); the joint behavior factors into separate pieces.
\(X\perp Y\) iff \(f_{X,Y}(x,y)=f_X(x)\,f_Y(y)\) for all \((x,y)\) on the support.
Product/conditioning tests for tables
In words.
- In a contingency table, compare a cell to the product of its row and column marginals.
Check \(f(x_0,y_0)\overset{?}{=} f_X(x_0)f_Y(y_0)\). Equivalently, \(f_{Y|X}(y_0|x_0)\overset{?}{=} f_Y(y_0).\)
Session 5.4 — Covariance & Correlation
Covariance: what the sign means
In words.
- Positive: they tend to be high together and low together.
- Negative: when one is high, the other tends to be low.
\(\operatorname{Cov}(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]\). Sample: \(s_{XY}=\frac{\sum (x_i-\bar x)(y_i-\bar y)}{n-1}.\)
Correlation: scale‑free measure
In words.
- Covariance divided by both standard deviations; lives in [−1, 1].
- Unaffected by changing measurement units.
\(\rho=\dfrac{\operatorname{Cov}(X,Y)}{\sigma_X\sigma_Y}\in[-1,1]\). Sample: \(r=\dfrac{s_{XY}}{s_X s_Y}.\)
When correlation is undefined / zero ≠ independence
In words.
- Undefined if either variable has zero variance (no spread).
- Zero correlation can still have a curved relationship; independence is stronger (except for special cases like bivariate Normal).
If \(\sigma_X=0\) or \(\sigma_Y=0\), \(\rho\) undefined. In general, \(\rho=0\nRightarrow X\perp Y\) (unless jointly Normal).
Session 6.x — Descriptive & Graphics
Sample vs population variance
In words.
- Sample variance uses \(n-1\) in the denominator (unbiased for \(\sigma^2\)).
- Population variance uses \(n\) (or integrates for a model).
Sample: \(s^2=\tfrac{\sum (x_i-\bar x)^2}{n-1}\). Population: \(\sigma^2=E[(X-\mu)^2].\)
Stem‑and‑leaf plot
In words.
- Shows shape while preserving exact values — great for small datasets.
No formula; split each value into a stem (leading digits) and a leaf (last digit).
Histogram and bin width choice
In words.
- Bars show counts in intervals; too‑wide bins hide detail; too‑narrow bins add noise.
Rules of thumb: Sturges, Freedman–Diaconis (\(\text{bin width}=2\,\text{IQR}\,n^{-1/3}\)).
Box plot and the 1.5·IQR rule
In words.
- Box = middle 50% (Q1 to Q3) with a line at the median; whiskers reach typical values.
- Points beyond 1.5·IQR from Q1 or Q3 are flagged as potential outliers.
IQR = Q3 − Q1. Outlier cutoffs: \([\text{Q1}-1.5\,\text{IQR},\;\text{Q3}+1.5\,\text{IQR}]\).
Scatter plot interpretation checklist
In words.
- Look for form (linear/curved), direction (\(+/-\)), strength, clusters, and outliers.
No single formula; correlation \(r\) summarizes linear strength but can miss curves/outliers.
Normal Q–Q plot (what to look for)
In words.
- Points near the line ⇒ roughly Normal.
- Systematic S‑shape ⇒ skew; bowed ends ⇒ heavy or light tails.
Plot order statistics \(x_{(i)}\) against Normal quantiles \(z_{(i)}\). Deviations from the line diagnose shape departures.