Session 4.3 — Mean & Variance (Continuous)
    Expected value (mean) for a continuous variable
      
        In words.
        
          - The mean is the long‑run balance point of the distribution.
- Weight each value \(x\) by how likely it is (its density) and integrate.
 
      \(E[X]=\int_{-\infty}^{\infty} x\,f(x)\,dx.\)
    Variance and the shortcut formula
      
        In words.
        
          - Variance measures typical squared distance from the mean.
- Shortcut: compute \(E[X^2]\) and subtract \(\mu^2\).
 
      \(\operatorname{Var}(X)=E[(X-\mu)^2]=\int (x-\mu)^2f(x)dx=E[X^2]-\mu^2.\)
    Session 4.4 — Uniform(a,b)
    Definition of Uniform(a,b)
      
        In words.
        
          - All values between \(a\) and \(b\) are equally likely; nothing outside.
 
      \(f(x)=\frac{1}{b-a}\) for \(a\le x\le b\); 0 otherwise.
    Mean and variance for Uniform(a,b)
      
        In words.
        
          - The mean is the midpoint; spread grows with the length of the interval.
 
      \(E[X]=\tfrac{a+b}{2}\), \(\operatorname{Var}(X)=\tfrac{(b-a)^2}{12}.\)
    Session 4.5 — Normal & z‑Scores
    What is a z‑score (standardization)?
      
        In words.
        
          - It tells you how many standard deviations a value is from the mean.
- Use it to convert any Normal(\(\mu,\sigma\)) to Standard Normal.
 
      \(z=\dfrac{x-\mu}{\sigma}\). If \(X\sim N(\mu,\sigma^2)\), then \(Z=\frac{X-\mu}{\sigma}\sim N(0,1).\)
    Using \(\Phi\) for Normal probabilities
      
        In words.
        
          - Convert to \(Z\), then use \(\Phi\) (or Excel NORM.DIST). Between two values? subtract two CDFs.
 
      \(P(a\le X\le b)=\Phi(\tfrac{b-\mu}{\sigma})-\Phi(\tfrac{a-\mu}{\sigma}).\)
    Empirical 68–95–99.7% rule
      
        In words.
        
          - About 68% within 1 sd of the mean, 95% within 2 sd, 99.7% within 3 sd.
 
      \(P(|X-\mu|\le k\sigma)\approx 0.68,0.95,0.997\) for \(k=1,2,3\) when \(X\) is roughly Normal.
    Quantiles / percentiles for Normal
      
        In words.
        
          - Find the \(z\) with area \(p\) to the left, then un‑standardize back to \(x\).
 
      \(x_p=\mu+\sigma z_p\), where \(z_p=\Phi^{-1}(p)\) (Excel: NORM.INV(p,μ,σ)).
    Session 4.7 — Exponential(λ)
    Definition and key summaries of Exponential(λ)
      
        In words.
        
          - Models waiting times between random arrivals; only supports nonnegative times.
- One parameter: the rate \(\lambda\) per time unit.
 
      \(f(x)=\lambda e^{-\lambda x}\) for \(x\ge 0\); \(E[X]=1/\lambda\), \(\operatorname{Var}(X)=1/\lambda^2\).
    Memoryless property
      
        In words.
        
          - Past waiting doesn’t change future waiting. “No aging.”
 
      \(P(X>s+t\mid X>s)=P(X>t)\). Among continuous distributions, only Exponential has this.
    Session 4.10 — Lognormal
    What is a lognormal variable?
      
        In words.
        
          - A positive, right‑skewed variable whose log is Normal.
- Common for income, prices, failure times when multiplicative effects matter.
 
      If \(\ln Y\sim N(\mu,\sigma^2)\), then \(Y\) is lognormal.
    Key summaries for lognormal
      
        In words.
        
          - Median equals the geometric mean; mean is larger because of right tail.
 
      \(\operatorname{median}(Y)=e^{\mu}\), \(E[Y]=e^{\mu+\sigma^2/2}\), \(\operatorname{mode}=e^{\mu-\sigma^2}.\)
    Why log‑transform skewed data?
      
        In words.
        
          - Taking logs reduces right‑skew, making distributions more symmetric and easier to model.
 
      If \(Y\) is lognormal, then \(\ln Y\sim N(\mu,\sigma^2)\) (approximately symmetric).
    Session 5.1 — Joint Distributions
    Joint pmf/pdf
      
        In words.
        
          - Describes the probability for pairs \((X,Y)\), not each alone.
 
      \(f_{X,Y}(x,y)\ge 0\), with total probability 1 (sum or integral over support).
    Marginal distributions
      
        In words.
        
          - What each variable does by itself — totals across the other variable.
 
      Discrete: \(f_X(x)=\sum_y f_{X,Y}(x,y)\). Continuous: \(f_X(x)=\int f_{X,Y}(x,y)\,dy\). Likewise for \(f_Y\).
    Support checks and totals
      
        In words.
        
          - Probabilities must be nonnegative and the table/region must sum/integrate to 1.
 
      \(\sum_{x}\sum_{y} f_{X,Y}(x,y)=1\) (discrete) or \(\int\!\int f_{X,Y}(x,y)\,dx\,dy=1\) (continuous).
    Session 5.2 — Conditional Distributions
    Conditional pmf/pdf
      
        In words.
        
          - Update your distribution for \(Y\) after learning a value of \(X\).
 
      \(f_{Y|X}(y|x)=\dfrac{f_{X,Y}(x,y)}{f_X(x)}\) when \(f_X(x)>0\). (Replace sums/integrals appropriately.)
    Conditional expectation
      
        In words.
        
          - The average of \(Y\) among cases with the same \(X=x\).
 
      \(E[Y\mid X=x]=\sum_y y\,f_{Y|X}(y|x)\) (discrete) or \(\int y\,f_{Y|X}(y|x)\,dy\) (continuous).
    Session 5.3 — Independence
    Definition of independence
      
        In words.
        
          - Knowing \(X\) tells you nothing about \(Y\); the joint behavior factors into separate pieces.
 
      \(X\perp Y\) iff \(f_{X,Y}(x,y)=f_X(x)\,f_Y(y)\) for all \((x,y)\) on the support.
    Product/conditioning tests for tables
      
        In words.
        
          - In a contingency table, compare a cell to the product of its row and column marginals.
 
      Check \(f(x_0,y_0)\overset{?}{=} f_X(x_0)f_Y(y_0)\). Equivalently, \(f_{Y|X}(y_0|x_0)\overset{?}{=} f_Y(y_0).\)
    Session 5.4 — Covariance & Correlation
    Covariance: what the sign means
      
        In words.
        
          - Positive: they tend to be high together and low together.
- Negative: when one is high, the other tends to be low.
 
      \(\operatorname{Cov}(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]\). Sample: \(s_{XY}=\frac{\sum (x_i-\bar x)(y_i-\bar y)}{n-1}.\)
    Correlation: scale‑free measure
      
        In words.
        
          - Covariance divided by both standard deviations; lives in [−1, 1].
- Unaffected by changing measurement units.
 
      \(\rho=\dfrac{\operatorname{Cov}(X,Y)}{\sigma_X\sigma_Y}\in[-1,1]\). Sample: \(r=\dfrac{s_{XY}}{s_X s_Y}.\)
    When correlation is undefined / zero ≠ independence
      
        In words.
        
          - Undefined if either variable has zero variance (no spread).
- Zero correlation can still have a curved relationship; independence is stronger (except for special cases like bivariate Normal).
 
      If \(\sigma_X=0\) or \(\sigma_Y=0\), \(\rho\) undefined. In general, \(\rho=0\nRightarrow X\perp Y\) (unless jointly Normal).
    Session 6.x — Descriptive & Graphics
    Sample vs population variance
      
        In words.
        
          - Sample variance uses \(n-1\) in the denominator (unbiased for \(\sigma^2\)).
- Population variance uses \(n\) (or integrates for a model).
 
      Sample: \(s^2=\tfrac{\sum (x_i-\bar x)^2}{n-1}\). Population: \(\sigma^2=E[(X-\mu)^2].\)
    Stem‑and‑leaf plot
      
        In words.
        
          - Shows shape while preserving exact values — great for small datasets.
 
      No formula; split each value into a stem (leading digits) and a leaf (last digit).
    Histogram and bin width choice
      
        In words.
        
          - Bars show counts in intervals; too‑wide bins hide detail; too‑narrow bins add noise.
 
      Rules of thumb: Sturges, Freedman–Diaconis (\(\text{bin width}=2\,\text{IQR}\,n^{-1/3}\)).
    Box plot and the 1.5·IQR rule
      
        In words.
        
          - Box = middle 50% (Q1 to Q3) with a line at the median; whiskers reach typical values.
- Points beyond 1.5·IQR from Q1 or Q3 are flagged as potential outliers.
 
      IQR = Q3 − Q1. Outlier cutoffs: \([\text{Q1}-1.5\,\text{IQR},\;\text{Q3}+1.5\,\text{IQR}]\).
    Scatter plot interpretation checklist
      
        In words.
        
          - Look for form (linear/curved), direction (\(+/-\)), strength, clusters, and outliers.
 
      No single formula; correlation \(r\) summarizes linear strength but can miss curves/outliers.
    Normal Q–Q plot (what to look for)
      
        In words.
        
          - Points near the line ⇒ roughly Normal.
- Systematic S‑shape ⇒ skew; bowed ends ⇒ heavy or light tails.
 
      Plot order statistics \(x_{(i)}\) against Normal quantiles \(z_{(i)}\). Deviations from the line diagnose shape departures.