Second Midterm — Study Guide Calculation + Conceptual

1) Practice — Normal Distribution (Protein Bar Net Weight)

Assume weights are Normal with mean \( \mu \) grams and sd \( \sigma \) grams.

  1. Proportion below a.
  2. Proportion above b.
  3. Proportion between a and b.
  4. 90th percentile \(x_{0.90}\).
  5. Central 80% interval \([x_{0.10}, x_{0.90}]\).
Show Solution
Excel: (a): =NORM.DIST(a, μ, σ, TRUE), (b): =1-NORM.DIST(b, μ, σ, TRUE), (c): =NORM.DIST(b, μ, σ, TRUE)-NORM.DIST(a, μ, σ, TRUE), (d): =NORM.INV(0.90, μ, σ); (e): central 80%: =NORM.INV(0.10, μ, σ) to =NORM.INV(0.90, μ, σ).

2) Practice — Lotte Market–style Table (Time × Money)

Editable table. Base counts sum to n = 23. Same structure as in-class & homework.

Y \\ XT1 <5T2 5–15T3 15–30T4 >30Row Sum
Col Sum23
Rounding:

Questions

  1. \(f(X=T3, Y=D)\)
  2. \(f_X(T3)\)
  3. \(P(Y=E \mid X=T4)\)
  4. \(P(X\ge 15)\) (T3 or T4)
  5. \(P(Y\ge 30)\) (D or E)
  6. Independence at (T4,E): compare \(f(T4,E)\) vs \(f_X(T4)\,f_Y(E)\)

Answers

3) Practice — Two Variables (Mean, SD, Cov, Corr) — 10 Pairs

Edit any cell. Example shows a mild negative association (X = study hr, Y = sleep hr).

Day12345678910
X 2.53.03.54.04.5 5.05.56.06.57.0
Y 8.28.07.77.47.0 6.86.56.25.95.6
Rounding:
Show Formulas & Results
Excel: =AVERAGE(range), =STDEV.S(range), =COVARIANCE.S(rangeX, rangeY), =CORREL(rangeX, rangeY).

4) Conceptual — “What is…?” (35 items, aligned to Sessions 4.x–6.x)

🙂
Checklist: Each item has (1) In words — what to say on an exam, and (2) In math — the formula you write. Read the words first, then confirm with the math.

Session 4.1 — PDF vs PMF

What is a probability density function (pdf)?
In words.
  • A pdf describes how likely continuous values are around each point; probabilities come from area under the curve.
  • It can be above 1 at a point; only the total area must be 1.
In math. Nonnegative \(f(x)\) with \(\int_{-\infty}^{\infty} f(x)\,dx=1\); \(P(a\le X\le b)=\int_a^b f(x)\,dx.\)
What is a probability mass function (pmf)?
In words.
  • A pmf lists probabilities for each possible discrete outcome.
  • All listed probabilities are between 0 and 1 and they sum to 1.
In math. Values \(p(x_i)=P(X=x_i)\) with \(\sum_i p(x_i)=1\); for sets, \(P(A)=\sum_{x_i\in A}p(x_i).\)
What is the key difference between pdf and pmf?
In words.
  • Continuous \(\Rightarrow\) use a pdf and areas.
  • Discrete \(\Rightarrow\) use a pmf and sums.
  • For continuous \(X\), \(P(X=a)=0\) even though \(f(a)\) may be positive.
In math. Continuous: \(P(a\le X\le b)=\int_a^b f(x)\,dx\). Discrete: \(P(a\le X\le b)=\sum_{x_i\in[a,b]} p(x_i).\)

Session 4.2 — CDF

What is a cumulative distribution function (CDF)?
In words.
  • The CDF at \(x\) is the chance that the variable is \(\le x\).
  • It grows from 0 to 1 as \(x\) moves left to right; it never goes down.
\(F(x)=P(X\le x)\); limits: \(\lim_{x\to-\infty}F(x)=0\), \(\lim_{x\to+\infty}F(x)=1\), and \(F\) is nondecreasing.
How are pdf and CDF related?
In words.
  • The CDF is the accumulated probability; the pdf is its slope.
  • To get an interval probability, subtract CDF values.
For continuous \(X\): \(F'(x)=f(x)\). Also \(P(a\le X\le b)=F(b)-F(a).\)
Tail probabilities with a CDF
In words.
  • Upper tail: 1 minus the CDF. Between a and b: difference of two CDFs.
  • This is the same logic you use with \(\Phi\) for Normal.
\(P(X>b)=1-F(b)\), \(P(a

Session 4.3 — Mean & Variance (Continuous)

Expected value (mean) for a continuous variable
In words.
  • The mean is the long‑run balance point of the distribution.
  • Weight each value \(x\) by how likely it is (its density) and integrate.
\(E[X]=\int_{-\infty}^{\infty} x\,f(x)\,dx.\)
Variance and the shortcut formula
In words.
  • Variance measures typical squared distance from the mean.
  • Shortcut: compute \(E[X^2]\) and subtract \(\mu^2\).
\(\operatorname{Var}(X)=E[(X-\mu)^2]=\int (x-\mu)^2f(x)dx=E[X^2]-\mu^2.\)

Session 4.4 — Uniform(a,b)

Definition of Uniform(a,b)
In words.
  • All values between \(a\) and \(b\) are equally likely; nothing outside.
\(f(x)=\frac{1}{b-a}\) for \(a\le x\le b\); 0 otherwise.
Mean and variance for Uniform(a,b)
In words.
  • The mean is the midpoint; spread grows with the length of the interval.
\(E[X]=\tfrac{a+b}{2}\), \(\operatorname{Var}(X)=\tfrac{(b-a)^2}{12}.\)

Session 4.5 — Normal & z‑Scores

What is a z‑score (standardization)?
In words.
  • It tells you how many standard deviations a value is from the mean.
  • Use it to convert any Normal(\(\mu,\sigma\)) to Standard Normal.
\(z=\dfrac{x-\mu}{\sigma}\). If \(X\sim N(\mu,\sigma^2)\), then \(Z=\frac{X-\mu}{\sigma}\sim N(0,1).\)
Using \(\Phi\) for Normal probabilities
In words.
  • Convert to \(Z\), then use \(\Phi\) (or Excel NORM.DIST). Between two values? subtract two CDFs.
\(P(a\le X\le b)=\Phi(\tfrac{b-\mu}{\sigma})-\Phi(\tfrac{a-\mu}{\sigma}).\)
Empirical 68–95–99.7% rule
In words.
  • About 68% within 1 sd of the mean, 95% within 2 sd, 99.7% within 3 sd.
\(P(|X-\mu|\le k\sigma)\approx 0.68,0.95,0.997\) for \(k=1,2,3\) when \(X\) is roughly Normal.
Quantiles / percentiles for Normal
In words.
  • Find the \(z\) with area \(p\) to the left, then un‑standardize back to \(x\).
\(x_p=\mu+\sigma z_p\), where \(z_p=\Phi^{-1}(p)\) (Excel: NORM.INV(p,μ,σ)).

Session 4.7 — Exponential(λ)

Definition and key summaries of Exponential(λ)
In words.
  • Models waiting times between random arrivals; only supports nonnegative times.
  • One parameter: the rate \(\lambda\) per time unit.
\(f(x)=\lambda e^{-\lambda x}\) for \(x\ge 0\); \(E[X]=1/\lambda\), \(\operatorname{Var}(X)=1/\lambda^2\).
Memoryless property
In words.
  • Past waiting doesn’t change future waiting. “No aging.”
\(P(X>s+t\mid X>s)=P(X>t)\). Among continuous distributions, only Exponential has this.

Session 4.10 — Lognormal

What is a lognormal variable?
In words.
  • A positive, right‑skewed variable whose log is Normal.
  • Common for income, prices, failure times when multiplicative effects matter.
If \(\ln Y\sim N(\mu,\sigma^2)\), then \(Y\) is lognormal.
Key summaries for lognormal
In words.
  • Median equals the geometric mean; mean is larger because of right tail.
\(\operatorname{median}(Y)=e^{\mu}\), \(E[Y]=e^{\mu+\sigma^2/2}\), \(\operatorname{mode}=e^{\mu-\sigma^2}.\)
Why log‑transform skewed data?
In words.
  • Taking logs reduces right‑skew, making distributions more symmetric and easier to model.
If \(Y\) is lognormal, then \(\ln Y\sim N(\mu,\sigma^2)\) (approximately symmetric).

Session 5.1 — Joint Distributions

Joint pmf/pdf
In words.
  • Describes the probability for pairs \((X,Y)\), not each alone.
\(f_{X,Y}(x,y)\ge 0\), with total probability 1 (sum or integral over support).
Marginal distributions
In words.
  • What each variable does by itself — totals across the other variable.
Discrete: \(f_X(x)=\sum_y f_{X,Y}(x,y)\). Continuous: \(f_X(x)=\int f_{X,Y}(x,y)\,dy\). Likewise for \(f_Y\).
Support checks and totals
In words.
  • Probabilities must be nonnegative and the table/region must sum/integrate to 1.
\(\sum_{x}\sum_{y} f_{X,Y}(x,y)=1\) (discrete) or \(\int\!\int f_{X,Y}(x,y)\,dx\,dy=1\) (continuous).

Session 5.2 — Conditional Distributions

Conditional pmf/pdf
In words.
  • Update your distribution for \(Y\) after learning a value of \(X\).
\(f_{Y|X}(y|x)=\dfrac{f_{X,Y}(x,y)}{f_X(x)}\) when \(f_X(x)>0\). (Replace sums/integrals appropriately.)
Conditional expectation
In words.
  • The average of \(Y\) among cases with the same \(X=x\).
\(E[Y\mid X=x]=\sum_y y\,f_{Y|X}(y|x)\) (discrete) or \(\int y\,f_{Y|X}(y|x)\,dy\) (continuous).

Session 5.3 — Independence

Definition of independence
In words.
  • Knowing \(X\) tells you nothing about \(Y\); the joint behavior factors into separate pieces.
\(X\perp Y\) iff \(f_{X,Y}(x,y)=f_X(x)\,f_Y(y)\) for all \((x,y)\) on the support.
Product/conditioning tests for tables
In words.
  • In a contingency table, compare a cell to the product of its row and column marginals.
Check \(f(x_0,y_0)\overset{?}{=} f_X(x_0)f_Y(y_0)\). Equivalently, \(f_{Y|X}(y_0|x_0)\overset{?}{=} f_Y(y_0).\)

Session 5.4 — Covariance & Correlation

Covariance: what the sign means
In words.
  • Positive: they tend to be high together and low together.
  • Negative: when one is high, the other tends to be low.
\(\operatorname{Cov}(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]\). Sample: \(s_{XY}=\frac{\sum (x_i-\bar x)(y_i-\bar y)}{n-1}.\)
Correlation: scale‑free measure
In words.
  • Covariance divided by both standard deviations; lives in [−1, 1].
  • Unaffected by changing measurement units.
\(\rho=\dfrac{\operatorname{Cov}(X,Y)}{\sigma_X\sigma_Y}\in[-1,1]\). Sample: \(r=\dfrac{s_{XY}}{s_X s_Y}.\)
When correlation is undefined / zero ≠ independence
In words.
  • Undefined if either variable has zero variance (no spread).
  • Zero correlation can still have a curved relationship; independence is stronger (except for special cases like bivariate Normal).
If \(\sigma_X=0\) or \(\sigma_Y=0\), \(\rho\) undefined. In general, \(\rho=0\nRightarrow X\perp Y\) (unless jointly Normal).

Session 6.x — Descriptive & Graphics

Sample vs population variance
In words.
  • Sample variance uses \(n-1\) in the denominator (unbiased for \(\sigma^2\)).
  • Population variance uses \(n\) (or integrates for a model).
Sample: \(s^2=\tfrac{\sum (x_i-\bar x)^2}{n-1}\). Population: \(\sigma^2=E[(X-\mu)^2].\)
Stem‑and‑leaf plot
In words.
  • Shows shape while preserving exact values — great for small datasets.
No formula; split each value into a stem (leading digits) and a leaf (last digit).
Histogram and bin width choice
In words.
  • Bars show counts in intervals; too‑wide bins hide detail; too‑narrow bins add noise.
Rules of thumb: Sturges, Freedman–Diaconis (\(\text{bin width}=2\,\text{IQR}\,n^{-1/3}\)).
Box plot and the 1.5·IQR rule
In words.
  • Box = middle 50% (Q1 to Q3) with a line at the median; whiskers reach typical values.
  • Points beyond 1.5·IQR from Q1 or Q3 are flagged as potential outliers.
IQR = Q3 − Q1. Outlier cutoffs: \([\text{Q1}-1.5\,\text{IQR},\;\text{Q3}+1.5\,\text{IQR}]\).
Scatter plot interpretation checklist
In words.
  • Look for form (linear/curved), direction (\(+/-\)), strength, clusters, and outliers.
No single formula; correlation \(r\) summarizes linear strength but can miss curves/outliers.
Normal Q–Q plot (what to look for)
In words.
  • Points near the line ⇒ roughly Normal.
  • Systematic S‑shape ⇒ skew; bowed ends ⇒ heavy or light tails.
Plot order statistics \(x_{(i)}\) against Normal quantiles \(z_{(i)}\). Deviations from the line diagnose shape departures.

End of guide