📘 Chapter 10 — Inference for Two Samples
Means (independent/paired), and variances. Formulas, assumptions, and Excel mirrors with copy-buttons.
📂 PPT Slides
🎥 Video: One-Sample vs Two-Sample t-Tests
Optional review: this YouTube video walks through the difference between one-sample and two-sample t-tests, step-by-step with examples. Watch this if you want a big-picture refresher before diving into the Excel formulas.
Overview
We compare two populations via differences in means (independent vs paired) and in variances. Choose a test by design (paired or not) and by variance assumption (equal vs unequal). Prefer Welch’s \(t\) when in doubt about equality of variances.
- Independent means: Welch’s \(t\) (default), Pooled \(t\) (when variances can be assumed equal).
- Paired means: one-sample \(t\) on the differences.
- Variances: F test (sensitive to non-normality; use with care).
Quick “Which Test?”
Yes → Paired \(t\)
No → see next box.
Yes → Independent means.
Unsure about equal variances? → Use Welch.
Yes → F test for \(\sigma_1^2 = \sigma_2^2\) (check normality first).
Rule of thumb: When sample sizes differ or variability looks different, pick Welch. If boxplots look similar and \(n_1\approx n_2\), pooled is acceptable.
Assumptions (checklist)
- Independence within and between groups (design/sampling).
- Approx. normal population or moderate \(n\) (robustness improves with \(n\); avoid extreme outliers).
- Pooled \(t\) only: population variances equal (visual check + similar \(s\)’s and \(n\)’s).
- F test: both groups ~ normal; highly sensitive to non-normality.
If assumptions are shaky (heavy skew/outliers), consider nonparametric alternatives: Mann–Whitney (independent medians) or Wilcoxon signed-rank (paired).
Two Independent Means
Welch’s t-test (variances not assumed equal) — Recommended default
Test \(H_0:\,\mu_1-\mu_2=\Delta_0\) (often \(\Delta_0=0\)).
\[ t = \frac{(\bar X_1-\bar X_2)-\Delta_0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}},\qquad \nu \approx \frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}} \]
CI: \((\bar X_1-\bar X_2)\pm t_{\alpha/2,\nu}\cdot \sqrt{\tfrac{s_1^2}{n_1}+\tfrac{s_2^2}{n_2}}\).
Excel mirrors (Welch)
Basic summaries
N1: COUNTA(A2:A999)
Xbar1: AVERAGE(A2:A999)
S1: STDEV.S(A2:A999)
N2: COUNTA(B2:B999)
Xbar2: AVERAGE(B2:B999)
S2: STDEV.S(B2:B999)
Test & CI (Welch)
t_stat: ((Xbar1-Xbar2)-Delta0)/SQRT(S1^2/N1+S2^2/N2)
df_welch: ((S1^2/N1+S2^2/N2)^2)/((S1^2/N1)^2/(N1-1)+(S2^2/N2)^2/(N2-1))
p_value (two-sided): T.TEST(A2:A999,B2:B999,2,3)
CI_low: (Xbar1-Xbar2) - t*SQRT(S1^2/N1+S2^2/N2)
CI_high: (Xbar1-Xbar2) + t*SQRT(S1^2/N1+S2^2/N2)
Pooled t-test (equal variances assumed)
\[ s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2},\quad t=\frac{(\bar X_1-\bar X_2)-\Delta_0}{\sqrt{s_p^2\!\left(\frac1{n_1}+\frac1{n_2}\right)}},\quad \text{df}=n_1+n_2-2 \]
CI: \((\bar X_1-\bar X_2)\pm t_{\alpha/2,n_1+n_2-2}\cdot \sqrt{s_p^2\!\left(\tfrac1{n_1}+\tfrac1{n_2}\right)}\).
Excel mirrors (Pooled)
sp2: ((N1-1)*S1^2+(N2-1)*S2^2)/(N1+N2-2)
SE_pooled: SQRT(sp2*(1/N1+1/N2))
t_stat: ((Xbar1-Xbar2)-Delta0)/SE_pooled
df: N1+N2-2
p_value (two-sided): T.TEST(A2:A999,B2:B999,2,2)
CI_low: (Xbar1-Xbar2) - t*SE_pooled
CI_high: (Xbar1-Xbar2) + t*SE_pooled
Excel shortcut: =T.TEST(array1,array2,tails,2) for pooled, 3 for Welch.
Paired Samples (Dependent Data)
Compute differences \(D_i = X_{1i}-X_{2i}\) and perform a one-sample \(t\) on \(\mu_D\).
\[ t=\frac{\bar D - \Delta_0}{s_D/\sqrt{n}},\qquad \text{df}=n-1,\qquad \text{CI: } \bar D \pm t_{\alpha/2,n-1}\cdot \frac{s_D}{\sqrt{n}} \]
Excel mirrors (Paired)
C2: A2-B2 → fill down (differences)
Dbar: AVERAGE(C2:C999)
SD_D: STDEV.S(C2:C999)
t_stat: (Dbar-Delta0)/(SD_D/SQRT(n))
p_value (two-sided): T.TEST(A2:A999,B2:B999,2,1)
CI_low: Dbar - t*SD_D/SQRT(n)
CI_high: Dbar + t*SD_D/SQRT(n)
Two Population Variances (F-test)
Let \(s_1^2\ge s_2^2\). Then \(F=\dfrac{s_1^2}{s_2^2}\) with df\(_1=n_1-1\), df\(_2=n_2-1\).
Use only with (approximately) normal data; it is not robust to skew/outliers.
Excel mirrors (F)
Var1: VAR.S(A2:A999)
Var2: VAR.S(B2:B999)
F: larger_variance / smaller_variance
p_one_sided: F.DIST.RT(F, df1, df2)
p_two_sided (symmetric): 2*min(…)
Effect Sizes & Reporting
Cohen’s \(d\) (independent)
\(d=\dfrac{\bar X_1-\bar X_2}{s_p}\), where \(s_p=\sqrt{s_p^2}\) from pooled formula above.
d_pooled: (Xbar1-Xbar2)/SQRT(sp2)
Paired \(d_z\)
\(d_z=\dfrac{\bar D}{s_D}\).
d_paired: Dbar/SD_D
Template: “Welch’s \(t\) test showed \(t(\nu)=\dots\), \(p=\dots\), 95% CI \([\dots,\dots]\). Effect size \(d=\dots\).”
Decision & Interpretation
- p-value: if \(p\le \alpha\), reject \(H_0\). Otherwise, fail to reject.
- CI view: if 0 is outside the CI for \(\mu_1-\mu_2\), it’s significant at \(\alpha\).
- Practical vs statistical: also inspect effect size and context.
Common Excel Functions
=T.TEST(array1,array2, tails, type)— type: 1 paired, 2 pooled, 3 Welch.=T.INV.2T(alpha, df)— two-sided critical t.=CONFIDENCE.T(alpha, std_dev, size)— one-sample CI half-width.=VAR.S, =STDEV.S— sample variance/std dev.=F.DIST.RT(x,df1,df2)— right-tail F.
Common Pitfalls
- Using pooled \(t\) when variances differ a lot (especially if \(n_1\ne n_2\)).
- Treating paired data as independent (throws away dependence info).
- Over-relying on the F test under skew/outliers.
- Interpreting “fail to reject” as “prove equal” (it does not).
Setup Notes (Excel)
Place Group 1 in A2:A?, Group 2 in B2:B?. Define cells for N1, Xbar1, S1 and N2, Xbar2, S2 using the mirrors above. Optional null difference Delta0 (usually 0) and alpha (e.g., 0.05).