📘 Chapter 10 — Inference for Two Samples (Summary • Excel-Ready)

📂 PPT Slides

🎥 Video: One-Sample vs Two-Sample t-Tests

Optional review: this YouTube video walks through the difference between one-sample and two-sample t-tests, step-by-step with examples. Watch this if you want a big-picture refresher before diving into the Excel formulas.

Overview

We compare two populations via differences in means (independent vs paired) and in variances. Choose a test by design (paired or not) and by variance assumption (equal vs unequal). Prefer Welch’s \(t\) when in doubt about equality of variances.

Independent means: Welch’s \(t\) (default), Pooled \(t\) (when variances can be assumed equal).
Paired means: one-sample \(t\) on the differences.
Variances: F test (sensitive to non-normality; use with care).

Quick “Which Test?”

Same units measured twice?
Yes → Paired \(t\)
No → see next box.

Two separate groups?
Yes → Independent means.
Unsure about equal variances? → Use Welch.

Comparing variability?
Yes → F test for \(\sigma_1^2 = \sigma_2^2\) (check normality first).

Rule of thumb: When sample sizes differ or variability looks different, pick Welch. If boxplots look similar and \(n_1\approx n_2\), pooled is acceptable.

Assumptions (checklist)

Independence within and between groups (design/sampling).
Approx. normal population or moderate \(n\) (robustness improves with \(n\); avoid extreme outliers).
Pooled \(t\) only: population variances equal (visual check + similar \(s\)’s and \(n\)’s).
F test: both groups ~ normal; highly sensitive to non-normality.

If assumptions are shaky (heavy skew/outliers), consider nonparametric alternatives: Mann–Whitney (independent medians) or Wilcoxon signed-rank (paired).

Two Independent Means

Welch’s t-test (variances not assumed equal) — Recommended default

Test \(H_0:\,\mu_1-\mu_2=\Delta_0\) (often \(\Delta_0=0\)).

\[ t = \frac{(\bar X_1-\bar X_2)-\Delta_0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}},\qquad \nu \approx \frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}} \]

CI: \((\bar X_1-\bar X_2)\pm t_{\alpha/2,\nu}\cdot \sqrt{\tfrac{s_1^2}{n_1}+\tfrac{s_2^2}{n_2}}\).

Excel mirrors (Welch)

Assume data in A2:A? (Group 1) and B2:B? (Group 2).

Basic summaries

N1: COUNTA(A2:A999)
Xbar1: AVERAGE(A2:A999)
S1: STDEV.S(A2:A999)
N2: COUNTA(B2:B999)
Xbar2: AVERAGE(B2:B999)
S2: STDEV.S(B2:B999)

Test & CI (Welch)

t_stat: ((Xbar1-Xbar2)-Delta0)/SQRT(S1^2/N1+S2^2/N2)

df_welch: ((S1^2/N1+S2^2/N2)^2)/((S1^2/N1)^2/(N1-1)+(S2^2/N2)^2/(N2-1))

p_value (two-sided): T.TEST(A2:A999,B2:B999,2,3)

CI_low: (Xbar1-Xbar2) - t*SQRT(S1^2/N1+S2^2/N2)

CI_high: (Xbar1-Xbar2) + t*SQRT(S1^2/N1+S2^2/N2)

Buttons work for the nearest code block.

Pooled t-test (equal variances assumed)

\[ s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2},\quad t=\frac{(\bar X_1-\bar X_2)-\Delta_0}{\sqrt{s_p^2\!\left(\frac1{n_1}+\frac1{n_2}\right)}},\quad \text{df}=n_1+n_2-2 \]

CI: \((\bar X_1-\bar X_2)\pm t_{\alpha/2,n_1+n_2-2}\cdot \sqrt{s_p^2\!\left(\tfrac1{n_1}+\tfrac1{n_2}\right)}\).

Excel mirrors (Pooled)

sp2: ((N1-1)*S1^2+(N2-1)*S2^2)/(N1+N2-2)

SE_pooled: SQRT(sp2*(1/N1+1/N2))

t_stat: ((Xbar1-Xbar2)-Delta0)/SE_pooled

df: N1+N2-2

p_value (two-sided): T.TEST(A2:A999,B2:B999,2,2)

CI_low: (Xbar1-Xbar2) - t*SE_pooled

CI_high: (Xbar1-Xbar2) + t*SE_pooled

Excel shortcut: =T.TEST(array1,array2,tails,2) for pooled, 3 for Welch.

Paired Samples (Dependent Data)

Compute differences \(D_i = X_{1i}-X_{2i}\) and perform a one-sample \(t\) on \(\mu_D\).

\[ t=\frac{\bar D - \Delta_0}{s_D/\sqrt{n}},\qquad \text{df}=n-1,\qquad \text{CI: } \bar D \pm t_{\alpha/2,n-1}\cdot \frac{s_D}{\sqrt{n}} \]

Excel mirrors (Paired)

C2: A2-B2  → fill down (differences)

Dbar: AVERAGE(C2:C999)

SD_D: STDEV.S(C2:C999)

t_stat: (Dbar-Delta0)/(SD_D/SQRT(n))

p_value (two-sided): T.TEST(A2:A999,B2:B999,2,1)

CI_low: Dbar - t*SD_D/SQRT(n)

CI_high: Dbar + t*SD_D/SQRT(n)

Two Population Variances (F-test)

Let \(s_1^2\ge s_2^2\). Then \(F=\dfrac{s_1^2}{s_2^2}\) with df\(_1=n_1-1\), df\(_2=n_2-1\).

Use only with (approximately) normal data; it is not robust to skew/outliers.

Excel mirrors (F)

Var1: VAR.S(A2:A999)

Var2: VAR.S(B2:B999)

F: larger_variance / smaller_variance

p_one_sided: F.DIST.RT(F, df1, df2)

p_two_sided (symmetric): 2*min(…)

Effect Sizes & Reporting

Cohen’s \(d\) (independent)

\(d=\dfrac{\bar X_1-\bar X_2}{s_p}\), where \(s_p=\sqrt{s_p^2}\) from pooled formula above.

d_pooled: (Xbar1-Xbar2)/SQRT(sp2)

Paired \(d_z\)

\(d_z=\dfrac{\bar D}{s_D}\).

d_paired: Dbar/SD_D

Template: “Welch’s \(t\) test showed \(t(\nu)=\dots\), \(p=\dots\), 95% CI \([\dots,\dots]\). Effect size \(d=\dots\).”

Decision & Interpretation

p-value: if \(p\le \alpha\), reject \(H_0\). Otherwise, fail to reject.
CI view: if 0 is outside the CI for \(\mu_1-\mu_2\), it’s significant at \(\alpha\).
Practical vs statistical: also inspect effect size and context.

Common Excel Functions

=T.TEST(array1,array2, tails, type) — type: 1 paired, 2 pooled, 3 Welch.
=T.INV.2T(alpha, df) — two-sided critical t.
=CONFIDENCE.T(alpha, std_dev, size) — one-sample CI half-width.
=VAR.S, =STDEV.S — sample variance/std dev.
=F.DIST.RT(x,df1,df2) — right-tail F.

Common Pitfalls

Using pooled \(t\) when variances differ a lot (especially if \(n_1\ne n_2\)).
Treating paired data as independent (throws away dependence info).
Over-relying on the F test under skew/outliers.
Interpreting “fail to reject” as “prove equal” (it does not).

Setup Notes (Excel)

Place Group 1 in A2:A?, Group 2 in B2:B?. Define cells for N1, Xbar1, S1 and N2, Xbar2, S2 using the mirrors above. Optional null difference Delta0 (usually 0) and alpha (e.g., 0.05).