8.2.1 — What is the t-Statistic? (Compare with z-Statistic)
Plain-English overview: definitions, when to use t vs z, degrees of freedom, reading a t-table, Excel quick refs, history, and common pitfalls. (No calculators here—see the CI pages for computation.)
What is a “statistic” and why t?
To test a claim about a mean or build a confidence interval (CI), we standardize “distance” using a statistic. If the population SD \( \sigma \) is known, we use the z-statistic. In real life, \( \sigma \) is almost never known, so we estimate it with the sample SD \( s \) and use the t-statistic, which has heavier tails to honestly reflect extra uncertainty from using \( s \).
\[ z \;=\; \frac{\bar x - \mu_0}{\sigma/\sqrt{n}},\quad z\sim\mathcal{N}(0,1). \] Used for z-tests and z-intervals when \( \sigma \) is truly known (rare) or for planning.
\[ t \;=\; \frac{\bar x - \mu_0}{s/\sqrt{n}},\quad t\sim t_{\nu},\ \nu=n-1. \] Default in practice. For small \( n \), critical values are larger than z; as \( n\to\infty \), \( t\to z \).
When to use t vs z
- Use t for one-sample mean tests & CIs when \( \sigma \) is unknown (typical). Assumptions: independence; for very small \( n \), the data should be roughly Normal or use robust/bootstrap methods.
- Use z only if \( \sigma \) is known from an external, reliable source (calibrated instrument spec, established population parameter) or when doing first-pass sample-size planning.
- With large \( n \), t and z become practically the same, so using t is safe and standard.
Degrees of freedom & reading a t-table
Degrees of freedom (df) for a one-sample t is \( \nu = n-1 \). Example: \( n=12 \Rightarrow \nu=11 \).
- Pick confidence \(100(1-\alpha)\%\) (e.g., 95% ⇒ \( \alpha=0.05 \)).
- Compute df: \( \nu = n-1 \).
- Open a two-tailed t-table: find the row for df \(=\nu\) and the column for \( \alpha \) (e.g., 0.05).
- The entry is \( t_{\nu,\alpha/2} \) (the positive cutoff). For 95% and \( \nu=11 \), \( t \approx 2.201 \).
| df (ν) | t | df (ν) | t | 
|---|---|---|---|
| 5 | 2.571 | 10 | 2.228 | 
| 15 | 2.131 | 20 | 2.086 | 
| 30 | 2.042 | 60 | 2.000 | 
| 120 | 1.980 | ∞ (z) | 1.960 | 
Where does the t-statistic show up?
- One-sample mean: t-test / t-interval for \( \mu \) (unknown \( \sigma \)).
- Two-sample mean: Welch’s t-test (unequal variances) or pooled-variance t (equal-variance assumption).
- Regression: each coefficient’s “t-stat” for testing \( H_0:\beta_j=0 \) and building CI for \( \beta_j \).
- Small-sample studies: lab/engineering settings with limited runs; t handles extra uncertainty better than z.
Excel — quick references (no calculation here)
- df: =B4-1 if B4 has \( n \)
- t-critical (two-tailed α): =T.INV.2T(ALPHA, df) or =T.INV(1-ALPHA/2, df)
- z-critical: =NORM.S.INV(1-ALPHA/2)
- t p-value (two-sided): =T.DIST.2T(ABS(t), df)
- z p-value (two-sided): =2*(1-NORM.S.DIST(ABS(z), TRUE))
History (why it’s called “Student’s t”)
The t distribution and t-tests were introduced by William Sealy Gosset, a statistician at the Guinness Brewery. Company policy limited publishing, so he wrote under the pseudonym “Student” (Biometrika, 1908). Hence “Student’s t.” The key idea: when you swap the unknown \( \sigma \) for the sample SD \( s \), the test statistic’s distribution changes to a family indexed by df, with heavier tails for small samples.
Common pitfalls
- Using z with \( s \) (unknown \( \sigma \)) → intervals too narrow / tests too liberal at small \( n \).
- Mixing up one-sided vs two-sided t critical values when reading the table.
- Thinking a 95% CI contains 95% of individuals (it’s about the mean, not individual outcomes).
- Ignoring skew/outliers at tiny \( n \); check plots and consider robust/bootstrap CIs.