8.2.1 — What is the t-Statistic? (Compare with z-Statistic)

Plain-English overview: definitions, when to use t vs z, degrees of freedom, reading a t-table, Excel quick refs, history, and common pitfalls. (No calculators here—see the CI pages for computation.)

What is a “statistic” and why t?

To test a claim about a mean or build a confidence interval (CI), we standardize “distance” using a statistic. If the population SD \( \sigma \) is known, we use the z-statistic. In real life, \( \sigma \) is almost never known, so we estimate it with the sample SD \( s \) and use the t-statistic, which has heavier tails to honestly reflect extra uncertainty from using \( s \).

z-statistic (σ known)
\[ z \;=\; \frac{\bar x - \mu_0}{\sigma/\sqrt{n}},\quad z\sim\mathcal{N}(0,1). \] Used for z-tests and z-intervals when \( \sigma \) is truly known (rare) or for planning.

t-statistic (σ unknown)
\[ t \;=\; \frac{\bar x - \mu_0}{s/\sqrt{n}},\quad t\sim t_{\nu},\ \nu=n-1. \] Default in practice. For small \( n \), critical values are larger than z; as \( n\to\infty \), \( t\to z \).

Bottom line: If you don’t truly know \( \sigma \), use t. That’s almost always.

When to use t vs z

Use t for one-sample mean tests & CIs when \( \sigma \) is unknown (typical). Assumptions: independence; for very small \( n \), the data should be roughly Normal or use robust/bootstrap methods.
Use z only if \( \sigma \) is known from an external, reliable source (calibrated instrument spec, established population parameter) or when doing first-pass sample-size planning.
With large \( n \), t and z become practically the same, so using t is safe and standard.

Degrees of freedom & reading a t-table

Degrees of freedom (df) for a one-sample t is \( \nu = n-1 \). Example: \( n=12 \Rightarrow \nu=11 \).

Pick confidence \(100(1-\alpha)\%\) (e.g., 95% ⇒ \( \alpha=0.05 \)).
Compute df: \( \nu = n-1 \).
Open a two-tailed t-table: find the row for df \(=\nu\) and the column for \( \alpha \) (e.g., 0.05).
The entry is \( t_{\nu,\alpha/2} \) (the positive cutoff). For 95% and \( \nu=11 \), \( t \approx 2.201 \).

One-sided? Use \( t_{\nu,\alpha} \) (not \( \alpha/2 \)). Make sure the table/column matches one- vs two-tailed.

Selected 95% two-tailed critical values \( t_{\nu,0.025} \).
df (ν)	t	df (ν)	t
5	2.571	10	2.228
15	2.131	20	2.086
30	2.042	60	2.000
120	1.980	∞ (z)	1.960

Where does the t-statistic show up?

One-sample mean: t-test / t-interval for \( \mu \) (unknown \( \sigma \)).
Two-sample mean: Welch’s t-test (unequal variances) or pooled-variance t (equal-variance assumption).
Regression: each coefficient’s “t-stat” for testing \( H_0:\beta_j=0 \) and building CI for \( \beta_j \).
Small-sample studies: lab/engineering settings with limited runs; t handles extra uncertainty better than z.

Interpretation tip (CI): A 95% t-CI is the set of plausible values for the fixed \( \mu \) that the method captures about 95% of the time in repeated sampling.

🎥 Video — Introduction to the t Distribution (non-technical)

Gentle intro to why the t distribution has heavier tails than Normal, how df controls the shape, and why it appears when σ is unknown (converges to Normal as n grows).

Can’t play video? Open on YouTube.

Excel — quick references (no calculation here)

df: =B4-1 if B4 has \( n \)
t-critical (two-tailed α): =T.INV.2T(ALPHA, df) or =T.INV(1-ALPHA/2, df)
z-critical: =NORM.S.INV(1-ALPHA/2)
t p-value (two-sided): =T.DIST.2T(ABS(t), df)
z p-value (two-sided): =2*(1-NORM.S.DIST(ABS(z), TRUE))

For full calculators, see the CI pages linked at the top.

History (why it’s called “Student’s t”)

The t distribution and t-tests were introduced by William Sealy Gosset, a statistician at the Guinness Brewery. Company policy limited publishing, so he wrote under the pseudonym “Student” (Biometrika, 1908). Hence “Student’s t.” The key idea: when you swap the unknown \( \sigma \) for the sample SD \( s \), the test statistic’s distribution changes to a family indexed by df, with heavier tails for small samples.

Common pitfalls

Using z with \( s \) (unknown \( \sigma \)) → intervals too narrow / tests too liberal at small \( n \).
Mixing up one-sided vs two-sided t critical values when reading the table.
Thinking a 95% CI contains 95% of individuals (it’s about the mean, not individual outcomes).
Ignoring skew/outliers at tiny \( n \); check plots and consider robust/bootstrap CIs.

If you truly know \( \sigma \) from a trusted source, z is fine. Otherwise—use t.