📘 Chapter 6 Review – Descriptive Statistics

🔎 At a Glance: What / Why / How

Center

Mean — average; sensitive to outliers.
Median — 50th percentile; robust.
Mode — most frequent (can be multiple).

Spread

Range: max − min
IQR: Q3 − Q1 (robust)
SD/Variance: average squared deviation
CV: SD / mean (unitless)

Shape & Outliers

Symmetric vs right/left-skewed
Outliers: < Q1 − 1.5·IQR or > Q3 + 1.5·IQR
Normality check: Normal Q–Q plot

Before modeling/regression: scan histograms/box plots, run a scatter of Y vs X, check normality and possible transforms.

🧮 Core Formulas (Chapter 6)

Mean / Median / Range

Sample mean: $ \displaystyle \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i $
Median: middle of sorted data (average two middles if even $n$).
Range: $ \max(x)-\min(x) $

Variance & Standard Deviation

Sample variance: $ \displaystyle s^2=\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2 $
Sample SD: $ s=\sqrt{s^2} $

Use $n-1$ degrees of freedom for samples.

Quartiles, IQR, Outliers

IQR: $ \mathrm{IQR}=Q_3-Q_1 $
Outlier rule: values $< Q_1 - 1.5\,\mathrm{IQR}$ or $> Q_3 + 1.5\,\mathrm{IQR}$

z-Score & Empirical Rule

$ z=\dfrac{x-\bar{x}}{s} $
For roughly normal data: ~68% within 1 SD, ~95% within 2 SD, ~99.7% within 3 SD.

Histogram Bin Rules

Square-root: $ k\approx \sqrt{n} $
Sturges: $ k\approx 1+\log_2 n $
Freedman–Diaconis: bin width $ w=2\,\mathrm{IQR}\,n^{-1/3} $, then $ k\approx \dfrac{\max-\min}{w} $

$k$ = number of bins, $w$ = bin width.

Correlation (preview)

$ r\in[-1,1] $ measures linear strength & direction.
Always look at the scatter—groups/nonlinearity can fool $r$.

📊 Graphics — What/Why/How to Read

Histogram

Shape Skew, multimodality; compare bin rules.

Tip Start with √n, then try FD for fine structure.

Box Plot

Compare Median shift (center) & IQR (spread).

Outliers 1.5·IQR rule for flags.

Dot / Stem-and-Leaf

Great for small n; preserves individual values and shows shape.

Scatter Plot

Y vs X trend, form, outliers. Don’t connect dots unless X is time.

Time (Run) Chart

Detect shifts/drift/cycles. Use before relying on a histogram.

Normal Q–Q Plot

≈ straight line → plausible normal. S-curve/ends off → skew/heavy tails.

💬 Excel Quick Reference

Assume your data are in B2:B101. Replace ranges as needed.

Task	Excel
Mean	`=AVERAGE(B2:B101)`
Median	`=MEDIAN(B2:B101)`
SD (sample)	`=STDEV.S(B2:B101)`
Min / Max	`=MIN(B2:B101)` / `=MAX(B2:B101)`
Q1 / Q3 (EXC)	`=QUARTILE.EXC(B2:B101,1)` / `=QUARTILE.EXC(...,3)`
IQR	`=QUARTILE.EXC(...,3)-QUARTILE.EXC(...,1)`
z-score for B2	`=(B2-$F$2)/$F$3` (F2=mean, F3=SD)
Correlation (X,Y)	`=CORREL(C2:C101, D2:D101)`

Chart	Steps
Histogram	Insert → Statistic Chart → Histogram; try different bin widths.
Box & Whisker	Insert → Box & Whisker; shows median/IQR/outliers.
Scatter	Insert → Scatter (no lines); Y on vertical, X on horizontal.
Q–Q (manual)	Sort, compute theoretical normal quantiles via `=NORM.S.INV(P)`, then scatter.

🧪 Paste-n-Compute (Quick Stats)

Paste numbers (one per line or comma-separated). We’ll compute n, mean, median, SD, min, max, Q1, Q3, IQR, and $ |z|>2 $ count. Data stay in your browser.

n	—
Mean	—
Median	—
SD (s)	—
Min / Max	—
Q1 / Q3	—
IQR	—
\|z\| > 2 (count)	—
Bin Suggestions	—

✅ Self-Check (Quick)

Mean vs Median: which is more robust to an extreme value?

Median. Mean will shift toward the extreme; report both if extremes are plausible.

When is the 1.5·IQR rule used?

To flag potential outliers in a box plot: values below Q1 − 1.5·IQR or above Q3 + 1.5·IQR.

What does a right-skewed histogram imply about mean vs median?

Mean > median typically (tail pulls mean right).

Q–Q plot looks S-shaped (concave then convex). Interpretation?

Heavy tails vs normal; normality is doubtful.

Is a line chart okay for “length of parts” on x-axis?

No. Use a scatter or histogram. Lines imply sequence/time.

What’s CV and why use it?

Coefficient of Variation = SD/mean; compares variability across different units/scales.

🗂️ Glossary (Fast Lookup)

Search or skim. Items open to show more detail.

Box Plot

Shows median and IQR; whiskers reach typical non-outliers; dots beyond are outliers.

Histogram

Bars show counts per bin; reveals shape, skew, and modes.

IQR

Interquartile Range = Q3 − Q1; robust spread.

Normal Probability (Q–Q) Plot

Data vs theoretical normal quantiles; straight line ≈ normal.

Outlier

A point far from the bulk (e.g., beyond 1.5·IQR from Q1 or Q3, or large |z|).

Percentile / Quartiles

Q1=25th, Q2=median=50th, Q3=75th. Percentile P means P% of data ≤ value.

Relative Frequency

Proportion in each class interval (counts / total).

Sample Mean / SD / Variance

Center and spread estimators for samples (use n−1 for variance).

Scatter Diagram

Y vs X to assess form, trend, outliers.

Stem-and-Leaf

Keeps exact values while showing distribution shape.

Time Series

Observations over time; look for trends/shifts/cycles.

z-Score

Standardized value: (x − mean)/SD; compares across units.

📘 Chapter 6 Review — Descriptive Statistics