📘 Chapter 6 Review — Descriptive Statistics

Your quick reference for center, spread, shape, plots, Excel, and “what to do next.” Includes a paste-n-compute mini-calculator.

🔎 At a Glance: What / Why / How

Center

  • Mean — average; sensitive to outliers.
  • Median — 50th percentile; robust.
  • Mode — most frequent (can be multiple).

Spread

  • Range: max − min
  • IQR: Q3 − Q1 (robust)
  • SD/Variance: average squared deviation
  • CV: SD / mean (unitless)

Shape & Outliers

  • Symmetric vs right/left-skewed
  • Outliers: < Q1 − 1.5·IQR or > Q3 + 1.5·IQR
  • Normality check: Normal Q–Q plot

Before modeling/regression: scan histograms/box plots, run a scatter of Y vs X, check normality and possible transforms.

🧮 Core Formulas (Chapter 6)

Mean / Median / Range
  • Sample mean: \( \displaystyle \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i \)
  • Median: middle of sorted data (average two middles if even \(n\)).
  • Range: \( \max(x)-\min(x) \)
Variance & Standard Deviation
  • Sample variance: \( \displaystyle s^2=\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2 \)
  • Sample SD: \( s=\sqrt{s^2} \)

Use \(n-1\) degrees of freedom for samples.

Quartiles, IQR, Outliers
  • IQR: \( \mathrm{IQR}=Q_3-Q_1 \)
  • Outlier rule: values \(< Q_1 - 1.5\,\mathrm{IQR}\) or \(> Q_3 + 1.5\,\mathrm{IQR}\)
z-Score & Empirical Rule
  • \( z=\dfrac{x-\bar{x}}{s} \)
  • For roughly normal data: ~68% within 1 SD, ~95% within 2 SD, ~99.7% within 3 SD.
Histogram Bin Rules
  • Square-root: \( k\approx \sqrt{n} \)
  • Sturges: \( k\approx 1+\log_2 n \)
  • Freedman–Diaconis: bin width \( w=2\,\mathrm{IQR}\,n^{-1/3} \), then \( k\approx \dfrac{\max-\min}{w} \)

\(k\) = number of bins, \(w\) = bin width.

Correlation (preview)
  • \( r\in[-1,1] \) measures linear strength & direction.
  • Always look at the scatter—groups/nonlinearity can fool \(r\).

📊 Graphics — What/Why/How to Read

Histogram

Shape Skew, multimodality; compare bin rules.

Tip Start with √n, then try FD for fine structure.

Box Plot

Compare Median shift (center) & IQR (spread).

Outliers 1.5·IQR rule for flags.

Dot / Stem-and-Leaf

Great for small n; preserves individual values and shows shape.

Scatter Plot

Y vs X trend, form, outliers. Don’t connect dots unless X is time.

Time (Run) Chart

Detect shifts/drift/cycles. Use before relying on a histogram.

Normal Q–Q Plot

≈ straight line → plausible normal. S-curve/ends off → skew/heavy tails.

💬 Excel Quick Reference

Assume your data are in B2:B101. Replace ranges as needed.

TaskExcel
Mean=AVERAGE(B2:B101)
Median=MEDIAN(B2:B101)
SD (sample)=STDEV.S(B2:B101)
Min / Max=MIN(B2:B101) / =MAX(B2:B101)
Q1 / Q3 (EXC)=QUARTILE.EXC(B2:B101,1) / =QUARTILE.EXC(...,3)
IQR=QUARTILE.EXC(...,3)-QUARTILE.EXC(...,1)
z-score for B2=(B2-$F$2)/$F$3 (F2=mean, F3=SD)
Correlation (X,Y)=CORREL(C2:C101, D2:D101)
ChartSteps
HistogramInsert → Statistic Chart → Histogram; try different bin widths.
Box & WhiskerInsert → Box & Whisker; shows median/IQR/outliers.
ScatterInsert → Scatter (no lines); Y on vertical, X on horizontal.
Q–Q (manual)Sort, compute theoretical normal quantiles via =NORM.S.INV(P), then scatter.

🧪 Paste-n-Compute (Quick Stats)

Paste numbers (one per line or comma-separated). We’ll compute n, mean, median, SD, min, max, Q1, Q3, IQR, and \( |z|>2 \) count. Data stay in your browser.

n
Mean
Median
SD (s)
Min / Max
Q1 / Q3
IQR
|z| > 2 (count)
Bin Suggestions

✅ Self-Check (Quick)

Mean vs Median: which is more robust to an extreme value?
Median. Mean will shift toward the extreme; report both if extremes are plausible.
When is the 1.5·IQR rule used?
To flag potential outliers in a box plot: values below Q1 − 1.5·IQR or above Q3 + 1.5·IQR.
What does a right-skewed histogram imply about mean vs median?
Mean > median typically (tail pulls mean right).
Q–Q plot looks S-shaped (concave then convex). Interpretation?
Heavy tails vs normal; normality is doubtful.
Is a line chart okay for “length of parts” on x-axis?
No. Use a scatter or histogram. Lines imply sequence/time.
What’s CV and why use it?
Coefficient of Variation = SD/mean; compares variability across different units/scales.

🗂️ Glossary (Fast Lookup)

Search or skim. Items open to show more detail.

Box Plot
Shows median and IQR; whiskers reach typical non-outliers; dots beyond are outliers.
Histogram
Bars show counts per bin; reveals shape, skew, and modes.
IQR
Interquartile Range = Q3 − Q1; robust spread.
Normal Probability (Q–Q) Plot
Data vs theoretical normal quantiles; straight line ≈ normal.
Outlier
A point far from the bulk (e.g., beyond 1.5·IQR from Q1 or Q3, or large |z|).
Percentile / Quartiles
Q1=25th, Q2=median=50th, Q3=75th. Percentile P means P% of data ≤ value.
Relative Frequency
Proportion in each class interval (counts / total).
Sample Mean / SD / Variance
Center and spread estimators for samples (use n−1 for variance).
Scatter Diagram
Y vs X to assess form, trend, outliers.
Stem-and-Leaf
Keeps exact values while showing distribution shape.
Time Series
Observations over time; look for trends/shifts/cycles.
z-Score
Standardized value: (x − mean)/SD; compares across units.