📈 Session 6.7 — Normal Probability Plot
What is p (plotting position)?
    \(p_i\) is the percentile you assign to the
    i-th smallest value \(x_{(i)}\).
    It estimates the CDF at that point:
    \(\;p_i \approx F\!\big(x_{(i)}\big)\).
    Then we compute the theoretical normal score
    \(\;z_i = \Phi^{-1}(p_i)\;\) (same as Excel’s NORM.S.INV(\(p_i\))).
  
Not a p-value! Here \(p\) is a percentile/position, not a hypothesis-test p-value.
Common formulas for \(p_i\) (all acceptable — be consistent)
- Hazen (Textbook): \(\displaystyle p_i=\frac{i-0.5}{n}\)
- Blom / Rankit: \(\displaystyle p_i=\frac{i-0.375}{\,n+0.25\,}\)
- Weibull: \(\displaystyle p_i=\frac{i}{\,n+1\,}\)
- Benard (median-rank): \(\displaystyle p_i=\frac{i-0.3}{\,n+0.4\,}\)
This app lets you pick the formula; the CSV includes \(x\), \(p\), and \(z\) so students can check the math.
1) Paste data
2) Probability plot (z vs x)
Orientation: X = sorted data x(j), Y = z. Read the ends (outer ~3 points): both below → right-skew; both above → left-skew; left above & right below → heavy-tailed; left below & right above → light-tailed. The line here is a trimmed fit (middle 60%) so extremes don’t hide tail behavior.
3) Excel — quick build (X = grade, Y = z)
A) 1-Minute version (Excel 365 with spill)
- Put grades in A2:A (one per cell).
- B2 (sorted grades, spill): =SORT(FILTER(A2:A, A2:A<>""))
- C2 (ranks j, spill): =SEQUENCE(COUNTA(B2#))
- D2 (plotting position pj, spill — Hazen): =(C2-0.5)/COUNTA(B2#)Blom:=(C2-0.375)/(COUNTA(B2#)+0.25)• Weibull:=C2/(COUNTA(B2#)+1)
- E2 (theoretical z, spill): =NORM.S.INV(D2#)
- Make the chart → Scatter (Markers): X=B2#, Y=E2#
- Reference line (two points): G2==MIN(B2#), G3==MAX(B2#); H2==(G2-AVERAGE(B2#))/STDEV.S(B2#); H3==(G3-AVERAGE(B2#))/STDEV.S(B2#).
B) Classic Excel (no spill)
- Sort A2:A? ascending.
- B1 n: =COUNT(A2:A1048576)
- B2 j (fill to n+1): =ROW()-1
- C2 pj (Hazen): =(B2-0.5)/$B$1(fill)
- D2 z (fill): =NORM.S.INV(C2)
- Chart: X=A2:A?, Y=D2:D?. Line: use min/max + mean/sd as above.
C) Read the plot
- Points hug the line → roughly normal.
- Right end bends up (above) → right-skew; left bend → left-skew.
- Ends below→above → light tails; above→below → heavy tails.
Orientation here: X = sorted grade, Y = z.
Mini-gallery — Light vs Heavy vs Right-skewed vs Normal (with fitted lines)
Orientation: z vertical, x(j) horizontal. Light tails = ends below→above; Heavy tails = above→below; Right-skew = both below (Left-skew = both above). Lines are a trimmed fit (middle 60%) to avoid extremes pulling the fit.
(a) Light-tailed — ends below→above.
(b) Heavy-tailed — ends above→below.
(c) Right-skewed — both ends below.
(d) Normal — roughly linear.
4) Q & A
Why a probability plot?
It’s a small-sample friendly normality check; far more reliable than a histogram when n is small/medium.
Which axis is which?
This app (and your book) use z on the vertical, x(j) on the horizontal. Some tools flip axes—interpretation is identical.
What does the fitted line mean?
If the data are normal, points should align with a straight line. We fit the line to the middle 60% so tail patterns are visible.