Session 7.3.4 Bootstrap Standard Error

Pick a statistic (mean / median / SD), enter data, then run a nonparametric or parametric bootstrap. Default nB=200.

Bootstrap — General Procedure (plain English)

  1. Define the question. What single number do you care about? (mean, median, or SD).
  2. Collect a sample. Small is OK (e.g., n=10). Enter it in the grid. Units matter (minutes, mm, etc.).
  3. Check bias risks. Is your sample representative? Any coverage/nonresponse/time-of-day bias? SE measures wobble (precision), not bias.
  4. Choose mode. Nonparametric (resample your data) or Parametric (draw from a model, e.g., Normal with μ,σ).
  5. Pick replicates. Start with nB = 200 for class; use 1000+ for reports.
  6. Resample & compute. For each bootstrap sample, compute the chosen statistic → a distribution of replicates.
  7. Summarize uncertainty. Bootstrap SE = sd of replicates; 95% CI via percentiles or \(\pm 2\times \text{SE}\).
  8. Interpret. Low SE = precise, not automatically “good data.” Inspect histogram shape; consider median if outliers/skew.
  9. Report. Example: “Mean commute = 11.7 min (SE 0.8; 95% CI 10.1–13.2; n=10; nonparametric bootstrap).”
Student commute case (minutes; n=10): Click Load student one way commute time example, choose Mean and Nonparametric, run. Then switch to Median and compare SE & CI. Ask: could survey timing or who responded bias the estimate?
SE ≠ “good data” — SE tells you precision (wobble). It does not fix sampling bias, measurement error, or a bad model.
When to use the bootstrap (small-n intuition): You have a small sample and want to know how your statistic (mean/median/SD) would vary if you could repeat the study many times. The bootstrap emulates those extra samples — by resampling your data (nonparametric) or a model like Normal(μ,σ) (parametric) — to estimate SE and CI. It does not add new information or fix sampling bias.
SD → we will resample your data and plot a histogram of bootstrap SDs (spread often right-skewed).


Bootstrap Mode


Excel — Clear Step-by-Step (one sheet)

  1. Original data in A2:A11 (n = 10). Do not overwrite.
  2. Set μ and σ (same sheet):
    • B1 = =AVERAGE($A$2:$A$11)  ← μ
    • B2 = =STDEV.S($A$2:$A$11) ← σ
  3. Nonparametric bootstrap (resample from A2:A11):
    1. In B3: =INDEX($A$2:$A$11, RANDBETWEEN(1,10))
    2. Fill B3 → B12 (Resample #1), then copy that 10×1 block across to 200 columns (BGR).
  4. Parametric bootstrap (Normal with μ, σ):
    1. In B3: =NORM.INV(RAND(), $B$1, $B$2)
    2. Fill B3 → B12 (Resample #1), then copy across to 200 columns (BGR).
  5. Statistic for each resample (row 14):
    • SD: in B14=STDEV.S(B3:B12), then fill across to GR14
    • Mean: =AVERAGE(B3:B12) · Median: =MEDIAN(B3:B12)
  6. Bootstrap SE (spread across resamples): anywhere (e.g., B16) → =STDEV.S(B14:GR14)
  7. One-shot spill (no dragging) — start in B3 and press Enter; Excel will spill a 10×200 block into B3:GR12.
    • Nonparametric (from A2:A11)
      =INDEX($A$2:$A$11, RANDARRAY(ROWS($A$2:$A$11), 200, 1, ROWS($A$2:$A$11), TRUE))
      Uses the sample size automatically via ROWS($A$2:$A$11). Change 200 to your desired nB.
    • Parametric (Normal with μ, σ in B1, B2)
      =NORM.INV(RANDARRAY(ROWS($A$2:$A$11), 200), $B$1, $B$2)
      Also spills 10×200 using the same n and nB. Make sure B3:GR12 is empty before entering.
  8. Press F9 to refresh; to freeze a run, copy B3:GR12Paste Special → Values.

Results

Original statistic

Bootstrap SE (SE_B)

SE across the \( n_B \) bootstrap estimates

Bootstrap mean

Mean of bootstrap estimates

Reference SE (formulas)

Mean: s/√n · SD: s/√[2(n−1)] (Normal-only) · Median: N/A

“±2 SE” band

Rough 95% band around original estimate

Bootstrap distribution

Teacher Summary (auto)
Run the bootstrap to generate a plain-English summary here.
Reference check (SE formulas)
What is the bootstrap? (plain English)
  • It’s a computer shortcut to measure how much your statistic (mean/median/SD) would wobble if you repeated the study many times.
  • Nonparametric: treat your sample like a tiny “population,” draw with replacement, recompute the statistic.
  • Parametric: if you assume a model (e.g., Normal with μ, σ), draw from that model and recompute.
  • The spread of those repeated statistics is the bootstrap standard error.
Why use it?
  • Small sample & can’t easily collect more: emulate many “new” samples to quantify precision (SE, CI).
  • Formulas are hard/unknown (e.g., SE of the SD with small n).
  • n is small and normal-approximation is shaky.
  • Quick, general SE or rough 95% band.
Common uses
  • SE/CI for mean, median, SD.
  • Uncertainty for regression coefficients.
  • Model performance uncertainty via resampling rows.
Compare to other methods (no math)
  • Normal / t-based formulas: Great when assumptions hold (mean’s s/√n). Bootstrap is more flexible when not.
  • Jackknife: Fast leave-one-out; good for smooth stats; can be biased for medians/SDs.
  • Permutation tests: For hypothesis tests (shuffle labels). Different goal.