✅ Excel Functions for Chapter 12

This page is a cheat sheet for the Excel functions and formulas used in Chapter 12 apps (MLR, VIF, Cook’s Distance, Adjusted R²). It is not a new assignment — just a quick reference so you don’t have to hunt through slides.

Notation in the formulas:

Part A · Basic regression output (β, ŷ, SSE, R², Root MSE)

A. Core regression quantities

Quantity / Function Excel formula (generic pattern) What it means / Example
Slope β₁ (simple regression) =SLOPE(Y_range, X1_range) Returns the slope of the line predicting Y from one X.
Example: =SLOPE(D2:D24, A2:A24) gives how many grade points FinalGrade changes when StudyHW increases by 1 hour.
Intercept β₀ (simple regression) =INTERCEPT(Y_range, X1_range) Predicted Y when X = 0.
Example: =INTERCEPT(D2:D24, A2:A24) gives the predicted FinalGrade when StudyHW = 0.
Multi-X β’s via LINEST
(slopes & intercept)
=LINEST(Y_range, X_range, TRUE, TRUE)

Use with INDEX to pull specific pieces, e.g.:
=INDEX(LINEST(Y_range, X_range, TRUE, TRUE), 1, 1)
LINEST with TRUE,TRUE returns an array containing: slopes, intercept, standard errors, R², F, etc.
In practice, we usually use the matrix-formula approach in the ICE, but LINEST is a built-in alternative for multi-predictor MLR.
Matrix formula for β̂ (multi-X) Assuming you built the design matrix X (including a column of 1’s) in F2:I24 and Y in D2:D24, you can use:

=MMULT(
  MINVERSE(MMULT(TRANSPOSE($F$2:$I$24), $F$2:$I$24)),
  MMULT(TRANSPOSE($F$2:$I$24), $D$2:$D$24)
)
Computes the 4×1 vector \( \hat\beta = (X'X)^{-1}X'Y \) used in the functions-only ICE app. The result spills into 4 cells: intercept, β(StudyHW), β(TutorHours), β(ExamPrep).
Predicted value ŷ If β’s are in B1:B3 and X’s in row 2:
= $B$1 + $B$2*StudyHours_2 + $B$3*SleepHours_2
Plug the fitted coefficients into the model to get the predicted score for each student. In the ICE, this is the “yhat” column.
Residual e = Y_cell - yhat_cell

Example: =D2 - L2
Actual minus predicted: how far off the model is for that student. Residuals should bounce randomly around 0 if the model is reasonable.
SSE (Sum of Squared Errors) =SUMSQ(Residual_range)
Example: =SUMSQ(M2:M24)
Adds up across all students. Smaller SSE means the line is closer to the points overall.
Residual degrees of freedom = n - p
Example (23 students, 3 slopes + intercept):
=23 - 4 → 19
df = number of data points minus number of parameters in the model. Used for MSE, t-tests, and confidence intervals.
MSE and Root MSE MSE: = SSE_cell / df_cell
Root MSE: =SQRT(MSE_cell)
MSE is the average squared residual; Root MSE is in the same units as Y (e.g., grade points). Root MSE ≈ typical prediction error.
R² (coefficient of determination) Simple regression:
=RSQ(Y_range, X1_range)

From predictions:
=RSQ(Y_range, Yhat_range)
Percent of variation in Y explained by the model.
Example: R² = 0.72 → about 72% of the variation in test scores is explained by the predictors.
Adjusted R² If R² is in C2, n in C3, p in C4:
=1 - (1 - C2) * (C3 - 1) / (C3 - C4)
Adjusted R² penalizes adding “junk” predictors. It can go down if a new X doesn’t help much. Use it when comparing models with different numbers of predictors.
t-statistic for a slope βⱼ If βⱼ is in B2 and its SE is in C2:
= B2 / C2
Measures how many SE’s the estimated slope is away from 0. Large |t| usually means a small p-value and stronger evidence that the slope ≠ 0.
p-value for slope (two-sided) If t is in D2 and df in E2:
=T.DIST.2T(ABS(D2), E2)
Probability of seeing a t-statistic this extreme (or more) if the true slope were 0. If p < 0.05 → slope is statistically significant at 5%.
95% confidence interval for βⱼ With βⱼ in B2, SE in C2, α in 0.05, df in E2:

Critical t:
=T.INV.2T(0.05, E2)

CI lower:
=B2 - tcrit_cell * C2
CI upper:
=B2 + tcrit_cell * C2
Gives a range of plausible values for the true slope. If 0 is not inside the interval, the slope is significant at α = 0.05.
Part B · Correlation and multicollinearity (VIF)

B. Correlation & VIF

Quantity / Function Excel formula What it means
Correlation r(X, Y) =CORREL(X_range, Y_range)

Example:
=CORREL(A2:A24, D2:D24)
Measures strength and direction of linear association (–1 ≤ r ≤ 1).
Rough guide:
  • |r| < 0.3 → weak
  • 0.3–0.7 → moderate
  • > 0.7 → strong
Correlation matrix (4×4, by hand) Use =CORREL() for each pair and mirror across the diagonal.
Example:
=CORREL($A$2:$A$24, $B$2:$B$24) → r(StudyHW,TutorHours)
Table of r’s for all pairs of variables (StudyHW, TutorHours, ExamPrep, FinalGrade). Large |r| among X’s signals possible multicollinearity.
VIF for a predictor Xⱼ
(from R² of Xⱼ on other X’s)
1. Get R²ⱼ from regressing Xⱼ on other X’s (can use RSQ or LINEST).

2. Then compute:
=1 / (1 - R2j_cell)
VIF tells how much the variance of βⱼ is inflated by collinearity.
  • VIF ≈ 1–2 → no real concern.
  • VIF ≈ 2–5 → noticeable collinearity.
  • VIF > 5 → keep an eye on it.
  • VIF > 10 → serious multicollinearity problem.
VIF via LINEST (example) Example for StudyHours (X₁) regressed on SleepHours (X₂):

=1/(1-INDEX(
  LINEST(StudyHours_range, SleepHours_range, TRUE, TRUE),
  3,1))
Here row 3, column 1 of the LINEST output is R² for the regression of X₁ on X₂. Plug into 1/(1−R²) to get VIF for X₁.
Part C · Influence diagnostics (Cook’s Distance)

C. Cook’s Distance (influence)

Quantity / Formula Excel expression (pattern) What it means / How to use
Standardized residual rᵢ If residual eᵢ is in E2, leverage hᵢᵢ in F2, MSE in $P$4:

= E2 / ( SQRT($P$4) * SQRT(1 - F2) )
Rescales residuals so they are comparable across observations, taking into account overall spread (MSE) and leverage. Large |rᵢ| indicates a point that doesn’t fit the line well.
Leverage hᵢᵢ In the model-adequacy ICE, leverage is usually computed via matrix formulas; you can store hᵢᵢ in its own column (say F2:F24). Measures how “far” the X-pattern of observation i is from the center of the X’s. High leverage points can pull the fitted line toward them.
Cook’s Distance Dᵢ
(using rᵢ and hᵢᵢ)
With rᵢ in G2, hᵢᵢ in F2, and p = number of parameters (intercept + slopes) in $Q$2:

= (G2^2 * F2) / ( $Q$2 * (1 - F2) )
Cook’s Distance measures how much all fitted β’s would change if observation i were removed.
Rules of thumb:
  • Dᵢ ≪ 1 → not influential.
  • Dᵢ ≳ 1 → investigate (possible influential point).
In diagnostic plots, points with large Dᵢ are often labelled separately.
Alternative Cook’s D formula
(using eᵢ directly)
If you have raw residual eᵢ (E2), leverage hᵢᵢ (F2), MSE (P4), and p = number of parameters (Q2):

= (E2^2 * F2) / ( $Q$2 * $P$4 * (1 - F2)^2 )
This is algebraically equivalent to the rᵢ-based formula above. Use whichever matches the quantities you’ve already computed in your sheet.

💡 How to use this page: When you’re working on Chapter 12 homework or ICE apps, keep this open as a “formula menu.” Find the quantity you need (β, R², Root MSE, correlation, VIF, or Cook’s D), copy the pattern, and then swap in your own cell ranges.