ICE — Simple MLR with Excel ToolPak

Three numeric study variables (StudyHW, TutorHours, ExamPrep) predict FinalGrade. All columns are numbers. No dummies, no extra sheets.

This ICE is built around the Excel Analysis ToolPak (Data → Data Analysis → Regression / Correlation). For the “no ToolPak, all Excel functions + matrix formula” version of Chapter 12 MLR, use the companion page 👉 Ch12 — MLR (GPA) • Excel ICE (functions only) .

Step A — Copy the dataset into Excel

1. Click Copy CSV. 2. In Excel, click into cell A1 (not edit mode) and press Ctrl+V. You should see A–D as StudyHW, TutorHours, ExamPrep, FinalGrade and rows 1..24 (23 students).

Layout after pasting:

A: StudyHW (hours/week doing homework)
B: TutorHours (hours/week with tutor/SI)
C: ExamPrep (hours/week exam prep)
D: FinalGrade (0–100 for this class)

Step B — Excel ToolPak Regression (FinalGrade on A,B,C)

Turn on the Analysis ToolPak if needed: File → Options → Add-ins → Manage “Excel Add-ins” → Go… → check Analysis ToolPak.
Go to the Data tab → click Data Analysis → choose Regression → OK.
Set ranges (include labels in row 1):
$D$1:$D$24 = Input Y Range (FinalGrade)
$A$1:$C$24 = Input X Range (StudyHW, TutorHours, ExamPrep)
Check Labels. Leave “Constant is Zero” unchecked.
Output → choose New Worksheet Ply (e.g., “MLR_ToolPak”). Optional: tick Residuals.
Click OK. Excel gives you Regression Statistics, ANOVA, and Coefficients.

Model Excel is fitting:
FinalGrade = β₀ + β₁·StudyHW + β₂·TutorHours + β₃·ExamPrep + ε
All variables are numeric. No dummy variables. No extra columns.

Step C — Reading Excel’s output (short version)

Regression Statistics

R Square: percent of grade variation explained by StudyHW, TutorHours, ExamPrep together.
Adjusted R Square: R² with a small penalty for 3 predictors (use this when comparing models).
Standard Error: √MSE — typical prediction error in grade points.

ANOVA (overall F test)

H₀: β₁ = β₂ = β₃ = 0 (study variables are useless).
H₁: at least one β ≠ 0.
F and Significance F: small p ⇒ at least one study variable helps explain grades.

Coefficients

StudyHW: extra grade points per 1 more hour/week of homework, holding the other two fixed.
TutorHours: effect of 1 more hour/week with tutor, all else equal.
ExamPrep: effect of 1 more hour/week exam prep, all else equal.
Each row: coefficient, Standard Error, t Stat, P-value. Small p (< 0.05) ⇒ strong evidence slope ≠ 0.

Step D — Correlation (ToolPak + build-your-own matrix)

D1. Correlation using the ToolPak

First, use Data → Data Analysis → Correlation to let Excel create the full 4×4 correlation matrix for you.

Go to the Data tab → click Data Analysis → choose Correlation → OK.
Input Range: $A$1:$D$24 (StudyHW, TutorHours, ExamPrep, FinalGrade, with headers in row 1).
Select Columns, check Labels in first row.
Output → choose New Worksheet Ply (e.g. “Corr_ToolPak”) → OK.
Excel prints a 4×4 table with the variables on both axes. This is the correlation matrix. Each off–diagonal cell is r between two variables (e.g., r(FinalGrade,StudyHW)).

Rough guide: |r| < 0.3 ≈ weak, 0.3–0.7 ≈ moderate, > 0.7 ≈ strong. Large correlations among the predictors (StudyHW, TutorHours, ExamPrep) can mean multicollinearity.

D2. Build the same matrix by hand with CORREL

Now recreate that same 4×4 table yourself on the original data sheet using CORREL. We’ll use a blank block in F1:J5.

Assume:
A = StudyHW, B = TutorHours, C = ExamPrep, D = FinalGrade, rows 2..24 are students.

Labels:
F1:  (leave blank)
G1: StudyHW
H1: TutorHours
I1: ExamPrep
J1: FinalGrade

F2: StudyHW
F3: TutorHours
F4: ExamPrep
F5: FinalGrade

Diagonal (correlation of a variable with itself):
G2: 1
H3: 1
I4: 1
J5: 1

Upper triangle — each CORREL uses two column ranges:
H2: =CORREL($A$2:$A$24,$B$2:$B$24)   // r(StudyHW,TutorHours)
I2: =CORREL($A$2:$A$24,$C$2:$C$24)   // r(StudyHW,ExamPrep)
J2: =CORREL($A$2:$A$24,$D$2:$D$24)   // r(StudyHW,FinalGrade)

I3: =CORREL($B$2:$B$24,$C$2:$C$24)   // r(TutorHours,ExamPrep)
J3: =CORREL($B$2:$B$24,$D$2:$D$24)   // r(TutorHours,FinalGrade)

J4: =CORREL($C$2:$C$24,$D$2:$D$24)   // r(ExamPrep,FinalGrade)

// Lower triangle — mirror the upper triangle:
G3: =H2
G4: =I2
G5: =J2

H4: =I3
H5: =J3

I5: =J4

Each CORREL has exactly two ranges: first variable’s column, second variable’s column. This hand-built F1:J5 table should match the ToolPak correlation matrix from part D1.

Step E — VIF (Variance Inflation Factor) for the 3 X’s

ToolPak does not show VIF. We can compute VIF from VIF_j = 1 / (1 − R²_j), where R²_j is from regressing X_j on the other X’s (with intercept).

With our layout (StudyHW=A, TutorHours=B, ExamPrep=C, rows 2..24), use:

AA10: "VIF StudyHW"
AB10: =1/(1-INDEX(LINEST($A$2:$A$24,$B$2:$C$24,TRUE,TRUE),3,1))

AA11: "VIF TutorHours"
AB11: =1/(1-INDEX(
          LINEST($B$2:$B$24,
                 CHOOSE({1,2},$A$2:$A$24,$C$2:$C$24),
                 TRUE,TRUE),
          3,1))

AA12: "VIF ExamPrep"
AB12: =1/(1-INDEX(
          LINEST($C$2:$C$24,
                 CHOOSE({1,2},$A$2:$A$24,$B$2:$B$24),
                 TRUE,TRUE),
          3,1))

Rules of thumb:

VIF ≲ 5 → OK.
VIF > 5 → keep an eye on collinearity.
VIF > 10 → serious multicollinearity problem (slopes may be unstable).

Step F — Mini write-up template

Students can copy this and fill in the numbers from the ToolPak output.

Step G — Math blog (optional, for the curious)

This part is for students who want to see the math behind what Excel is doing: OLS multiple regression, correlation, and VIF.

1. OLS multiple regression (matrix view)

Our model is
FinalGrade = β₀ + β₁·StudyHW + β₂·TutorHours + β₃·ExamPrep + ε.

In matrix form we write:

Y   = n×1 column of FinalGrade values
X   = n×p design matrix (columns: 1, StudyHW, TutorHours, ExamPrep)
β   = p×1 column of unknown coefficients
ε   = n×1 column of errors

Model:  Y = Xβ + ε

Ordinary Least Squares (OLS) chooses β to minimize the sum of squared residuals Σ eᵢ² = Σ (yᵢ − ŷᵢ)². The solution is the famous closed-form formula:

β̂ = (XᵀX)⁻¹ XᵀY

From this we get predictions ŷ = Xβ̂, residuals e = Y − ŷ, and the error variance estimate MSE = SSE / (n − p), where SSE = Σeᵢ² and p = number of parameters (here p = 4: intercept + 3 slopes).

2. Correlation r(X,Y)

The sample correlation between variables X and Y is:

r(X,Y) = [ Σ (xᵢ − x̄)(yᵢ − ȳ) ]  /  [ (n−1)·sₓ·sᵧ ]

where:
  x̄  = sample mean of X
  ȳ  = sample mean of Y
  sₓ² = Σ (xᵢ − x̄)² / (n−1)  (sample variance of X)
  sᵧ² = Σ (yᵢ − ȳ)² / (n−1)  (sample variance of Y)

In Excel, =CORREL(X_range, Y_range) computes this r directly. The 4×4 correlation matrix is just all pairwise r’s arranged in a table.

3. VIF — variance inflation factor

For each predictor X_j in a multiple regression, the Variance Inflation Factor (VIF) tells you how much the variance of its slope β̂_j is inflated by collinearity with the other X’s.

Let R_j² be the R-squared from regressing X_j on all the other predictors (with an intercept). Then

VIFⱼ = 1 / (1 − Rⱼ²)

If there were no collinearity, the variance of β̂_j would look like σ² / Σ(xⱼᵢ − x̄ⱼ)². With other correlated predictors, it becomes

Var(β̂ⱼ) = [σ² / Σ(xⱼᵢ − x̄ⱼ)²] · VIFⱼ

So VIFⱼ is literally the factor by which the variance (and standard error) of the slope is blown up by multicollinearity.

Rules of thumb:

VIF ≈ 1–2: essentially no collinearity.
VIF ≈ 2–5: noticeable, but often acceptable.
VIF > 5: potential concern.
VIF > 10: serious multicollinearity; slopes may be unstable.

Optional prework — watch a quick intro to simple linear regression

If you’ve never seen regression before, watch this short video first. It shows the basic idea of fitting a line before we jump into multiple regression and matrix formulas.

Optional video — Multicollinearity in Regression Analysis

To connect the correlation matrix and VIF to real data problems, you can also watch this short video on multicollinearity.