Ch12 • Simple MLR (Study/Tutor/Prep → FinalGrade)

Step A — Copy the dataset into Excel

1. Click Copy CSV. 2. In Excel, click into cell A1 (not edit mode) and press Ctrl+V. You should see A–D as StudyHW, TutorHours, ExamPrep, FinalGrade and rows 1..24 (23 students).

Layout after pasting:

A: StudyHW (hours/week doing homework)
B: TutorHours (hours/week with tutor/SI)
C: ExamPrep (hours/week exam prep)
D: FinalGrade (0–100 for this class)

We have n = 23 students and p = 4 parameters (intercept + 3 slopes).

Step B — β, SE, t, p using LINEST only

Model we are fitting
FinalGrade = β₀ + β₁·StudyHW + β₂·TutorHours + β₃·ExamPrep + ε

Y (response) = FinalGrade = column D.
X's (predictors) = StudyHW, TutorHours, ExamPrep = columns A, B, C.

We will use LINEST to get β̂, their standard errors, R², F, and df in one shot. Then we compute t and p from those cells.

Main recipe (for the project): use LINEST, then build a tiny summary table.

Goal: using only LINEST, build a clean β-table with SE, t, and p-values.

Data:
  A2:A24 = StudyHW
  B2:B24 = TutorHours
  C2:C24 = ExamPrep
  D2:D24 = FinalGrade (Y)

Part 1 — Run LINEST once (full 5×4 output)
------------------------------------------
1) Select a 5×4 empty block, for example F2:I6.

2) Type this formula and press Enter:
   =LINEST($D$2:$D$24,$A$2:$C$24,TRUE,TRUE)

In recent Excel this spills automatically into F2:I6.

The 5×4 block F2:I6 contains:

Row 1 (F2:I2) — coefficients:
  • F2 = slope for ExamPrep  (last X column)
  • G2 = slope for TutorHours
  • H2 = slope for StudyHW   (first X column)
  • I2 = intercept

Row 2 (F3:I3) — standard errors of those 4 numbers.

Row 3 (F4:I4):
  • F4 = R²
  • G4 = standard error of y (RootMSE)

Row 4 (F5:I5):
  • F5 = F-statistic
  • G5 = df  (here = 19)

Row 5 (F6:I6):
  • F6 = regression SS
  • G6 = residual SS

Part 2 — Build a readable coefficient table
-------------------------------------------
Make a small table starting at K2:

K2: Term      | L2: Beta | M2: SE   | N2: t        | O2: p (two-sided)
-------------------------------------------------------------------------
K3: StudyHW   | L3: =H2  | M3: =H3  | N3: =L3/M3   | O3: =T.DIST.2T(ABS(N3),$G$5)
K4: TutorHours| L4: =G2  | M4: =G3  | N4: =L4/M4   | O4: =T.DIST.2T(ABS(N4),$G$5)
K5: ExamPrep  | L5: =F2  | M5: =F3  | N5: =L5/M5   | O5: =T.DIST.2T(ABS(N5),$G$5)
K6: Intercept | L6: =I2  | M6: =I3  | N6: =L6/M6   | O6: =T.DIST.2T(ABS(N6),$G$5)

Here:
  • L-column = β̂ (from LINEST row 1).
  • M-column = SE(β̂) (from LINEST row 2).
  • N-column = t-statistics.
  • O-column = p-values with df = G5 (LINEST df).

Optional — record R² and RootMSE
--------------------------------
Somewhere convenient:

  R²:        cell =F4
  RootMSE:   cell =G4
  df:        cell =G5
  SSE (residual SS): =G6

What to submit for this ICE
For the Excel-functions version, the key pieces are:

Your LINEST block (5×4) somewhere on the sheet.
Your β-table (Term, Beta, SE, t, p) built from those cells.
R², RootMSE, SSE, and df labelled clearly.

Step C — Correlation matrix in Excel (no ToolPak)

What is correlation r?
Correlation measures the strength and direction of a linear relationship between two variables.

Range: −1 ≤ r ≤ 1.
r > 0: as X increases, Y tends to increase.
r < 0: as X increases, Y tends to decrease.
|r| close to 1: strong linear association; |r| near 0: weak or no linear pattern.

Here we care about:

Correlation between each study variable and FinalGrade.
Correlation among the study variables themselves (possible collinearity).

Goal: build a 4×4 correlation matrix for StudyHW, TutorHours, ExamPrep, FinalGrade
using only CORREL. Put it under the data, in rows 27..31.

Assume:
A = StudyHW, B = TutorHours, C = ExamPrep, D = FinalGrade, rows 2..24 are students.

1) Labels for the matrix (B27:E31)
----------------------------------
B27: StudyHW
C27: TutorHours
D27: ExamPrep
E27: FinalGrade

A28: StudyHW
A29: TutorHours
A30: ExamPrep
A31: FinalGrade

2) Diagonal (correlation with itself = 1)
-----------------------------------------
B28: 1
C29: 1
D30: 1
E31: 1

3) Upper triangle — CORREL for each pair
----------------------------------------
C28: =CORREL($A$2:$A$24,$B$2:$B$24)   // r(StudyHW,TutorHours)
D28: =CORREL($A$2:$A$24,$C$2:$C$24)   // r(StudyHW,ExamPrep)
E28: =CORREL($A$2:$A$24,$D$2:$D$24)   // r(StudyHW,FinalGrade)

D29: =CORREL($B$2:$B$24,$C$2:$C$24)   // r(TutorHours,ExamPrep)
E29: =CORREL($B$2:$B$24,$D$2:$D$24)   // r(TutorHours,FinalGrade)

E30: =CORREL($C$2:$C$24,$D$2:$D$24)   // r(ExamPrep,FinalGrade)

4) Lower triangle — mirror the upper triangle
---------------------------------------------
B29: =C28
B30: =D28
B31: =E28

C30: =D29
C31: =E29

D31: =E30

Guideline:
  |r| < 0.3  → weak
  0.3–0.7    → moderate
  > 0.7      → strong linear association.

Step D — VIF (Variance Inflation Factor) using LINEST

What is VIF?
For each predictor X_j, VIF tells you how much the variance of its slope β̂_j is inflated by collinearity with the other X’s.

For each X_j, compute R²_j by regressing X_j on all the other predictors.
Then VIFⱼ = 1 / (1 − Rⱼ²).

Rules of thumb:

VIF ≈ 1–2: essentially no collinearity.
VIF ≈ 2–5: noticeable, but often acceptable.
VIF > 5: potential concern.
VIF > 10: serious multicollinearity; slopes may be unstable.

Goal: compute VIF for StudyHW, TutorHours, ExamPrep using LINEST.
Place these formulas anywhere in a blank area (e.g., AA10:AB12).

Assume:
A = StudyHW, B = TutorHours, C = ExamPrep, rows 2..24 are students.

1) VIF for StudyHW (regress StudyHW on TutorHours and ExamPrep)
----------------------------------------------------------------
AA10: "VIF StudyHW"
AB10:
=1/(1-INDEX(
      LINEST($A$2:$A$24,$B$2:$C$24,TRUE,TRUE),
      3,1))

Explanation:
  • LINEST(Y,X,TRUE,TRUE) with stats=TRUE returns a 5×k block.
  • In that block, row 3, column 1 is R² for this regression.
  • Here Y = StudyHW (A), X = (TutorHours, ExamPrep) (B,C).

2) VIF for TutorHours (regress TutorHours on StudyHW and ExamPrep)
------------------------------------------------------------------
AA11: "VIF TutorHours"
AB11:
=1/(1-INDEX(
      LINEST($B$2:$B$24,
             CHOOSE({1,2},$A$2:$A$24,$C$2:$C$24),
             TRUE,TRUE),
      3,1))

CHOOSE({1,2},A-range,C-range) builds a 2-column X matrix (StudyHW, ExamPrep).

3) VIF for ExamPrep (regress ExamPrep on StudyHW and TutorHours)
----------------------------------------------------------------
AA12: "VIF ExamPrep"
AB12:
=1/(1-INDEX(
      LINEST($C$2:$C$24,
             CHOOSE({1,2},$A$2:$A$24,$B$2:$B$24),
             TRUE,TRUE),
      3,1))

Interpretation:
  • If all three VIFs are around 1–3 → collinearity is mild.
  • If any are > 5 → keep an eye on it.
  • If any are > 10 → serious multicollinearity.

Step E — Mini wrap-up (how to talk about the results)

Use this template to turn your Excel work into a short paragraph. Copy into Word and fill in numbers from your β-table and LINEST output.

Optional — Matrix OLS with MMULT (for the curious only)

This part is not required for the project. It shows how the same β̂ from LINEST can be computed using the matrix OLS formula β̂ = (XᵀX)⁻¹ XᵀY with Excel functions.

Optional: build the design matrix X and compute β̂ with MMULT/MINVERSE.

1) Build the design matrix X (with intercept)
---------------------------------------------
Put X in F2:I24:

F1: Intercept
G1: StudyHW
H1: TutorHours
I1: ExamPrep

F2: =1
G2: =A2
H2: =B2
I2: =C2

Fill F2:I2 down to row 24. Then:

  X = F2:I24   (23×4)
  Y = D2:D24   (23×1)

2) Compute β̂ = (XᵀX)⁻¹ XᵀY
----------------------------
Select cells K2:K5 (4×1 block) and type:

=LET(
   X,$F$2:$I$24,
   Y,$D$2:$D$24,
   MMULT(
     MINVERSE( MMULT(TRANSPOSE(X),X) ),
     MMULT(TRANSPOSE(X),Y)
   )
 )

Press Enter. This spills β̂ into:

  K2 = β̂0  (Intercept)
  K3 = β̂1  (StudyHW)
  K4 = β̂2  (TutorHours)
  K5 = β̂3  (ExamPrep)

Explanation of the long formula:
  • TRANSPOSE(X) = Xᵀ
  • MMULT(TRANSPOSE(X),X) = XᵀX (square 4×4 matrix)
  • MINVERSE( … ) = (XᵀX)⁻¹
  • MMULT(TRANSPOSE(X),Y) = XᵀY (4×1 column)
  • Final MMULT = (XᵀX)⁻¹ · XᵀY = β̂

These β̂ values should match exactly the coefficients from LINEST row 1.

Big idea: LINEST is doing this matrix work behind the scenes for you. The matrix version is only to show what is happening under the hood.

Bonus — Same analysis in Python

Why scripts?
Excel is perfect for learning the mechanics. For larger or professional projects, a script (Python/R/Stata) is easier to rerun, document, and extend. The script below:

Reads the same CSV.
Runs FinalGrade ~ StudyHW + TutorHours + ExamPrep.
Prints β, SE, t, p, R², etc.
Prints the correlation matrix and VIFs.

# study_mlr_linest_style.py
# Simple MLR (FinalGrade ~ StudyHW + TutorHours + ExamPrep)
# using the same data as the Excel ICE, but in Python.

import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor


def main():
    # 1) Read the data ---------------------------------------------------------
    # In Excel: File → Save As → CSV (Comma delimited) and name it "study_mlr.csv".
    df = pd.read_csv("study_mlr.csv")

    print("First few rows of the data:")
    print(df.head(), "\n")

    # 2) Run the multiple regression ------------------------------------------
    model = smf.ols("FinalGrade ~ StudyHW + TutorHours + ExamPrep", data=df).fit()
    print("OLS regression results:")
    print(model.summary())
    print()

    # 3) Correlation matrix ---------------------------------------------------
    print("Correlation matrix:")
    print(df[["StudyHW", "TutorHours", "ExamPrep", "FinalGrade"]].corr())
    print()

    # 4) VIF ------------------------------------------------------------------
    X = df[["StudyHW", "TutorHours", "ExamPrep"]]
    X = sm.add_constant(X)

    vif_table = []
    for i in range(X.shape[1]):
        vif_table.append({
            "variable": X.columns[i],
            "VIF": variance_inflation_factor(X.values, i)
        })

    vif_df = pd.DataFrame(vif_table)
    print("VIF table:")
    print(vif_df)
    print()
    print("Hints:")
    print("- Compare β, SE, t, p, and R² with your Excel LINEST output.")
    print("- Check that the VIF values match your Excel VIF formulas.")


if __name__ == "__main__":
    main()

Math blog (optional) — OLS, correlation, and VIF

1. OLS multiple regression (matrix view)

Our model is
FinalGrade = β₀ + β₁·StudyHW + β₂·TutorHours + β₃·ExamPrep + ε.

In matrix form:

Y   = n×1 column of FinalGrade values
X   = n×p design matrix (columns: 1, StudyHW, TutorHours, ExamPrep)
β   = p×1 column of unknown coefficients
ε   = n×1 column of errors

Model:  Y = Xβ + ε

Ordinary Least Squares (OLS) chooses β to minimize Σ (yᵢ − ŷᵢ)². The solution:

β̂ = (XᵀX)⁻¹ XᵀY

LINEST is essentially computing this β̂ plus SE, R², F, etc., for you.

2. Correlation r(X,Y)

The sample correlation between variables X and Y is:

r(X,Y) = [ Σ (xᵢ − x̄)(yᵢ − ȳ) ]  /  [ (n−1)·sₓ·sᵧ ]

where:
  x̄   = mean of X
  ȳ   = mean of Y
  sₓ²  = Σ (xᵢ − x̄)² / (n−1)
  sᵧ²  = Σ (yᵢ − ȳ)² / (n−1)

Excel’s CORREL function computes exactly this number.

3. VIF — variance inflation factor

For each predictor X_j, let R_j² be the R² from regressing X_j on the other predictors. Then:

VIFⱼ = 1 / (1 − Rⱼ²)

If there were no collinearity, the variance of β̂_j would look like σ² / Σ(xⱼᵢ − x̄ⱼ)². With collinearity:

Var(β̂ⱼ) = [σ² / Σ(xⱼᵢ − x̄ⱼ)²] · VIFⱼ

So VIFⱼ is literally the factor by which the variance (and standard error) of β̂_j is blown up by multicollinearity.

Optional prework & videos

These videos are optional. They help connect the Excel work to the big-picture ideas in regression.

1. Quick intro to simple linear regression

2. Multicollinearity in regression analysis

3. Leverage and Cook’s D (influential points)

Optional video — LINEST, F.DIST.RT, T.DIST.2T & ABS in Excel

This video shows how to use LINEST, F.DIST.RT, T.DIST.2T, and ABS to do significance tests in Excel. It matches the way we build t- and p-values in this ICE.

ICE — Simple MLR in Excel with LINEST (No ToolPak)