MMULT/MINVERSE is optional, for curious students only.
1. Click Copy CSV. 2. In Excel, click into cell A1 (not edit mode) and press Ctrl+V. You should see A–D as StudyHW, TutorHours, ExamPrep, FinalGrade and rows 1..24 (23 students).
FinalGrade = β₀ + β₁·StudyHW + β₂·TutorHours + β₃·ExamPrep + ε
LINEST to get β̂, their standard errors, R², F, and df in one shot.
Then we compute t and p from those cells.
Main recipe (for the project): use LINEST, then build a tiny summary table.
Goal: using only LINEST, build a clean β-table with SE, t, and p-values. Data: A2:A24 = StudyHW B2:B24 = TutorHours C2:C24 = ExamPrep D2:D24 = FinalGrade (Y) Part 1 — Run LINEST once (full 5×4 output) ------------------------------------------ 1) Select a 5×4 empty block, for example F2:I6. 2) Type this formula and press Enter: =LINEST($D$2:$D$24,$A$2:$C$24,TRUE,TRUE) In recent Excel this spills automatically into F2:I6. The 5×4 block F2:I6 contains: Row 1 (F2:I2) — coefficients: • F2 = slope for ExamPrep (last X column) • G2 = slope for TutorHours • H2 = slope for StudyHW (first X column) • I2 = intercept Row 2 (F3:I3) — standard errors of those 4 numbers. Row 3 (F4:I4): • F4 = R² • G4 = standard error of y (RootMSE) Row 4 (F5:I5): • F5 = F-statistic • G5 = df (here = 19) Row 5 (F6:I6): • F6 = regression SS • G6 = residual SS Part 2 — Build a readable coefficient table ------------------------------------------- Make a small table starting at K2: K2: Term | L2: Beta | M2: SE | N2: t | O2: p (two-sided) ------------------------------------------------------------------------- K3: StudyHW | L3: =H2 | M3: =H3 | N3: =L3/M3 | O3: =T.DIST.2T(ABS(N3),$G$5) K4: TutorHours| L4: =G2 | M4: =G3 | N4: =L4/M4 | O4: =T.DIST.2T(ABS(N4),$G$5) K5: ExamPrep | L5: =F2 | M5: =F3 | N5: =L5/M5 | O5: =T.DIST.2T(ABS(N5),$G$5) K6: Intercept | L6: =I2 | M6: =I3 | N6: =L6/M6 | O6: =T.DIST.2T(ABS(N6),$G$5) Here: • L-column = β̂ (from LINEST row 1). • M-column = SE(β̂) (from LINEST row 2). • N-column = t-statistics. • O-column = p-values with df = G5 (LINEST df). Optional — record R² and RootMSE -------------------------------- Somewhere convenient: R²: cell =F4 RootMSE: cell =G4 df: cell =G5 SSE (residual SS): =G6
Goal: build a 4×4 correlation matrix for StudyHW, TutorHours, ExamPrep, FinalGrade using only CORREL. Put it under the data, in rows 27..31. Assume: A = StudyHW, B = TutorHours, C = ExamPrep, D = FinalGrade, rows 2..24 are students. 1) Labels for the matrix (B27:E31) ---------------------------------- B27: StudyHW C27: TutorHours D27: ExamPrep E27: FinalGrade A28: StudyHW A29: TutorHours A30: ExamPrep A31: FinalGrade 2) Diagonal (correlation with itself = 1) ----------------------------------------- B28: 1 C29: 1 D30: 1 E31: 1 3) Upper triangle — CORREL for each pair ---------------------------------------- C28: =CORREL($A$2:$A$24,$B$2:$B$24) // r(StudyHW,TutorHours) D28: =CORREL($A$2:$A$24,$C$2:$C$24) // r(StudyHW,ExamPrep) E28: =CORREL($A$2:$A$24,$D$2:$D$24) // r(StudyHW,FinalGrade) D29: =CORREL($B$2:$B$24,$C$2:$C$24) // r(TutorHours,ExamPrep) E29: =CORREL($B$2:$B$24,$D$2:$D$24) // r(TutorHours,FinalGrade) E30: =CORREL($C$2:$C$24,$D$2:$D$24) // r(ExamPrep,FinalGrade) 4) Lower triangle — mirror the upper triangle --------------------------------------------- B29: =C28 B30: =D28 B31: =E28 C30: =D29 C31: =E29 D31: =E30 Guideline: |r| < 0.3 → weak 0.3–0.7 → moderate > 0.7 → strong linear association.
VIFⱼ = 1 / (1 − Rⱼ²).
Goal: compute VIF for StudyHW, TutorHours, ExamPrep using LINEST.
Place these formulas anywhere in a blank area (e.g., AA10:AB12).
Assume:
A = StudyHW, B = TutorHours, C = ExamPrep, rows 2..24 are students.
1) VIF for StudyHW (regress StudyHW on TutorHours and ExamPrep)
----------------------------------------------------------------
AA10: "VIF StudyHW"
AB10:
=1/(1-INDEX(
LINEST($A$2:$A$24,$B$2:$C$24,TRUE,TRUE),
3,1))
Explanation:
• LINEST(Y,X,TRUE,TRUE) with stats=TRUE returns a 5×k block.
• In that block, row 3, column 1 is R² for this regression.
• Here Y = StudyHW (A), X = (TutorHours, ExamPrep) (B,C).
2) VIF for TutorHours (regress TutorHours on StudyHW and ExamPrep)
------------------------------------------------------------------
AA11: "VIF TutorHours"
AB11:
=1/(1-INDEX(
LINEST($B$2:$B$24,
CHOOSE({1,2},$A$2:$A$24,$C$2:$C$24),
TRUE,TRUE),
3,1))
CHOOSE({1,2},A-range,C-range) builds a 2-column X matrix (StudyHW, ExamPrep).
3) VIF for ExamPrep (regress ExamPrep on StudyHW and TutorHours)
----------------------------------------------------------------
AA12: "VIF ExamPrep"
AB12:
=1/(1-INDEX(
LINEST($C$2:$C$24,
CHOOSE({1,2},$A$2:$A$24,$B$2:$B$24),
TRUE,TRUE),
3,1))
Interpretation:
• If all three VIFs are around 1–3 → collinearity is mild.
• If any are > 5 → keep an eye on it.
• If any are > 10 → serious multicollinearity.
Use this template to turn your Excel work into a short paragraph. Copy into Word and fill in numbers from your β-table and LINEST output.
This part is not required for the project. It shows how the same β̂ from LINEST can be
computed using the matrix OLS formula
β̂ = (XᵀX)⁻¹ XᵀY with Excel functions.
Optional: build the design matrix X and compute β̂ with MMULT/MINVERSE.
1) Build the design matrix X (with intercept)
---------------------------------------------
Put X in F2:I24:
F1: Intercept
G1: StudyHW
H1: TutorHours
I1: ExamPrep
F2: =1
G2: =A2
H2: =B2
I2: =C2
Fill F2:I2 down to row 24. Then:
X = F2:I24 (23×4)
Y = D2:D24 (23×1)
2) Compute β̂ = (XᵀX)⁻¹ XᵀY
----------------------------
Select cells K2:K5 (4×1 block) and type:
=LET(
X,$F$2:$I$24,
Y,$D$2:$D$24,
MMULT(
MINVERSE( MMULT(TRANSPOSE(X),X) ),
MMULT(TRANSPOSE(X),Y)
)
)
Press Enter. This spills β̂ into:
K2 = β̂0 (Intercept)
K3 = β̂1 (StudyHW)
K4 = β̂2 (TutorHours)
K5 = β̂3 (ExamPrep)
Explanation of the long formula:
• TRANSPOSE(X) = Xᵀ
• MMULT(TRANSPOSE(X),X) = XᵀX (square 4×4 matrix)
• MINVERSE( … ) = (XᵀX)⁻¹
• MMULT(TRANSPOSE(X),Y) = XᵀY (4×1 column)
• Final MMULT = (XᵀX)⁻¹ · XᵀY = β̂
These β̂ values should match exactly the coefficients from LINEST row 1.
FinalGrade ~ StudyHW + TutorHours + ExamPrep.
# study_mlr_linest_style.py
# Simple MLR (FinalGrade ~ StudyHW + TutorHours + ExamPrep)
# using the same data as the Excel ICE, but in Python.
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
def main():
# 1) Read the data ---------------------------------------------------------
# In Excel: File → Save As → CSV (Comma delimited) and name it "study_mlr.csv".
df = pd.read_csv("study_mlr.csv")
print("First few rows of the data:")
print(df.head(), "\n")
# 2) Run the multiple regression ------------------------------------------
model = smf.ols("FinalGrade ~ StudyHW + TutorHours + ExamPrep", data=df).fit()
print("OLS regression results:")
print(model.summary())
print()
# 3) Correlation matrix ---------------------------------------------------
print("Correlation matrix:")
print(df[["StudyHW", "TutorHours", "ExamPrep", "FinalGrade"]].corr())
print()
# 4) VIF ------------------------------------------------------------------
X = df[["StudyHW", "TutorHours", "ExamPrep"]]
X = sm.add_constant(X)
vif_table = []
for i in range(X.shape[1]):
vif_table.append({
"variable": X.columns[i],
"VIF": variance_inflation_factor(X.values, i)
})
vif_df = pd.DataFrame(vif_table)
print("VIF table:")
print(vif_df)
print()
print("Hints:")
print("- Compare β, SE, t, p, and R² with your Excel LINEST output.")
print("- Check that the VIF values match your Excel VIF formulas.")
if __name__ == "__main__":
main()
Our model is
FinalGrade = β₀ + β₁·StudyHW + β₂·TutorHours + β₃·ExamPrep + ε.
In matrix form:
Y = n×1 column of FinalGrade values
X = n×p design matrix (columns: 1, StudyHW, TutorHours, ExamPrep)
β = p×1 column of unknown coefficients
ε = n×1 column of errors
Model: Y = Xβ + ε
Ordinary Least Squares (OLS) chooses β to minimize Σ (yᵢ − ŷᵢ)². The solution:
β̂ = (XᵀX)⁻¹ XᵀY
LINEST is essentially computing this β̂ plus SE, R², F, etc., for you.
The sample correlation between variables X and Y is:
r(X,Y) = [ Σ (xᵢ − x̄)(yᵢ − ȳ) ] / [ (n−1)·sₓ·sᵧ ]
where:
x̄ = mean of X
ȳ = mean of Y
sₓ² = Σ (xᵢ − x̄)² / (n−1)
sᵧ² = Σ (yᵢ − ȳ)² / (n−1)
Excel’s CORREL function computes exactly this number.
For each predictor Xj, let Rj2 be the R² from regressing Xj on the other predictors. Then:
VIFⱼ = 1 / (1 − Rⱼ²)
If there were no collinearity, the variance of β̂j would look like
σ² / Σ(xⱼᵢ − x̄ⱼ)². With collinearity:
Var(β̂ⱼ) = [σ² / Σ(xⱼᵢ − x̄ⱼ)²] · VIFⱼ
So VIFⱼ is literally the factor by which the variance (and standard error) of β̂j is blown up by multicollinearity.
These videos are optional. They help connect the Excel work to the big-picture ideas in regression.
This video shows how to use LINEST, F.DIST.RT,
T.DIST.2T, and ABS to do significance tests
in Excel. It matches the way we build t- and p-values in this ICE.
© Ch12 – Simple MLR (Excel LINEST only; matrix OLS optional) • Single-file, offline.