📘 Sessions 12.1.3-12.1.4 - – Matrix Form of the Multiple Linear Regression Model

📌 Purpose of Matrix Representation

In multiple linear regression (MLR), writing out the equations becomes cumbersome with more predictors. The matrix form provides a compact, efficient way to represent and compute the model, especially useful for software tools like Excel, R, or Python.

🧮 General MLR Model in Matrix Notation

Y = Xβ + ε

Y: n×1 response vector (e.g., GPA)
X: n×(k+1) matrix of predictors (including intercept)
β: (k+1)×1 vector of regression coefficients
ε: n×1 vector of errors, E(ε) = 0, Var(ε) = σ²I

🧾 Component Breakdown

Y vector

Y = [ y₁ y₂ ... yₙ ]

X matrix

X = [ 1 x₁₁ x₁₂ ... x₁k ... 1 xₙ₁ xₙ₂ ... xₙk ]

β vector

β = [ β₀ β₁ ... βₖ ]

ε vector

ε = [ ε₁ ε₂ ... εₙ ]

🔢 Least Squares Estimator

The least squares estimator minimizes the residual sum of squares:

β̂ = (XᵀX)⁻¹XᵀY

🎯 Model: Student GPA

We model GPA as a function of study hours (x₁), sleep hours (x₂), and attendance (x₃):

Y = Xβ + ε

📊 Student Data

Student	GPA (Y)	Study (x₁)	Sleep (x₂)	Attendance (x₃)
1	3.5	15	7	40
2	3.8	20	8	42
3	2.9	10	6	35
4	3.2	12	6.5	38
5	3.7	18	7.5	41

🧮 Matrix Setup

Design Matrix X (5×4):

X =
[1  15  7.0   40]
[1  20  8.0   42]
[1  10  6.0   35]
[1  12  6.5   38]
[1  18  7.5   41]

Response Vector Y (5×1):

Y = [3.5 3.8 2.9 3.2 3.7]

🔢 Step-by-Step Least Squares

1. Compute XᵀX

[5   75   35.0   196]
[75 1173  546.5 2988]
[35 546.5 257.5 1391]
[196 2988 1391 11510]

2. Compute XᵀY

[17.1]
[262.2]
[122.1]
[672.4]

3. Compute β̂ = (XᵀX)⁻¹ XᵀY

β̂ = [1.256 0.082 0.291 0.014]

Final model: GPA = 1.256 + 0.082×Study + 0.291×Sleep + 0.014×Attendance

✅ Interpretation

Each hour of study improves GPA by ~0.082 points
Each hour of sleep adds ~0.291 points
Each class attended adds ~0.014 points

📐 Section 12.1.4 – Properties of Least Squares Estimators

Assumptions: Errors εᵢ are independent, E(εᵢ) = 0, and Var(εᵢ) = σ².

Unbiasedness:

E[β̂] = E[(X′X)⁻¹X′Y] = β

Because E(ε) = 0 and (X′X)⁻¹X′X = I.

Covariance Matrix of β̂:

Cov(β̂) = σ² (X′X)⁻¹ = σ² C

Var(β̂ⱼ) = σ² Cⱼⱼ
Cov(β̂ᵢ, β̂ⱼ) = σ² Cᵢⱼ

Standard Error:

se(β̂ⱼ) = √(σ̂² Cⱼⱼ)

Interpretation: Small se(β̂ⱼ) implies high precision. Computer output typically includes se values.