📘 Sessions 12.1.3-12.1.4 - – Matrix Form of the Multiple Linear Regression Model

📌 Purpose of Matrix Representation

In multiple linear regression (MLR), writing out the equations becomes cumbersome with more predictors. The matrix form provides a compact, efficient way to represent and compute the model, especially useful for software tools like Excel, R, or Python.

🧮 General MLR Model in Matrix Notation

Y = Xβ + ε

🧾 Component Breakdown

Y vector

Y = [ y₁ y₂ ... yₙ ]

X matrix

X = [ 1 x₁₁ x₁₂ ... x₁k ... 1 xₙ₁ xₙ₂ ... xₙk ]

β vector

β = [ β₀ β₁ ... βₖ ]

ε vector

ε = [ ε₁ ε₂ ... εₙ ]

🔢 Least Squares Estimator

The least squares estimator minimizes the residual sum of squares:

β̂ = (XᵀX)−1XᵀY

🎯 Model: Student GPA

We model GPA as a function of study hours (x₁), sleep hours (x₂), and attendance (x₃):

Y = Xβ + ε

📊 Student Data

StudentGPA (Y)Study (x₁)Sleep (x₂)Attendance (x₃)
13.515740
23.820842
32.910635
43.2126.538
53.7187.541

🧮 Matrix Setup

Design Matrix X (5×4):

X =
[1  15  7.0   40]
[1  20  8.0   42]
[1  10  6.0   35]
[1  12  6.5   38]
[1  18  7.5   41]

Response Vector Y (5×1):

Y = [3.5 3.8 2.9 3.2 3.7]

🔢 Step-by-Step Least Squares

1. Compute XᵀX

[5   75   35.0   196]
[75 1173  546.5 2988]
[35 546.5 257.5 1391]
[196 2988 1391 11510]

2. Compute XᵀY

[17.1]
[262.2]
[122.1]
[672.4]

3. Compute β̂ = (XᵀX)⁻¹ XᵀY

β̂ = [1.256 0.082 0.291 0.014]

Final model: GPA = 1.256 + 0.082×Study + 0.291×Sleep + 0.014×Attendance

✅ Interpretation

📐 Section 12.1.4 – Properties of Least Squares Estimators

Assumptions: Errors εᵢ are independent, E(εᵢ) = 0, and Var(εᵢ) = σ².

Unbiasedness:

E[β̂] = E[(X′X)−1X′Y] = β

Because E(ε) = 0 and (X′X)−1X′X = I.

Covariance Matrix of β̂:

Cov(β̂) = σ² (X′X)−1 = σ² C

Standard Error:

se(β̂ⱼ) = √(σ̂² Cⱼⱼ)

Interpretation: Small se(β̂ⱼ) implies high precision. Computer output typically includes se values.