📘 Section 12.1.1.3 – Matrix Form of the Multiple Linear Regression Model
📌 Purpose of Matrix Representation
In multiple linear regression (MLR), writing out the equations becomes cumbersome with more predictors. The matrix form provides a compact, efficient way to represent and compute the model, especially useful for software tools like Excel, R, or Python.
🧮 General MLR Model in Matrix Notation
Y = Xβ + ε
  - Y: n×1 response vector (e.g., GPA)
- X: n×(k+1) matrix of predictors (including intercept)
- β: (k+1)×1 vector of regression coefficients
- ε: n×1 vector of errors, E(ε) = 0, Var(ε) = σ²I
🧾 Component Breakdown
Y vector
Y = [ y₁ y₂ ... yₙ ]
X matrix
X = [ 1 x₁₁ x₁₂ ... x₁k ... 1 xₙ₁ xₙ₂ ... xₙk ]
β vector
β = [ β₀ β₁ ... βₖ ]
ε vector
ε = [ ε₁ ε₂ ... εₙ ]
🔢 Least Squares Estimator
The least squares estimator minimizes the residual sum of squares:
β̂ = (XᵀX)−1XᵀY
  🎯 Model: Student GPA
  We model GPA as a function of study hours (x₁), sleep hours (x₂), and attendance (x₃):
  Y = Xβ + ε
 
  📊 Student Data
  
    | Student | GPA (Y) | Study (x₁) | Sleep (x₂) | Attendance (x₃) | 
|---|
    | 1 | 3.5 | 15 | 7 | 40 | 
    | 2 | 3.8 | 20 | 8 | 42 | 
    | 3 | 2.9 | 10 | 6 | 35 | 
    | 4 | 3.2 | 12 | 6.5 | 38 | 
    | 5 | 3.7 | 18 | 7.5 | 41 | 
  
 
  🧮 Matrix Setup
  X =
[1  15  7.0   40]
[1  20  8.0   42]
[1  10  6.0   35]
[1  12  6.5   38]
[1  18  7.5   41]
  Y = [3.5 3.8 2.9 3.2 3.7]
 
  🔢 Step-by-Step Least Squares
  1. Compute XᵀX
  [5   75   35.0   196]
[75 1173  546.5 2988]
[35 546.5 257.5 1391]
[196 2988 1391 11510]
  2. Compute XᵀY
  [17.1]
[262.2]
[122.1]
[672.4]
  3. Compute β̂ = (XᵀX)⁻¹ XᵀY
  β̂ = [1.256 0.082 0.291 0.014]
  Final model: GPA = 1.256 + 0.082×Study + 0.291×Sleep + 0.014×Attendance
 
  ✅ Interpretation
  
    - Each hour of study improves GPA by ~0.082 points
- Each hour of sleep adds ~0.291 points
- Each class attended adds ~0.014 points
 
  📐 Section 12.1.4 – Properties of Least Squares Estimators
  Unbiasedness: E(β̂) = β
  Covariance Matrix: Cov(β̂) = σ² (XᵀX)−1
  
    - Diagonal → variances: Var(β̂ⱼ) = σ² Cⱼⱼ
- Off-diagonal → covariances: Cov(β̂ᵢ, β̂ⱼ) = σ² Cᵢⱼ
Standard Errors: se(β̂ⱼ) = √(σ̂² Cⱼⱼ)
 
  🧪 Section 12.2 – Hypothesis Tests for Regression Coefficients
  We test:
  H₀: βⱼ = 0 vs H₁: βⱼ ≠ 0
  t-statistic:
  t = β̂ⱼ / se(β̂ⱼ)
  Compare with tα/2, n−k−1. If |t| > tα/2, reject H₀.
  This helps test whether each predictor (e.g., study, sleep, attendance) significantly affects GPA.
  F-Test: Global Significance
  We test:
  H₀: β₁ = β₂ = β₃ = 0 vs H₁: At least one βⱼ ≠ 0
  F-statistic:
  F = (SSR/k) / (SSE/(n−k−1))
  
    - SSR = Regression sum of squares
- SSE = Error sum of squares
- k = number of predictors
If F > Fα, k, n−k−1, reject H₀ → model is significant.