📘 Section 12.6 – Aspects of Multiple Regression Modeling

📊 Example: Student GPA Model

We are modeling a student's GPA based on:

Suppose the estimated regression model is:

GPA = 2.5 + 0.05x₁ + 0.1x₂ − 0.08x₃ − 0.03x₄

🔎 12.6.1 Multicollinearity

🎯 12.6.2 Use of Dummy Variables

Convert categorical variables (e.g., major = {STEM, non-STEM}) into 0/1 variables.

Example: STEM = 1 if STEM major, 0 otherwise.

GPA = 2.4 + 0.12x₁ + 0.06STEM

📀 12.6.3 Model Selection – Mallow’s Cp

Cp = SSEₚ / MSE(full) − (n − 2p)

⚙️ 12.6.4 Adjusted R²

Adjusted R² accounts for the number of predictors:

R²_adj = 1 − [(1 − R²)(n − 1)/(n − p)]

Use this to avoid overfitting – it can decrease if you add irrelevant predictors.

💡 Summary

📊 GPA Student Example Continued

Suppose we want to model a student's GPA (y) based on:

Model:

GPA = β₀ + β₁x₁ + β₂x₂ + β₃x₃ + β₄x₄ + ε

📝 Final Checklist – How to Approach Multiple Regression Carefully

  1. Fit Full Model using OLS: Y = Xβ + ε
  2. Assess Goodness of Fit: Look at R² and Adjusted R²
  3. Check Residuals: Look for nonlinearity or outliers
  4. Check Multicollinearity: Compute VIF for each predictor
  5. Model Simplification: Use Mallows’ Cp and Adjusted R²
  6. Detect Influential Observations: Use Cook’s Distance (D > 1 is a warning)
  7. Use Dummy Variables: For categorical features
  8. Report Final Model: Include coefficients, R²_adj, and diagnostics
  9. Validate: Use test data or cross-validation if possible
  10. Interpret Carefully: Emphasize practical significance, not just p-values