📘 Section 12.6 – Aspects of Multiple Regression Modeling
  📊 Example: Student GPA Model
  We are modeling a student's GPA based on:
  
    - x₁: Average study hours per week
- x₂: Hours of sleep per night
- x₃: Number of courses taken
- x₄: Part-time job hours per week
Suppose the estimated regression model is:
  
    GPA = 2.5 + 0.05x₁ + 0.1x₂ − 0.08x₃ − 0.03x₄
  
 
  🔎 12.6.1 Multicollinearity
  
    - Occurs when predictor variables are highly correlated
- Can inflate standard errors of coefficients
- Check Variance Inflation Factor (VIF): VIF > 10 is problematic
 
  🎯 12.6.2 Use of Dummy Variables
  Convert categorical variables (e.g., major = {STEM, non-STEM}) into 0/1 variables.
  Example: STEM = 1 if STEM major, 0 otherwise.
  
    GPA = 2.4 + 0.12x₁ + 0.06STEM
  
 
  📀 12.6.3 Model Selection – Mallow’s Cp
  
    - Mallows' Cp is used to assess subset models
- Choose the model where Cp ≈ p (number of parameters)
    Cp = SSEₚ / MSE(full) − (n − 2p)
  
 
  ⚙️ 12.6.4 Adjusted R²
  Adjusted R² accounts for the number of predictors:
  
    R²_adj = 1 − [(1 − R²)(n − 1)/(n − p)]
  
  Use this to avoid overfitting – it can decrease if you add irrelevant predictors.
 
  💡 Summary
  
    - Check for multicollinearity using VIF
- Use dummy variables for categorical predictors
- Apply Mallows’ Cp for model selection
- Evaluate models using Adjusted R²
- Monitor influential observations with Cook’s Distance
 
  📊 GPA Student Example Continued
  Suppose we want to model a student's GPA (y) based on:
  
    - x₁ = Hours studied per week
- x₂ = Number of courses
- x₃ = Hours of sleep
- x₄ = Dummy variable for part-time job
Model:
  GPA = β₀ + β₁x₁ + β₂x₂ + β₃x₃ + β₄x₄ + ε
  
    - Run full model → check VIF
- Use Adjusted R², Cp to choose reduced model
- Examine residual plots
- Use Cook’s Distance to identify influential students
 
  📝 Final Checklist – How to Approach Multiple Regression Carefully
  
    - Fit Full Model using OLS: Y = Xβ + ε
- Assess Goodness of Fit: Look at R² and Adjusted R²
- Check Residuals: Look for nonlinearity or outliers
- Check Multicollinearity: Compute VIF for each predictor
- Model Simplification: Use Mallows’ Cp and Adjusted R²
- Detect Influential Observations: Use Cook’s Distance (D > 1 is a warning)
- Use Dummy Variables: For categorical features
- Report Final Model: Include coefficients, R²_adj, and diagnostics
- Validate: Use test data or cross-validation if possible
- Interpret Carefully: Emphasize practical significance, not just p-values