📘 Section 11.7 – Adequacy of the Regression Model
1. Overview
Fitting a regression model requires verifying assumptions about error terms, model structure, and variance. Section 11.7 discusses how to examine model adequacy using residual analysis and the coefficient of determination (R²).
2. Key Assumptions in Simple Linear Regression
- Errors are uncorrelated random variables with mean zero and constant variance.
- Errors are normally distributed (for hypothesis testing and confidence intervals).
- The model is correctly specified (i.e., linear if a linear model is used).
3. Residual Analysis (Section 11.7.1)
Residuals (eᵢ = yᵢ − ŷᵢ) help detect non-normality, non-constant variance, and model misspecification. Residuals should ideally appear as random scatter with no pattern when plotted against predicted values or x-values.
Key Diagnostic Plots:
- Residuals vs. Predicted Values (
ŷᵢ) → Random scatter indicates good fit.
- Residuals vs. x-values → No pattern or curvature.
- Normal Probability Plot → Linear shape suggests normality.
- Standardized Residuals → ~95% should lie between −2 and +2.
Common Residual Patterns:
- (a) Random scatter – Good fit
- (b) Funnel shape – Non-constant variance
- (c) Double bow – Unequal variance
- (d) Curved – Model misspecification
Tips:
- Use transformations (e.g., √y, ln(y), 1/y) if residuals show unequal variance.
- Don’t discard outliers without investigation—they might be meaningful.
4. Example – Oxygen Purity Residuals
Model: Å· = 74.283 + 14.947x
- Normal probability plot: Residuals align along a straight line → Normality assumed.
- Residuals vs. predicted values and x-values: No visible pattern → Model is adequate.
Sample data point: At x = 1.02, y = 89.05, predicted ŷ = 89.53 → residual = −0.48
Another example: At x = 1.55, y = 99.42, predicted ŷ = 97.45 → residual = 1.97
5. Coefficient of Determination (R²) – Section 11.7.2
Formula: R² = SSR / SST = 1 − SSE / SST
- Measures how well the regression model explains variability in y.
- Ranges from 0 to 1. Higher R² = better model fit.
- Example: R² = 152.13 / 173.38 ≈ 0.877 → 87.7% of variation in y is explained.
Limitations & Misconceptions:
- R² always increases when more variables are added – use adjusted R² for fair comparison.
- High R² ≠good model – poor fits can still yield high R².
- R² does not imply a steep slope.
- R² does not ensure accurate future predictions.