📘 Section 11.7 – Adequacy of the Regression Model

1. Overview

Fitting a regression model requires verifying assumptions about error terms, model structure, and variance. Section 11.7 discusses how to examine model adequacy using residual analysis and the coefficient of determination (R²).

2. Key Assumptions in Simple Linear Regression

Errors are uncorrelated random variables with mean zero and constant variance.
Errors are normally distributed (for hypothesis testing and confidence intervals).
The model is correctly specified (i.e., linear if a linear model is used).

3. Residual Analysis (Section 11.7.1)

Residuals (eᵢ = yᵢ − ŷᵢ) help detect non-normality, non-constant variance, and model misspecification. Residuals should ideally appear as random scatter with no pattern when plotted against predicted values or x-values.

Key Diagnostic Plots:

Residuals vs. Predicted Values (ŷᵢ) → Random scatter indicates good fit.
Residuals vs. x-values → No pattern or curvature.
Normal Probability Plot → Linear shape suggests normality.
Standardized Residuals → ~95% should lie between −2 and +2.

Common Residual Patterns:

(a) Random scatter – Good fit
(b) Funnel shape – Non-constant variance
(c) Double bow – Unequal variance
(d) Curved – Model misspecification

Tips:

Use transformations (e.g., √y, ln(y), 1/y) if residuals show unequal variance.
Don’t discard outliers without investigation—they might be meaningful.

4. Example – Oxygen Purity Residuals

Model: ŷ = 74.283 + 14.947x

Normal probability plot: Residuals align along a straight line → Normality assumed.
Residuals vs. predicted values and x-values: No visible pattern → Model is adequate.

Sample data point: At x = 1.02, y = 89.05, predicted ŷ = 89.53 → residual = −0.48

Another example: At x = 1.55, y = 99.42, predicted ŷ = 97.45 → residual = 1.97

5. Coefficient of Determination (R²) – Section 11.7.2

Formula: R² = SSR / SST = 1 − SSE / SST

Measures how well the regression model explains variability in y.
Ranges from 0 to 1. Higher R² = better model fit.
Example: R² = 152.13 / 173.38 ≈ 0.877 → 87.7% of variation in y is explained.

Limitations & Misconceptions:

R² always increases when more variables are added – use adjusted R² for fair comparison.
High R² ≠ good model – poor fits can still yield high R².
R² does not imply a steep slope.
R² does not ensure accurate future predictions.