📘 Section 11.7 – Adequacy of the Regression Model 
  1. Overview
  Fitting a regression model requires verifying assumptions about error terms, model structure, and variance. Section 11.7 discusses how to examine model adequacy using residual analysis and the coefficient of determination (R²).
  2. Key Assumptions in Simple Linear Regression
  
    - Errors are uncorrelated random variables with mean zero and constant variance.
- Errors are normally distributed (for hypothesis testing and confidence intervals).
- The model is correctly specified (i.e., linear if a linear model is used).
3. Residual Analysis (Section 11.7.1)
  Residuals (eᵢ = yᵢ − ŷᵢ) help detect non-normality, non-constant variance, and model misspecification. Residuals should ideally appear as random scatter with no pattern when plotted against predicted values or x-values.
  Key Diagnostic Plots:
  
    - Residuals vs. Predicted Values (ŷᵢ) → Random scatter indicates good fit.
- Residuals vs. x-values → No pattern or curvature.
- Normal Probability Plot → Linear shape suggests normality.
- Standardized Residuals → ~95% should lie between −2 and +2.
Common Residual Patterns:
  
    - (a) Random scatter – Good fit
- (b) Funnel shape – Non-constant variance
- (c) Double bow – Unequal variance
- (d) Curved – Model misspecification
Tips:
  
    - Use transformations (e.g., √y, ln(y), 1/y) if residuals show unequal variance.
- Don’t discard outliers without investigation—they might be meaningful.
4. Example – Oxygen Purity Residuals
  Model: Å· = 74.283 + 14.947x
  
    - Normal probability plot: Residuals align along a straight line → Normality assumed.
- Residuals vs. predicted values and x-values: No visible pattern → Model is adequate.
Sample data point: At x = 1.02, y = 89.05, predicted ŷ = 89.53 → residual = −0.48
  Another example: At x = 1.55, y = 99.42, predicted ŷ = 97.45 → residual = 1.97
  5. Coefficient of Determination (R²) – Section 11.7.2
  
    Formula: R² = SSR / SST = 1 − SSE / SST
   
  
    - Measures how well the regression model explains variability in y.
- Ranges from 0 to 1. Higher R² = better model fit.
- Example: R² = 152.13 / 173.38 ≈ 0.877 → 87.7% of variation in y is explained.
Limitations & Misconceptions:
  
    - R² always increases when more variables are added – use adjusted R² for fair comparison.
- High R² ≠ good model – poor fits can still yield high R².
- R² does not imply a steep slope.
- R² does not ensure accurate future predictions.