📘 Sections 11.2 – 11.6 Simple Linear Regression

🎯 What Is Simple Linear Regression?

Simple linear regression helps us understand how a dependent variable Y is related to an independent variable x. It finds the best-fitting straight line to predict Y from x.

Common Uses:

📌 Model Structure

Y = β₀ + β₁x + ε

📐 Objective: Minimize Squared Errors

L = Σ (yᵢ − β₀ − β₁xᵢ)²

This is called the least squares criterion. We choose β₀ and β₁ to minimize this function.

🧮 Derivation: Normal Equations

∂L/∂β₀ = −2 Σ (yᵢ − β₀ − β₁xᵢ) = 0 ∂L/∂β₁ = −2 Σ (yᵢ − β₀ − β₁xᵢ) xᵢ = 0

Solving these yields the normal equations:

nβ₀ + β₁ Σxᵢ = Σyᵢ β₀ Σxᵢ + β₁ Σxᵢ² = Σxᵢyᵢ

✅ Final Least Squares Estimates

β̂₁ = [ Σxᵢyᵢ − (Σxᵢ)(Σyᵢ)/n ] / [ Σxᵢ² − (Σxᵢ)²/n ] = Sxy / Sxx β̂₀ = ȳ − β̂₁ * x̄

📏 Definitions

x̄ = (1/n) Σxᵢ,  ȳ = (1/n) Σyᵢ Sxx = Σ(xᵢ − x̄)² = Σxᵢ² − (Σxᵢ)² / n Sxy = Σ(xᵢ − x̄)(yᵢ − ȳ) = Σxᵢyᵢ − (Σxᵢ)(Σyᵢ)/n

📊 Baseball Training Data

x: Weekly training hours
y: Batting average

📈 Output: Step-by-Step Calculations

📘 Explanation of Regression Steps

📏 Confidence Intervals: Slope, Intercept, and Mean Response

These intervals use the t-distribution with n − 2 degrees of freedom. Try choosing an x₀ and use the regression results to compute CI for mean prediction!

📉 Example: 95% Confidence Intervals for Coefficients

Based on the example where:

Then we compute:

✅ Interpretation: These intervals tell us the range of plausible values for the true slope and intercept with 95% confidence. If the slope CI does not include 0, it supports a statistically significant linear relationship.

📏 Interactive Confidence Interval for Mean Response

This tool computes a 95% confidence interval for the average predicted y-value at a user-defined x₀ using the regression line.

🔮 Interactive Prediction Interval for a New Observation

This section computes a 95% prediction interval for a new single future observation at a specific x₀ using the fitted regression line.

It differs from the confidence interval for the mean response by including extra uncertainty from a new observation.

Prediction Interval = ŷ(x₀) ± tα/2, n−2 × √[σ̂² × (1 + 1/n + (x₀ − x̄)² / Sxx)]

Enter x₀ below:

📘 11.6 – Prediction of New Observations

In regression analysis, it's important not only to estimate the average response at a given value of the predictor variable (i.e., a confidence interval for the mean response), but also to predict an actual new observation. This is where prediction intervals come in. They provide a range within which we expect a new, single observation to fall, given a value of x₀.

A prediction interval is always wider than a confidence interval because it includes both:

This makes it useful when you want to predict what will actually happen next time you measure Y at a given x₀.

Mathematical Formula for 95% Prediction Interval:

ŷ₀ ± tα/2, n−2 × √[ σ̂² × (1 + 1/n + (x₀ − x̄)² / Sxx) ]
  

✅ Example – Oxygen Purity

Suppose we have a regression model based on 20 observations. The fitted regression line is:

ŷ = 74.283 + 14.947x

We want to predict the next value of oxygen purity at hydrocarbon level x₀ = 1.00%.

Using the formula above:

Then the margin of error is:

2.101 × √[ 1.18 × (1 + 1/20 + (1.00 − 1.1960)² / 0.68088) ] ≈ 2.40
  

🔮 So the 95% prediction interval is: 86.83 ≤ Y₀ ≤ 91.63

This tells us we are 95% confident the next observed purity level will fall in this range.