📘 Sections 11.2s– 11.6 Simple Linear Regression

🎯 What Is Simple Linear Regression?

Simple linear regression helps us understand how a dependent variable Y is related to an independent variable x. It finds the best-fitting straight line to predict Y from x.

Common Uses:

Predicting house prices from size or location
Forecasting sales based on advertising budget
Estimating student test scores from hours studied
Example: In a baseball team, you want to predict player performance (e.g., batting average) based on training hours. If more training leads to better results, a linear trend may exist.

📘 Explanation of Regression Steps

Step 1: Compute Sums
We calculate the total sums needed for later formulas. These are the building blocks of the regression equation.
Step 2: Compute Sxx and Sxy
Sxx measures the variation in x values.
Sxy captures the relationship between x and y. These help us find the slope of the line.
Step 3: Compute Coefficients
The slope (β̂₁) shows how much y changes for each unit increase in x. The intercept (β̂₀) is the expected value of y when x = 0.
These form the regression line: ŷ = β̂₀ + β̂₁x
Step 4: Compute Variance Measures
SST (Total Sum of Squares): Total variation in y values.
SSR (Regression Sum of Squares): Variation explained by the model.
SSE (Error Sum of Squares): Unexplained variation.
σ̂²: Estimated variance of errors (how much prediction typically deviates from actual data).
R² (R-squared)
The proportion of the variation in y that is explained by the regression model. Closer to 1 means a better fit.

📏 Confidence Intervals: Slope, Intercept, and Mean Response

Confidence Interval for Slope β̂₁:
CI = β̂₁ ± t_{α/2, n−2} × √(σ̂² / Sxx)
This interval tells us where the true slope might lie. If it doesn't include 0, the relationship is statistically significant.
Confidence Interval for Intercept β̂₀:
CI = β̂₀ ± t_{α/2, n−2} × √(σ̂² × (1/n + x̄² / Sxx))
This shows the possible range for the intercept.
Confidence Interval for Mean Response at x₀:
Let x₀ be a chosen input. Then
CI for μ_Y|x₀ = ŷ(x₀) ± t_{α/2, n−2} × √[σ̂² × (1/n + (x₀ − x̄)² / Sxx)]
This estimates the average y value at a specific x.

These intervals use the t-distribution with n − 2 degrees of freedom. Try choosing an x₀ and use the regression results to compute CI for mean prediction!

📉 Example: 95% Confidence Intervals for Coefficients

Based on the example where:

n = 8
β̂₁ = 1.9434, Sxx = 19.875
β̂₀ = 82.88679, x̄ = 4.625
σ̂² = 0.30189, df = n − 2 = 6
t_0.025,6 = 2.447 (from t-table)

Then we compute:

se(β̂₁) = √(σ̂² / Sxx) = √(0.30189 / 19.875) = 0.12325
95% CI for β̂₁ = 1.94340 ± 2.447 × 0.12325 = [1.64182, 2.24498]

se(β̂₀) = √[σ̂² × (1/n + x̄² / Sxx)] = √[0.30189 × (1/8 + 4.6250² / 19.875)] = 0.60220
95% CI for β̂₀ = 82.88679 ± 2.447 × 0.60220 = [81.41320, 84.36038]

✅ Interpretation: These intervals tell us the range of plausible values for the true slope and intercept with 95% confidence. If the slope CI does not include 0, it supports a statistically significant linear relationship.

🔮 Interactive Prediction Interval for a New Observation

This section computes a 95% prediction interval for a new single future observation at a specific x₀ using the fitted regression line.

It differs from the confidence interval for the mean response by including extra uncertainty from a new observation.

Prediction Interval = ŷ(x₀) ± t_{α/2, n−2} × √[σ̂² × (1 + 1/n + (x₀ − x̄)² / Sxx)]

Enter x₀ below:

📘 11.6 – Prediction of New Observations

In regression analysis, it's important not only to estimate the average response at a given value of the predictor variable (i.e., a confidence interval for the mean response), but also to predict an actual new observation. This is where prediction intervals come in. They provide a range within which we expect a new, single observation to fall, given a value of x₀.

A prediction interval is always wider than a confidence interval because it includes both:

Uncertainty in the regression model (like CI),
And variability of future individual outcomes.

This makes it useful when you want to predict what will actually happen next time you measure Y at a given x₀.

Mathematical Formula for 95% Prediction Interval:

ŷ₀ ± t_{α/2, n−2} × √[ σ̂² × (1 + 1/n + (x₀ − x̄)² / S_xx) ]

✅ Example – Oxygen Purity

Suppose we have a regression model based on 20 observations. The fitted regression line is:

ŷ = 74.283 + 14.947x

We want to predict the next value of oxygen purity at hydrocarbon level x₀ = 1.00%.

Using the formula above:

ŷ₀ = 74.283 + 14.947 × 1.00 = 89.23
σ̂² = 1.18, n = 20, x̄ = 1.1960, S_xx = 0.68088
t_{0.025, 18} = 2.101

Then the margin of error is:

2.101 × √[ 1.18 × (1 + 1/20 + (1.00 − 1.1960)² / 0.68088) ] ≈ 2.40

🔮 So the 95% prediction interval is: 86.83 ≤ Y₀ ≤ 91.63

This tells us we are 95% confident the next observed purity level will fall in this range.

📘 Sections 11.2 – 11.6 Simple Linear Regression

🎯 What Is Simple Linear Regression?

📌 Model Structure

📐 Objective: Minimize Squared Errors

🧮 Derivation: Normal Equations

✅ Final Least Squares Estimates

📏 Definitions

📊 Baseball Training Data