📘 11.1 – Simple Linear Regression Summary

📈 Scatter Plot – Observing the Pattern

Let's start with a scatter plot of real data. Below is the relationship between hydrocarbon level (x) and oxygen purity (y):

From the plot, we can tell that there's a general pattern: as the hydrocarbon level increases, oxygen purity tends to increase. This suggests a potential linear relationship, even though the data does not fall perfectly on a straight line.

📌 Regression Model

Y = β₀ + β₁x + ε
E(Y|x) = β₀ + β₁x
Var(Y|x) = σ²

📘 Deriving E(Y|x) and Var(Y|x)

Let the model be:
Y = β₀ + β₁x + ε

Expected value:
E(Y|x) = E(β₀ + β₁x + ε) = β₀ + β₁x + E(ε) = β₀ + β₁x
(Assuming E(ε) = 0)

Variance:
Var(Y|x) = Var(β₀ + β₁x + ε) = Var(β₀ + β₁x) + Var(ε) = 0 + σ² = σ²
(β₀ + β₁x is constant, so its variance is 0)

🧪 Model Example

μY|x = 75 + 15x,    σ² = 2
Hydrocarbon Level (x): 1.23
Predicted Oxygen Purity (μY): 93.45%

⚠️ Warnings

🧠 Key Vocabulary (with Examples)

Term Definition Example
Scatterplot A graph that displays paired data points (x, Y). Plotting hydrocarbon level (x) vs. oxygen purity (Y) to observe a pattern.
Regression Describes how one variable (Y) depends on another (x). Modeling oxygen purity based on hydrocarbon level.
Simple Linear Regression A model with one independent variable (x). Y = β₀ + β₁x + ε
Slope (β₁) The change in Y for each unit increase in x. If β₁ = 15, then each 1% increase in x raises Y by 15%.
Intercept (β₀) The predicted value of Y when x = 0. If β₀ = 75, then Y is 75 when x = 0.
Error Term (ε) Represents variation in Y not explained by x. Noise from temperature or equipment that affects oxygen purity.