📘 Section 11.3 – Math Properties of Least Squares Estimators

📌 Model Assumptions

We assume the simple linear regression model:

Y = β₀ + β₁x + ε E(ε) = 0, Var(ε) = σ², ε ~ i.i.d.

Then the least squares estimators:

β̂₁ = Sxy / Sxx β̂₀ = ȳ − β̂₁ x̄

E(β̂₁) = β₁ E(β̂₀) = β₀

Therefore, both β̂₀ and β̂₁ are unbiased estimators of the true coefficients.

Var(β̂₁) = σ² / Sxx Var(β̂₀) = σ² × (1/n + x̄² / Sxx)

Cov(β̂₀, β̂₁) = −σ² × x̄ / Sxx

If σ² is unknown, we estimate it by:

σ̂² = SSE / (n − 2)

Then the estimated standard errors are:

se(β̂₁) = √(σ̂² / Sxx) se(β̂₀) = √[σ̂² × (1/n + x̄² / Sxx)]

Why do we use least squares?
It gives us the "best-fitting" line by minimizing how far the predictions are from the actual data points (squared distance).
What does β̂₁ mean?
It's the slope. It tells us how much y increases when x increases by 1. If β̂₁ = 2, it means for every extra hour studied, score increases by 2 points.
What does β̂₀ mean?
It's the intercept. It’s the predicted value of y when x = 0. Sometimes it has no practical meaning, but it's still needed to draw the line.
Why do we care about unbiasedness?
If an estimator is unbiased, it means on average it gives us the true value. That’s what we want when we use data to learn!
Why do we care about variance?
A lower variance means your estimator is more stable — less affected by random data noise. It’s like a more “confident” estimate.
What’s the role of σ̂²?
It estimates how noisy your data is. If σ̂² is big, your predictions will be less accurate. We use it to calculate standard errors and test hypotheses.
How does Sxx affect our results?
Sxx measures the spread of your x values. More spread = more reliable slope estimates (smaller standard error).
In short:
All these properties help us trust the regression model. They tell us whether the line we drew is meaningful, stable, and worth interpreting.

Edit the data below (comma-separated) and see how the regression line changes!

x-values:
y-values:

Question: If the slope β̂₁ = 3, what does it mean in the context of predicting salary from years of experience?

Once you click "Update Graph", the following section will compute and display:

β̂₁ (slope): β̂₁ = Sxy / Sxx, where Sxy = Σ(xᵢ − x̄)(yᵢ − ȳ), and Sxx = Σ(xᵢ − x̄)². This tells us how much y changes when x increases by 1.
β̂₀ (intercept): β̂₀ = ȳ − β̂₁·x̄. This is the predicted value of y when x = 0.
SST (Total Sum of Squares): SST = Σ(yᵢ − ȳ)². It measures the total variation in y.
SSR (Regression Sum of Squares): SSR = Σ(ŷᵢ − ȳ)² = β̂₁·Sxy. It measures how much of the variation is explained by the model.
SSE (Error Sum of Squares): SSE = Σ(yᵢ − ŷᵢ)² = SST − SSR. It measures the remaining unexplained variation.
σ̂² (Estimated Error Variance): σ̂² = SSE / (n − 2). This estimates how spread out the errors are.
Standard Errors: se(β̂₁) = √(σ̂² / Sxx), se(β̂₀) = √[σ̂²·(1/n + x̄² / Sxx)]. These show how precise the estimates of β̂₀ and β̂₁ are.
t-Test for β̂₁: t = β̂₁ / se(β̂₁). Used to test if the slope is significantly different from 0.
F-Test: F = SSR / σ̂². Used to test if the model as a whole is significant.
R² (R-squared): R² = SSR / SST. It shows what proportion of the variation in y is explained by x.