📘 Section 11.3 – Properties of the Least Squares Estimators
  📌 Model Assumptions
  We assume the simple linear regression model:
  
    Y = β₀ + β₁x + ε
    E(ε) = 0, Var(ε) = σ², ε ~ i.i.d.
  
  Then the least squares estimators:
  
    β̂₁ = Sxy / Sxx
    β̂₀ = ȳ − β̂₁ x̄
  
 
  ✅ Unbiasedness of Estimators
  
    E(β̂₁) = β₁
    E(β̂₀) = β₀
  
  Therefore, both β̂₀ and β̂₁ are unbiased estimators of the true coefficients.
 
  📐 Variance of Estimators
  
    Var(β̂₁) = σ² / Sxx
    Var(β̂₀) = σ² × (1/n + x̄² / Sxx)
  
 
  📊 Covariance Between Estimators
  
    Cov(β̂₀, β̂₁) = −σ² × x̄ / Sxx
  
 
  🧮 Estimated Standard Errors
  If σ² is unknown, we estimate it by:
  
    σ̂² = SSE / (n − 2)
  
  Then the estimated standard errors are:
  
    se(β̂₁) = √(σ̂² / Sxx)
    se(β̂₀) = √[σ̂² × (1/n + x̄² / Sxx)]
  
 
  📝 Notes
  
    - Sxx = Σ(xᵢ − x̄)² = Σxᵢ² − (Σxᵢ)² / n
- Sxy = Σ(xᵢ − x̄)(yᵢ − ȳ)
- SSE = Σ(yᵢ − ŷᵢ)² = SST − SSR
 
  🧠 What Students Should Know (No Book Needed)
  
    - Why do we use least squares?
 It gives us the "best-fitting" line by minimizing how far the predictions are from the actual data points (squared distance).
- What does β̂₁ mean?
 It's the slope. It tells us how much y increases when x increases by 1. If β̂₁ = 2, it means for every extra hour studied, score increases by 2 points.
- What does β̂₀ mean?
 It's the intercept. It’s the predicted value of y when x = 0. Sometimes it has no practical meaning, but it's still needed to draw the line.
- Why do we care about unbiasedness?
 If an estimator is unbiased, it means on average it gives us the true value. That’s what we want when we use data to learn!
- Why do we care about variance?
 A lower variance means your estimator is more stable — less affected by random data noise. It’s like a more “confident” estimate.
- What’s the role of σ̂²?
 It estimates how noisy your data is. If σ̂² is big, your predictions will be less accurate. We use it to calculate standard errors and test hypotheses.
- How does Sxx affect our results?
 Sxx measures the spread of your x values. More spread = more reliable slope estimates (smaller standard error).
- In short:
 All these properties help us trust the regression model. They tell us whether the line we drew is meaningful, stable, and worth interpreting.
 
  📋 Full Regression Output with Math Details
  Once you click "Update Graph", the following section will compute and display:
  
    - β̂₁ (slope), β̂₀ (intercept)
- SST, SSR, SSE
- σ̂² (estimated variance), se(β̂₁), se(β̂₀)
- t-test for slope, F-test, R²
- ANOVA table
 
  📘 What Do These Numbers Mean? (With Full Math)
  
    - β̂₁ (slope): 
      β̂₁ = Sxy / Sxx, where Sxy = Σ(xᵢ − x̄)(yᵢ − ȳ), 
      and Sxx = Σ(xᵢ − x̄)². This tells us how much y changes when x increases by 1.
    
- β̂₀ (intercept): 
      β̂₀ = ȳ − β̂₁·x̄. This is the predicted value of y when x = 0.
    
- SST (Total Sum of Squares): 
      SST = Σ(yᵢ − ȳ)². It measures the total variation in y.
    
- SSR (Regression Sum of Squares): 
      SSR = Σ(ŷᵢ − ȳ)² = β̂₁·Sxy. It measures how much of the variation is explained by the model.
    
- SSE (Error Sum of Squares): 
      SSE = Σ(yᵢ − ŷᵢ)² = SST − SSR. It measures the remaining unexplained variation.
    
- σ̂² (Estimated Error Variance): 
      σ̂² = SSE / (n − 2). This estimates how spread out the errors are.
    
- Standard Errors: 
      se(β̂₁) = √(σ̂² / Sxx), se(β̂₀) = √[σ̂²·(1/n + x̄² / Sxx)].
      These show how precise the estimates of β̂₀ and β̂₁ are.
    
- t-Test for β̂₁: 
      t = β̂₁ / se(β̂₁). Used to test if the slope is significantly different from 0.
    
- F-Test: 
      F = SSR / σ̂². Used to test if the model as a whole is significant.
    
- R² (R-squared): 
      R² = SSR / SST. It shows what proportion of the variation in y is explained by x.