πŸ“˜ Section 11.8 – Correlation & Bivariate Normal Regression

Goal: Understand regression when both variables \(X\) and \(Y\) are random, assuming a bivariate normal model.

πŸ” Core Concepts

πŸ“Œ Key Formulas

Population / model relationships

\[ \rho \;=\; \frac{\sigma_{XY}}{\sigma_X\,\sigma_Y},\qquad E(Y\mid X=x) \;=\; \mu_Y \;+\; \rho\frac{\sigma_Y}{\sigma_X}\,(x-\mu_X), \]

\[ \operatorname{Var}(Y\mid X=x) \;=\; \sigma_Y^2(1-\rho^2),\qquad \beta_1 \;=\; \rho \frac{\sigma_Y}{\sigma_X},\qquad R^2 \;=\; \rho^2 . \]

In this special case, the simple-linear-regression line is exactly the conditional mean line of the bivariate Normal.

Sample counterparts

\[ \hat\rho \;=\; \frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}},\qquad R^2 \;=\; \hat\rho^{\,2}. \]

πŸ§ͺ Example (sketch)

Suppose \(n=25\) wire samples with length \(X\) and pull strength \(Y\) yield \( \hat\beta_1=2.903\), \( \hat\beta_0=5.115\), and \( \hat\rho=0.982\Rightarrow R^2=0.964\). Interpretation: ~96% of the variation in strength is explained by length; the linear relationship is very strong.

πŸ“‰ Visualization: Strong vs Weak Correlation

πŸ“Š Inference for Correlation

Test for zero correlation

\[ H_0:\rho=0,\quad T = \frac{\hat\rho\sqrt{n-2}}{\sqrt{1-\hat\rho^{\,2}}}\;\sim\;t_{n-2}. \] Reject \(H_0\) for large \(|T|\).

Confidence interval for \( \rho \) (Fisher’s \(z\))

\[ z=\operatorname{arctanh}(\hat\rho)=\tfrac12\ln\!\frac{1+\hat\rho}{1-\hat\rho}, \qquad z \approx N\!\Big(\operatorname{arctanh}(\rho),\,\frac1{n-3}\Big). \] A \(100(1-\alpha)\%\) CI: \[ \rho \in \left[\tanh\!\Big(z - z_{\alpha/2}\sqrt{\tfrac1{n-3}}\Big),\; \tanh\!\Big(z + z_{\alpha/2}\sqrt{\tfrac1{n-3}}\Big)\right]. \]