Chapter 5 . Session 5.4 — Covariance, Correlation, and OLS (EXCEL)
Data: ENGR200_Term_Project_Data_GPA • n = students. Choose a variable to compare with Statics_Grade. Download CSV
1) Explore with your data
Mean(Statics)–
          Var(Statics)–
          Std(Statics)–
        Mean()–
          Var()–
          Std()–
        Cov(, Statics)–
          Corr(, Statics)–
          OLS slope β₁–
        OLS intercept β₀–
          R² (simple OLS)–
          ρ² (should = R²)–
        2) What are Covariance and Correlation?
Covariance measures the direction of linear association. Units are the product of units of \(X\) and \(Y\).
Correlation rescales covariance to a unitless number in \([-1,1]\) indicating strength + direction of linear association.
From a joint distribution \(f_{X,Y}\): \(E[XY]=\sum_x\sum_y xy\,f_{X,Y}(x,y)\) (discrete) or \(E[XY]=\int\!\!\int xy\,f_{X,Y}(x,y)\,dx\,dy\) (continuous).
        \( \textbf{Covariance:}\quad \sigma_{XY}=\operatorname{Cov}(X,Y)
           = \mathbb{E}[(X-\mu_X)(Y-\mu_Y)] = \mathbb{E}[XY]-\mu_X\mu_Y \;\; \tag{5.15} \)
\( \textbf{Correlation:}\quad \rho_{XY} = \dfrac{\operatorname{Cov}(X,Y)}{\sqrt{V(X)\,V(Y)}} = \dfrac{\sigma_{XY}}{\sigma_X \sigma_Y} \;\; \tag{5.16} \)
\( \text{Because }\sigma_X>0,\ \sigma_Y>0,\ \text{sign}(\rho_{XY})=\text{sign}(\sigma_{XY}).\ \text{Also,}\ -1 \le \rho_{XY} \le +1. \;\; \tag{5.17} \)
      \( \textbf{Correlation:}\quad \rho_{XY} = \dfrac{\operatorname{Cov}(X,Y)}{\sqrt{V(X)\,V(Y)}} = \dfrac{\sigma_{XY}}{\sigma_X \sigma_Y} \;\; \tag{5.16} \)
\( \text{Because }\sigma_X>0,\ \sigma_Y>0,\ \text{sign}(\rho_{XY})=\text{sign}(\sigma_{XY}).\ \text{Also,}\ -1 \le \rho_{XY} \le +1. \;\; \tag{5.17} \)
Correlation simply rescales covariance by the product of standard deviations, making it dimensionless for fair comparison across different units and scales.
3) OLS vs. Correlation — what's the difference?
Correlation (ρ)
- Symmetric: Corr(X,Y) = Corr(Y,X)
- Unitless in \([-1,1]\); scale-invariant
- Summarizes strength & direction of linear association only
OLS Regression (β₀, β₁)
- Asymmetric: regress Y on X → vertical residuals minimized in Y
- Units matter: β₁ in “units of Y per unit X”
- Best-fitting line: \( \hat Y = β_0 + β_1 X\) with \( β_1 = \dfrac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}\)
- In simple regression, \(R^2 = \rho^2\)
4) Excel — Step-by-step
A) Built-in functions (fast)
- Put Statics_Grade in column A (A2:A) and the other variable in column B (B2:B).
- Means: =AVERAGE(A2:A),=AVERAGE(B2:B)
- Variance (population): =VAR.P(A2:A),=VAR.P(B2:B)
- Std (population): =STDEV.P(A2:A),=STDEV.P(B2:B)
- Covariance (population): =COVARIANCE.P(A2:A, B2:B)
- Correlation: =CORREL(A2:A, B2:B)
Use the .S versions (VAR.S, STDEV.S, COVARIANCE.S) for sample formulas.
B) By-hand (shows the math)
- Means: \( \bar x=\text{AVERAGE}(B),\ \bar y=\text{AVERAGE}(A)\)
- Deviations columns: \(x_i-\bar x\) and \(y_i-\bar y\)
- Product column: \((x_i-\bar x)(y_i-\bar y)\)
- Population covariance: average the products; sample covariance: sum/(n−1)
- Std devs from VAR/STD; then \( \rho = \text{cov}/(s_X s_Y)\)
5) Numeric table (live from data)
| Metric | Statics_Grade | |
|---|---|---|
| Mean | ||
| Variance | ||
| Std Dev | ||
| Covariance | ||
| Correlation | ||
| OLS: Statics = β₀ + β₁·X | ||
| R² vs ρ² | ||