📘 Chapter 12: Multiple Linear Regression
Most useful topics in Chapter 12:
Session 12.2 (and its Excel ICE) — reading MLR output (β, SE, t, p, F, R², Adjusted R²),
using VIF to spot multicollinearity, and checking residual plots, leverage, and Cook’s distance.
📅 Session Plan (What We Will Do in Class)
We will walk through these two ICE apps step by step in class. All other HTML apps,
slides, and notes below are optional support materials.
🎮 Interactive Apps (HTML/JS)
-
Excel ICE (GPA, functions only) — β, SE, t, p, CI/PI, VIF
ICE
After using this ICE in Excel, read
12.2 – Hypothesis Tests in Multiple Regression
for the theory version of the same output.
-
Excel ToolPak ICE (StudyHW/TutorHours/ExamPrep → FinalGrade)
ICE
This ToolPak ICE produces the ANOVA and t/F tests. See
12.2 – Hypothesis Tests in Multiple Regression
to understand the formulas behind those numbers.
- 12.1.1 – Multiple Linear Regression Intro
- 12.1.2 – Least Squares Estimation
- 12.1.3 – Matrix Form of the Multiple Linear Regression Model
- 12.2 – Hypothesis Tests in Multiple Regression
- 12.3 – Confidence Intervals & 12.4 – Model Utility
- 12.5 – Model Adequacy Checking (Residual, Leverage, Cook)
- 12.6 – Aspects of Multiple Regression Modeling (VIF, Dummy, Cp, Adj R²)
📚 Homework
Everyone uses the same dataset (student stress study). You will fit a simple MLR model
with two predictors, compute correlation and VIF, and write a short conclusion using the template.
💬 Excel
-
Chapter 12 — Excel ICE (GPA): compute β, SE, t, p, CI/PI, VIF
IN-CLASS ICE
For the theory behind these numbers (F-test, t-tests, sums of squares), see
12.2 – Hypothesis Tests in Multiple Regression
.
-
Chapter 12 - Excel for Multiple Linear Regression (MLR), VIF, Cook’s Distance, and Adjusted R²
📢 Student Q&A
Part 1 – Student perspective (MLR basics)
- Q1: How do I interpret a coefficient in MLR?
A: It shows the expected change in Y for a 1-unit increase in that X, holding other X’s constant.
- Q2: What does the intercept mean in a regression model?
A: It is the predicted value of Y when all X’s are 0. Sometimes this is meaningful, sometimes it’s just a mathematical anchor.
- Q3: Why do we keep saying “holding other X’s constant”?
A: In multiple regression, each slope is a partial effect. We change one X at a time while pretending the others stay fixed, so we don’t mix their effects together.
- Q4: What is a “response” variable and what are “predictors”?
A: The response (Y) is what we want to predict or explain (e.g., FinalGrade). Predictors (X’s) are the inputs we use to predict Y (e.g., StudyHW, SleepHours).
- Q5: What is R² in simple terms?
A: R² is the fraction of the variation in Y explained by the model. Example: R² = 0.70 means the model explains about 70% of the variation in Y.
- Q6: What is the difference between R² and Adjusted R²?
A: R² never goes down when you add predictors. Adjusted R² penalizes extra predictors and can go down if a new X doesn’t help much. Use Adjusted R² to compare models with different numbers of predictors.
- Q7: What does a p-value for a slope tell me?
A: It measures how strong the evidence is that the true slope is not 0. A small p (like < 0.05) suggests that X is useful for explaining Y in that model.
- Q8: What does the overall F-test check?
A: It tests “all slopes = 0” vs “at least one slope ≠ 0.” A small F p-value says the model, as a whole, explains Y better than using just the mean.
- Q9: What is a residual?
A: Residual = actual − predicted (e = y − ŷ). It’s how far off the model was for that observation.
- Q10: What should a good residual plot look like?
A: A random cloud around 0, with no clear pattern, curve, or funnel shape. That suggests the linear model and constant spread assumptions are reasonable.
- Q11: What is multicollinearity in plain language?
A: It means that two or more predictors are highly correlated with each other, so the model has trouble deciding which one gets credit for explaining Y.
- Q12: What is VIF and how do I read it?
A: VIF (Variance Inflation Factor) tells how much the variance of a slope is blown up by collinearity. Rough guide: VIF ≈ 1–2 no problem, 2–5 noticeable, >5 keep an eye on it, >10 serious concern.
- Q13: What if VIF is high?
A: It suggests multicollinearity. You might drop one of two very similar X’s, combine them into an index, or collect more data. High VIF does not ruin predictions but makes slopes unstable and hard to interpret.
- Q14: What is the difference between correlation and regression?
A: Correlation is a single number that measures linear association between two variables. Regression fits an equation that predicts Y from one or more X’s and separates the effects of multiple predictors.
- Q15: Does a high R² or high correlation mean X causes Y?
A: No. Regression and correlation show association, not causation. To argue causality you need a good research design (experiments, controls, time ordering, etc.).
- Q16: Should I always drop a predictor with p > 0.05?
A: Not automatically. That variable might still be important conceptually, or jointly significant with others. Look at theory, VIF, and how the whole model behaves, not just one p-value.
- Q17: What is a dummy variable?
A: A dummy (indicator) variable is coded 0/1 to represent categories such as gender (0 = male, 1 = female) or major (0 = non-engineering, 1 = engineering).
- Q18: How do I interpret the coefficient on a dummy variable?
A: It is the average difference in Y between the group coded 1 and the baseline group coded 0, holding other X’s constant.
- Q19: What is an interaction term and when do I need one?
A: An interaction term is a product like X1*X2. You use it when the effect of one predictor depends on the level of another (e.g., extra study hours help more for students who attend class regularly).
- Q20: What is the difference between a confidence interval and a prediction interval?
A: A confidence interval is for the mean response at a given X. A prediction interval is for an individual observation and is wider because it includes random noise for that one student.
Part 2 – Advanced: common OLS mistakes in research
- Q21: Biggest mistake #1 — is “significant” the same as “causal”?
A: No. A small p-value in OLS shows an association, not a causal effect. Without a good identification strategy (experiment, natural experiment, IV, etc.), calling it “impact” or “effect” is usually too strong.
- Q22: What is omitted variable bias?
A: It happens when an important variable that affects both X and Y is left out of the model. The slopes on the included X’s can be badly biased and misleading.
- Q23: Is it safe to rely only on “starred” coefficients and ignore the rest?
A: Not really. Some variables may not be individually significant but are jointly important. Also, focus on effect sizes, confidence intervals, and theory—not only on stars.
- Q24: What is “kitchen sink” regression and why is it a problem?
A: It’s when you throw every possible variable into the model. With limited n, this can create overfitting, high VIFs, unstable coefficients, and p-values that bounce around across specifications.
- Q25: Why is stepwise selection based only on p-values risky?
Stepwise methods chase noise in the sample and ignore theory. They can inflate Type I error, produce biased coefficients, and give models that don’t replicate in new data.
- Q26: What happens if I ignore nonlinearity?
A: If the relationship is curved but you force a straight line, predictions and slopes can be wrong. Often you need transformations (like log) or polynomial/interaction terms.
- Q27: How can high multicollinearity mislead researchers?
A: With high VIF, individual p-values may look large even when the predictors are important jointly. Signs can flip when you add/drop a variable. Over-interpreting individual slopes in that situation is a common mistake.
- Q28: What if I ignore clustered or repeated-measures data?
A: If observations are not independent (e.g., students inside schools, same firm over time) and you still use plain OLS SEs, the p-values can be far too optimistic. Clustered or robust SEs (or panel methods) are needed.
- Q29: Why is it dangerous to interpret log coefficients without thinking?
A: In log models, slopes are elasticities or semi-elasticities (percent changes, not raw units). A common error is to read them as “X units” instead of “X percent,” or to misinterpret log–log vs log–level forms.
- Q30: What is p-hacking in the context of OLS?
A: Running many regressions and only reporting the ones that “work” (p < 0.05), often without saying how many models were tried. This inflates false positives and makes results look stronger than they are.
- Q31: Why should researchers always check residual and influence diagnostics?
A: Ignoring residual plots, leverage, and Cook’s D can hide model misfit or a few influential points driving the entire result. Robust work always reports basic diagnostics.
- Q32: Is a big sample size a free pass to ignore model assumptions?
A: No. A large n can make tiny, unimportant effects statistically significant, and systematic violations (wrong functional form, omitted variables) don’t disappear just because n is large.