Chapter 12 — Homework: Study & Sleep vs Next-Day Grade
Use a small data set of 23 students to practice multiple linear regression,
correlation, and VIF. You may use either
Excel Analysis ToolPak or the Excel functions from our class ICE apps.
Model for this homework:
NextDayGrade = β₀ + β₁·StudyHours + β₂·SleepHours + ε
A. Data: Study, Sleep, and Next-Day Grade
The night before a short quiz, 23 students recorded how many hours they studied
and how many hours they slept. The next day they took the quiz, and we recorded
their NextDayGrade (0–100 scale).
StudyHours — hours of study the evening before
SleepHours — hours of sleep that night
NextDayGrade — quiz grade the next day
n = 23 students
Copy the dataset into Excel (use it exactly as given, no editing):
Hint: you can use either the functions-only ICE (GPA) or the
ToolPak ICE (Study/Tutor/Prep → FinalGrade) as a pattern for what outputs
to report and how to interpret them.
B. Task 1 — Fit the multiple regression model
Use Excel (ToolPak or functions) to fit the model
NextDayGrade = β₀ + β₁·StudyHours + β₂·SleepHours + ε.
Run a multiple linear regression of NextDayGrade on
StudyHours and SleepHours.
Write down the estimated equation using your numbers:
NextDayGradê = b₀ + b₁·StudyHours + b₂·SleepHours
Report:
R² and Adjusted R²
Standard error of the regression (Root MSE)
For each slope (StudyHours, SleepHours): estimate, standard error,
t, and p-value
The overall F statistic and its p-value
In words, interpret:
What does b₁ (StudyHours) mean in this context?
What does b₂ (SleepHours) mean in this context?
According to the F-test, are the predictors useful as a group?
Need a hint? Regression in Excel (ToolPak or functions)
Option 1 — ToolPak (quick):
Put the data in three columns with headers:
StudyHours, SleepHours,
NextDayGrade.
Data → Data Analysis → Regression.
Input Y Range: the NextDayGrade column (including header).
Input X Range: the StudyHours and SleepHours columns (including headers).
Check Labels, choose an output range or new sheet, click OK.
Use the Coefficients table for b₀, b₁, b₂, and the ANOVA / R² section for
R², F, etc.
Option 2 — Functions (matches the GPA ICE):
Build an X matrix with an intercept column (all 1’s), StudyHours, SleepHours.
Use MMULT and MINVERSE as in the
class ICE to compute β̂ = (XᵀX)⁻¹XᵀY (you can literally copy the GPA ICE block and
just change the ranges to StudyHours, SleepHours, NextDayGrade).
Compute fitted values, residuals, SSE, df, MSE, Root MSE, SE(β̂), t, p using the
same pattern as the ICE.
C. Task 2 — Correlation among Study, Sleep, and Grade
Use Excel (ToolPak correlation or CORREL) to compute the
3×3 correlation matrix for:
StudyHours
SleepHours
NextDayGrade
Report the correlations:
corr(StudyHours, NextDayGrade)
corr(SleepHours, NextDayGrade)
corr(StudyHours, SleepHours)
Briefly describe:
Does more study tend to be associated with higher grades?
Does more sleep tend to be associated with higher grades?
Are StudyHours and SleepHours strongly correlated with each other?
Need a hint? Correlation in Excel (ToolPak or CORREL)
Option 1 — ToolPak Correlation:
Select the three columns (StudyHours, SleepHours, NextDayGrade) including headers.
Data → Data Analysis → Correlation.
Input Range = the 3-column block; choose Columns and check Labels in first row.
Output to a new sheet or range. Excel prints a 3×3 correlation matrix.
You can arrange these in a little 3×3 table if you like (same pattern as the ICE).
D. Task 3 — VIF and multicollinearity
Use Excel formulas (for example, LINEST as in the ICE) to compute
the VIF for StudyHours and SleepHours.
Compute:
VIF for StudyHours
VIF for SleepHours
Based on the usual rules of thumb
(VIF ≈ 1–2 OK, >5 keep an eye on it, >10 serious problem),
comment on whether multicollinearity is a concern in this model.
In one or two sentences, explain what can go wrong when VIF is high
(for example: unstable slopes, large standard errors, signs that don’t
match intuition).
List at least one strategy to deal with high VIF in general
(for example: dropping one of two very similar predictors, combining
them into a single index, collecting more data, etc.).
Need a hint? VIF in Excel (functions)
The ToolPak does not give VIF directly, so we use formulas.
For each X, we compute R² from regressing that X on the other X, then
use VIF = 1 / (1 - R²).
Step 1 — VIF for StudyHours:
Let Y = StudyHours, X = SleepHours.
Use LINEST with Y_range =
StudyHours, X_range = SleepHours.
One simple way (match our ICE):
=1/(1-INDEX(LINEST(StudyHours_range, SleepHours_range, TRUE, TRUE), 3, 1))