Chapter 12 — Homework: Study & Sleep vs Next-Day Grade

Use a small data set of 23 students to practice multiple linear regression, correlation, and VIF. You may use either Excel Analysis ToolPak or the Excel functions from our class ICE apps.

Model for this homework:
NextDayGrade = β₀ + β₁·StudyHours + β₂·SleepHours + ε

A. Data: Study, Sleep, and Next-Day Grade

The night before a short quiz, 23 students recorded how many hours they studied and how many hours they slept. The next day they took the quiz, and we recorded their NextDayGrade (0–100 scale).

Copy the dataset into Excel (use it exactly as given, no editing):

Hint: you can use either the functions-only ICE (GPA) or the ToolPak ICE (Study/Tutor/Prep → FinalGrade) as a pattern for what outputs to report and how to interpret them.

B. Task 1 — Fit the multiple regression model

Use Excel (ToolPak or functions) to fit the model NextDayGrade = β₀ + β₁·StudyHours + β₂·SleepHours + ε.

  1. Run a multiple linear regression of NextDayGrade on StudyHours and SleepHours.
  2. Write down the estimated equation using your numbers:
    NextDayGradê = b₀ + b₁·StudyHours + b₂·SleepHours
  3. Report:
    • R² and Adjusted R²
    • Standard error of the regression (Root MSE)
    • For each slope (StudyHours, SleepHours): estimate, standard error, t, and p-value
    • The overall F statistic and its p-value
  4. In words, interpret:
    • What does b₁ (StudyHours) mean in this context?
    • What does b₂ (SleepHours) mean in this context?
    • According to the F-test, are the predictors useful as a group?
Need a hint? Regression in Excel (ToolPak or functions)

Option 1 — ToolPak (quick):

  1. Put the data in three columns with headers: StudyHours, SleepHours, NextDayGrade.
  2. Data → Data Analysis → Regression.
  3. Input Y Range: the NextDayGrade column (including header).
  4. Input X Range: the StudyHours and SleepHours columns (including headers).
  5. Check Labels, choose an output range or new sheet, click OK.
  6. Use the Coefficients table for b₀, b₁, b₂, and the ANOVA / R² section for R², F, etc.

Option 2 — Functions (matches the GPA ICE):

  1. Build an X matrix with an intercept column (all 1’s), StudyHours, SleepHours.
  2. Use MMULT and MINVERSE as in the class ICE to compute β̂ = (XᵀX)⁻¹XᵀY (you can literally copy the GPA ICE block and just change the ranges to StudyHours, SleepHours, NextDayGrade).
  3. Compute fitted values, residuals, SSE, df, MSE, Root MSE, SE(β̂), t, p using the same pattern as the ICE.

C. Task 2 — Correlation among Study, Sleep, and Grade

Use Excel (ToolPak correlation or CORREL) to compute the 3×3 correlation matrix for:

  1. Report the correlations:
    • corr(StudyHours, NextDayGrade)
    • corr(SleepHours, NextDayGrade)
    • corr(StudyHours, SleepHours)
  2. Briefly describe:
    • Does more study tend to be associated with higher grades?
    • Does more sleep tend to be associated with higher grades?
    • Are StudyHours and SleepHours strongly correlated with each other?
Need a hint? Correlation in Excel (ToolPak or CORREL)

Option 1 — ToolPak Correlation:

  1. Select the three columns (StudyHours, SleepHours, NextDayGrade) including headers.
  2. Data → Data Analysis → Correlation.
  3. Input Range = the 3-column block; choose Columns and check Labels in first row.
  4. Output to a new sheet or range. Excel prints a 3×3 correlation matrix.

Option 2 — CORREL function:

  • corr(StudyHours, NextDayGrade):
    =CORREL(StudyHours_range, NextDayGrade_range)
  • corr(SleepHours, NextDayGrade):
    =CORREL(SleepHours_range, NextDayGrade_range)
  • corr(StudyHours, SleepHours):
    =CORREL(StudyHours_range, SleepHours_range)

You can arrange these in a little 3×3 table if you like (same pattern as the ICE).

D. Task 3 — VIF and multicollinearity

Use Excel formulas (for example, LINEST as in the ICE) to compute the VIF for StudyHours and SleepHours.

  1. Compute:
    • VIF for StudyHours
    • VIF for SleepHours
  2. Based on the usual rules of thumb (VIF ≈ 1–2 OK, >5 keep an eye on it, >10 serious problem), comment on whether multicollinearity is a concern in this model.
  3. In one or two sentences, explain what can go wrong when VIF is high (for example: unstable slopes, large standard errors, signs that don’t match intuition).
  4. List at least one strategy to deal with high VIF in general (for example: dropping one of two very similar predictors, combining them into a single index, collecting more data, etc.).
Need a hint? VIF in Excel (functions)

The ToolPak does not give VIF directly, so we use formulas. For each X, we compute R² from regressing that X on the other X, then use VIF = 1 / (1 - R²).

Step 1 — VIF for StudyHours:

  1. Let Y = StudyHours, X = SleepHours.
  2. Use LINEST with Y_range = StudyHours, X_range = SleepHours.
  3. One simple way (match our ICE):
    =1/(1-INDEX(LINEST(StudyHours_range, SleepHours_range, TRUE, TRUE), 3, 1))
  4. That formula gives VIF(StudyHours).

Step 2 — VIF for SleepHours:

  1. Now let Y = SleepHours, X = StudyHours.
  2. Use:
    =1/(1-INDEX(LINEST(SleepHours_range, StudyHours_range, TRUE, TRUE), 3, 1))
  3. That formula gives VIF(SleepHours).

You can put these in a small table, for example:

  • A1: "Variable"    B1: "VIF"
  • A2: "StudyHours"   B2: (formula for VIF StudyHours)
  • A3: "SleepHours"   B3: (formula for VIF SleepHours)

E. Short report template (turn this in)

Type your answers in Word (or similar). You may use the template below and fill in the brackets with your numbers and comments.