📊 Chapter 6 – Descriptive Statistics

📚 Homework

Use the dataset file chapter6_homework_data.xlsx. Focus on Statics_Grade as the primary variable. If you need a second variable, pick any numeric column (e.g., Class_Participation). Submit plots and brief answers; no solutions are provided on this page.

📄 Dataset: Open Excel file in browser · Download Excel File

⚠️ Can’t find chapter6_homework_data.xlsx. If you’re viewing this file locally (file://), the check may be blocked—use the download link above or open the page from the course website.

🔧 Need step-by-step Excel help? Open Chapter 6 — Excel Help (copy-ready formulas). Reference video: YouTube Q–Q Plot walkthrough.

Descriptive stats. For Statics_Grade, compute: n, mean, median, sample stdev (s), min, max, range, and IQR.
Histogram (bins = 8). Make a histogram of Statics_Grade with 8 bins. Describe the shape (symmetric/skewed), and any unusual features.
Histogram (compare bins). Rebuild the histogram with a different bin rule (e.g., Sturges or Freedman–Diaconis). Compare to the previous one—what changed?
Box plot + outliers. Create a box plot for Statics_Grade. Using the 1.5×IQR rule, list any suspected outliers (by row ID or value).
Normal Q–Q plot. Make a normal probability (Q–Q) plot for Statics_Grade. Comment on whether normality seems reasonable and where deviations occur.
z-scores. Compute z-scores for Statics_Grade. How many observations have |z| > 2? Briefly interpret.
Scatter plot. Plot Statics_Grade (y-axis) vs a numeric variable of your choice (x-axis; e.g., Class_Participation). Describe the trend and note any unusual points.

📢 Student Q&A

Why do engineers start with descriptive statistics?

Every design/test decision begins with “what does the data look like?” Center, spread, and shape tell you if the process meets specs, if sensors are stable, and where problems likely are—before modeling.

First step before regression?

A quick EDA pass: check units/ranges, missing values, histograms/box plots, scatter Y vs each X, correlation scan for multicollinearity, consider transforms for symmetry.

Fast column summary in Excel?

Use =AVERAGE(), =MEDIAN(), =STDEV.S(), =MIN(), =MAX(), IQR via =QUARTILE.EXC(r,3)-QUARTILE.EXC(r,1). Or Insert → PivotTable for grouped summaries.

Mean & SD vs Median & IQR—when?

Mean & SD for symmetric data; Median & IQR when outliers/skew are present (robust). In practice, report both if extremes are plausible.

What is a “robust” summary?

A summary that resists bad points. Median/IQR/MAD barely move if one measurement is wrong—handy for sensors and student grade summaries alike.

How to catch data-entry errors quickly?

Sort ascending; build a box plot; flag impossible values. Excel: Conditional Formatting → “Highlight Cell Rules…”, plus =COUNTIF(range, value) to spot duplicates.

Build a good histogram and choose bins?

Insert → Statistic Chart → Histogram. Start with √n or Sturges (≈1+log₂n). For finer control, Freedman–Diaconis width: w=2*IQR*n^(-1/3), bins ≈ (max−min)/w.

How do I read a box plot?

Box = IQR (Q1—Q3), line = median; whiskers reach typical non-outliers; dots beyond are outliers. Compare groups by median (shift) and IQR (spread).

Histogram vs time (run) chart—when to use?

Histogram shows overall distribution; a run chart shows when shifts happened (drift, cycles). For instruments, always glance at a run chart first.

Moving average in Excel?

Simple: =AVERAGE(B3:B7) for a 5-point mean, fill down. Or Data → Data Analysis → Moving Average. Keep raw series visible.

Population vs sample formulas in Excel?

STDEV.P for full populations (σ), STDEV.S for samples (s, divides by n−1). Usually use STDEV.S.

How to standardize scores (z-scores)?

z=(x−mean)/SD. If mean in F2 and SD in F3: =(A2-$F$2)/$F$3 then fill. Now different units are comparable.

Quick correlation matrix in Excel?

Use pairwise =CORREL(col1,col2) or Data → Data Analysis → Correlation. Treat |r|>0.8 as a multicollinearity warning pre-regression.

Risk of eyeballing correlation?

Hidden groups (Simpson’s paradox) can fake a trend. Color by group or facet plots to see true relationships.

When to transform (log/sqrt)?

Use logs for positive right-skewed data (fail times, costs, counts). Excel: =LN(x) or =LOG10(x). Re-plot to confirm symmetry and stabilized spread.

Compare groups fast?

Side-by-side box plots or overlaid histograms. In Excel: Insert → Box & Whisker, or Histogram with multiple series (Excel 365).

Why can line charts mislead?

Lines imply time/sequence. If X is not time (e.g., part length), a line suggests trends that aren’t there—use a scatter plot.

What to do with missing values?

Decide if “missing” is truly missing or zero. Exclude from numeric summaries (AVERAGEIF(range,"<>"&"")) and report the count used. Don’t auto-fill with zeros.

Make Excel analyses reproducible?

Convert data to a Table (Ctrl+T), use named ranges, keep a “Summary” block with formulas, and document chart steps. For simulations, fix the random seed or paste values before sharing.

How does Chapter 6 prep me for regression?

You’ll know Y’s distribution, which X’s are promising, where outliers/transformations are needed, and whether normality/constant-variance look plausible—making model building smoother.

📊 Chapter 6: Descriptive Statistics

🎮 Interactive Apps

📽️ PPT Slides

📝 Quiz

📚 Homework

💬 Excel

📢 Student Q&A