Home ›
Statistics Help ›
Descriptive Statistics
Descriptive Statistics Explained: Summarizing Data the Right Way
📊 Quick Answer
Descriptive statistics summarize and describe the main features of a dataset—they tell you what your data looks like right now. This includes measures of center (mean, median, mode), measures of spread (range, standard deviation, variance), and measures of position (percentiles, z-scores). Unlike inferential statistics, descriptive stats don’t make predictions or test hypotheses—they simply organize and present data in meaningful ways.
📑 In This Guide
What Are Descriptive Statistics?
Descriptive statistics are tools for summarizing and organizing data so you can understand what you’re looking at. Think of them as the “snapshot” of your dataset—they answer the question “What happened?” rather than “What does this mean for the future?”
The key distinction is between descriptive and inferential statistics:
| Descriptive Statistics | Inferential Statistics |
|---|---|
| Summarizes data you already have | Makes predictions about larger populations |
| Mean, median, standard deviation | Hypothesis tests, confidence intervals |
| Describes “what is” | Estimates “what might be” |
| No probability statements | Uses probability to draw conclusions |
Every statistics course starts with descriptive statistics because you need to understand your data before you can analyze it. If you skip this foundation, hypothesis testing and confidence intervals won’t make sense.
Measures of Central Tendency
Central tendency measures identify the “center” of your data—the typical or representative value. The three main measures are mean, median, and mode, and knowing when to use each one is critical for exams.
Mean (Average)
The mean is the sum of all values divided by the number of values. It’s the most commonly used measure of center, but it has a major weakness: outliers pull the mean toward extreme values.
Formula: x̄ = Σx / n
When to use: Symmetric data without extreme outliers (exam scores, heights, weights in normal populations).
When NOT to use: Income data, home prices, or any dataset with extreme values on one end.
Median (Middle Value)
The median is the middle value when data is arranged in order. If you have an even number of values, it’s the average of the two middle values. The median is “resistant” to outliers—extreme values don’t affect it much.
When to use: Skewed data, income distributions, home prices, any data with outliers.
Example: If five houses sell for $200K, $220K, $240K, $260K, and $2.1 million, the mean is $604K (misleading), but the median is $240K (much more representative).
Mode (Most Frequent)
The mode is the value that appears most often. It’s the only measure of center that works for categorical data (like favorite colors or political party).
Unimodal: One peak. Bimodal: Two peaks. Multimodal: Multiple peaks. No mode: All values appear equally.
💡 Quick Decision Guide
- Symmetric data, no outliers → Use mean
- Skewed data or outliers → Use median
- Categorical data → Use mode
- Need to describe distribution shape → Report mean AND median (the gap between them indicates skewness)
Measures of Spread (Variability)
Central tendency tells you where the center is, but spread tells you how much the data varies around that center. Two datasets can have the same mean but look completely different—one might be tightly clustered while the other is spread out.
Standard deviation measures the typical distance of data points from the mean
Range
The range is simply the maximum minus the minimum. It’s easy to calculate but highly sensitive to outliers—one extreme value changes the entire range.
Range = Maximum – Minimum
Variance and Standard Deviation
These are the most important measures of spread. Variance (σ² for populations, s² for samples) measures the average of the squared distances from the mean. Standard deviation (σ or s) is the square root of variance, returning the measurement to original units.
Standard deviation calculation: find the mean, subtract from each value, square, average, then square root
Key distinction for exams:
- Population standard deviation (σ): Divide by N
- Sample standard deviation (s): Divide by n-1 (this is called Bessel’s correction)
Most homework problems use sample standard deviation (n-1) because you’re typically working with a sample, not an entire population.
Same mean, different spread: standard deviation captures what the mean alone cannot
Interquartile Range (IQR)
The IQR is the range of the middle 50% of data: Q3 – Q1. Like the median, it’s resistant to outliers, making it useful for skewed distributions.
IQR = Q3 – Q1
The IQR is also used to identify outliers: any value below Q1 – 1.5(IQR) or above Q3 + 1.5(IQR) is considered an outlier.
For a deeper dive into standard deviation with worked examples, see our complete standard deviation guide.
Measures of Position
Position measures tell you where a specific value falls relative to the rest of the dataset. These are crucial for understanding individual data points in context.
Percentiles and Quartiles
A percentile tells you what percentage of data falls below a given value. If you score in the 85th percentile on an exam, 85% of test-takers scored below you.
Quartiles divide data into four equal parts:
- Q1 (25th percentile): 25% of data falls below
- Q2 (50th percentile): The median—50% falls below
- Q3 (75th percentile): 75% of data falls below
Z-Scores (Standard Scores)
A z-score tells you how many standard deviations a value is from the mean. It’s the foundation for working with the normal distribution.
Formula: z = (x – μ) / σ
Z-scores standardize data: z = 0 is the mean, z = 1 is one standard deviation above
Interpretation:
- z = 0: Value equals the mean
- z = 1: One standard deviation above the mean
- z = -2: Two standard deviations below the mean
- |z| > 2: Unusual values (roughly top/bottom 5%)
- |z| > 3: Very unusual (roughly top/bottom 0.3%)
Z-scores let you compare values from different distributions. A score of 85 on a test with mean 70 and SD 10 (z = 1.5) is more impressive than a score of 90 on a test with mean 80 and SD 5 (z = 2.0)—wait, actually the z = 2.0 is better. This is exactly why z-scores matter.
Measures of Shape
Shape describes how data is distributed. The two main aspects are skewness and kurtosis.
Skewness
Skewness measures asymmetry in the distribution:
- Right-skewed (positive): Long tail to the right, mean > median. Examples: income, home prices, reaction times.
- Left-skewed (negative): Long tail to the left, mean < median. Examples: age at retirement, easy exam scores.
- Symmetric: Mean ≈ median. Examples: heights, IQ scores, measurement errors.
The mean gets pulled toward the tail; use median for skewed data
⚠️ Common Exam Trap
The tail points the direction of the skew, not the bulk of the data. A “right-skewed” distribution has most data on the LEFT with a long tail stretching RIGHT.
Kurtosis
Kurtosis measures how heavy the tails are compared to a normal distribution:
- Leptokurtic (kurtosis > 3): Heavier tails, more outliers than normal
- Mesokurtic (kurtosis ≈ 3): Similar to normal distribution
- Platykurtic (kurtosis < 3): Lighter tails, fewer outliers
Most intro courses focus on skewness. Kurtosis typically appears in more advanced statistics courses.
Data Visualization
Descriptive statistics often involve creating visual displays that reveal patterns numbers alone might miss.
Histograms
Show the distribution shape by grouping data into bins. Great for seeing skewness, multiple modes, and overall spread.
Box Plots (Box-and-Whisker)
Display the five-number summary (minimum, Q1, median, Q3, maximum) and clearly show outliers. Excellent for comparing multiple groups.
Box plots display the five-number summary and identify outliers at a glance
Stem-and-Leaf Plots
Preserve original data values while showing distribution shape. Common in intro courses but rarely used in practice.
Dot Plots
Show individual data points—useful for small datasets.
The Empirical Rule (68-95-99.7): applies to approximately normal distributions
For approximately normal distributions, the Empirical Rule states:
- ~68% of data falls within 1 standard deviation of the mean
- ~95% falls within 2 standard deviations
- ~99.7% falls within 3 standard deviations
Learn more in our normal distribution guide.
Common Student Mistakes
These are the errors that cost students the most points on exams and homework.
❌ Mistake #1: Using the mean for skewed data
When data is skewed, the mean gets pulled toward the tail. Always check the distribution shape first. If skewed, report the median instead (or report both and explain why they differ).
❌ Mistake #2: Confusing σ and s notation
σ (sigma) = population standard deviation (divide by N). s = sample standard deviation (divide by n-1). Most problems use sample statistics, so use n-1 unless told otherwise.
❌ Mistake #3: Forgetting to square root variance
Variance is in squared units (like “square dollars” or “square inches”), which is meaningless. Standard deviation returns to original units. If a problem asks for spread in interpretable terms, they want standard deviation.
❌ Mistake #4: Misinterpreting percentiles
The 90th percentile means 90% scored BELOW that value, not that you scored 90%. Scoring in the 90th percentile with a score of 75 means 75 was better than 90% of scores—the actual number doesn’t matter.
❌ Mistake #5: Calculator mode errors
TI calculators show “σx” for population SD and “Sx” for sample SD. Most students need Sx. ALEKS and MyStatLab are strict about which one you enter—double-check before submitting.
Platform-Specific Tips
ALEKS
ALEKS requires exact formatting. For standard deviation, round to the decimal place specified in the problem. If it says “round to two decimal places,” don’t give three. ALEKS also distinguishes between population and sample formulas—read carefully.
MyStatLab (Pearson)
MyStatLab often provides calculator instructions within problems. Follow their rounding rules exactly—they usually specify in the problem or help section. StatCrunch is integrated and will give you sample vs. population statistics clearly labeled.
WebAssign
WebAssign accepts answers within a tolerance range, but descriptive statistics answers should still match closely. Watch for problems that want you to use specific data from tables rather than entering formulas.
Struggling with these platforms? Our tutors handle ALEKS statistics, MyStatLab, and WebAssign daily.
Frequently Asked Questions
What’s the difference between descriptive and inferential statistics?
Descriptive statistics summarize data you already have (mean, median, standard deviation). Inferential statistics use sample data to make predictions about larger populations (hypothesis tests, confidence intervals). Think of descriptive as “describing what is” and inferential as “predicting what might be.”
When should I use mean vs. median?
Use the mean for symmetric data without outliers (like exam scores with no extreme values). Use the median for skewed data or when outliers are present (like income, home prices, or any data with extreme values). When in doubt, report both—if they’re very different, that tells you the data is skewed.
Why do we divide by n-1 instead of n for sample standard deviation?
This is called Bessel’s correction. When calculating from a sample, dividing by n tends to underestimate the true population standard deviation. Dividing by n-1 corrects this bias, giving an “unbiased estimator.” For large samples, the difference is minimal, but for small samples, it matters significantly.
How do I know if my data is skewed?
Compare the mean and median: if mean > median, data is right-skewed (positive skew); if mean < median, data is left-skewed (negative skew). You can also look at a histogram—the tail points in the direction of the skew. The mode is typically at the peak, while the mean gets pulled toward the tail.
What’s a z-score and why does it matter?
A z-score tells you how many standard deviations a value is from the mean. A z-score of 2 means the value is 2 standard deviations above the mean; z = -1.5 means 1.5 standard deviations below. Z-scores let you compare values from different distributions and are essential for working with normal distribution problems.
How do I identify outliers?
The most common method uses the IQR rule: any value below Q1 – 1.5(IQR) or above Q3 + 1.5(IQR) is an outlier. Alternatively, z-scores beyond ±2 are often considered unusual, and beyond ±3 are typically considered outliers. Box plots visually display outliers as individual points beyond the whiskers.
What’s the five-number summary?
The five-number summary consists of: minimum, Q1 (25th percentile), median (Q2), Q3 (75th percentile), and maximum. These five values are used to construct box plots and give a quick picture of the distribution’s center, spread, and potential skewness without calculating mean or standard deviation.
Can you help with my descriptive statistics homework?
Yes—descriptive statistics problems are one of the most common requests we handle. Whether you need help understanding concepts, checking your work, or completing assignments, our tutors work with ALEKS, MyStatLab, WebAssign, and other platforms daily. Get a free quote to see how we can help.
Related Resources
Statistics Foundations
- Standard Deviation Explained
- Normal Distribution Guide
- Confidence Intervals Explained
- Hypothesis Testing Guide
Statistics Help
Need Help With Statistics?
Our tutors handle descriptive statistics assignments daily—from basic measures of center to five-number summaries and beyond.