Menu Top
Complete Course of Mathematics
Topic 1: Numbers & Numerical Applications Topic 2: Algebra Topic 3: Quantitative Aptitude
Topic 4: Geometry Topic 5: Construction Topic 6: Coordinate Geometry
Topic 7: Mensuration Topic 8: Trigonometry Topic 9: Sets, Relations & Functions
Topic 10: Calculus Topic 11: Mathematical Reasoning Topic 12: Vectors & Three-Dimensional Geometry
Topic 13: Linear Programming Topic 14: Index Numbers & Time-Based Data Topic 15: Financial Mathematics
Topic 16: Statistics & Probability


Content On This Page
Describing the Dispersion (Variability of Data) Measures of Dispersion: Definition and Purpose Different Methods of Measuring Dispersion (Overview)
Range: Definition and Calculation Mean Deviation: Definition and Calculation (from Mean, Median)


Measures of Dispersion: Range and Mean Deviation




Describing the Dispersion (Variability of Data)


Beyond Central Tendency

Measures of central tendency (mean, median, mode) provide a single value that represents the "center" or typical value of a dataset. They tell us where the data is located. However, they do not give any information about how spread out or scattered the data values are around this center. Two datasets can have the exact same mean, median, and mode but exhibit vastly different levels of variability.

Consider the following two small datasets representing, for instance, the heights (in cm) of students in two different groups:

For Set A:

For Set B:

Both datasets have the same mean (50) and the same median (50). However, visually inspecting the numbers, it is clear that the values in Set A are very close to the mean, while the values in Set B are much more spread out. A single measure of central tendency alone is insufficient to fully describe these datasets.

Therefore, describing the dispersion or variability of the data is just as crucial as describing its center for a complete understanding of the dataset's characteristics.

Two frequency distribution curves with the same central point but different widths

What is Dispersion?

Dispersion, also referred to as variability, scatter, or spread, is a statistical concept that quantifies the extent to which the data values in a distribution are spread out or clustered together. It measures how much the individual observations deviate or vary from the central value (an average like the mean or median).

Measures of dispersion complement measures of central tendency by providing context about the variability within the dataset. Averages alone can be misleading if the dispersion is not also considered.



Measures of Dispersion: Definition and Purpose


Definition

Measures of Dispersion are statistical indicators that quantify the amount of variation or spread within a set of data. They provide a numerical summary of how much the individual data points tend to differ from the average or from each other. A measure of dispersion summarizes the scatter of observations into a single value.

In simple terms, a measure of dispersion tells us how representative the average is or how homogeneous (similar) the data points are. A large value of dispersion indicates that the data is widely spread, while a small value indicates that the data is clustered closely around the average.


Purpose and Importance

Calculating measures of dispersion is essential for a comprehensive understanding of data and serves several important purposes:

  1. Judging the Reliability of Averages: Measures of dispersion help in assessing how well an average (like the mean) represents the entire dataset. If the dispersion is small, it means most data points are close to the average, and the average is a reliable representative value. If the dispersion is large, the average is less reliable as a single representative because the data is widely scattered, and many values are far from the average.
  2. Comparing Variability: They allow for a direct comparison of the spread or consistency of two or more different datasets, even if they have different averages or units (when using relative measures). For example, comparing the dispersion of scores in two different exams can tell us which exam had more consistent performance among students. Comparing the variability in returns of two investments helps assess their relative riskiness.
  3. Basis for Quality Control: In fields like manufacturing and quality control, minimizing variability is often a key objective. Measures of dispersion are used to monitor process consistency and identify when variability exceeds acceptable limits.
  4. Foundation for Further Statistical Analysis: Measures of dispersion, particularly variance and standard deviation, are fundamental components of many more advanced statistical techniques, including hypothesis testing, confidence interval estimation, correlation, regression, and analysis of variance (ANOVA).
  5. Understanding Distribution Shape: Along with measures of central tendency, measures of dispersion help in describing the shape of a distribution. For instance, a distribution with high dispersion will have a flatter shape than one with low dispersion, assuming they have similar central tendencies.

Common Measures of Dispersion

Various measures have been developed to quantify dispersion, each with its own strengths and applications. They can be broadly categorized into absolute and relative measures:




Range: Definition and Calculation


Definition

The Range is the simplest and most easily understood measure of dispersion. It quantifies the total spread of the data by calculating the difference between the highest and lowest values in the dataset.

In simple terms, it tells us the extent of variation from the minimum to the maximum value.


Calculation

Let $L$ denote the minimum (smallest) value and $H$ denote the maximum (largest) value in a dataset.

Formula for Ungrouped Data:

Range $= H - L$

... (1)

For Grouped Data:

For data presented in a grouped frequency distribution, the exact minimum and maximum values are usually unknown. The range is estimated as the difference between the upper boundary of the highest class interval and the lower boundary of the lowest class interval.

Range (Grouped Data) $\approx$ Upper Boundary of Highest Class - Lower Boundary of Lowest Class

... (2)

Ensure you use class boundaries, especially if dealing with inclusive intervals. For example, if the lowest class is $10-19$ and the highest is $90-99$ (with class boundaries $9.5-19.5$ and $89.5-99.5$), the estimated range would be $99.5 - 9.5 = 90$.


Advantages of Range


Disadvantages of Range

Despite its simplicity, the range has significant limitations as a measure of dispersion:

Due to these limitations, the range is often used only for preliminary data analysis or for comparing the spread of very small datasets where simplicity is prioritized over precision.


Example

Example 1. Find the range of the following set of scores: 15, 20, 25, 18, 22, 95.

Answer:

Given: Dataset: 15, 20, 25, 18, 22, 95.

To Find: The range.

Solution:

We need to identify the maximum (largest) and minimum (smallest) values in the dataset.

  • Maximum value ($H$) = 95.
  • Minimum value ($L$) = 15.

Using the formula for range:

Range $= H - L$

... (i)

Substitute the maximum and minimum values:

Range $= 95 - 15$

Range $= 80$

... (ii)

The range of the scores is 80.

Note on Outliers:

Consider the dataset without the value 95: {15, 20, 25, 18, 22}. The maximum value is 25 and the minimum is 15. The range is $25 - 15 = 10$. The inclusion of a single outlier (95) drastically increased the range from 10 to 80, illustrating the range's high sensitivity to extreme values.



Mean Deviation: Definition and Calculation (from Mean, Median)


Definition and Concept

The Mean Deviation (often abbreviated as MD, and also known as the Mean Absolute Deviation, MAD) is a measure of dispersion that quantifies the average amount by which individual observations in a dataset differ from a measure of central tendency. It is calculated as the arithmetic mean of the absolute values of the deviations of the observations from the chosen central value (typically the mean or the median).

The use of absolute values, denoted by $|...|$, is essential. If we simply summed the deviations $(x_i - \bar{x})$, the sum would always be zero for the mean, as positive and negative deviations cancel out. Taking absolute values ensures that the measure reflects the total distance of data points from the center, regardless of direction.

The Mean Deviation provides a direct measure of the average distance of each data point from the center.


Calculation for Ungrouped Data

For a set of $n$ individual observations $x_1, x_2, \dots, x_n$:

1. Mean Deviation about the Mean ($\text{MD}_{\bar{x}}$):

This measures the average deviation of observations from the arithmetic mean.

  1. Calculate the arithmetic mean, $\bar{x} = \frac{\sum x_i}{n}$.
  2. For each observation $x_i$, calculate its deviation from the mean: $x_i - \bar{x}$.
  3. Take the absolute value of each deviation: $|x_i - \bar{x}|$.
  4. Sum all these absolute deviations: $\sum_{i=1}^{n} |x_i - \bar{x}|$.
  5. Divide the sum by the total number of observations, $n$.
  6. Formula:

    $\text{MD}_{\bar{x}} = \frac{\sum\limits_{i=1}^{n} |x_i - \bar{x}|}{n}$

    ... (1)

2. Mean Deviation about the Median ($\text{MD}_{\text{M}}$):

This measures the average deviation of observations from the median.

  1. Arrange the data in ascending or descending order and find the median, $M$.
  2. For each observation $x_i$, calculate its deviation from the median: $x_i - M$.
  3. Take the absolute value of each deviation: $|x_i - M|$.
  4. Sum these absolute deviations: $\sum_{i=1}^{n} |x_i - M|$.
  5. Divide the sum by the total number of observations, $n$.
  6. Formula:

    $\text{MD}_{\text{M}} = \frac{\sum\limits_{i=1}^{n} |x_i - M|}{n}$

    ... (2)

A significant property is that the sum of absolute deviations, $\sum |x_i - c|$, is minimized when $c$ is the median. Consequently, the Mean Deviation about the Median is always less than or equal to the Mean Deviation about the Mean ($\text{MD}_{\text{M}} \le \text{MD}_{\bar{x}}$).


Calculation for Frequency Distributions (Ungrouped or Grouped)

When data is presented in a frequency distribution (either ungrouped with distinct values or grouped with class intervals), where $x_1, x_2, \dots, x_k$ are the distinct values or class marks and $f_1, f_2, \dots, f_k$ are their corresponding frequencies, and $N = \sum\limits_{i=1}^{k} f_i$ is the total frequency:

The calculation involves weighting the absolute deviations by their frequencies.

1. Mean Deviation about the Mean ($\text{MD}_{\bar{x}}$):

  1. Calculate the mean, $\bar{x} = \frac{\sum f_i x_i}{N}$ (using the appropriate formula for ungrouped or grouped data).
  2. For each distinct value or class mark $x_i$, calculate its absolute deviation from the mean: $|x_i - \bar{x}|$.
  3. Multiply each absolute deviation by its corresponding frequency $f_i$: $f_i |x_i - \bar{x}|$.
  4. Sum these products: $\sum_{i=1}^{k} f_i |x_i - \bar{x}|$.
  5. Divide the sum by the total frequency $N$.
  6. Formula:

    $\text{MD}_{\bar{x}} = \frac{\sum\limits_{i=1}^{k} f_i |x_i - \bar{x}|}{N}$

    ... (3)

2. Mean Deviation about the Median ($\text{MD}_{\text{M}}$):

  1. Calculate the median, $M$ (using the appropriate method for ungrouped or grouped frequency data).
  2. For each distinct value or class mark $x_i$, calculate its absolute deviation from the median: $|x_i - M|$.
  3. Multiply each absolute deviation by its corresponding frequency $f_i$: $f_i |x_i - M|$.
  4. Sum these products: $\sum_{i=1}^{k} f_i |x_i - M|$.
  5. Divide by the total frequency $N$.
  6. Formula:

    $\text{MD}_{\text{M}} = \frac{\sum\limits_{i=1}^{k} f_i |x_i - M|}{N}$

    ... (4)


Example

Example 1. Find the mean deviation about the mean for the data: 6, 7, 10, 12, 13, 4, 8, 12.

Answer:

Given: Dataset: 6, 7, 10, 12, 13, 4, 8, 12.

To Find: Mean deviation about the mean.

Solution:

This is ungrouped data (individual observations). The number of observations is $n=8$.

Step 1: Calculate the Mean ($\bar{x}$).

Sum of observations = $6+7+10+12+13+4+8+12 = 72$.

$\bar{x} = \frac{\sum x_i}{n} = \frac{72}{8} = 9$

... (i)

The mean is 9.

Step 2: Calculate Absolute Deviations $|x_i - \bar{x}| = |x_i - 9|$.

We calculate the absolute difference between each observation and the mean (9).

$x_i$ $x_i - \bar{x}$
($x_i - 9$)
Absolute Deviation
$|x_i - \bar{x}|$
6$6 - 9 = -3$$|-3| = 3$
7$7 - 9 = -2$$|-2| = 2$
10$10 - 9 = 1$$|1| = 1$
12$12 - 9 = 3$$|3| = 3$
13$13 - 9 = 4$$|4| = 4$
4$4 - 9 = -5$$|-5| = 5$
8$8 - 9 = -1$$|-1| = 1$
12$12 - 9 = 3$$|3| = 3$
Total $\sum |x_i - \bar{x}| = 3+2+1+3+4+5+1+3 = 22$

Step 3: Calculate Mean Deviation about Mean.

Using the formula $\text{MD}_{\bar{x}} = \frac{\sum |x_i - \bar{x}|}{n}$:

$\text{MD}_{\bar{x}} = \frac{22}{8}$

... (ii)

$\text{MD}_{\bar{x}} = 2.75$

... (iii)

The mean deviation about the mean is 2.75. This means, on average, each score deviates from the mean of 9 by 2.75 units.


Advantages and Disadvantages of Mean Deviation

Mean deviation offers some advantages and disadvantages compared to other measures of dispersion:

Due to the mathematical difficulties associated with absolute values, the Mean Deviation is not as widely used in inferential statistics as the Variance and Standard Deviation, which are based on squared deviations.