Menu Top
Complete Course of Mathematics
Topic 1: Numbers & Numerical Applications Topic 2: Algebra Topic 3: Quantitative Aptitude
Topic 4: Geometry Topic 5: Construction Topic 6: Coordinate Geometry
Topic 7: Mensuration Topic 8: Trigonometry Topic 9: Sets, Relations & Functions
Topic 10: Calculus Topic 11: Mathematical Reasoning Topic 12: Vectors & Three-Dimensional Geometry
Topic 13: Linear Programming Topic 14: Index Numbers & Time-Based Data Topic 15: Financial Mathematics
Topic 16: Statistics & Probability


Content On This Page
Mode: Definition and Calculation for Ungrouped Data Mode of Grouped Data (Formula and Calculation) Relationsship Between Mean, Median and Mode (Empirical Formula)
Comparing Mean, Median, and Mode


Measures of Central Tendency: Mode and Relationship




Mode: Definition and Calculation for Ungrouped Data


Definition and Nature

The Mode is defined as the value or category that occurs most frequently in a given dataset. It represents the observation with the highest frequency. Unlike the mean and median, which are typically used for numerical data, the mode can be determined for both numerical (quantitative) and non-numerical (categorical or qualitative) data.

The mode indicates the most popular or common outcome in a dataset. For example, if we survey people about their favourite colour, the colour chosen by the largest number of people is the mode.

Key aspects of the mode:


Calculation for Ungrouped Data

Calculating the mode for ungrouped data (either a simple list of observations or an ungrouped frequency distribution) is straightforward:

  1. List the Observations (if raw data):

    If the data is just a list of values, examine each distinct value and count how many times it appears in the list.

  2. Identify Frequencies:

    For raw data, tally the frequency of each distinct value. If the data is already in an ungrouped frequency distribution table, the frequencies are given directly.

  3. Find the Highest Frequency:

    Identify the maximum frequency among all the distinct values or categories.

  4. Determine the Mode(s):

    The value(s) or category(ies) corresponding to the highest frequency are the mode(s).

Possible outcomes when finding the mode:


Example

Example 1. Find the mode of the following datasets:

(a) 2, 5, 3, 5, 1, 5, 3, 4, 5, 2

(b) Favourite colours of students: Red, Blue, Green, Blue, Red, Red, Yellow

(c) 7, 8, 10, 10, 12, 13, 12, 15

(d) 1, 2, 3, 4, 5, 6, 7

Answer:

Solution:

(a) Dataset: 2, 5, 3, 5, 1, 5, 3, 4, 5, 2

Let's count the frequency of each distinct value:

  • Value 1: occurs 1 time
  • Value 2: occurs 2 times
  • Value 3: occurs 2 times
  • Value 4: occurs 1 time
  • Value 5: occurs 4 times

The highest frequency is 4, which corresponds to the value 5.

The mode is 5. This is a unimodal distribution.


(b) Dataset: Red, Blue, Green, Blue, Red, Red, Yellow

Let's count the frequency of each colour:

  • Colour Red: occurs 3 times
  • Colour Blue: occurs 2 times
  • Colour Green: occurs 1 time
  • Colour Yellow: occurs 1 time

The highest frequency is 3, which corresponds to the colour Red.

The mode is Red. This is a unimodal distribution.


(c) Dataset: 7, 8, 10, 10, 12, 13, 12, 15

Let's count the frequency of each distinct value:

  • Value 7: occurs 1 time
  • Value 8: occurs 1 time
  • Value 10: occurs 2 times
  • Value 12: occurs 2 times
  • Value 13: occurs 1 time
  • Value 15: occurs 1 time

The highest frequency is 2, and it occurs for two values: 10 and 12.

The modes are 10 and 12. This is a bimodal distribution.


(d) Dataset: 1, 2, 3, 4, 5, 6, 7

Let's count the frequency of each distinct value:

  • Value 1: occurs 1 time
  • Value 2: occurs 1 time
  • Value 3: occurs 1 time
  • Value 4: occurs 1 time
  • Value 5: occurs 1 time
  • Value 6: occurs 1 time
  • Value 7: occurs 1 time

All values occur with the same frequency (1).

There is no mode for this dataset.



Mode of Grouped Data (Formula and Calculation)


Introduction and Modal Class

For data that is grouped into class intervals, we cannot determine the exact value of the mode because the individual observations are not known. Instead, we can identify the class interval that contains the mode. This class is called the modal class and is defined as the class interval with the highest frequency.

Once the modal class is identified, we estimate the value of the mode within that class using a formula. This formula is based on the assumption that the mode is located within the modal class and its exact position is influenced by the frequencies of the classes immediately preceding and succeeding the modal class.


Steps to Calculate the Mode of Grouped Data

To estimate the mode from a grouped frequency distribution table:

  1. Identify the Modal Class:

    Examine the frequency column of the grouped frequency distribution table. The class interval corresponding to the maximum frequency is the modal class.

  2. Determine Values for the Formula:

    From the identified modal class and the frequency table, extract the following values needed for the mode formula:

    • $l$: The lower class boundary of the modal class. If the class intervals are exclusive (like $10-20, 20-30$), the lower limit is the boundary. If they are inclusive (like $10-19, 20-29$), convert the lower limit to a boundary by subtracting the adjustment factor (half the gap between classes).
    • $f_m$: The frequency of the modal class.
    • $f_1$: The frequency of the class immediately preceding the modal class.
    • $f_2$: The frequency of the class immediately succeeding the modal class.
    • $h$: The class width (size) of the modal class. (Assumed to be constant). $h = \text{Upper Boundary} - \text{Lower Boundary}$.
  3. Apply the Mode Formula:

    The estimated mode of grouped data is calculated using the formula:

    Mode $= l + \left( \frac{f_m - f_1}{2f_m - f_1 - f_2} \right) \times h$

    ... (1)

    Alternatively, the denominator can be written as $f_m + (f_m - f_1) + (f_m - f_2)$, highlighting the differences in frequency between the modal class and its neighbours:

    Mode $= l + \left( \frac{f_m - f_1}{(f_m - f_1) + (f_m - f_2)} \right) \times h$

    ... (2)

Important Notes:

Visual Representation:

The mode formula can be intuitively understood from a histogram. If lines are drawn from the top-right corner of the modal class bar to the top-right corner of the preceding bar, and from the top-left corner of the modal class bar to the top-left corner of the succeeding bar, the intersection of these lines projected onto the x-axis gives an estimate of the mode.

Histogram illustrating Mode formula components and graphical estimation

Example

Example 1. Find the mode for the following weight distribution data:

Weight (kg)Frequency (f)
40 - 452
45 - 505
50 - 555
55 - 607
60 - 655
65 - 704
70 - 752
Total30

Answer:

Given: Grouped frequency distribution of student weights.

To Find: The mode weight.

Solution:

Step 1: Identify Modal Class.

We look for the class interval with the maximum frequency in the table.

Weight (kg)Frequency (f)
40 - 452
45 - 505
50 - 555
55 - 607 (Maximum Frequency)
60 - 655
65 - 704
70 - 752
Total30

The maximum frequency is 7, which occurs in the class interval 55 - 60. This is the modal class.

Step 2: Determine Values for Formula.

From the modal class (55 - 60) and the table, we extract the necessary values:

  • $l$: Lower class boundary of the modal class. The class intervals are exclusive, so the lower boundary is the lower limit. $l = 55$.
  • $f_m$: Frequency of the modal class. $f_m = 7$.
  • $f_1$: Frequency of the class preceding the modal class (50 - 55). $f_1 = 5$.
  • $f_2$: Frequency of the class succeeding the modal class (60 - 65). $f_2 = 5$.
  • $h$: Class width of the modal class. $h = 60 - 55 = 5$.

Step 3: Apply the Mode Formula.

Mode $= l + \left( \frac{f_m - f_1}{2f_m - f_1 - f_2} \right) \times h$

... (i)

Substitute the values into the formula:

Mode $= 55 + \left( \frac{7 - 5}{2(7) - 5 - 5} \right) \times 5$

Mode $= 55 + \left( \frac{2}{14 - 10} \right) \times 5$

Mode $= 55 + \left( \frac{2}{4} \right) \times 5$

Mode $= 55 + \left( \frac{\cancel{2}^{1}}{\cancel{4}_{2}} \right) \times 5$

(Cancelling the fraction)

Mode $= 55 + \left(\frac{1}{2}\right) \times 5$

Mode $= 55 + 2.5$

Mode $= 57.5$

... (ii)

The estimated mode weight is 57.5 kg. This value lies within the modal class 55-60 kg, as expected.




Relationship Between Mean, Median and Mode (Empirical Formula)


Empirical Relationship

While the mean, median, and mode are distinct measures of central tendency, for distributions that are **unimodal** (having a single peak) and **moderately skewed** (not extremely asymmetrical), there exists an approximate empirical relationship between them. This relationship is based on observations from many real-world datasets that exhibit this type of distribution.

The approximate empirical relationship is given by:

Mean - Mode $\approx$ 3 $\times$ (Mean - Median)

... (1)

This formula suggests that the difference between the mean and the mode is roughly three times the difference between the mean and the median.

This relationship can be rearranged algebraically to express one measure in terms of the other two:

Formula (3) is the most commonly cited form of the empirical relationship.


Conditions for Applicability and Interpretation

It is crucial to understand when and how this empirical relationship applies:


Example

Example 1. In a moderately skewed distribution, the mean is 35.4 and the median is 34.3. Estimate the mode.

Answer:

Given: Mean ($\bar{x}$) = 35.4, Median (M) = 34.3.

The distribution is moderately skewed.

To Estimate: The mode (Z).

Solution:

We can use the empirical formula relating Mean, Median, and Mode for moderately skewed distributions. Let's use the form Mode $\approx$ 3 Median - 2 Mean.

Mode $\approx$ 3 $\times$ Median - 2 $\times$ Mean

... (i)

Substitute the given values into the formula:

Mode $\approx$ 3 $\times$ (34.3) - 2 $\times$ (35.4)

Perform the multiplications:

3 $\times$ 34.3 = 102.9

... (ii)

2 $\times$ 35.4 = 70.8

... (iii)

Substitute these products back into the formula (i):

Mode $\approx$ 102.9 - 70.8

Perform the subtraction:

Mode $\approx$ 32.1

... (iv)

The estimated mode is 32.1.

Interpretation:

Given that Mean (35.4) > Median (34.3), this is consistent with a positively skewed distribution. In a positively skewed distribution, the expected order of the measures is Mean > Median > Mode. Our estimated mode (32.1) follows this pattern (35.4 > 34.3 > 32.1), which supports the appropriateness of using the empirical formula in this case.



Comparing Mean, Median, and Mode


Comparison of the Three Measures

The mean, median, and mode are the three most common measures of central tendency, but they represent the "center" of the data in different ways. Each measure has its strengths and weaknesses, and the most appropriate measure to use depends on the nature of the data, the shape of its distribution, and the objective of the analysis.

Here is a comparison of their key features:

Feature Mean ($\bar{x}$) Median Mode
Definition The sum of all observations divided by the number of observations. It's the arithmetical average. The middle value when the data is arranged in ascending or descending order. It's the positional average. The value that occurs most frequently in the dataset. It's the value with the highest frequency.
Calculation Basis Takes into account the magnitude of every observation. Based on the positional rank of the middle value(s) in the ordered data. Does not directly use the magnitude of all values. Based on the frequency of occurrence of each value or category.
Type of Data Applicable Suitable only for **numerical** data (interval or ratio scales). Requires mathematical operations (addition, division). Suitable for **numerical** (interval or ratio scales) and **ordinal** data (data that can be ranked). Requires ordering. Suitable for **numerical** (interval or ratio scales), **ordinal**, and **categorical** (nominal) data. Does not require ordering or mathematical operations.
Effect of Extreme Values (Outliers) **Highly affected**. Extreme values pull the mean towards them. This can distort the representation of the typical value in skewed distributions. **Not affected** (or minimally affected). Its value is only dependent on the position of the middle observation(s), making it robust to outliers. **Not affected**. Outliers (rare extreme values) by definition do not occur with the highest frequency.
Existence and Uniqueness Always exists and is always unique for any numerical dataset. Always exists and is always unique for any dataset that can be ordered. May not exist (if all values have the same frequency), or may not be unique (bimodal or multimodal distributions).
Mathematical Properties Possesses desirable mathematical properties. Amenable to further algebraic treatment and is the foundation for many other statistical techniques (e.g., variance, standard deviation, correlation). Less amenable to complex algebraic manipulation compared to the mean. Least amenable to further algebraic treatment. Primarily a descriptive statistic.
When to Prefer Use Preferred for symmetrical distributions without significant outliers, when every observation's value should contribute to the central measure, or when further parametric statistical analysis is intended. Preferred for skewed distributions or distributions with significant outliers, when the focus is on the typical value based on rank, or when dealing with open-ended classes in grouped data. It represents the true middle point of the ordered data. Preferred for categorical data, identifying the most frequent observation or category, or when describing the peak(s) in a frequency distribution. Useful when a quick estimate of the center is needed.
Stability Generally stable (less fluctuation) across different samples from the same population, but this stability is compromised by outliers. More stable than the mean in the presence of outliers or skewness. Can be unstable, especially in small datasets, where adding or removing a few values can drastically change the mode. It can also be highly sensitive to the grouping of data in frequency distributions.

Choosing the Right Measure

Selecting the most appropriate measure of central tendency is a crucial step in data analysis. Consider the following guidelines:

In practice, it is often beneficial to calculate and report more than one measure of central tendency to provide a more complete description of the data's center and shape. Examining all three measures together can reveal important characteristics of the distribution.