Menu Top
Latest Geography NCERT Notes, Solutions and Extra Q & A (Class 8th to 12th)
8th 9th 10th 11th 12th

Class 12th Chapters
Fundamentals of Human Geography
1. Human Geography - Nature And Scope 2. The World Population - Distribution, Density And Growth 3. Human Development
4. Primary Activities 5. Secondary Activities 6. Tertiary And Quaternary Activities
7. Transport And Communication 8. International Trade
India - People and Economy
1. Population : Distribution, Density, Growth And Composition 2. Human Settlements 3. Land Resources And Agriculture
4. Water Resources 5. Mineral And Energy Resources 6. Planning And Sustainable Development In Indian Context
7. Transport And Communication 8. International Trade 9. Geographical Perspective On Selected Issues And Problems
Practical Work in Geography
1. Data – Its Source And Compilation 2. Data Processing 3. Graphical Representation Of Data
4. Spatial Information Technology



Chapter 2 Data Processing



Measures Of Central Tendency

After collecting and organising data, the next step in data processing is to analyse it using statistical techniques. These techniques help in extracting meaningful insights and summarising the data. Measures of central tendency are a set of statistical methods used for this purpose.


Definition and Purpose

Measures of central tendency aim to identify a single, representative value that best describes the center or typical value of a dataset. When dealing with variables that vary (like rainfall, elevation, population density, test scores), a single number that encapsulates the essence of all observations is often required to understand the dataset efficiently. This representative value usually lies somewhere in the middle of the data distribution, where observations tend to cluster.

These measures are also known as statistical averages because they provide a summary value for the entire data set. They offer a way to represent the entire collection of data points with just one number, making the dataset more comprehensible and easier to compare with others.


Types Of Measures

There are several common measures of central tendency:

Each of these measures uses a different method to determine the 'center' of a distribution and is suitable for different types of data or analytical purposes.



Mean

The Mean is the most commonly used measure of central tendency. It represents the simple arithmetic average of a set of values. The method for calculating the mean differs depending on whether the data is ungrouped (individual values) or grouped (data sorted into classes or intervals).


Computing Mean From Ungrouped Data

Ungrouped data consists of individual observations that have not been sorted into frequency classes.


Direct Method

The direct method for calculating the mean of ungrouped data involves summing all the individual values and dividing the sum by the total number of observations.

The formula for the direct method is:

$ \bar{X} = \frac{\sum x}{N} $

Where:

Example 2.1: Calculate the mean rainfall for Malwa Plateau districts from the data given below:

Districts in Malwa Plateau Normal Rainfall in mm
Indore 979
Dewas 1083
Dhar 833
Ratlam 896
Ujjain 891
Mandsaur 825
Shajapur 977

Answer:

Here, the individual observations (x) are the rainfall values for each district, and the number of observations (N) is the number of districts, which is 7.

Sum of rainfall values ($\sum x$) = $979 + 1083 + 833 + 896 + 891 + 825 + 977 = 6484$ mm

$ N = 7 $

$ \bar{X} = \frac{\sum x}{N} = \frac{6484}{7} = 926.29 \text{ mm} $

The mean rainfall for the Malwa Plateau in this example is 926.29 mm.


Indirect Method

The indirect method is often used for ungrouped data when dealing with a large number of observations or large values, as it simplifies calculations. It involves choosing an 'assumed mean' (A) and calculating the deviations (d) of each observation from this assumed mean ($d = x - A$). The mean is then calculated based on the assumed mean and the average of these deviations.

The formula for the indirect method is:

$ \bar{X} = A + \frac{\sum d}{N} $

Where:

Example 2.1 (Continued): Calculate the mean rainfall using the indirect method, taking 800 as the assumed mean.

Districts in Malwa Plateau Normal Rainfall (x) in mm Deviation (d = x - 800)
Indore 979 179
Dewas 1083 283
Dhar 833 33
Ratlam 896 96
Ujjain 891 91
Mandsaur 825 25
Shajapur 977 177
Sum ($\sum$) 6484 884

Answer:

$ A = 800 $

Sum of deviations ($\sum d$) = $179 + 283 + 33 + 96 + 91 + 25 + 177 = 884$

$ N = 7 $

$ \bar{X} = A + \frac{\sum d}{N} = 800 + \frac{884}{7} = 800 + 126.29 = 926.29 \text{ mm} $

As expected, the mean calculated by the indirect method is the same as that calculated by the direct method.


Computing Mean From Grouped Data

When data is presented as a frequency distribution (grouped into classes), the individual values within each class are not known. Instead, they are represented by the midpoint of their respective class interval.


Direct Method

In the direct method for grouped data, the midpoint of each class is multiplied by its frequency. These products (fx) are summed up, and the total sum is divided by the total number of observations (N, which is the sum of all frequencies).

The formula is:

$ \bar{X} = \frac{\sum fx}{N} $

Where:

Example 2.2: Compute the average wage rate of factory workers using the data given in Table 2.2:

Wage Rate (Rs./day) Number of workers (f)
50 - 70 10
70 - 90 20
90 - 110 25
110 - 130 35
130 - 150 9
Total = 99

Answer:

To calculate the mean, we first need to find the midpoint (x) of each class and then the product (fx) of the midpoint and frequency for each class.

Classes (Wage Rate) Frequency (f) Midpoints (x) fx
50-70 10 60 600
70-90 20 80 1600
90-110 25 100 2500
110-130 35 120 4200
130-150 9 140 1260
Sum ($\sum$) N = 99 $\sum fx = 10160$

$ \bar{X} = \frac{\sum fx}{N} = \frac{10160}{99} = 102.63 \text{ Rs./day (approx)} $

The average wage rate of the factory workers is approximately $\textsf{₹}102.63$ per day.


Indirect Method

The indirect method for grouped data also uses an assumed mean to simplify calculations, especially useful when midpoints or frequencies are large numbers. An assumed mean (A) is selected from the midpoint of one of the classes (often the class near the center). Deviations (d) are calculated for each class midpoint from the assumed mean ($d = x - A$). Alternatively, if class intervals are equal, deviations can be coded (u) by dividing 'd' by the class interval width (i) ($u = d/i$).

The formula for the indirect method using deviations (d) is:

$ \bar{X} = A \pm \frac{\sum fd}{N} $

The formula using coded deviations (u) is:

$ \bar{X} = A \pm \frac{\sum fu}{N} \times i $

Where:

Example 2.2 (Continued): Compute the average wage rate using the indirect method, taking the midpoint of the 90-110 class (which is 100) as the assumed mean. Also, use coded deviations as the class interval width is 20.

Classes (Wage Rate) Frequency (f) Midpoints (x) Deviation (d = x - 100) fd Coded Deviation (u = d/20) fu
50-70 10 60 -40 -400 -2 -20
70-90 20 80 -20 -400 -1 -20
90-110 25 100 0 0 0 0
110-130 35 120 20 700 1 35
130-150 9 140 40 360 2 18
Sum ($\sum$) N = 99 $\sum fd = 260$ $\sum fu = 13$

Answer:

Using the formula with deviations (d):

$ A = 100 $

$ \sum fd = 260 $

$ N = 99 $

$ \bar{X} = A + \frac{\sum fd}{N} = 100 + \frac{260}{99} = 100 + 2.63 = 102.63 \text{ Rs./day (approx)} $

Using the formula with coded deviations (u):

$ A = 100 $

$ \sum fu = 13 $

$ N = 99 $

$ i = 20 $

$ \bar{X} = A + \frac{\sum fu}{N} \times i = 100 + \frac{13}{99} \times 20 = 100 + 0.1313 \times 20 = 100 + 2.63 = 102.63 \text{ Rs./day (approx)} $

Both indirect methods yield the same mean as the direct method.



Median

The Median (M) is a positional average. It represents the value of the middle observation in a dataset that has been arranged in ascending or descending order. The median divides the data into two equal halves: 50% of the observations are below the median, and 50% are above it.

The median is independent of the actual values of extreme observations, making it a suitable measure when the data is skewed or contains outliers.


Computing Median For Ungrouped Data

For ungrouped data, the steps to calculate the median are:

  1. Arrange the data in either ascending or descending order.
  2. Locate the position of the median using the formula: Value of $ (\frac{N+1}{2})^{\text{th}} $ item.
  3. If N is odd, the median is the value of the item at the calculated position.
  4. If N is even, the median is the average of the values of the two middle items (at positions $ N/2 $ and $ (N/2) + 1 $).

Example 2.3: Calculate median height of mountain peaks in parts of the Himalayas using the following data (in meters):

8,126; 8,611; 7,817; 8,172; 8,076; 8,848; 8,598

Answer:

There are 7 observations (N=7), which is an odd number.

1. Arrange the data in ascending order:

7,817; 8,076; 8,126; 8,172; 8,598; 8,611; 8,848

2. Locate the median position: $ (\frac{N+1}{2})^{\text{th}} \text{ item} = (\frac{7+1}{2})^{\text{th}} \text{ item} = (\frac{8}{2})^{\text{th}} \text{ item} = 4^{\text{th}} \text{ item} $

3. The 4th item in the arranged series is 8,172.

$ M = 8,172 \text{ m} $

If there were an even number of observations, say 8, you would average the values at the $ 8/2 = 4^{\text{th}} $ and $ (8/2)+1 = 5^{\text{th}} $ positions.


Computing Median For Grouped Data

For grouped data (frequency distribution), the median is found by first locating the median class and then using a formula to interpolate the median value within that class. The median class is the class interval where the cumulative frequency first exceeds or equals $ N/2 $.

The formula for calculating the median from grouped data is:

$ M = l + \frac{\frac{N}{2} - c}{f} \times i $

Where:

Example 2.4: Calculate the median for the following distribution:

Class f
50-60 3
60-70 7
70-80 11
80-90 16
90-100 8
100-110 5
Total N=50

Answer:

1. Create a cumulative frequency (F) column.

Class Frequency (f) Cumulative Frequency (F)
50-60 3 3
60-70 7 $3 + 7 = 10$
70-80 11 $10 + 11 = 21$ (c)
80-90 16 (f) $21 + 16 = 37$
90-100 8 $37 + 8 = 45$
100-110 5 $45 + 5 = 50$ (N)
N=50

2. Calculate $ N/2 $: $ N/2 = 50/2 = 25 $

3. Find the median class: Look in the cumulative frequency column for the value that is just greater than or equal to 25. This value is 37, which corresponds to the class 80-90. So, the median class is 80-90.

4. Identify the values for the formula:

  • $l$ (Lower limit of median class) = 80
  • $N$ (Total frequency) = 50
  • $c$ (Cumulative frequency of pre-median class, i.e., the class before 80-90) = 21
  • $f$ (Frequency of the median class 80-90) = 16
  • $i$ (Class interval width) = $90 - 80 = 10$

5. Substitute the values into the median formula:

$ M = l + \frac{\frac{N}{2} - c}{f} \times i = 80 + \frac{25 - 21}{16} \times 10 $

$ M = 80 + \frac{4}{16} \times 10 = 80 + 0.25 \times 10 = 80 + 2.5 $

$ M = 82.5 $

The median wage rate is $\textsf{₹}82.5$ per day.



Mode

The Mode (Z or Mo) is defined as the value that appears most frequently in a dataset. It is the observation with the highest frequency of occurrence. Compared to the mean and median, the mode is generally less used in statistical analysis, but it is useful for identifying the most typical or common value in a distribution.

A dataset can have one mode (unimodal), two modes (bimodal), more than two modes (multimodal), or no mode at all if no value is repeated.


Computing Mode For Ungrouped Data

To compute the mode for ungrouped data, simply identify the value that occurs with the highest frequency. Arranging the data in ascending or descending order can help in easily counting the frequency of each distinct value.

Example 2.5: Calculate mode for the following test scores in geography for ten students:

61, 10, 88, 37, 61, 72, 55, 61, 46, 22

Answer:

List the unique scores and count their frequencies:

  • 10: occurs once
  • 22: occurs once
  • 37: occurs once
  • 46: occurs once
  • 55: occurs once
  • 61: occurs three times
  • 72: occurs once
  • 88: occurs once

The score 61 occurs most frequently (3 times). Therefore, the mode is 61.

$ \text{Mode} = 61 $

This distribution is unimodal.

Example 2.6: Calculate the mode using a different sample of ten other students, who scored:

82, 11, 57, 82, 08, 11, 82, 95, 41, 11.

Answer:

List the unique scores and count their frequencies:

  • 08: occurs once
  • 11: occurs three times
  • 41: occurs once
  • 57: occurs once
  • 82: occurs three times
  • 95: occurs once

Both scores 11 and 82 occur with the highest frequency (3 times). Therefore, this dataset has two modes: 11 and 82.

$ \text{Modes} = 11, 82 $

This distribution is bimodal.



Comparison Of Mean, Median And Mode

Comparing the mean, median, and mode helps in understanding the characteristics of a data distribution, especially its shape (symmetry or skewness).


Normal Distribution

In a normal distribution (often represented graphically as a symmetrical, bell-shaped curve), the mean, median, and mode all coincide and are equal to the same value. The highest frequency occurs exactly at the center of the distribution, where the mean, median, and mode are located. In a normal distribution, data is symmetrically distributed around the center, with frequencies gradually decreasing as you move towards the extreme values.

Normal Distribution Curve showing Mean, Median, and Mode at the center

Skewed Distributions

If a dataset is skewed (asymmetrical), the mean, median, and mode will generally not be equal. The relative positions of these measures indicate the direction of the skew.

The relationship between these measures provides insights into the shape of the distribution. The mean is sensitive to extreme values (outliers), while the median is not. The mode is useful for categorical data or identifying the most common category or value.