Applied Mathematics for Class 11th & 12th (Concepts and Questions) | ||
---|---|---|
11th | Concepts | Questions |
12th | Concepts | Questions |
Content On This Page | ||
---|---|---|
Time Series | Components of Time Series | Time Series analysis for Univariate Data |
Secular Trend | Methods of Measuring Trend |
Chapter 8 Index Numbers and Time Based Data (Concepts)
Welcome to this crucial chapter exploring specialized statistical tools predominantly employed within economics, business, and finance: Index Numbers and Time Series Analysis. These techniques are fundamental in Applied Mathematics for quantifying and interpreting changes over time, analyzing economic trends, and making informed forecasts. While descriptive statistics provides snapshots, the methods covered here focus explicitly on dynamics and comparisons across different periods or conditions. Understanding index numbers allows us to measure relative changes in complex variables like price levels (inflation/deflation) or industrial production, while time series analysis equips us with methods to dissect data collected chronologically, identify underlying patterns, and predict future values – essential skills for economic modeling, business planning, and policy making.
The first major focus is on Index Numbers, sophisticated statistical measures designed to represent the average change in a variable (or group of related variables) over time, often relative to a specific base period. We move beyond simple observation to quantify change systematically. We begin with Simple Index Numbers, exploring the straightforward Simple Aggregative method and the Simple Average of Price Relatives. However, recognizing that different items often hold varying degrees of importance (e.g., staple foods vs. luxury goods in a cost of living index), we delve deeply into Weighted Index Numbers. Key methodologies involving weights (typically based on quantity consumed or value) are examined, including:
- Laspeyres' Price Index: Utilizes base year quantities as weights ($P_{01} = \frac{\sum p_1 q_0}{\sum p_0 q_0} \times 100$).
- Paasche's Price Index: Employs current year quantities as weights ($P_{01} = \frac{\sum p_1 q_1}{\sum p_0 q_1} \times 100$).
- Fisher's Ideal Index: Often considered superior as it satisfies certain statistical tests (like Time Reversal and Factor Reversal tests), calculated as the geometric mean of Laspeyres' and Paasche's indices ($P_{01} = \sqrt{\frac{\sum p_1 q_0}{\sum p_0 q_0} \times \frac{\sum p_1 q_1}{\sum p_0 q_1}} \times 100$).
We also discuss the significance and interpretation of widely published indices like the Consumer Price Index (CPI) and Wholesale Price Index (WPI), understanding their critical role in measuring inflation, adjusting wages and contracts (often involving $\textsf{₹}$ values), and guiding economic policy decisions.
The second part of this chapter shifts focus to Time Series Analysis, which deals specifically with data points collected sequentially over regular intervals (e.g., monthly sales, annual GDP, daily stock prices). The primary objective is to analyze historical data to identify underlying patterns and components, ultimately enabling us to make informed forecasts about future values. A time series is typically considered to be composed of several distinct components:
- Secular Trend (T): The smooth, long-term direction or general tendency of the series (e.g., consistent growth or decline over years).
- Seasonal Variation (S): Patterns that repeat predictably over a fixed period, usually within a year (e.g., higher sales during holidays, weather-related fluctuations).
- Cyclical Variation (C): Longer-term fluctuations or oscillations around the secular trend, often associated with business cycles or economic expansions and contractions (periods typically longer than one year).
- Irregular / Random Variation (I): Unpredictable, erratic fluctuations due to random events like strikes, natural disasters, or unforeseen factors.
While understanding all components is important, a primary focus in introductory analysis is often on identifying and measuring the Secular Trend (T), as it represents the fundamental long-term movement. We explore common methods for achieving this: the Method of Moving Averages, which helps smooth out short-term fluctuations to reveal the underlying trend, and the Method of Least Squares, a statistical technique used to fit a mathematical trend line (most commonly a straight line, $Y_c = a + bX$) to the data, allowing for quantitative trend description and basic extrapolation for forecasting. Mastering these tools provides a powerful quantitative basis for economic analysis, business strategy, and future planning.
Time Series
In many fields of study and areas of application, data are collected sequentially over time. This type of data, where observations are recorded at successive points or periods in time, is known as a time series. The chronological order of the observations is a crucial characteristic of time series data, distinguishing it from other types of data where the order of collection doesn't matter. Time series analysis involves methods for analyzing such data to extract meaningful statistics and other characteristics.
A time series is formally defined as a sequence of observations $\{Y_t\}$ measured at specified time points $\{t_1, t_2, \dots, t_n\}$. The time points are typically equally spaced, such as hourly, daily, weekly, monthly, quarterly, or annually. The variable being measured, $Y$, is observed over time $t$, and its value at time $t$ is denoted by $Y_t$.
Examples of Time Series Data
Time series data is abundant in various disciplines. Here are some examples:
- Economic Data: Annual Gross Domestic Product (GDP) of a country, monthly Consumer Price Index (CPI), quarterly unemployment rates, daily stock market closing prices, yearly inflation rates.
- Business Data: Monthly sales revenue of a company, weekly footfall in a retail store, quarterly production volume, annual profit figures.
- Environmental Data: Hourly temperature readings, daily rainfall amounts, annual average sea level, monthly pollution levels in a city.
- Medical Data: Daily number of new reported cases of a disease, patient's heart rate recorded over time, annual birth rates.
- Social Data: Yearly population census data, monthly number of traffic accidents, number of website visits per hour.
In these examples, time is the independent variable (often denoted simply by an index like $t$), and the measured quantity (GDP, sales, temperature, etc.) is the dependent variable, whose value changes over time.
Importance and Objectives of Time Series Analysis
Analyzing time series data is vital for understanding past phenomena, monitoring current states, and forecasting future developments. The main objectives of time series analysis include:
- Description: Identifying and describing the salient features, patterns, and components present in the time series data, such as trends, seasonal variations, and cycles.
- Explanation: Understanding the underlying forces or factors that influence the behavior of the time series and explain the observed patterns.
- Forecasting (Prediction): Using the patterns identified from past data to predict future values of the time series. This is a primary application in areas like economic planning, inventory management, and financial forecasting.
- Control: Using the insights gained from time series analysis to monitor a process and make adjustments or interventions to keep the variable within desired bounds or achieve specific targets.
Effective time series analysis allows us to make informed decisions based on data that evolves over time.
Graphical Representation of Time Series
The most fundamental step in analyzing a time series is often to visualize it using a graph. A time series plot is a simple yet powerful way to display the data and visually inspect for patterns such as trends, seasonality, or irregular fluctuations.
A time series plot is typically a line graph where the horizontal axis represents time (with sequential time points or intervals) and the vertical axis represents the value of the variable being measured at each time point. The data points are plotted and usually connected by lines to show the progression over time.
Example of a simple time series plot showing hypothetical monthly sales over a few years:

Visual inspection of such a graph can reveal:
- Whether there is an upward or downward trend over the long term.
- Whether there are recurring patterns within specific periods (like months or quarters), indicating seasonality.
- Whether there are longer-term cycles that are not strictly seasonal.
- Whether there are unusual spikes or drops that seem random or due to specific events (irregular movements).
Components of Time Series
Observed time series data often exhibit various patterns and fluctuations when plotted. These movements are typically not due to a single factor but are the result of the combined influence of several distinct forces operating over different time horizons. Time series analysis often involves decomposing the observed series into these underlying components to understand their individual behaviors and impacts.
Traditionally, the movements in a time series are categorized into four principal components:
- Secular Trend (T)
- Seasonal Variation (S)
- Cyclical Variation (C)
- Irregular or Random Variation (I)
The observed value of the time series at any point in time ($Y_t$) is considered to be a composite effect of these four components at that specific time.
Explanation of Components
1. Secular Trend ($T_t$)
The Secular Trend, or simply Trend, represents the smooth, long-term general movement of the time series data over a considerable period. It indicates the underlying direction or tendency of the series to increase, decrease, or remain relatively constant over time, disregarding short-term ups and downs. Trends reflect the fundamental systematic forces that influence the variable in the long run.
Characteristics of Trend:
- It is a smooth, persistent, and unidirectional movement.
- It is observed over a long period (years, decades).
- It is typically caused by factors that evolve slowly but steadily, such as:
- Population growth or decline.
- Technological advancements.
- Changes in consumer preferences.
- Large-scale economic development or decline.
- Changes in infrastructure (e.g., road networks, communication).
Examples: The increasing trend in India's population over the past century, the decreasing trend in the price of electronic goods over the years due to technological advancements, the increasing trend in global temperatures.
2. Seasonal Variation ($S_t$)
Seasonal Variation refers to patterns of change in a time series that repeat with regularity over a fixed and known period, typically within a year. These fluctuations are predictable in their timing and duration and are often influenced by calendar-related factors.
Characteristics of Seasonal Variation:
- It is a regular, periodic fluctuation.
- The period of repetition is constant and less than or equal to one year (e.g., quarterly, monthly, weekly, daily).
- It is caused by factors such as:
- Climate and weather conditions (e.g., sales of umbrellas during monsoon, sale of winter clothing).
- Customs and traditions (e.g., increased retail sales during festivals like Diwali, Eid, Christmas).
- Holidays and school vacations (e.g., increased tourism during summer holidays).
- Fixed calendar events (e.g., tax filing deadlines, academic year schedules).
Examples: Higher sales of ice cream during summer months, peak electricity consumption during specific seasons, increased passenger traffic on railways during festival periods, daily patterns in traffic flow.
3. Cyclical Variation ($C_t$)
Cyclical Variation (or Cycles) refers to oscillations or swings in the time series that occur over periods longer than a year, and are not strictly periodic like seasonal variations. These fluctuations are often associated with the alternating phases of expansion and contraction in overall economic activity, commonly known as business cycles.
Characteristics of Cyclical Variation:
- It is a wave-like oscillation around the trend.
- The period of oscillation is generally longer than one year (typically 2 to 10 years or more).
- The variations are not as regular in terms of timing or magnitude as seasonal variations.
- It is often caused by macro-economic factors, such as:
- Changes in the business cycle (prosperity, recession, depression, recovery).
- Major technological innovations.
- Changes in government policies affecting the economy.
Examples: Fluctuations in employment rates over several years reflecting economic cycles, cycles in the stock market over periods longer than a year, real estate market cycles.
4. Irregular or Random Variation ($I_t$)
Irregular or Random Variation (also called Erratic or Chance Variation) represents the unsystematic, unpredictable fluctuations in the time series that are not accounted for by trend, seasonality, or cyclical components. These variations are residual effects caused by random or unforeseen events.
Characteristics of Irregular Variation:
- It is unpredictable, erratic, and does not follow any discernible pattern.
- It is typically short-lived.
- It is caused by sudden, unexpected events such as:
- Natural disasters (earthquakes, floods, droughts).
- Wars or political upheavals.
- Strikes or lockouts.
- Sudden changes in government policy.
- Unusual weather events.
Examples: A sudden drop in tourism revenue due to a terrorist attack, a sharp increase in the price of a commodity due to a major supply disruption, a spike in demand caused by an unexpected event.
In some analyses, particularly when seasonal and irregular movements are difficult to separate or are considered together, the variation other than trend and cycle is sometimes grouped under "short-term fluctuations," and the irregular and cyclical components are sometimes combined, though they represent different time scales of variation.
Models for Time Series Decomposition
The relationship between the observed time series value ($Y_t$) and its four components ($T_t, S_t, C_t, I_t$) can be represented by different mathematical models. The two most common decomposition models are the additive model and the multiplicative model. The choice of model depends on the nature of the data and how the components are believed to interact.
Additive Model
In the Additive Model, it is assumed that the observed value of the time series is the sum of its four components. This model is appropriate when the magnitude of the seasonal, cyclical, and irregular variations remains relatively constant over time, regardless of the level of the trend.
$\mathbf{Y_t = T_t + S_t + C_t + I_t}$
... (1)
Here, $Y_t$ is the observed value, $T_t$ is the trend component, $S_t$ is the seasonal component, $C_t$ is the cyclical component, and $I_t$ is the irregular component at time $t$. The units of $S_t, C_t,$ and $I_t$ are the same as the units of $Y_t$ and $T_t$.
Multiplicative Model
In the Multiplicative Model, it is assumed that the observed value of the time series is the product of its four components. This model is more common when the magnitude of the seasonal, cyclical, and irregular variations is proportional to the level of the trend. This means that the fluctuations tend to become larger as the overall level of the series increases.
$\mathbf{Y_t = T_t \times S_t \times C_t \times I_t}$
... (2)
In this model, $T_t$ is typically in the units of $Y_t$, while $S_t, C_t,$ and $I_t$ are often expressed as ratios or percentages around 1 (e.g., a seasonal index of 1.20 means the value is 20% above the trend/cyclical average for that season).
The multiplicative model can be transformed into an additive model by taking the logarithm of both sides:
$\log(Y_t) = \log(T_t \times S_t \times C_t \times I_t)$
$\log(Y_t) = \log(T_t) + \log(S_t) + \log(C_t) + \log(I_t)$
[Using $\log(abc) = \log a + \log b + \log c$]
This logarithmic transformation can be useful because many statistical techniques are designed for additive models.
In some basic time series analysis contexts, the cyclical and irregular components are combined into a residual component ($R_t$). The models then become:
- Additive Model: $Y_t = T_t + S_t + R_t$
- Multiplicative Model: $Y_t = T_t \times S_t \times R_t$
where $R_t$ contains the combined effects of cyclical and irregular variations. The specific methods used for time series analysis often depend on which decomposition model is assumed.
Time Series Analysis for Univariate Data
Building upon the concept of a time series, we now turn our attention to analysing such data. Univariate time series data refers to a time series dataset that consists of observations of a single variable recorded over time. The analysis of univariate time series focuses solely on the patterns and characteristics within this single sequence of data points, without explicitly considering the influence of other external variables. This is in contrast to multivariate time series analysis, where multiple variables are recorded over time and their relationships are analyzed.
Analyzing univariate time series data is a fundamental aspect of forecasting and understanding dynamic phenomena. It involves identifying the structure and patterns within the data itself to model its behaviour and predict its future values.
Goals of Univariate Time Series Analysis
The primary objectives of analyzing a univariate time series are typically geared towards understanding the underlying process that generated the data and using this understanding to make informed decisions or predictions. The main goals include:
- Description: To identify, isolate, and quantify the different components of the time series (Trend, Seasonality, Cyclical, and Irregular). This helps in understanding the historical behaviour and characteristics of the variable.
- Explanation: To build a model that explains how the values of the time series are generated. While univariate analysis focuses on the series itself, the explanation often involves understanding the types of underlying forces (economic, seasonal, random) represented by the identified components.
- Forecasting (Prediction): To predict future values of the time series based on the patterns and relationships observed in the past data. This is a critical application in business planning, economic forecasting, and resource management.
- Control: To use the insights gained from the analysis to monitor a process over time and detect deviations from expected behavior, allowing for timely interventions or adjustments.
Process of Univariate Time Series Analysis
Analyzing a univariate time series typically involves a systematic process. While different methodologies exist (like classical decomposition, ARIMA models, etc.), a general process often includes the following steps:
- Data Collection and Preliminary Analysis:
- Gather the time series data for the variable of interest, ensuring the data is recorded at regular time intervals.
- Plotting the Data: Create a time series plot (line graph of the variable vs. time). This initial visualization is crucial for identifying prominent patterns like overall trend, repeating seasonal fluctuations, and unusual observations (outliers).
- Inspect for missing values, data errors, or structural breaks in the series.
- Model Specification/Identification:
- Based on the visual inspection and possibly other statistical tools (like autocorrelation plots), determine the presence and nature of the time series components.
- Decide on an appropriate model for decomposition (additive or multiplicative), considering whether the magnitude of fluctuations changes with the level of the series.
- Select a statistical model or method suitable for the identified patterns (e.g., simple moving averages, weighted moving averages, method of least squares for trend; methods for seasonal index calculation).
- Model Fitting/Estimation:
- Apply the chosen statistical method to the data to estimate the parameters of the model.
- Decomposition: This step involves separating the observed time series into its estimated components ($T_t, S_t, C_t, I_t$ or $T_t, S_t, R_t$) based on the chosen model (additive or multiplicative). Methods like calculating moving averages to estimate the trend, or calculating seasonal indices, fall under this step.
- Model Evaluation/Diagnostic Checking:
- Assess how well the fitted model describes the historical data.
- Examine the residual component (what's left after removing trend, seasonality, and cycle) to see if it behaves like random noise (as expected for the irregular component). Statistical tests for randomness can be used here.
- Check if the assumptions underlying the chosen method are reasonably met.
- If the model is not satisfactory, refine the model specification or try alternative methods, and repeat the fitting and evaluation steps.
- Forecasting (Prediction):
- Once a satisfactory model is obtained, use it to generate forecasts for future time periods.
- Forecasting typically involves projecting the trend and seasonal components into the future and potentially accounting for cyclical patterns. The irregular component is usually unpredictable and assumed to average out to zero in the long run.
- Provide forecast intervals (prediction intervals) along with point forecasts to indicate the uncertainty associated with the predictions.
At the level of Class 12 Applied Maths, the focus is primarily on the initial steps of univariate time series analysis, particularly the description of components and the fundamental methods for measuring the secular trend. Understanding these concepts is essential before delving into more complex forecasting models.
Secular Trend
As discussed in the previous section, a time series can be broken down into several components that represent different types of movements. Among these, the Secular Trend, often simply referred to as the Trend, is the most fundamental component representing the long-term direction of the time series.
The Secular Trend describes the underlying smooth, persistent movement of a time series over a considerable period of time. It reflects the general tendency of the variable to increase, decrease, or remain stable over the long run, effectively smoothing out the temporary fluctuations caused by seasonal, cyclical, and irregular factors. It captures the evolutionary pattern of the series driven by fundamental, long-acting forces.
Characteristics of Secular Trend
The key characteristics that define a secular trend are:
- Long-Term Nature: A secular trend reflects changes that occur over an extended period, typically spanning several years or even decades. Short-term fluctuations (like monthly ups and downs within a year) are considered deviations from this long-term movement.
- Smoothness: When plotted, the trend is usually represented by a smooth line or curve that captures the overall direction of the data while averaging out or bypassing the short-term variability. It is a gradual, evolutionary change.
- Reflection of Fundamental Forces: The secular trend is caused by deep-rooted factors that affect the underlying structure of the phenomenon being observed. These are influences that change gradually over time. Examples include:
- Population changes (growth, migration).
- Technological advancements and innovations.
- Changes in consumer tastes, preferences, or lifestyles.
- Economic development, industrial growth, or decline.
- Changes in infrastructure, education levels, or healthcare.
- Inflation or deflation over long periods.
The secular trend is a crucial component because it often represents the most predictable part of the time series and is typically the primary basis for long-term forecasting.
Types of Trends
While the secular trend is a smooth, long-term movement, it can exhibit different forms depending on how the variable changes over time. The main types of trends are classified based on the shape of the trend line or curve:
1. Linear Trend
A Linear Trend occurs when the time series values tend to increase or decrease by a roughly constant amount in each unit of time. When plotted, the trend appears to follow a straight line.
Mathematically, a linear trend can be represented by a linear equation relating the variable $Y_t$ to time $t$:
$\mathbf{T_t = a + bt}$
... (1)
where:
- $T_t$ is the trend value at time $t$.
- $t$ is the time period index (e.g., $t=1, 2, 3, \dots$).
- $a$ is the intercept, representing the trend value at time $t=0$ (often the base period).
- $b$ is the slope, representing the constant amount of increase (if $b > 0$) or decrease (if $b < 0$) in the trend value per unit of time.
Example: If annual sales increase by approximately ₹50 Lakh each year.
2. Non-linear Trend
A Non-linear Trend occurs when the time series values do not increase or decrease by a constant amount each period. The rate of change itself might be increasing or decreasing over time. When plotted, the trend appears as a curve rather than a straight line. Non-linear trends can take various functional forms:
- Polynomial Trend: The trend can be represented by a polynomial function of time. The most common is a quadratic trend (polynomial of degree 2), which represents a trend that is accelerating or decelerating.
$\mathbf{T_t = a + bt + ct^2}$
... (2)
- Exponential Trend: An exponential trend is appropriate when the series increases or decreases by a relatively constant percentage each period, rather than a constant amount. This is common in phenomena exhibiting percentage growth rates (like population growth or inflation).
$\mathbf{T_t = ab^t}$
... (3)
$\log T_t = \log a + t \log b$
... (4)
- Other non-linear forms can include Gompertz curves or logistic curves, which model growth that approaches an upper limit (saturation level), often used in areas like biological growth or product life cycles.
Identifying the appropriate type of trend and measuring it accurately is a crucial step in time series analysis. The methods used to measure the trend help in isolating this long-term pattern, which can then be used for forecasting or for further analyzing the other components of the time series.
Methods of Measuring Trend
Identifying and quantifying the secular trend is a crucial step in time series analysis. Once the trend is measured, it can be used for forecasting or removed from the original series to analyse other components like seasonality and cyclical variations. Various methods exist for measuring trend, each with its own advantages and disadvantages. These methods range from simple visual approaches to more analytical and mathematical techniques.
1. Method of Freehand Curve (Graphical Method)
This is the simplest and most subjective method for estimating the trend. It involves plotting the original time series data on a graph. After plotting all the data points, a smooth curve or straight line is drawn through the points, visually attempting to follow the general direction of the data and average out the short-term fluctuations (seasonal, cyclical, and irregular movements). The drawn line represents the estimated secular trend.
Procedure:
- Plot the time series data on a graph with time on the x-axis and the variable value on the y-axis.
- Observe the general direction of the plotted points over the entire period.
- Draw a smooth curve or line that passes through the center of the plotted points, aiming to balance the points above and below the line.
Pros:
- It is very easy to understand and quick to implement.
- It can be used to represent both linear and non-linear trends visually.
- It is flexible and doesn't require prior assumptions about the mathematical form of the trend.
Cons:
- It is highly subjective; different individuals may draw different trend lines for the same data.
- It does not provide a mathematical equation for the trend, making it difficult to calculate precise trend values for specific time periods or to extrapolate for forecasting.
- Its accuracy depends heavily on the judgment and experience of the person drawing the curve.
2. Method of Semi-Averages
This method provides a more objective way to determine a linear trend. It involves dividing the time series data into two equal halves and calculating the arithmetic mean (semi-average) for the data values in each half. A straight line connecting these two semi-averages, plotted at the midpoint of their respective time periods, is considered the trend line.
Procedure:
- Divide the entire time series data into two equal parts chronologically.
- If the number of periods ($n$) is even, divide the data exactly into two halves of $n/2$ periods each.
- If the number of periods ($n$) is odd, the middle period's data point is usually omitted, and the remaining $n-1$ periods are divided into two equal halves of $(n-1)/2$ periods each.
- Calculate the arithmetic mean (semi-average) of the data values for the first half ($\bar{Y}_1$) and the second half ($\bar{Y}_2$).
- Determine the time point corresponding to the midpoint of the first half's period and the time point corresponding to the midpoint of the second half's period.
- Plot the point corresponding to the first semi-average at the midpoint of its period $(\text{Midpoint}_1, \bar{Y}_1)$ and the point corresponding to the second semi-average at the midpoint of its period $(\text{Midpoint}_2, \bar{Y}_2)$.
- Draw a straight line connecting these two plotted points. This line is the estimated linear trend line.
- Optionally, find the equation of this line to represent the trend mathematically, typically in the form $T_t = a + bt$, where $t$ is the time index.
Example illustrating midpoints:
If years are 2015, 2016, 2017, 2018, 2019, 2020 ($n=6$, even).
Half 1: 2015, 2016, 2017. Midpoint time: $\frac{2015+2016+2017}{3} = 2016$. Or simply the middle year of the half.
Half 2: 2018, 2019, 2020. Midpoint time: $\frac{2018+2019+2020}{3} = 2019$.
Plot $(\text{2016}, \bar{Y}_1)$ and $(\text{2019}, \bar{Y}_2)$.
If years are 2015, 2016, 2017, 2018, 2019, 2020, 2021 ($n=7$, odd). Middle year 2018 is omitted. Half 1: 2015, 2016, 2017. Midpoint time: 2016. Half 2: 2019, 2020, 2021. Midpoint time: 2020. Plot $(\text{2016}, \bar{Y}_1)$ and $(\text{2020}, \bar{Y}_2)$.
Pros:
- It is simpler and more objective than the freehand curve method.
- It provides a linear equation that can be used for basic forecasting (extrapolation).
Cons:
- It assumes that the underlying trend is strictly linear, which is often not true for real-world data.
- It is sensitive to extreme values in the first or second half of the data.
- It does not use all the information in the data as effectively as methods like Least Squares.
3. Method of Moving Averages
The Method of Moving Averages is a technique used to smooth out short-term fluctuations (seasonal, cyclical, and irregular) in a time series to reveal the underlying trend. It does this by calculating a series of averages of different subsets of the full data set. A moving average is the average of data points within a fixed-size window that moves forward one period at a time.
Procedure:
- Choose the Order ($m$) of the Moving Average: This is the number of periods to include in each average calculation. The order is typically chosen to be equal to the period of the seasonal variation you want to remove (e.g., 4 for quarterly data, 12 for monthly data, 7 for daily data with weekly seasonality) or the approximate length of a cycle.
- Calculate the Moving Totals: Sum the data values for the first $m$ periods ($Y_1, Y_2, \dots, Y_m$). Then, shift the window one period forward and sum the values for the next $m$ periods ($Y_2, Y_3, \dots, Y_{m+1}$), and so on, until the last possible group of $m$ periods ($Y_{n-m+1}, \dots, Y_n$).
- Calculate the Moving Averages: Divide each moving total by the order $m$ to get the moving average.
- Centering the Moving Averages (if $m$ is even): If the order $m$ is an odd number, the calculated moving average corresponds to the time period at the center of the $m$-period window. If $m$ is an even number, the moving average falls between two periods. To center it to an actual time period, you calculate a 2-period moving average of the moving averages (this is called a centered moving average).
Example (3-Year Moving Average, $m=3$ is odd):
Year ($t$) | Value ($Y_t$) | 3-Year Moving Total | 3-Year Moving Average (Trend Value at $t$) |
---|---|---|---|
2018 | $Y_{2018}$ | - | - |
2019 | $Y_{2019}$ | $Y_{2018}+Y_{2019}+Y_{2020}$ | $(Y_{2018}+Y_{2019}+Y_{2020})/3$ (Placed at Year 2019) |
2020 | $Y_{2020}$ | $Y_{2019}+Y_{2020}+Y_{2021}$ | $(Y_{2019}+Y_{2020}+Y_{2021})/3$ (Placed at Year 2020) |
2021 | $Y_{2021}$ | ... | ... |
... | ... | ... | ... |
Example (4-Quarter Moving Average, $m=4$ is even - Requires Centering):
Year & Quarter ($t$) | Value ($Y_t$) | 4-Quarter Moving Total | 4-Quarter Moving Average (at midpoint) | 2-term Centered Moving Total | 4-Quarter Centered Moving Average (Trend Value at $t$) |
---|---|---|---|---|---|
2020 Q1 | $Y_1$ | - | - | - | - |
2020 Q2 | $Y_2$ | - | - | - | - |
2020 Q3 | $Y_3$ | $Y_1+Y_2+Y_3+Y_4$ | $MA_1 = (Y_1+...+Y_4)/4$ (Placed between Q2 & Q3 2020) |
- | - |
2020 Q4 | $Y_4$ | $Y_2+Y_3+Y_4+Y_5$ | $MA_2 = (Y_2+...+Y_5)/4$ (Placed between Q3 & Q4 2020) |
$MA_1 + MA_2$ | $(MA_1 + MA_2)/2$ (Placed at 2020 Q3) |
2021 Q1 | $Y_5$ | $Y_3+Y_4+Y_5+Y_6$ | $MA_3 = (Y_3+...+Y_6)/4$ (Placed between Q4 2020 & Q1 2021) |
$MA_2 + MA_3$ | $(MA_2 + MA_3)/2$ (Placed at 2020 Q4) |
... | ... | ... | ... | ... | ... |
The resulting moving averages represent the estimated trend values, smoothed by the averaging process.
Pros:
- Relatively simple concept and calculation (though centering for even order requires an extra step).
- Effective at smoothing out variations and revealing the underlying trend.
- By choosing the order of the moving average equal to the period of seasonality (e.g., 12 for monthly data), it effectively removes the seasonal component.
Cons:
- Trend values cannot be calculated for the first $(m-1)/2$ periods and the last $(m-1)/2$ periods of the series when $m$ is odd. For even $m$, $(m/2)$ periods at the start and end lose trend values.
- It does not provide a mathematical equation for the trend, making it less suitable for precise long-term forecasting (extrapolation).
- The choice of the order $m$ can be subjective if the periodic variations are not clearly defined.
4. Method of Least Squares (Fitting a Mathematical Curve)
The Method of Least Squares is a mathematical and objective technique for finding the best-fitting trend line or curve to a time series dataset. It works by minimizing the sum of the squared vertical distances between the observed data points and the points on the fitted curve. This method provides a mathematical equation for the trend, which is highly useful for forecasting and further analysis.
Fitting a Straight Line Trend
This method assumes that the secular trend can be represented by a straight line. The equation for the trend line is:
$\mathbf{Y_t = a + bX_t}$
... (1)
where:
- $Y_t$ is the estimated trend value at time $t$.
- $X_t$ is the time variable (e.g., year, month, or a coded time index).
- $a$ is the intercept (the estimated trend value when $X_t = 0$).
- $b$ is the slope (the estimated average change in the trend value per unit increase in $X_t$).
$\mathbf{\sum Y = na + b \sum X}$
... (2)
$\mathbf{\sum XY = a \sum X + b \sum X^2}$
... (3)
where $n$ is the number of data points (periods). To find the trend line, you calculate the sums $\sum Y$, $\sum X$, $\sum XY$, $\sum X^2$ from the data, substitute them into these two equations, and solve for $a$ and $b$ simultaneously.
Simplifying Calculations by Shifting the Origin (Coding Time)
Solving the normal equations can be simplified significantly by coding the time variable $X_t$ such that the sum of the coded time values is zero ($\sum X = 0$). This is achieved by choosing the origin of the coded time variable appropriately.
Case 1: Number of periods ($n$) is odd. Let the middle period be the origin (coded as 0). The periods before the middle are coded as $-1, -2, -3, \dots$ and the periods after are coded as $1, 2, 3, \dots$. For example, if the years are 2018, 2019, 2020, 2021, 2022 ($n=5$), the middle year is 2020. The coded time $X_t$ for these years would be -2, -1, 0, 1, 2 respectively. The sum $\sum X = -2 + (-1) + 0 + 1 + 2 = 0$. When $\sum X = 0$, the normal equations simplify:
$\sum Y = na + b(0) \implies \mathbf{a = \frac{\sum Y}{n} = \bar{Y}}$
[From Eq (2) with $\sum X=0$] ... (4)
$\sum XY = a(0) + b \sum X^2 \implies \mathbf{b = \frac{\sum XY}{\sum X^2}}$
[From Eq (3) with $\sum X=0$] ... (5)
Calculating $a$ and $b$ becomes much simpler.
Case 2: Number of periods ($n$) is even. The middle falls between two periods. The origin is taken as the midpoint between the two middle periods. To get integer coded values that sum to zero, the time units are typically defined as half of the interval between periods. For example, if years are 2018, 2019, 2020, 2021 ($n=4$), the midpoint is between 2019 and 2020. We code 2019 as -1 and 2020 as +1 (representing -0.5 and +0.5 years from the midpoint, scaled by 2). The coded time $X_t$ for the years 2018, 2019, 2020, 2021 would be -3, -1, 1, 3. The sum $\sum X = -3 + (-1) + 1 + 3 = 0$. The unit of $X_t$ in this case is half a year. With $\sum X = 0$, the simplified normal equations (4) and (5) are still valid for calculating $a$ and $b$. Remember that $b$ will represent the change per unit of $X_t$ (e.g., per half-year), which might need to be converted back to change per year if required.
Fitting a Quadratic Trend
If the trend appears non-linear, a quadratic trend curve can be fitted using the method of least squares. The equation for a quadratic trend is:
$\mathbf{Y_t = a + bX_t + cX_t^2}$
... (6)
where $a, b, c$ are constants to be determined. Minimizing the sum of squared errors $\sum (Y_t^{observed} - (a + bX_t + cX_t^2))^2$ leads to a system of three normal equations:
$\sum Y = na + b \sum X + c \sum X^2$
... (7)
$\sum XY = a \sum X + b \sum X^2 + c \sum X^3$
... (8)
$\sum X^2Y = a \sum X^2 + b \sum X^3 + c \sum X^4$
... (9)
Calculations for a quadratic trend are also simplified by coding the time variable $X_t$ such that $\sum X = 0$ (using the same method as for the linear trend, Cases 1 and 2). If the time periods are symmetrically coded around the origin (which they are with the standard odd/even period coding), then $\sum X^3 = 0$. The normal equations then become:
$\sum Y = na + c \sum X^2$
[From Eq (7) with $\sum X=0$]
$\sum XY = b \sum X^2$
[From Eq (8) with $\sum X=0, \sum X^3=0$]
$\sum X^2Y = a \sum X^2 + c \sum X^4$
[From Eq (9) with $\sum X^3=0$]
From the second simplified equation, $b = \frac{\sum XY}{\sum X^2}$. The first and third simplified equations form a system of two linear equations in $a$ and $c$, which can be solved simultaneously to find their values.
The method of least squares can be extended to fit higher-degree polynomials or other functional forms if the trend requires it.
Pros:
- It is an objective method; the trend equation is uniquely determined by the data and the chosen model type.
- It provides a mathematical equation for the trend, enabling precise calculation of trend values for any time period (including future periods for forecasting).
- It utilizes all data points in the series to determine the trend.
- Can fit linear or various non-linear trend shapes by choosing the appropriate function.
Cons:
- It is more complex computationally than the simpler methods, especially for non-linear trends.
- It assumes that the chosen mathematical form (e.g., linear, quadratic) is appropriate for the underlying trend, which might not always be the case.
- The calculated trend line can be sensitive to extreme values (outliers) in the data.
Summary of Methods
The choice of method for measuring trend depends on the nature of the data, the desired level of accuracy, and the purpose of the analysis. The freehand method is for quick visual assessment. Semi-averages is a step towards objectivity for linear trends. Moving averages are effective for smoothing and isolating trend by removing seasonality. The method of least squares is the most analytical and provides a mathematical model for forecasting.
Examples (Method of Least Squares)
Example 1. Fit a straight line trend to the following data using the method of least squares and estimate the trend value for the year 2024.
Year | Sales (in $\textsf{₹}$ Lakhs) |
---|---|
2019 | 12 |
2020 | 15 |
2021 | 17 |
2022 | 20 |
2023 | 21 |
Answer:
Given:
Time series data for Sales for the years 2019 to 2023. Number of years, $n=5$.
To Find:
1. Fit a straight line trend using the method of least squares. 2. Estimate the trend value for the year 2024.
Solution:
We will fit a linear trend equation of the form $Y_t = a + bX_t$, where $Y_t$ is the trend value of sales at time $t$, and $X_t$ is the coded time variable. The number of years is $n=5$ (odd). We choose the middle year, 2021, as the origin for the coded time variable $X$. The coded values for the years will be:
- 2019: $2019 - 2021 = -2$
- 2020: $2020 - 2021 = -1$
- 2021: $2021 - 2021 = 0$
- 2022: $2022 - 2021 = 1$
- 2023: $2023 - 2021 = 2$
We need to calculate $\sum Y$, $\sum X$, $\sum XY$, and $\sum X^2$.
Year | Sales ($Y$) | Coded Time ($X$) | $XY$ | $X^2$ |
---|---|---|---|---|
2019 | 12 | -2 | -24 | 4 |
2020 | 15 | -1 | -15 | 1 |
2021 | 17 | 0 | 0 | 0 |
2022 | 20 | 1 | 20 | 1 |
2023 | 21 | 2 | 42 | 4 |
Total | $\sum Y = 85$ | $\sum X = 0$ | $\sum XY = 23$ | $\sum X^2 = 10$ |
Since $\sum X = 0$, the normal equations simplify to:
$\sum Y = na$
[Simplified Eq (4)]
$\sum XY = b \sum X^2$
[Simplified Eq (5)]
From the first equation:
$85 = 5a$
[Substitute $\sum Y=85, n=5$]
$a = \frac{85}{5} = 17$
... (1)
From the second equation:
$23 = b \times 10$
[Substitute $\sum XY=23, \sum X^2=10$]
$b = \frac{23}{10} = 2.3$
... (2)
The fitted straight line trend equation is $Y_t = a + bX_t$. Substitute the values of $a$ and $b$:
$\mathbf{Y_t = 17 + 2.3X_t}$
[Trend equation with origin at 2021, unit = 1 year]
... (3)
To estimate the trend value for the year 2024, we need to find the coded value of $X$ for 2024. Origin year = 2021.
$X_{2024} = 2024 - 2021 = 3$
Substitute $X_t = 3$ into the trend equation (3):
$Y_{2024} = 17 + 2.3 \times 3$
$Y_{2024} = 17 + 6.9 = 23.9$
... (4)
The estimated trend value of sales for the year 2024 is $\textsf{₹}\$23.9$ Lakhs.
Example 2. Fit a straight line trend to the following data using the method of least squares and estimate the trend value for the year 2025.
Year | Production (in '000 Units) |
---|---|
2018 | 40 |
2019 | 45 |
2020 | 42 |
2021 | 50 |
2022 | 55 |
2023 | 52 |
Answer:
Given:
Time series data for Production for the years 2018 to 2023. Number of years, $n=6$.
To Find:
1. Fit a straight line trend using the method of least squares. 2. Estimate the trend value for the year 2025.
Solution:
We fit a linear trend equation $Y_t = a + bX_t$. The number of years is $n=6$ (even). We choose the midpoint between the two middle years (2020 and 2021) as the origin for the coded time variable $X$. The time interval is 1 year, so we code $X$ in units of half-years to make $\sum X = 0$. The coded values for the years will be:
- 2018: Corresponds to -2.5 years from origin $\implies X = -5$ (unit = 0.5 year)
- 2019: Corresponds to -1.5 years from origin $\implies X = -3$
- 2020: Corresponds to -0.5 years from origin $\implies X = -1$
- 2021: Corresponds to +0.5 years from origin $\implies X = 1$
- 2022: Corresponds to +1.5 years from origin $\implies X = 3$
- 2023: Corresponds to +2.5 years from origin $\implies X = 5$
We need to calculate $\sum Y$, $\sum X$, $\sum XY$, and $\sum X^2$.
Year | Production ($Y$) | Coded Time ($X$) | $XY$ | $X^2$ |
---|---|---|---|---|
2018 | 40 | -5 | -200 | 25 |
2019 | 45 | -3 | -135 | 9 |
2020 | 42 | -1 | -42 | 1 |
2021 | 50 | 1 | 50 | 1 |
2022 | 55 | 3 | 165 | 9 |
2023 | 52 | 5 | 260 | 25 |
Total | $\sum Y = 264$ | $\sum X = 0$ | $\sum XY = 98$ | $\sum X^2 = 70$ |
Since $\sum X = 0$, the normal equations simplify as in the previous example:
$\sum Y = na$
[Simplified Eq (4)]
$\sum XY = b \sum X^2$
[Simplified Eq (5)]
From the first equation:
$264 = 6a$
[Substitute $\sum Y=264, n=6$]
$a = \frac{264}{6} = 44$
... (1)
From the second equation:
$98 = b \times 70$
[Substitute $\sum XY=98, \sum X^2=70$]
$b = \frac{98}{70} = \frac{\cancel{98}^{49}}{\cancel{70}_{35}} = \frac{\cancel{49}^7}{\cancel{35}_5} = \frac{7}{5} = 1.4$
... (2)
The fitted straight line trend equation is $Y_t = a + bX_t$. Substitute the values of $a$ and $b$:
$\mathbf{Y_t = 44 + 1.4X_t}$
[Trend equation with origin at midpoint of 2020 & 2021, unit = 0.5 year]
... (3)
To estimate the trend value for the year 2025, we need to find the coded value of $X$ for 2025. The origin is the midpoint between 2020 and 2021. Year 2025 is 4 years after 2021. The number of half-year units from the origin to 2021 is 1. From 2021 to 2025 is 4 years, which is $4 \times 2 = 8$ half-year units. So the coded value for 2025 is $1 + 8 = 9$. Alternatively, 2025 is 4.5 years from the origin (midpoint of 2020/2021). Coded $X$ is $4.5 \times 2 = 9$.
$X_{2025} = 9$
Substitute $X_t = 9$ into the trend equation (3):
$Y_{2025} = 44 + 1.4 \times 9$
$Y_{2025} = 44 + 12.6 = 56.6$
... (4)
The estimated trend value of Production for the year 2025 is $56.6 \times 1000 = 56,600$ units.