Chapter 13 Probability (Class 12 - Maths NCERT Concept Notes)
Welcome to Chapter 13: Probability! This chapter significantly advances our study of uncertainty by moving from basic counting to sophisticated analytical models. We begin with Conditional Probability, which calculates the likelihood of an event occurring given that another related event has already happened. The fundamental formula for this is: $$P(A|B) = \frac{P(A \cap B)}{P(B)}$$ where $P(B) \neq 0$. We also explore the Multiplication Rule and define Independent Events, where $P(A \cap B) = P(A) \times P(B)$.
A major highlight is Bayes' Theorem, an essential tool for statistical inference that allows us to reverse conditional probabilities to find the probability of a cause given an effect. Furthermore, we introduce Random Variables and their probability distributions, learning to calculate the Mean (Expected Value) and Variance to summarize random numerical outcomes.
Finally, we master Bernoulli Trials and the Binomial Distribution, which models the probability of $k$ successes in $n$ independent trials using: $$P(X=k) = {^nC_k} p^k q^{n-k}$$ To enhance the understanding of these concepts, this page includes visualizations, flowcharts, mindmaps, and practical examples. This page is prepared by learningspot.co to provide a structured and comprehensive learning experience for every student.
Conditional Probability
In the study of probability, we often encounter situations where the likelihood of an event occurring changes because some related event has already taken place. This is known as Conditional Probability. It is the probability of an event $A$, given that another event $B$ has already occurred.
The core idea is that the original sample space is reduced to a new sample space consisting only of the outcomes of the event that has already occurred.
Illustration: Playing Cards
Consider a standard deck of 52 playing cards. Let us analyze the probability of drawing a Diamond card under different conditions.
Scenario 1: No prior information
Total number of cards in the sample space $n(S) = 52$.
Number of Diamond cards $n(D) = 13$.
The probability $P(D)$ is calculated as:
$P(D) = \frac{n(D)}{n(S)} = \frac{13}{52} = \frac{1}{4}$
Scenario 2: Information given that the card is Red
Suppose we are told that the card drawn is Red. Let this event be $R$.
Since we know the card is red, the outcomes are limited to Hearts and Diamonds only.
Reduced Sample Space $n(R) = 26$.
Favourable outcomes (Diamonds) within this red set $n(D \cap R) = 13$.
The new probability (Conditional Probability) is:
$P(D|R) = \frac{n(D \cap R)}{n(R)} = \frac{13}{26} = \frac{1}{2}$
Illustration: Rolling a Die
Consider the experiment of rolling a single die. The sample space is:
$S = \{1, 2, 3, 4, 5, 6\}$
Let us analyze the probability of getting the number '4' under different conditions.
Scenario 1: No prior information
In a normal throw, any of the six faces can appear. Let event $A$ be getting the number 4.
Total number of outcomes in the sample space $n(S) = 6$.
Number of favourable outcomes $n(A) = 1$.
The probability $P(A)$ is calculated as:
$P(A) = \frac{n(A)}{n(S)} = \frac{1}{6}$
Scenario 2: Information given that the outcome is an Even Number
Now, suppose the die is rolled and you are told that the result is an Even Number, but you haven't seen the face yet. Let this event be $B$.
The outcomes for event $B$ (Even Numbers) are $\{2, 4, 6\}$.
Since we know the result is even, the sample space $S$ is reduced to $B$.
Reduced Sample Space $n(B) = 3$.
Within this new sample space, the only outcome favourable to event $A$ (getting a 4) is $\{4\}$.
Favourable outcomes $n(A \cap B) = 1$.
The new probability (Conditional Probability) is calculated as:
$P(A|B) = \frac{n(A \cap B)}{n(B)} = \frac{1}{3}$
Observation: We can see that $\frac{1}{3} > \frac{1}{6}$. The additional information that the number is even has increased the probability of it being a '4'.
Meaning of $P(A|B)$
The notation $P(A|B)$ represents the conditional probability of event $A$ occurring, given that event $B$ has already occurred.
Key Characteristics:
1. Condition: Event $B$ is the "evidence" or the known fact.
2. Sample Space: The original sample space $S$ is replaced by the set $B$.
3. Focus: We are looking for the portion of $B$ that also belongs to $A$.
$P(A|B) = \frac{P(A \cap B)}{P(B)}$
[Formula for A given B]
Meaning of $P(B|A)$
The notation $P(B|A)$ represents the conditional probability of event $B$ occurring, given that event $A$ has already occurred.
Key Characteristics:
1. Condition: Event $A$ is now the "evidence" or the known fact.
2. Sample Space: The original sample space $S$ is replaced by the set $A$.
3. Focus: We are looking for the portion of $A$ that also belongs to $B$.
$P(B|A) = \frac{P(A \cap B)}{P(A)}$
[Formula for B given A]
Detailed Derivation of Conditional Probability
The concept of Conditional Probability is rooted in the idea of a Reduced Sample Space. When we are given that event $B$ has already occurred, the outcomes that were originally in the sample space $S$ but not in $B$ become irrelevant. The set $B$ now acts as the "new" universe or sample space for any further probability calculations.
Visualizing the Concept with Venn Diagrams
To understand the derivation, we look at the relationship between events $A$ and $B$ using a Venn diagram. In this diagram, the rectangle represents the total sample space $S$, and the circles represent events $A$ and $B$.
When we say "given $B$," we are focusing our attention entirely within the circle of $B$. Within this circle, the only way event $A$ can occur is if the outcome falls in the overlapping region, which is $A \cap B$.
Mathematical Derivation Steps
Let us consider a random experiment with a finite sample space $S$ consisting of equally likely outcomes.
Step 1: Define the Cardinalities
Let the number of elements in each set be represented as follows:
| Notation | Description |
|---|---|
| $n(S)$ | Total number of outcomes in the sample space |
| $n(A)$ | Number of outcomes favourable to event $A$ |
| $n(B)$ | Number of outcomes favourable to event $B$ |
| $n(A \cap B)$ | Number of outcomes common to both $A$ and $B$ |
Step 2: Define Probability in the Reduced Space
If we are given that event $B$ has occurred, $B$ becomes the New Sample Space. The only outcomes in $A$ that can now occur are those which are also in $B$. These are the elements of the intersection $A \cap B$.
Therefore, the conditional probability of $A$ given $B$ is the ratio of favourable outcomes in the intersection to the total outcomes in the new sample space $B$:
$P(A|B) = \frac{n(A \cap B)}{n(B)}$
[By definition of reduced space]
Step 3: Conversion to Probability Terms
To express this formula in terms of the original probabilities $P(A \cap B)$ and $P(B)$, we divide both the numerator and the denominator of the right-hand side by the total number of outcomes $n(S)$:
$P(A|B) = \frac{\frac{n(A \cap B)}{n(S)}}{\frac{n(B)}{n(S)}}$
We know from the basic definition of probability that:
$\frac{n(A \cap B)}{n(S)} = P(A \cap B)$ and $\frac{n(B)}{n(S)} = P(B)$
Substituting these values back into the equation, we get the standard formula:
$P(A|B) = \frac{P(A \cap B)}{P(B)}$
[Provided $P(B) \neq 0$]
Symmetry: Derivation for $P(B|A)$
Similarly, if we are given that event $A$ has already occurred, the sample space is reduced to $A$. The outcomes favourable to $B$ within this reduced space are again the outcomes in $A \cap B$.
Following the same logic:
$P(B|A) = \frac{n(A \cap B)}{n(A)}$
Dividing numerator and denominator by $n(S)$:
$P(B|A) = \frac{n(A \cap B) / n(S)}{n(A) / n(S)}$
$P(B|A) = \frac{P(A \cap B)}{P(A)}$
[Provided $P(A) \neq 0$]
Important Conclusion
From both equations provided above, we can observe that both conditional probabilities depend on the probability of the Joint Event ($A \cap B$). This leads to the Multiplication Theorem of Probability:
$P(A \cap B) = P(B) \cdot P(A|B) = P(A) \cdot P(B|A)$
Important Remarks
1: The Condition for Existence
In the definition of conditional probability $P(A|B)$, it is mathematically and logically required that $P(B) \neq 0$.
1. Mathematical Reason: The formula involves dividing by $P(B)$. Since division by zero is undefined in mathematics, $P(B)$ must be a non-zero value.
2. Logical Reason: If $P(B) = 0$, then event $B$ is an impossible event. To say "given that $B$ has occurred" when $B$ cannot possibly occur is a logical contradiction. Therefore, the condition must be a non-impossible event.
2: The Concept of Reduced Sample Space
One of the most important takeaways is that conditional probability effectively redefines the universe of the experiment. When we are given that event $B$ has occurred, we move from the original sample space $S$ to a restricted or reduced sample space $B$.
For example, if we are looking for a student from a class (Sample Space $S$) who plays cricket (Event $A$), but we are told the student is from Section A (Event $B$), we no longer care about students in Section B or C. Our "new world" is only Section A.
3: Conditional Probability of the Sample Space
The probability of the entire sample space $S$, given that any event $A$ has occurred, is always 1 (Certainty).
Proof:
By the definition of conditional probability:
$P(S|A) = \frac{P(S \cap A)}{P(A)}$
Since event $A$ is a subset of the sample space $S$ ($A \subset S$), their intersection is simply $A$ itself, i.e., $S \cap A = A$. Substituting this:
$P(S|A) = \frac{P(A)}{P(A)} = 1$
[Since $S \cap A = A$]
This logically means that if we know $A$ has happened, it is 100% certain that an outcome from the total sample space $S$ has happened.
4: Probability of an Event Given Itself
The conditional probability of an event $A$, given that event $A$ has already occurred, is always 1.
Proof:
Using the formula:
$P(A|A) = \frac{P(A \cap A)}{P(A)}$
Since the intersection of a set with itself is the set itself ($A \cap A = A$):
$P(A|A) = \frac{P(A)}{P(A)} = 1$
[Identity Property]
5: Event A Given the Sample Space S
The conditional probability of an event $A$, given the sample space $S$, is equal to the unconditional probability of $A$.
Proof:
If the "additional information" given is simply that the outcome belongs to the sample space $S$, it tells us nothing new, because we already know that any outcome must belong to $S$.
$P(A|S) = \frac{P(A \cap S)}{P(S)}$
Since $A \cap S = A$ and $P(S) = 1$ (the probability of a certain event):
$P(A|S) = \frac{P(A)}{1} = P(A)$
[Unconditional Probability]
Properties of Conditional Probability
Conditional probability follows several fundamental properties that are analogous to the properties of unconditional probability. Let $S$ be the sample space and $A, B, F$ be events such that $P(F) \neq 0$.
1: Range of Conditional Probability
The conditional probability of an event $A$ given that $B$ has occurred always lies between $0$ and $1$ inclusive.
Proof:
We know that the intersection of two sets $A$ and $B$ is a subset of $B$. Therefore, $A \cap B \subseteq B$.
Taking probabilities on both sides:
$P(A \cap B) \leq P(B)$
Dividing both sides by $P(B)$ (where $P(B) > 0$):
$\frac{P(A \cap B)}{P(B)} \leq 1$
[Substituting the definition of $P(A|B)$]
Thus, $P(A|B) \leq 1$.
Also, since $P(A \cap B) \geq 0$ and $P(B) > 0$, the ratio must be non-negative:
$P(A|B) \geq 0$
Hence, we conclude:
$0 \leq P(A|B) \leq 1$
2: Addition Theorem for Conditional Probability
If $A$ and $B$ are any two events and $F$ is an event such that $P(F) \neq 0$, then:
$P(A \cup B|F) = P(A|F) + P(B|F) - P(A \cap B|F)$
Proof:
By the definition of conditional probability:
$P(A \cup B|F) = \frac{P((A \cup B) \cap F)}{P(F)}$
Using the distributive law of set theory, $(A \cup B) \cap F = (A \cap F) \cup (B \cap F)$. Thus:
$P((A \cup B) \cap F) = P((A \cap F) \cup (B \cap F))$
Applying the addition rule for probabilities $P(X \cup Y) = P(X) + $$ P(Y) - P(X \cap Y)$:
$P((A \cup B) \cap F) = P(A \cap F) + P(B \cap F) - P(A \cap F \cap B \cap F)$
$P((A \cup B) \cap F) = P(A \cap F) + P(B \cap F) - P(A \cap B \cap F)$
Substituting this value into equation of $P(A \cup B|F)$:
$P(A \cup B|F) = \frac{P(A \cap F) + P(B \cap F) - P(A \cap B \cap F)}{P(F)}$
Separating the terms:
$P(A \cup B|F) = \frac{P(A \cap F)}{P(F)} + \frac{P(B \cap F)}{P(F)} - \frac{P(A \cap B \cap F)}{P(F)}$
$P(A \cup B|F) = P(A|F) + P(B|F) - P(A \cap B|F)$
Special Case: Disjoint Events
If $A$ and $B$ are disjoint (mutually exclusive) events, then $A \cap B = \phi$. This implies $P(A \cap B|F) = 0$.
$P(A \cup B|F) = P(A|F) + P(B|F)$
(For disjoint events)
3: Complementary Event Rule
The probability of the complement of an event $A$ given $B$ is equal to 1 minus the probability of $A$ given $B$.
$P(A'|B) = 1 - P(A|B)$
Proof:
We know that the sample space $S$ given $B$ has a probability of 1:
$P(S|B) = 1$
Since the sample space $S$ can be written as the union of an event $A$ and its complement $A'$ ($S = A \cup A'$), and since $A$ and $A'$ are disjoint:
$P(A \cup A'|B) = 1$
Using the addition property for disjoint events from Property 2 above:
$P(A|B) + P(A'|B) = 1$
Rearranging the terms to find $P(A'|B)$:
$P(A'|B) = 1 - P(A|B)$
(Hence Proved)
Example 1. A fair die is rolled. Consider the events $E = \{1, 3, 5\}$ and $F = \{2, 3\}$. Find the conditional probability of $E$ given that $F$ has already occurred.
Answer:
Given:
Sample space $S = \{1, 2, 3, 4, 5, 6\}$, so $n(S) = 6$.
Event $E = \{1, 3, 5\}$
Event $F = \{2, 3\}$
To Find: $P(E|F)$
Solution:
First, we find the intersection of $E$ and $F$:
$E \cap F = \{3\}$
Thus, $n(E \cap F) = 1$ and $n(F) = 2$.
Calculating the probabilities:
$P(F) = \frac{n(F)}{n(S)} = \frac{2}{6}$
$P(E \cap F) = \frac{n(E \cap F)}{n(S)} = \frac{1}{6}$
Using the formula for conditional probability:
$P(E|F) = \frac{P(E \cap F)}{P(F)}$
[Formula] ... (i)
Substituting the values:
$P(E|F) = \frac{1/6}{2/6}$
$P(E|F) = \frac{1}{2}$
(Ans)
Example 2. Three coins are tossed simultaneously. Let $A$ be the event 'at least two heads appear' and $B$ be the event 'first coin shows tail'. Find $P(A|B)$.
Answer:
Given:
The sample space $S = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}$, so $n(S) = 8$.
Event $A$ (at least two heads) = $\{HHH, HHT, HTH, THH\}$
Event $B$ (first coin is tail) = $\{THH, THT, TTH, TTT\}$
To Find: $P(A|B)$
Solution:
The intersection $A \cap B$ represents outcomes where the first coin is a tail AND there are at least two heads:
$A \cap B = \{THH\}$
Now, $n(B) = 4$ and $n(A \cap B) = 1$.
Probabilities are:
$P(B) = \frac{4}{8} = \frac{1}{2}$
$P(A \cap B) = \frac{1}{8}$
Using the conditional probability formula:
$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{1/8}{4/8}$
$P(A|B) = \frac{1}{4}$
(Ans)
Example 3. In a school in Mumbai, there are 1000 students, out of which 430 are girls. It is known that out of 430 girls, 10% study in class XII. What is the probability that a student chosen randomly studies in class XII, given that the chosen student is a girl?
Answer:
Given:
Total students $n(S) = 1000$
Let $G$ be the event that the student is a girl. $n(G) = 430$.
Let $XII$ be the event that the student studies in class XII.
The number of girls in class XII is 10% of 430.
Solution:
$n(XII \cap G) = 10\% \text{ of } 430 = \frac{10}{100} \times 430 = 43$
We need to find $P(XII | G)$ (Probability of being in class XII given the student is a girl).
$P(XII|G) = \frac{n(XII \cap G)}{n(G)}$
[Using reduced sample space] ... (ii)
$P(XII|G) = \frac{43}{430}$
$P(XII|G) = 0.1$
(Ans)
Example 4. Consider a survey of a group of people regarding their preferred beverage. The data is as follows:
| Gender | Tea | Coffee | Total |
|---|---|---|---|
| Male | 40 | 60 | 100 |
| Female | 50 | 30 | 80 |
| Total | 90 | 90 | 180 |
If a person is selected at random and it is found that they prefer Coffee, what is the probability that the person is a Male?
Answer:
Given:
Let $M$ be the event 'Person is Male' and $C$ be the event 'Person prefers Coffee'.
From the table:
$n(C) = 90$ (Total people who prefer coffee)
$n(M \cap C) = 60$ (Males who prefer coffee)
To Find: $P(M|C)$
Solution:
Using the conditional probability formula:
$P(M|C) = \frac{n(M \cap C)}{n(C)}$
$P(M|C) = \frac{60}{90}$
Dividing numerator and denominator by 30:
$P(M|C) = \frac{2}{3}$
(Ans)
Example 5. If $P(A) = 0.8$, $P(B) = 0.5$ and $P(B|A) = 0.4$, find (i) $P(A \cap B)$ and (ii) $P(A|B)$.
Answer:
To Find: (i) $P(A \cap B)$ and (ii) $P(A|B)$
Solution (i):
Using the Multiplication Rule of probability:
$P(A \cap B) = P(A) \cdot P(B|A)$
... (iii)
$P(A \cap B) = 0.8 \times 0.4$
$P(A \cap B) = 0.32$
(Ans i)
Solution (ii):
Using the conditional probability formula for $A$ given $B$:
$P(A|B) = \frac{P(A \cap B)}{P(B)}$
... (iv)
$P(A|B) = \frac{0.32}{0.5}$
$P(A|B) = \frac{32}{50}$
$P(A|B) = 0.64$
(Ans ii)
Multiplication Theorem On Probability
In the study of probability, we often encounter scenarios where we need to determine the likelihood of the simultaneous occurrence of two or more events. If $A$ and $B$ are two events associated with a random experiment, the intersection of these events, denoted by $A \cap B$ (or simply $AB$), represents the compound event that both $A$ and $B$ occur together. The Multiplication Theorem provides a formal mathematical framework to calculate this joint probability using conditional probabilities.
For example, consider the deck of 52 playing cards. If we draw two cards sequentially, we might want to find the probability of getting a King in the first draw and a Spade in the second draw. The outcome of the second event is often dependent on the outcome of the first, especially when the experiment is conducted without replacement.
Derivation for Two Events
The theorem is derived directly from the definition of conditional probability. By definition, the probability of event $A$ occurring, given that event $B$ has already occurred, is:
$P(A|B) = \frac{P(A \cap B)}{P(B)}$
[Condition: $P(B) \neq 0$]
By applying simple algebraic cross-multiplication to the above equation, we isolate the joint probability $P(A \cap B)$:
$P(A \cap B) = P(B) \cdot P(A|B)$
... (i)
Conversely, the probability of event $B$ occurring, given that event $A$ has already occurred, is defined as:
$P(B|A) = \frac{P(A \cap B)}{P(A)}$
[Condition: $P(A) \neq 0$]
Again, by cross-multiplying, we obtain another expression for the joint probability:
$P(A \cap B) = P(A) \cdot P(B|A)$
... (ii)
General Rule for Two Events
By equating (i) and (ii), we establish the Multiplication Rule of Probability. It states that the probability of the simultaneous occurrence of two events is the product of the probability of one event and the conditional probability of the other, relative to the first.
$P(A \cap B) = P(A) \cdot P(B|A) = P(B) \cdot P(A|B)$
This rule is particularly useful when dealing with dependent events. If the events were independent, then $P(B|A) = P(B)$, and the formula would simplify to $P(A \cap B) = P(A) \cdot P(B)$.
Extension to Multiple Events
(i) Extension to Three Events
For three events $A, B,$ and $C$, the theorem expands to account for the sequential dependence of each event on all preceding ones. To prove this, let us treat $(A \cap B)$ as a single event $E$. Then:
$P(A \cap B \cap C) = P(E \cap C)$
$P(E \cap C) = P(E) \cdot P(C|E)$
Substituting $E = A \cap B$ back into the equation:
$P(A \cap B \cap C) = P(A \cap B) \cdot P(C | A \cap B)$
Now, replacing $P(A \cap B)$ with $P(A) \cdot P(B|A)$ from the above equation:
$P(A \cap B \cap C) = P(A) \cdot P(B|A) \cdot P(C | A \cap B)$
Logic: First, event $A$ happens. Then, $B$ happens given $A$ occurred. Finally, $C$ happens given both $A$ and $B$ have occurred.
(ii) Generalization to n Events
This logic can be mathematically extended to any finite number of events $A_1, A_2, A_3, \dots, A_n$. This is known as the Chain Rule of Probability:
$P(A_1 \cap A_2 \cap \dots \cap A_n) = P(A_1) \cdot P(A_2|A_1) \cdot P(A_3 | A_1 \cap A_2) $$ \dots $$ P(A_n | A_1 \cap \dots \cap A_{n-1})$
Using the product notation, it is written as:
$P\left(\bigcap\limits_{i=1}^{n} A_i\right) = P(A_1) \cdot \prod\limits_{k=2}^{n} P(A_k | \bigcap\limits_{j=1}^{k-1} A_j)$
Example. An urn contains 10 black and 5 white balls. Two balls are drawn from the urn one after the other without replacement.
Find the probability that both drawn balls are black.
Answer:
Given:
Total number of balls = $10 \text{ (Black)} + 5 \text{ (White)} = 15$ balls.
Let $A$ be the event that the first ball drawn is black, and $B$ be the event that the second ball drawn is black.
To Find: $P(A \cap B)$
Solution:
The probability that the first ball drawn is black is:
$P(A) = \frac{10}{15} = \frac{2}{3}$
Since the drawing is without replacement, after the first black ball is drawn, there are 9 black balls and 14 total balls left in the urn.
Therefore, the conditional probability of drawing a black ball in the second draw, given that the first was black, is:
$P(B|A) = \frac{9}{14}$
By the multiplication theorem of probability:
$P(A \cap B) = P(A) \cdot P(B|A)$
$P(A \cap B) = \frac{\cancel{10}^2}{\cancel{15}_3} \times \frac{9}{14}$
[Substituting values]
$P(A \cap B) = \frac{2 \times 9}{3 \times 14} = \frac{18}{42}$
$P(A \cap B) = \frac{3}{7}$
Independent Events
The concept of Independent Events is a cornerstone of probability theory. Two events, $A$ and $B$, are classified as independent if the probability of occurrence of one event is completely unaffected by the occurrence or non-occurrence of the other. In the language of conditional probability, knowledge about one event does not revise the likelihood of the second event.
In practical scenarios, independence implies that the trials are conducted under identical conditions where the outcome of a previous trial is not 'remembered' by the system. For instance, consider a fair coin tossed multiple times. The physical properties of the coin do not change based on previous results. If we obtain a 'Head' in the first five tosses, the coin does not become "due" for a 'Tail'; the probability remains $\frac{1}{2}$ for every individual toss.
This is particularly evident in the perspective of cricket. If the Indian cricket captain has lost the toss in the last ten consecutive matches, many fans might feel that "luck" must change in the eleventh match. However, mathematically, the probability of winning the next toss remains exactly $\frac{1}{2}$. Each toss is a fresh, independent trial.
Illustration: Drawing Balls
To further elaborate on how independence works (and how it can be lost), let us analyze an experiment involving a bag containing colored balls. This is a classic example used to distinguish between Independent and Dependent events.
Initial Setup:
Number of Red balls in the bag = 4
Number of Blue balls in the bag = 3
$n(S) = 4 + 3 = 7$
(Total outcomes in first draw)
Let event $A$ be 'getting a red ball on the 1st draw' and event $B$ be 'getting a red ball on the 2nd draw'.
Case (i): Drawing With Replacement (Independent)
In this case, after the first ball is drawn, its color is noted, and it is placed back into the bag before the second draw. This action restores the bag to its original state.
1. Probability of drawing a red ball in the 1st draw ($A$):
$P(A) = \frac{4}{7}$
2. Since the ball is replaced, the total number of balls remains 7 and the number of red balls remains 4 for the second draw.
3. Probability of drawing a red ball in the 2nd draw ($B$):
$P(B) = \frac{4}{7}$
Because $P(B)$ remains $\frac{4}{7}$ regardless of whether the first ball was red or blue, the events $A$ and $B$ are Independent.
Case (ii): Drawing Without Replacement (Dependent)
In this case, the first ball drawn is not put back into the bag. This changes the sample space for the second draw, making the events Dependent.
1. Probability of drawing a red ball in the 1st draw ($A$):
$P(A) = \frac{4}{7}$
2. If event $A$ has occurred (a red ball was removed), the remaining balls in the bag are 3 Red and 3 Blue.
$n(S)_{new} = 7 - 1 = 6$
(Reduced sample space)
3. The conditional probability of drawing a red ball in the 2nd draw, given that the first was red ($B|A$):
$P(B|A) = \frac{\cancel{3}^{1}}{\cancel{6}_{2}} = \frac{1}{2}$
[Probability changes from $\frac{4}{7}$ to $\frac{1}{2}$]
Since $P(B|A) \neq P(B)$ (where $P(B)$ is the unconditional probability), the occurrence of $A$ has affected the probability of $B$. Therefore, these events are Dependent.
Mathematical Definition and Multiplication Rule
Two events $A$ and $B$ are independent if:
$P(A|B) = P(A)$, provided $P(B) \neq 0$
$P(B|A) = P(B)$, provided $P(A) \neq 0$
Derivation of the Multiplication Rule for Independence
By the general multiplication rule of probability for any two events $A$ and $B$, we have:
$P(A \cap B) = P(A) \cdot P(B|A)$
…(i)
If $A$ and $B$ are independent events, by the definition of independence:
$P(B|A) = P(B)$
…(ii)
Substituting the value of (ii) into (i), we get the formula for independent events:
$P(A \cap B) = P(A) \cdot P(B)$
[Condition for Independence]
Extension to Multiple Events
If three events $A, B$, and $C$ are independent, then the probability of their intersection is the product of their individual probabilities:
$P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C)$
In general, for $n$ independent events $A_1, A_2, \dots, A_n$:
$P(A_1 \cap A_2 \cap \dots \cap A_n) = P(A_1) \cdot P(A_2) \cdot \dots \cdot P(A_n)$
Example. A fair die is rolled twice. Let event $A$ be 'getting an even number on the first throw' and event $B$ be 'getting an odd number on the second throw'. Check if $A$ and $B$ are independent events.
Answer:
Given:
Sample space of a die $S = \{1, 2, 3, 4, 5, 6\}$
Total outcomes in one roll = 6
To Find: Whether $A$ and $B$ are independent.
Solution:
Event $A$ (Even number on 1st throw) $= \{2, 4, 6\}$
$P(A) = \frac{3}{6} = \frac{1}{2}$
Event $B$ (Odd number on 2nd throw) $= \{1, 3, 5\}$
$P(B) = \frac{3}{6} = \frac{1}{2}$
Since the outcome of the first throw does not affect the second throw, the total outcomes for two rolls is $6 \times 6 = 36$.
Number of outcomes in $(A \cap B) = 3 \times 3 = 9$
$P(A \cap B) = \frac{9}{36} = \frac{1}{4}$
Now, checking the condition $P(A) \cdot P(B)$:
$P(A) \cdot P(B) = \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}$
Since $P(A \cap B) = P(A) \cdot P(B)$, the events $A$ and $B$ are Independent.
Mathematical Verification of Independent Events
In probability theory, it is often difficult to determine if two events are independent simply by looking at the physical nature of the experiment. The only rigorous method to establish independence is through the verification of the Multiplication Rule. Two events $E$ and $F$ are said to be independent if and only if the probability of their simultaneous occurrence equals the product of their individual probabilities.
$P(E \cap F) = P(E) \cdot P(F)$
[Condition for Independence]
If the above condition is not satisfied, i.e., $P(E \cap F) \neq P(E) \cdot P(F)$, then the events are considered dependent.
Case I: Tossing a Coin and a Die Together
Consider the experiment where a fair coin and a six-faced die are tossed simultaneously. The sample space $S$ consists of 12 equally likely outcomes:
$S = \{H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6\}$
Let $E$ be the event 'head on the coin' and $F$ be the event 'number six on the die'.
1. Outcomes for event $E = \{H1, H2, H3, H4, H5, H6\}$
$P(E) = \frac{6}{12} = \frac{1}{2}$
2. Outcomes for event $F = \{H6, T6\}$
$P(F) = \frac{2}{12} = \frac{1}{6}$
3. The common outcome $E \cap F = \{H6\}$
$P(E \cap F) = \frac{1}{12}$
Now, checking the product of individual probabilities:
$P(E) \cdot P(F) = \frac{1}{2} \cdot \frac{1}{6} = \frac{1}{12}$
Since $P(E \cap F) = P(E) \cdot P(F)$, events $E$ and $F$ are Independent.
Case II: Drawing from a Deck of Cards
In a standard pack of 52 cards, let event $E$ be 'a card of spades is drawn' and $F$ be 'an ace is drawn'. The event $E \cap F$ denotes drawing the 'Ace of Spades'.
1. Since there are 13 spades in a deck:
$P(E) = \frac{13}{52} = \frac{1}{4}$
2. Since there are 4 aces in a deck:
$P(F) = \frac{4}{52} = \frac{1}{13}$
3. The Ace of Spades is a single unique card:
$P(E \cap F) = \frac{1}{52}$
By calculation, $P(E) \cdot P(F) = \frac{1}{4} \cdot \frac{1}{13} = \frac{1}{52}$. Thus, the events $E$ and $F$ are Independent.
Case III: Throwing a Single Die
When a die is thrown, let $E$ be the event 'the number is a multiple of 3' and $F$ be the event 'the number appearing is even'.
$S = \{1, 2, 3, 4, 5, 6\}$
$E = \{3, 6\}$ and $F = \{2, 4, 6\}$. Therefore, $E \cap F = \{6\}$.
$P(E) = \frac{2}{6} = \frac{1}{3}$
$P(F) = \frac{3}{6} = \frac{1}{2}$
$P(E \cap F) = \frac{1}{6}$
As $P(E) \cdot P(F) = \frac{1}{3} \cdot \frac{1}{2} = \frac{1}{6}$, the condition for independence is satisfied.
Case IV: Tossing Two Coins
Let the sample space be $S = \{HH, HT, TH, TT\}$. We evaluate two different sets of events.
Example of Independent Outcomes
Let $E$ be 'head on first coin' and $F$ be 'head on second coin'.
$E = \{HH, HT\}$ and $F = \{HH, TH\}$. Thus, $E \cap F = \{HH\}$.
$P(E) = \frac{2}{4} = \frac{1}{2}$
$P(F) = \frac{2}{4} = \frac{1}{2}$
$P(E \cap F) = \frac{1}{4}$
Since $\frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}$, $E$ and $F$ are Independent.
Example of Dependent Outcomes
Now let $A$ be 'at least one head' and $B$ be 'at least one tail'.
$A = \{HH, HT, TH\}$ and $B = \{HT, TH, TT\}$. Thus, $A \cap B = \{HT, TH\}$.
$P(A) = \frac{3}{4}$
$P(B) = \frac{3}{4}$
$P(A \cap B) = \frac{2}{4} = \frac{1}{2}$
Calculating the product of probabilities:
$P(A) \cdot P(B) = \frac{3}{4} \cdot \frac{3}{4} = \frac{9}{16}$
Since $P(A \cap B) \neq P(A) \cdot P(B)$ (because $1/2 \neq 9/16$), the events $A$ and $B$ are Not Independent.
Conclusion
It is important to emphasize that physical separation of trials does not always guarantee independence, and single trials can sometimes contain independent events. Only through numerical calculation of probabilities and checking if $P(E \cap F) = P(E) \cdot P(F)$ can one accurately conclude whether events are independent.
Further Remarks on Independent Events
In the study of probability, understanding the nuances between independence and other set relations is crucial. While we have established the basic definition of independence, further classification is required when dealing with multiple events or comparing independence with mutual exclusivity.
1. Dependent Events
Two events $A$ and $B$ are said to be dependent if they are not independent. Mathematically, this means the probability of their simultaneous occurrence does not equal the product of their individual probabilities.
$P(A \cap B) \neq P(A) \cdot P(B)$
[Condition for Dependence]
2. Independence of Three or More Events
Three events $A, B$, and $C$ are said to be independent (or mutually independent) if and only if they satisfy the following four conditions simultaneously:
$P(A \cap B) = P(A) \cdot P(B)$
$P(A \cap C) = P(A) \cdot P(C)$
$P(B \cap C) = P(B) \cdot P(C)$
$P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C)$
If at least one of the above four conditions is not true for the three given events, the events are not independent. It is a common mistake to check only the fourth condition; however, pairwise independence (the first three conditions) is also mandatory for mutual independence.
3. Independent Events vs Mutually Exclusive Events
Students often confuse these two concepts, but they are fundamentally different in nature:
Mutually Exclusive Events: These are defined in terms of subsets and outcomes. Two events $A$ and $B$ are mutually exclusive if they have no common outcomes, i.e., $A \cap B = \emptyset$. This implies that $P(A \cap B) = 0$.
Independent Events: These are defined in terms of probabilities. Two events $A$ and $B$ are independent if the occurrence of one does not affect the probability of the other, i.e., $P(A \cap B) = P(A) \cdot P(B)$.
| Basis of Comparison | Mutually Exclusive Events | Independent Events |
|---|---|---|
| Definition | No common outcomes ($A \cap B = \emptyset$) | $P(A \cap B) = P(A) \cdot P(B)$ |
| Focus | Commonality of elements | Probability of occurrence |
| Key Equation | $P(A \cap B) = 0$ | $P(A \cap B) = P(A) \cdot P(B)$ |
4. Relationship between Non-Zero Probability Events
If two events $A$ and $B$ have non-zero probabilities ($P(A) > 0$ and $P(B) > 0$), then they cannot be both independent and mutually exclusive at the same time.
If they are independent, then $P(A \cap B) = P(A) \cdot P(B)$. Since $P(A) > 0$ and $P(B) > 0$, their product must be greater than zero. Thus, $P(A \cap B) \neq 0$, which means they must have some common outcomes ($A \cap B \neq \emptyset$). Therefore, they cannot be mutually exclusive.
Formal Proof: Mutually Exclusive events are not Independent
Given:
Two events $A$ and $B$ such that $P(A) > 0$ and $P(B) > 0$.
$A$ and $B$ are mutually exclusive, i.e., $A \cap B = \emptyset$.
To Prove:
$A$ and $B$ cannot be independent.
Proof:
Since $A$ and $B$ are mutually exclusive:
$P(A \cap B) = 0$
For $A$ and $B$ to be independent, the following must hold:
$P(A \cap B) = P(A) \cdot P(B)$
However, we are given $P(A) > 0$ and $P(B) > 0$. The product of two positive numbers is always positive:
$P(A) \cdot P(B) > 0$
From the above equations, we see that:
$P(A \cap B) \neq P(A) \cdot P(B)$
[Since $0 \neq$ Positive Value]
Hence, if two events with non-zero probabilities are mutually exclusive, they are necessarily dependent.
Independence of Complementary Events
An essential theorem in probability states that if two events are independent, then their complements (the events that they do not occur) are also independent of each other and the original events.
The Theorem
If $A$ and $B$ are independent events associated with a random experiment, then the following pairs of events are also independent:
(i) $A$ and $\overline{B}$
(ii) $\overline{A}$ and $B$
(iii) $\overline{A}$ and $\overline{B}$
Proof of the Theorem
Given: $A$ and $B$ are independent events.
$P(A \cap B) = P(A) \cdot P(B)$
…(i)
Part (i): Independence of $A$ and $\overline{B}$
To prove that $A$ and $\overline{B}$ are independent, we must show that $P(A \cap \overline{B}) = P(A) \cdot P(\overline{B})$.
From the set theory perspective, the event $A$ can be decomposed into two mutually disjoint sets: $A \cap B$ and $A \cap \overline{B}$ (often denoted as $A-B$).
Mathematically, we express this as:
$A = (A \cap B) \cup (A \cap \overline{B})$
Since $(A \cap B)$ and $(A \cap \overline{B})$ are mutually disjoint, we apply the addition rule of probability:
$P(A) = P(A \cap B) + P(A \cap \overline{B})$
…(ii)
Rearranging the equation to isolate $P(A \cap \overline{B})$:
$P(A \cap \overline{B}) = P(A) - P(A \cap B)$
Substituting the condition for independence from (i):
$P(A \cap \overline{B}) = P(A) - P(A) \cdot P(B)$
Factoring out $P(A)$:
$P(A \cap \overline{B}) = P(A) [1 - P(B)]$
Using the property of complements where $1 - P(B) = P(\overline{B})$:
$P(A \cap \overline{B}) = P(A) \cdot P(\overline{B})$
[Events A and not B are Independent]
Part (ii): Independence of $\overline{A}$ and $B$
This proof follows a similar logic by decomposing event $B$ instead of event $A$:
$B = (B \cap A) \cup (B \cap \overline{A})$
$P(B) = P(B \cap A) + P(B \cap \overline{A})$
$P(\overline{A} \cap B) = P(B) - P(A \cap B)$
Applying $P(A \cap B) = P(A)P(B)$:
$P(\overline{A} \cap B) = P(B) - P(A) \cdot P(B)$
$P(\overline{A} \cap B) = P(B) [1 - P(A)]$
$P(\overline{A} \cap B) = P(\overline{A}) \cdot P(B)$
[Hence proved]
Part (iii): Independence of $\overline{A}$ and $\overline{B}$
This can be proved using the results from Parts (i) and (ii). If $A$ and $B$ are independent, then by Part (i), $A$ and $\overline{B}$ are independent. Now, treating $\overline{B}$ as a fixed event, by Part (ii), its complement $\overline{B}$ and the complement of $A$ (which is $\overline{A}$) must also be independent.
Alternatively, using De Morgan's Law:
$P(\overline{A} \cap \overline{B}) = P(\overline{A \cup B}) = 1 - P(A \cup B)$
Using the addition theorem: $P(A \cup B) = P(A) + P(B) - P(A)P(B)$
$P(\overline{A} \cap \overline{B}) = 1 - [P(A) + P(B) - P(A)P(B)]$
$P(\overline{A} \cap \overline{B}) = (1 - P(A))(1 - P(B))$
$P(\overline{A} \cap \overline{B}) = P(\overline{A}) \cdot P(\overline{B})$
The Law of Total Probability
The Law of Total Probability is a fundamental rule in probability theory that relates marginal probabilities to conditional probabilities. It provides a method to calculate the probability of an event $A$ by considering all possible ways it can occur through a set of mutually exclusive and exhaustive events.
In many real-world scenarios, an event $A$ can happen under several different conditions or "paths." If we know the probability of each condition and the probability of $A$ occurring under each condition, we can find the total probability of $A$.
Partition of a Sample Space
To apply the Law of Total Probability, we first need a partition of the sample space $S$. A set of events $E_1, E_2, \dots, E_n$ is said to form a partition of $S$ if they satisfy the following two conditions:
| Condition Type | Mathematical Requirement |
|---|---|
| Mutually Exclusive | $E_i \cap E_j = \phi$ for all $i \neq j$ |
| Exhaustive | $E_1 \cup E_2 \cup \dots \cup E_n = S$ |
| Non-zero Probability | $P(E_i) > 0$ for all $i$ |
Statement of the Theorem
Let $\{E_1, E_2, \dots, E_n\}$ be a partition of the sample space $S$. Let $A$ be any event associated with $S$. Then the probability of event $A$ is given by:
$P(A) = P(E_1)P(A|E_1) + P(E_2)P(A|E_2) + \dots + P(E_n)P(A|E_n)$
Using the summation notation, this can be expressed as:
$P(A) = \sum\limits_{i=1}^{n} P(E_i)P(A|E_i)$
Derivation of the Law of Total Probability
Step 1: Since $E_1, E_2, \dots, E_n$ are exhaustive events of $S$, we have:
$S = E_1 \cup E_2 \cup \dots \cup E_n$
…(i)
Step 2: We can write event $A$ as the intersection of $A$ and the sample space $S$:
$A = A \cap S$
(Property of Sets)
Substituting the value of $S$ from equation (i):
$A = A \cap (E_1 \cup E_2 \cup \dots \cup E_n)$
Step 3: Using the Distributive Law of sets over intersection:
$A = (A \cap E_1) \cup (A \cap E_2) \cup \dots \cup (A \cap E_n)$
…(ii)
Step 4: Since $E_i$ and $E_j$ are disjoint for $i \neq j$, the intersections $(A \cap E_i)$ and $(A \cap E_j)$ are also disjoint. Applying the addition rule for mutually exclusive events:
$P(A) = P(A \cap E_1) + P(A \cap E_2) + \dots + P(A \cap E_n)$
…(iii)
Step 5: By the Multiplication Rule of probability, we know that for each $i$:
$P(A \cap E_i) = P(E_i)P(A|E_i)$
Substituting these values back into equation (iii), we get the final formula:
$P(A) = P(E_1)P(A|E_1) + P(E_2)P(A|E_2) + \dots + P(E_n)P(A|E_n)$
Example. In a mobile manufacturing plant in Noida, three machines $M_1, M_2$, and $M_3$ produce 25%, 35%, and 40% of the total handsets respectively. Out of their outputs, 5%, 4%, and 2% are defective. If a handset is chosen at random, find the total probability that it is defective.
Answer:
Given:
Let $E_1, E_2, E_3$ be the events that the handset is produced by machines $M_1, M_2, M_3$ respectively. Let $A$ be the event that the handset is defective.
$P(E_1) = 0.25, \;\; P(E_2) = 0.35, \;\; P(E_3) = 0.40$
The conditional probabilities of producing a defective handset are:
$P(A|E_1) = 0.05, \;\; P(A|E_2) = 0.04, \;\; P(A|E_3) = 0.02$
Solution:
By the Law of Total Probability:
$P(A) = P(E_1)P(A|E_1) + P(E_2)P(A|E_2) + P(E_3)P(A|E_3)$
Substituting the values:
$P(A) = (0.25 \times 0.05) + (0.35 \times 0.04) + (0.40 \times 0.02)$
$P(A) = 0.0125 + 0.0140 + 0.0080$
$P(A) = 0.0345$
The total probability that a randomly chosen handset is defective is 0.0345 (or 3.45%).
Baye’s Theorem
The Baye’s Theorem, named after the British Mathematician Thomas Bayes, provides a way to revise existing probabilities based on new evidence. It is fundamentally used to find the "probability of causes" given that an effect has already occurred. This is often referred to as finding the posteriori probability.
Statement of the Theorem
If $E_1, E_2, E_3, \dots, E_n$ are $n$ mutually exclusive and exhaustive events associated with a random experiment, and $A$ is any event of non-zero probability associated with the same experiment, then for any $i = 1, 2, 3, \dots, n$:
$P(E_i|A) = \frac{P(E_i) P(A|E_i)}{\sum\limits_{j=1}^{n} P(E_j) P(A|E_j)}$
Derivation and Proof
Given:
(i) $E_1, E_2, \dots, E_n$ form a partition of the sample space $S$. Therefore, $\bigcup\limits_{i=1}^{n} E_i = S$ and $E_i \cap E_j = \phi$ for $i \neq j$.
(ii) $A$ is an event such that $P(A) > 0$.
To Prove:
$P(E_i|A) = \frac{P(E_i) P(A|E_i)}{\sum\limits_{j=1}^{n} P(E_j) P(A|E_j)}$
Proof:
By the definition of conditional probability, we have:
$P(E_i|A) = \frac{P(E_i \cap A)}{P(A)}$
... (i)
According to the Multiplication Law of Probability, we know that:
$P(E_i \cap A) = P(E_i) P(A|E_i)$
... (ii)
By the Law of Total Probability, the probability of event $A$ occurring is the sum of probabilities of $A$ occurring with each partition $E_j$:
$P(A) = \sum\limits_{j=1}^{n} P(E_j \cap A)$
[Total Probability Space]
Substituting the multiplication law into the total probability expression:
$P(A) = \sum\limits_{j=1}^{n} P(E_j) P(A|E_j)$
... (iii)
Finally, substituting equations (ii) and (iii) into equation (i), we obtain the formula for Baye's Theorem:
$P(E_i|A) = \frac{P(E_i) P(A|E_i)}{\sum\limits_{j=1}^{n} P(E_j) P(A|E_j)}$
[Hence Proved]
Important Terminology
In the study of Probability Theory, especially when applying Baye’s Theorem, it is crucial to understand how information evolves. The theorem acts as a mathematical bridge between initial beliefs and updated knowledge based on observed evidence.
1. A Priori Probability (Prior Probability)
The term A Priori is a Latin phrase meaning "from the earlier." In the context of Baye's theorem, these are the probabilities assigned to the events $E_1, E_2, \dots, E_n$ before any additional information or evidence (event $A$) is obtained.
These probabilities are usually based on historical data, past experience, or existing knowledge. For example, if a doctor in a clinic in Mumbai knows from medical records that $1\%$ of the population generally has a specific condition, that $1\%$ is the a priori probability.
$P(E_i)$
[Initial belief/Prior knowledge]
2. Likelihood (Conditional Probability)
The term $P(A|E_i)$ is known as the Likelihood. It represents the probability of observing the evidence $A$ given that the specific event $E_i$ has already occurred. It measures how well the evidence "fits" a particular hypothesis.
$P(A|E_i)$
[Likelihood of evidence $A$ under $E_i$]
3. Evidence or Marginal Likelihood
The denominator in Baye's formula, $\sum P(E_j)P(A|E_j)$, is the Total Probability of the event $A$ occurring across all possible mutually exclusive scenarios. It acts as a normalizing constant to ensure that the sum of all posterior probabilities equals $1$.
$P(A) = \sum\limits_{j=1}^{n} P(E_j) P(A|E_j)$
[Total Evidence]
4. Posteriori Probability (Posterior Probability)
The term Posteriori is a Latin phrase meaning "from the later." This is the revised or updated probability of event $E_i$ occurring after the evidence $A$ has been observed. It is the core output of Baye's Theorem.
Consider the Indian Meteorological Department (IMD) predicting rainfall. The initial prediction is the a priori probability. Once they observe certain atmospheric pressure changes (event $A$), the revised chance of rain becomes the posteriori probability.
$P(E_i|A)$
[Revised belief/Updated knowledge]
Comparative Analysis of Priori vs Posteriori
| Feature | A Priori Probability | Posteriori Probability |
|---|---|---|
| Meaning | "From the earlier" (Before observation) | "From the later" (After observation) |
| Basis | Historical records or general statistics. | Specific evidence or test results. |
| Mathematical Symbol | $P(E_i)$ | $P(E_i|A)$ |
| Certainty | General and less specific. | Higher accuracy relative to the evidence. |
| Role in Baye's | Acts as an Input. | Acts as the Output (Result). |
Step-by-Step Evolution of Knowledge
The process of moving from A Priori to Posteriori can be visualized in the following logical sequence:
Step 1: Identify all mutually exclusive and exhaustive causes ($E_1, E_2, \dots$).
Step 2: Assign initial weights to these causes based on history $\rightarrow$ A Priori Probabilities.
Step 3: Determine how likely the new evidence $A$ is for each cause $\rightarrow$ Likelihoods.
Step 4: Calculate the total probability of observing $A$ $\rightarrow$ Evidence.
Step 5: Use Baye’s Formula to calculate the revised probability $\rightarrow$ Posteriori Probability.
Example 1. In a mobile manufacturing factory in Noida, three machines $M_1$, $M_2$, and $M_3$ produce $25\%$, $35\%$, and $40\%$ of the total mobile handsets respectively. Out of their outputs, $5\%$, $4\%$, and $2\%$ are defective respectively. A handset is chosen at random and is found to be defective.
What is the probability that it was manufactured by machine $M_2$?
Answer:
Let $E_1, E_2, E_3$ be the events that the handset is manufactured by machines $M_1, M_2, M_3$ respectively. Let $A$ be the event that the handset is defective.
Given:
$P(E_1) = 25\% = 0.25$
$P(E_2) = 35\% = 0.35$
$P(E_3) = 40\% = 0.40$
The conditional probabilities of producing a defective handset are:
$P(A|E_1) = 5\% = 0.05$
$P(A|E_2) = 4\% = 0.04$
$P(A|E_3) = 2\% = 0.02$
To Find: $P(E_2|A)$
Using Baye's Theorem:
$P(E_2|A) = \frac{P(E_2)P(A|E_2)}{P(E_1)P(A|E_1) + P(E_2)P(A|E_2) + P(E_3)P(A|E_3)}$
$P(E_2|A) = \frac{0.35 \times 0.04}{(0.25 \times 0.05) + (0.35 \times 0.04) + (0.40 \times 0.02)}$
$P(E_2|A) = \frac{0.0140}{0.0125 + 0.0140 + 0.0080}$
$P(E_2|A) = \frac{0.0140}{0.0345}$
$P(E_2|A) = \frac{140}{345} = \frac{28}{69}$
Result: The probability that the defective handset was manufactured by machine $M_2$ is $\frac{28}{69}$.
Example 2. In a manufacturing unit, two machines $M_1$ and $M_2$ are used to produce items. Machine $M_1$ produces $60\%$ of the total products, while Machine $M_2$ produces $40\%$. It is known from past data that $1\%$ of items produced by $M_1$ are defective and $2\%$ of items produced by $M_2$ are defective.
If an item is picked at random and found to be defective, what is the probability that it was produced by Machine $M_1$?
Answer:
Step 1: Define the Events
Let $E_1$ be the event that the product is manufactured by Machine $M_1$.
Let $E_2$ be the event that the product is manufactured by Machine $M_2$.
Let $A$ be the event that the product is defective.
Step 2: State the Given Information
The A Priori probabilities (initial production shares) are:
$P(E_1) = 60\% = 0.60$
... (i)
$P(E_2) = 40\% = 0.40$
... (ii)
The Likelihoods (conditional defect rates) are:
$P(A|E_1) = 1\% = 0.01$
[Defect rate of $M_1$]
$P(A|E_2) = 2\% = 0.02$
[Defect rate of $M_2$]
Step 3: Calculate the Total Probability of Defect
Before finding the specific cause, we calculate the total probability that any randomly selected item is defective using the Law of Total Probability:
$P(A) = P(E_1) \cdot P(A|E_1) + P(E_2) \cdot P(A|E_2)$
Substituting the values:
$P(A) = (0.60)(0.01) + (0.40)(0.02)$
$P(A) = 0.006 + 0.008 = 0.014$
$P(A) = 1.4\%$
[Total defect probability]
Step 4: Applying Baye’s Theorem (Finding Posteriori Probability)
We need to find $P(E_1|A)$, which is the probability that the item came from $M_1$ given that it is defective.
$P(E_1|A) = \frac{P(E_1) \cdot P(A|E_1)}{P(A)}$
$P(E_1|A) = \frac{0.60 \times 0.01}{0.014}$
$P(E_1|A) = \frac{0.006}{0.014} = \frac{6}{14} = \frac{3}{7}$
Converting to percentage:
$P(E_1|A) \approx 0.4285 \approx 43\%$
Similarly, for Machine $M_2$:
$P(E_2|A) = \frac{0.008}{0.014} = \frac{8}{14} = \frac{4}{7} \approx 57\%$
Random Variables
In the study of probability, we often want to assign numerical values to the outcomes of a random experiment. A Random Variable (also known as a stochastic variable) serves this purpose by acting as a mathematical bridge between sample space outcomes and real numbers.
Formal Definition
In probabilistic modeling, the transition from qualitative outcomes to quantitative values is governed by specific mathematical constructs. Below are the formal definitions for these terms as they appear in statistical theory.
1. Random Function (Stochastic Function)
When we conduct a random experiment, the results are often non-numerical (e.g., "Pass" or "Fail"). To analyze these results mathematically, we assign a real number to each individual outcome in the sample space. This assignment creates a function defined on the sample space.
A Random Function (also known as a Stochastic Function) is a rule that assigns a unique real number to every sample point $w$ belonging to the sample space $S$.
$f: S \to \mathbb{R}$
[Technical Notation]
2. Random Variable (Stochastic Variable)
The term Random Variable is the common name used in modern probability for a Random Function. Although the word "variable" is used, it is mathematically a function. It serves to quantify the outcomes of a random experiment.
Formal Definition:
A Random Variable is a function whose domain is the sample space $S$ of a random experiment and whose range is the set of real numbers $\mathbb{R}$ or a specific subset of real numbers.
1. The Domain: This is the Sample Space ($S$), which is the set of all possible qualitative or quantitative outcomes of an experiment.
2. The Range: This is a subset of Real Numbers ($\mathbb{R}$).
Mathematically, we denote this relationship as:
$X: S \to \mathbb{R}$
If $w$ is a specific outcome in $S$, then $X(w) = x$, where $x$ is a real number. This mapping is visualized below:
As shown in the image, the random variable $X$ acts as a bridge, taking an abstract outcome $w$ from the sample space and "mapping" it to a specific point on the real number line.
Mathematical Requirements for a Valid Definition
For a mapping to be considered a proper Random Variable, it must satisfy the following conditions:
1. Single-Valued: For every outcome $w \in S$, there must be exactly one real number $X(w)$. In a cricket match in IPL, a single ball cannot result in both 4 runs and 6 runs simultaneously for the same random variable definition.
2. Domain Coverage: The function must be defined for every possible outcome in the sample space $S$.
3. Measurability: For any real number $x$, the set of outcomes $\{w : X(w) \leq x\}$ must be an event whose probability can be calculated.
$P(X \leq x)$
[Probability Distribution Function]
Key Characteristics
1. Reliance on Chance
The value $X$ takes is determined only after the experiment is performed. Since the outcome of a random experiment cannot be predicted with certainty, the value of the random variable is governed by probability.
2. Real-Valued Output
The requirement that the range must be a subset of $\mathbb{R}$ is vital. It allows us to apply arithmetic operations, find averages, and use calculus to analyze probabilities. Without this numerical conversion, we could not calculate the "average" of outcomes like "Rain" or "No Rain".
$X(w) \in (-\infty, \infty)$
[Condition for Real-Valued Function]
Multiple Variables on One Sample Space
A single sample space can support an infinite number of random variables. Each variable represents a different way of looking at the same data.
Suppose a student in appears for a Mock Test consisting of 5 Multiple Choice Questions (MCQs). The sample space $S$ contains all possible combinations of Correct ($C$) and Incorrect ($I$) answers.
We can define two different random variables on this $S$:
1. Variable $X$: The number of correct answers (Values: $0, 1, 2, 3, 4, 5$).
2. Variable $Y$: The total marks obtained (if $+4$ for correct and $-1$ for incorrect).
$Y = 4X - 1(5 - X)$
[Relationship between two variables]
Even though $X$ and $Y$ describe the same test performance, they are different functions because they map the same outcomes to different real numbers.
Illustrative Examples:
Example 1: Tossing Two Coins
Consider an experiment where two unbiased coins are tossed simultaneously. The sample space $S$ is given by:
$S = \{HH, HT, TH, TT\}$
Case 1: Let $X$ be the number of heads
In this case, $X$ is a random variable that maps each outcome to the count of 'H'.
| Outcome ($w$) | Value of $X(w)$ |
|---|---|
| $HH$ | $2$ |
| $HT$ | $1$ |
| $TH$ | $1$ |
| $TT$ | $0$ |
Here, the Range of $X$ is $\{0, 1, 2\}$.
Case 2: Let $Y$ be the number of tails
Similarly, $Y$ maps each outcome to the count of 'T'.
$Y(HH) = 0$
$Y(HT) = 1$
$Y(TH) = 1$
$Y(TT) = 2$
Observation: Even though the Range of $X$ and $Y$ are identical (both are $\{0, 1, 2\}$), the functions are not the same ($X \neq Y$) because they assign different values to the same outcome. For instance, $X(HH) = 2$ while $Y(HH) = 0$.
Example 2: Drawing White Balls from a Bag
Consider a bag placed in a laboratory or a school in Kolkata containing balls of three different colors:
• Red Balls ($R$): 5
• Black Balls ($B$): 3
• White Balls ($W$): 4
Total number of balls in the bag = $5 + 3 + 4 = 12$
The experiment consists of drawing three balls simultaneously from the bag. This is a problem of Combinations, as the order of drawing does not matter.
1. The Domain (Sample Space $S$)
The domain consists of all possible sets of 3 balls that can be picked from the 12 available. The total number of outcomes (elements in the domain) is given by:
$n(S) = {}^{12}C_{3}$
Calculating the value:
$n(S) = \frac{12 \times 11 \times 10}{3 \times 2 \times 1} = 220$
[Total sample points]
Each sample point $w$ is a triplet, such as $\{R_1, R_2, W_1\}$ or $\{W_1, W_2, W_3\}$.
2. The Mapping Rule
The function $X$ looks at each triplet $w$ and counts how many white balls ($W$) are present in that specific triplet. The Range of $X$ is determined by the maximum and minimum number of white balls possible in a draw of three.
• Minimum: 0 (if all three balls are Red or Black).
• Maximum: 3 (since we are drawing 3 balls and 4 white balls are available).
Thus, the Range of $X = \{0, 1, 2, 3\}$.
Functional Mapping Examples
Let us look at how the random variable $X$ maps specific outcomes $w$ to real numbers $x$:
Case A: No White Balls Drawn
If we draw 2 Red balls and 1 Black ball, represented as $w = \{R_1, R_2, B_1\}$:
$X(R_1, R_2, B_1) = 0$
[Since count of $W$ is zero]
Case B: Exactly One White Ball Drawn
If we draw 1 White ball and 2 Black balls, represented as $w = \{W_1, B_1, B_2\}$:
$X(W_1, B_1, B_2) = 1$
[Since count of $W$ is one]
Case C: All White Balls Drawn
If we draw 3 White balls, represented as $w = \{W_1, W_2, W_3\}$:
$X(W_1, W_2, W_3) = 3$
[Since count of $W$ is three]
Visual Representation: The following diagram illustrates the mapping from the set of all possible triplets (Sample Space) to the numerical values (Range).
As observed in the image, many different outcomes from the sample space map to the same real number. For example, both $\{R_1, R_2, R_3\}$ and $\{B_1, B_2, B_3\}$ map to $0$. This is a many-to-one function, which is a common characteristic of random variables.
Summary Table of the Random Variable $X$
| Attribute | Description |
|---|---|
| Experiment | Drawing 3 balls from 12 (5R, 3B, 4W) |
| Random Variable $X$ | Number of white balls in the draw |
| Domain | Set of ${}^{12}C_3$ = 220 possible triplets |
| Range | $\{0, 1, 2, 3\}$ |
Example 3: Numerical Transformation (Square of Die Face)
The experiment consists of rolling the die once and observing the number on the uppermost face.
Domain (Sample Space $S$):
The set of all possible outcomes is:
$S = \{1, 2, 3, 4, 5, 6\}$
Defining the Transformation Function $Z$
In this specific case, the random variable $Z$ is defined by a mathematical rule rather than a direct count. The rule is: "The value of $Z$ is equal to half of the square of the number appearing on the die."
Mathematically, the rule (function) can be expressed as:
$Z(w) = \frac{w^2}{2}$
where $w \in S$
Step-by-Step Mapping and Calculation
By applying the rule in the above equation to every element in the domain $S$, we calculate the following real-valued outputs:
$Z(1) = \frac{1^2}{2} = \frac{1}{2} = 0.5$
(For outcome 1)
$Z(2) = \frac{2^2}{2} = \frac{4}{2} = 2$
(For outcome 2)
$Z(3) = \frac{3^2}{2} = \frac{9}{2} = 4.5$
(For outcome 3)
$Z(4) = \frac{4^2}{2} = \frac{16}{2} = 8$
(For outcome 4)
$Z(5) = \frac{5^2}{2} = \frac{25}{2} = 12.5$
(For outcome 5)
$Z(6) = \frac{6^2}{2} = \frac{36}{2} = 18$
(For outcome 6)
Visual Mapping and Range: The Range of $Z$ is the set of all calculated values:
$Range(Z) = \{0.5, 2, 4.5, 8, 12.5, 18\}$
As seen in the visual representation, each outcome $w$ on the left side (Sample Space) is linked to a unique real number on the right side (Real Line) via the specific rule $Z(w)$. This confirms that $Z$ is a well-defined function.
Tabular Summary
The relationship between the die face and the random variable $Z$ can be summarized in the following table:
| Die Face ($w$) | Calculation ($\frac{w^2}{2}$) | Value of $Z(w)$ |
|---|---|---|
| 1 | $1 \div 2$ | $0.5$ |
| 2 | $4 \div 2$ | $2$ |
| 3 | $9 \div 2$ | $4.5$ |
| 4 | $16 \div 2$ | $8$ |
| 5 | $25 \div 2$ | $12.5$ |
| 6 | $36 \div 2$ | $18$ |
Example 4: Success and Failure (Even/Odd)
Consider the experiment of rolling a single fair die. The sample space $S$ consists of six equally likely outcomes:
$S = \{1, 2, 3, 4, 5, 6\}$
We define our criteria for Success based on the parity (even or odd nature) of the number appearing on the upper face.
Defining the Random Variable $Y$
Let the random variable $Y$ be defined such that:
• Success (1): The number on the die is Even.
• Failure (0): The number on the die is Odd.
Using the notation for piecewise functions, we can represent $Y$ as:
$Y(w) = \begin{cases} 1 & , & w \in \{2, 4, 6\} \\ 0 & , & w \in \{1, 3, 5\} \end{cases}$
1. The Domain
The domain of the function $Y$ is the complete set of outcomes in the sample space:
$Domain(Y) = \{1, 2, 3, 4, 5, 6\}$
(Sample Space)
2. The Range
Since every outcome is mapped to either 0 or 1, the range (the set of possible values the variable can take) is significantly smaller than the domain:
$Range(Y) = \{0, 1\}$
3. Visual Representation
This is a classic example of a Many-to-One mapping, where multiple distinct outcomes from the sample space correspond to the same real number in the range.
Functional Values Table
The following table summarizes the mapping for each individual outcome $w$ of the experiment:
| Outcome ($w$) | Classification | Value $Y(w)$ |
|---|---|---|
| 1 | Odd (Failure) | $0$ |
| 2 | Even (Success) | $1$ |
| 3 | Odd (Failure) | $0$ |
| 4 | Even (Success) | $1$ |
| 5 | Odd (Failure) | $0$ |
| 6 | Even (Success) | $1$ |
Probability Calculation
For a fair die, we can calculate the probability of each value in the range of $Y$:
$P(Y=1) = \frac{3}{6} = \frac{1}{2}$
[Probability of Success]
$P(Y=0) = \frac{3}{6} = \frac{1}{2}$
[Probability of Failure]
Types of Random Variables
1. Discrete Random Variable
A random variable is called Discrete if it assumes only a finite or countably infinite number of values. These values are distinct and can be listed or counted.
Examples:
• The number of heads in $n$ tosses of a coin.
• The number of runs scored by an Indian batsman in an over in IPL (can be 0, 1, 2, 3, 4, 6).
• The number of children in a family in Mumbai.
2. Continuous Random Variable
A random variable which can assume a non-countably infinite number of values is called a Continuous Random Variable. It can take any value within a specified interval or range of real numbers.
Examples:
• The height of a student in a class in Delhi (e.g., $165.45$ cm).
• The time taken by a commuter to reach New Delhi Railway Station (could be 30.5 mins, 30.55 mins, etc.).
• The weight of a 5 kg bag of Basmati Rice, which may vary slightly at the milligram level.
| Attribute | Discrete | Continuous |
|---|---|---|
| Values | Countable (Isolated points) | Uncountable (Intervals) |
| Probability at a point | Can be non-zero $P(X=x) > 0$ | Typically zero $P(X=x) = 0$ |
| Example | Number of cars in a parking lot. | Temperature in Rajasthan. |
Example 1. Consider the experiment of rolling a fair die. Define a random variable $X$ as the square of the number appearing on the top face. Determine the domain and range of $X$.
Answer:
Step 1: Identify the Sample Space (Domain)
The outcomes of rolling a die are $S = \{1, 2, 3, 4, 5, 6\}$.
Step 2: Apply the Rule of the Random Variable
The rule is $X(w) = w^2$.
$X(1) = 1^2 = 1$
$X(2) = 2^2 = 4$
$X(3) = 3^2 = 9$
$X(4) = 4^2 = 16$
$X(5) = 5^2 = 25$
$X(6) = 6^2 = 36$
Step 3: State the Range
The range of the random variable $X$ is $\{1, 4, 9, 16, 25, 36\}$.
Example 2. A fruit seller in a Mandi (local market) picks 3 apples from a crate to check for quality. Suppose $10\%$ of the apples in the crate are bruised. Let $X$ denote the number of bruised apples found by the seller.
Identify the Random Variable $X$ and its possible values.
Answer:
Let $B$ represent a bruised apple and $G$ represent a good apple. The seller picks 3 apples, so the sample space $S$ consists of $2^3 = 8$ outcomes:
$S = \{GGG, GGB, GBG, BGG, GBB, BGB, BBG, BBB\}$
The random variable $X$ is the "number of bruised apples." The mapping is as follows:
$X(GGG) = 0$
(No bruised apples)
$X(GGB) = X(GBG) = X(BGG) = 1$
(One bruised apple)
$X(GBB) = X(BGB) = X(BBG) = 2$
(Two bruised apples)
$X(BBB) = 3$
(Three bruised apples)
The Range of $X$ is $\{0, 1, 2, 3\}$. The actual value $X$ takes depends on the random selection (chance) from the crate.
Probability Distribution of Random Variables
Just as a frequency distribution provides a summary of how often different values occur in a dataset, a Probability Distribution provides a summary of the probabilities associated with all possible values of a random variable. It is a complete description of a random phenomenon in numerical terms.
Formal Definition
If $X$ is a discrete random variable that can assume values $x_1, x_2, x_3, \dots, x_n$ with corresponding probabilities $P_1, P_2, P_3, \dots, P_n$, then the collection of these values and their probabilities is called the Probability Distribution of $X$.
It is traditionally represented in a tabular form as follows:
| $X$ | $x_1$ | $x_2$ | $x_3$ | ... | $x_n$ |
| $P(X = x_i)$ | $P_1$ | $P_2$ | $P_3$ | ... | $P_n$ |
Essential Properties of Probability Distribution
For any valid probability distribution, two fundamental conditions must be satisfied:
1. Non-negativity Condition
Each individual probability must be non-negative and cannot exceed 1. This means the event is neither impossible (negative) nor more than certain.
$0 \leq P(x_k) \leq 1$
for $k = 1, 2, \dots, n$
2. Summation Condition
The sum of all probabilities in the distribution must be exactly equal to 1. This is because the values $x_1, x_2, \dots, x_n$ represent all possible mutually exclusive and exhaustive outcomes of the sample space.
$\sum\limits_{i=1}^{n} P_i = 1$
[Sum of total probability]
Mathematical Notations
In advanced statistics, specific notation is used to avoid confusion:
• Capital Letters ($X, Y, Z$): Used to denote the Random Variable itself (the function).
• Small Letters ($x, y, z$): Used to denote the specific values that the random variable can assume.
Example: $P(X = x)$ reads as "The probability that the random variable $X$ takes the specific value $x$."
Cumulative and Tail Probabilities
Often, we are interested in the probability of a range of values rather than a single point.
A. Cumulative Probability ($X \leq x_i$)
This is the sum of probabilities of all values less than or equal to $x_i$.
$P(X \leq x_i) = P_1 + P_2 + \dots + P_i$
B. Tail Probability ($X \geq x_i$)
This is the sum of probabilities of all values greater than or equal to $x_i$.
$P(X \geq x_i) = P_i + P_{i+1} + \dots + P_n$
Example. Two unbiased coins are tossed. Let $X$ denote the number of heads obtained. Construct the probability distribution table for $X$.
Answer:
Step 1: Identify the Sample Space ($S$)
$S = \{HH, HT, TH, TT\}$. The total number of outcomes is $4$.
Step 2: Determine possible values of $X$
$X$ can be 0 (no heads), 1 (one head), or 2 (two heads).
Step 3: Calculate individual probabilities
1. For $X = 0$: Only the outcome $\{TT\}$ satisfies this.
$P(X = 0) = P(TT) = \frac{1}{4}$
2. For $X = 1$: The outcomes $\{HT, TH\}$ satisfy this.
$P(X = 1) = P(HT) + P(TH) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2}$
3. For $X = 2$: Only the outcome $\{HH\}$ satisfies this.
$P(X = 2) = P(HH) = \frac{1}{4}$
Step 4: Verification
Check if $\sum P(X) = 1$:
$\frac{1}{4} + \frac{1}{2} + \frac{1}{4} = \frac{1+2+1}{4} = \frac{4}{4} = 1$. (Condition satisfied)
Step 5: Probability Distribution Table
| $X$ | $0$ | $1$ | $2$ |
| $P(X)$ | $\frac{1}{4}$ | $\frac{1}{2}$ | $\frac{1}{4}$ |
Mean and Variance of Probability Distribution
In statistics, just as we calculate the average and spread of data in a frequency distribution, we can calculate the central tendency and dispersion of a random variable. These measures help us predict the long-term behavior of a random experiment.
1. Mean of a Random Variable (Mathematical Expectation)
The Mean of a random variable is the weighted average of all possible values, where the weights are the respective probabilities. It is also called the Expected Value or Expectation and is denoted by $\mu$ or $E(X)$.
Derivation:
In a frequency distribution, the mean is calculated as:
$\bar{x} = \frac{\sum\limits_{i=1}^{n} f_i x_i}{\sum\limits_{i=1}^{n} f_i}$
[Frequency Mean]
In a probability distribution, the total frequency $\sum f_i$ is replaced by the total probability $\sum p_i$. Since we know that $\sum\limits_{i=1}^{n} p_i = 1$, the formula simplifies to:
$E(X) = \mu = \sum\limits_{i=1}^{n} p_i x_i$
Expanding the summation:
$\mu = p_1 x_1 + p_2 x_2 + p_3 x_3 + \dots + p_n x_n$
2. Variance of a Random Variable
The Variance measures the spread of the random variable values around the mean. It is denoted by $Var(X)$ or $\sigma^2$ (sigma squared). It is defined as the expectation of the squared deviations from the mean.
$\sigma^2 = Var(X) = E[(X - \mu)^2]$
Derivation:
Using the definition of expectation, we can write:
$\sigma^2 = \sum\limits_{i=1}^{n} p_i (x_i - \mu)^2$
Expanding $(x_i - \mu)^2$:
$\sigma^2 = \sum\limits_{i=1}^{n} p_i (x_i^2 - 2\mu x_i + \mu^2)$
Distributing the summation and $p_i$:
$\sigma^2 = \sum p_i x_i^2 - \sum p_i (2\mu x_i) + \sum p_i \mu^2$
Since $\mu$ is a constant, we can take it outside the summation:
$\sigma^2 = \sum p_i x_i^2 - 2\mu \sum p_i x_i + \mu^2 \sum p_i$
Now, substituting the known values $\sum p_i x_i = \mu$ and $\sum p_i = 1$:
$\sigma^2 = \sum p_i x_i^2 - 2\mu(\mu) + \mu^2(1)$
$\sigma^2 = \sum p_i x_i^2 - 2\mu^2 + \mu^2$
$\sigma^2 = \sum p_i x_i^2 - \mu^2$
Using the $E(X)$ notation where $E(X^2) = \sum p_i x_i^2$, we get the standard computational formula:
$Var(X) = E(X^2) - [E(X)]^2$
[Hence Proved]
3. Standard Deviation (S.D.)
The Standard Deviation is the positive square root of the variance. It is expressed in the same units as the random variable $X$.
$\sigma = \sqrt{Var(X)} = \sqrt{\sum p_i x_i^2 - \mu^2}$
Example. A local vendor in Jaipur estimates his daily profit based on weather conditions. The probability distribution of his profit $X$ (in $\textsf{₹}$ hundreds) is given below:
| Profit ($X$) | 10 | 15 | 20 |
|---|---|---|---|
| $P(X)$ | 0.2 | 0.5 | 0.3 |
Calculate the Mean profit and the Variance.
Answer:
I. Calculation for Mean ($E(X)$):
$E(X) = \sum p_i x_i$
$E(X) = (10 \times 0.2) + (15 \times 0.5) + (20 \times 0.3)$
$E(X) = 2.0 + 7.5 + 6.0 = 15.5$
Mean Profit = $\textsf{₹} 15.5 \times 100 = \textsf{₹} 1,550$.
II. Calculation for Variance ($Var(X)$):
First, we find $E(X^2)$:
$E(X^2) = \sum p_i x_i^2$
$E(X^2) = (10^2 \times 0.2) + (15^2 \times 0.5) + (20^2 \times 0.3)$
$E(X^2) = (100 \times 0.2) + (225 \times 0.5) + (400 \times 0.3)$
$E(X^2) = 20 + 112.5 + 120 = 252.5$
Now, applying the Variance formula:
$Var(X) = E(X^2) - [E(X)]^2$
$Var(X) = 252.5 - (15.5)^2$
$Var(X) = 252.5 - 240.25 = 12.25$
Standard Deviation ($\sigma$):
$\sigma = \sqrt{12.25} = 3.5$
Comparison Summary
| Measure | Symbol | Formula |
|---|---|---|
| Mean | $E(X)$ or $\mu$ | $\sum\limits p_i x_i$ |
| Variance | $Var(X)$ or $\sigma^2$ | $E(X^2) - \mu^2$ |
| Standard Deviation | $\sigma$ | $\sqrt{E(X^2) - \mu^2}$ |
Binomial Experiment
A Binomial Experiment is a special type of random experiment consisting of a series of repeated trials, where each trial results in exactly two outcomes. These individual trials are often called Bernoulli Trials, named after the Swiss mathematician Jacob Bernoulli.
The Concept of Success and Failure
In probability theory, the two possible outcomes of a trial are traditionally labeled as "Success" and "Failure". It is important to note that "Success" does not necessarily imply a positive result in real-life terms. It simply refers to the occurrence of the specific event we are tracking.
Examples:
1. In an Indian Court Trial, the outcomes are 'Guilty' or 'Not Guilty'. We might define 'Guilty' as a success for statistical tracking.
2. In a study of road accidents on the Mumbai-Pune Expressway, "meeting with an accident" would be labeled as a Success if that is the event under study.
3. In a CBSE Board Exam, a student may score any mark from 0 to 100. However, if we define "Passing" as scoring more than $33$ marks, the experiment becomes Binomial: Success (Marks $> 33$) and Failure (Marks $\leq 33$).
Conditions for a Binomial Experiment
For a sequence of trials to be classified as a Binomial Experiment, it must strictly satisfy the following four conditions:
1. Finite Number of Trials ($n$)
The number of trials, denoted by $n$, must be fixed and determined before the experiment begins.
2. Dichotomous Outcomes
Each trial must result in only two possible outcomes: Success ($S$) or Failure ($F$).
3. Independence of Trials
The outcome of any single trial must not affect the outcome of any other trial. Every trial is an independent event.
4. Constant Probability
The probability of success, denoted by $p$, must remain identical for every trial. Consequently, the probability of failure, denoted by $q$, also remains constant.
$p + q = 1$
$q = 1 - p$
Comparison: Binomial vs. Non-Binomial Experiments
Understanding what is not a binomial experiment is equally important.
| Experiment Description | Classification | Reason / Violation |
|---|---|---|
| Tossing a fair coin 20 times to count Heads. | Binomial | $n=20$, $p=0.5$ (constant), Independent. |
| Rolling a die until a '6' is obtained. | Non-Binomial | The number of trials $n$ is not fixed beforehand. |
| Drawing 5 cards with replacement to count Aces. | Binomial | Replacement ensures $p$ remains constant and trials stay independent. |
| Drawing 5 cards without replacement. | Non-Binomial | Probability $p$ changes after each draw; trials are dependent. |
| Recording 'Spade, Heart, Diamond, or Club'. | Non-Binomial | There are four outcomes instead of two. |
Example. A fair die is rolled 10 times. A "Success" is defined as getting a number greater than 4. Identify if this is a Binomial Experiment and find $p$ and $q$.
Answer:
Step 1: Check conditions
• $n = 10$ (Fixed and finite).
• Each roll is independent of the other.
• Only two outcomes: Success ($>4$) or Failure ($\leq 4$).
Therefore, it is a Binomial Experiment.
Step 2: Calculate $p$ (Probability of Success)
Success outcomes in a single roll = $\{5, 6\}$. Total outcomes = $\{1, 2, 3, 4, 5, 6\}$.
$p = \frac{2}{6} = \frac{1}{3}$
Step 3: Calculate $q$ (Probability of Failure)
$q = 1 - p = 1 - \frac{1}{3} = \frac{2}{3}$
Result: The experiment is Binomial with $n=10, p=\frac{1}{3}, q=\frac{2}{3}$.
Derivation of Binomial Distribution
Let us consider a sequence of $n$ independent trials. Let $X$ be the random variable representing the number of successes. The value of $X$ can range from $0$ (no success) to $n$ (all successes).
Let:
$p = \text{Probability of success in a single trial}$
$q = \text{Probability of failure in a single trial}$
$p + q = 1$
(By Definition)
1. Probability of a Single Specific Sequence
If we want exactly $r$ successes and $(n-r)$ failures in a specific order (e.g., all successes first, then all failures), the sequence looks like: $\underbrace{S, S, \dots, S}_{r \text{ times}}, \underbrace{F, F, \dots, F}_{n-r \text{ times}}$.
Since the trials are independent, the probability of this specific sequence is:
$\underbrace{p \cdot p \cdot \dots \cdot p}_{r \text{ times}} \times \underbrace{q \cdot q \cdot \dots \cdot q}_{n-r \text{ times}} = p^r q^{n-r}$
2. Number of Such Sequences
However, the $r$ successes can occur in any of the $n$ trials. The number of ways to choose $r$ positions for successes out of $n$ total trials is given by the combination formula:
${}^nC_r = \frac{n!}{r!(n-r)!}$
[Combinatorics Rule]
3. General Formula (Probability Mass Function)
By multiplying the probability of one sequence by the total number of possible sequences, we arrive at the probability of obtaining exactly $r$ successes:
$P(X = r) = {}^nC_r p^r q^{n-r}$
where $r = 0, 1, 2, \dots, n$.
The Probability Distribution Table
The successive probabilities $P(X=0), P(X=1), \dots, P(X=n)$ correspond to the terms in the binomial expansion of $(q+p)^n$.
| Number of Successes ($X$) | Probability $P(X = r)$ |
|---|---|
| $0$ | ${}^nC_0 p^0 q^n = q^n$ |
| $1$ | ${}^nC_1 p^1 q^{n-1}$ |
| $2$ | ${}^nC_2 p^2 q^{n-2}$ |
| ... | ... |
| $n$ | ${}^nC_n p^n q^0 = p^n$ |
Verification: The sum of all probabilities is $\sum\limits_{r=0}^{n} {}^nC_r p^r q^{n-r} = (q+p)^n = (1)^n = 1$.
Parameters and Notations
A Binomial Distribution is completely determined by two values, known as its Parameters:
1. $n$: The number of trials.
2. $p$: The probability of success in each trial.
In standard statistical notation, we write:
$X \sim B(n, p)$
This is read as: "Random variable $X$ follows a Binomial distribution with parameters $n$ and $p$."
Key Results
The following shortcut cases are frequently used:
A. All Trials are Successes
$P(X = n) = p^n$
B. All Trials are Failures
$P(X = 0) = q^n = (1 - p)^n$
C. At least one Success
Instead of adding $P(1) + P(2) \dots$, we use the Complementary Rule:
$P(X \geq 1) = 1 - P(X = 0)$
$P(X \geq 1) = 1 - q^n$
D. At least $r$ Successes
$P(X \geq r) = \sum\limits_{k=r}^{n} {}^nC_k p^k q^{n-k}$
E. At most $r$ Successes
$P(X \leq r) = \sum\limits_{k=0}^{r} {}^nC_k p^k q^{n-k}$
Important Property: Mean and Variance
For a Binomial distribution $X \sim B(n, p)$:
• Mean ($\mu$): $np$
• Variance ($\sigma^2$): $npq$
• Standard Deviation ($\sigma$): $\sqrt{npq}$
Example. A student takes a multiple-choice test consisting of 5 questions. Each question has 4 options, with only one correct answer. If the student guesses randomly, find the probability of getting exactly 3 correct answers.
Answer:
This is a Binomial experiment because trials are independent and $p$ is constant.
Given:
$n = 5$ (Total questions)
$p = \frac{1}{4} = 0.25$ (Probability of a correct guess)
$q = 1 - p = \frac{3}{4} = 0.75$ (Probability of an incorrect guess)
To Find: $P(X = 3)$
Using the Binomial formula:
$P(X = 3) = {}^5C_3 \left(\frac{1}{4}\right)^3 \left(\frac{3}{4}\right)^{5-3}$
$P(X = 3) = 10 \times \left(\frac{1}{64}\right) \times \left(\frac{9}{16}\right)$
$P(X = 3) = \frac{90}{1024} \approx 0.0878$
Result: The probability of getting exactly 3 correct answers is approximately $8.78\%$.
Mean and Variance of Binomial Distribution
To analyze the behavior of a Binomial Distribution, we look at its central tendency and spread. These are represented by the Mean (Expected Value) and Variance.
1. Mean of Binomial Distribution
The Mean of a random variable $X$ following a binomial distribution is the average number of successes we expect in $n$ trials. It is denoted by $E(X)$ or $\mu$.
Derivation:
For a binomial distribution, the probability of $r$ successes is $P(r) = {}^nC_r p^r q^{n-r}$. The expected value is given by:
$E(X) = \sum\limits_{r=0}^{n} r \cdot P(r) = \sum\limits_{r=0}^{n} r \cdot {}^nC_r p^r q^{n-r}$
Since the term for $r=0$ becomes zero, we can start the summation from $r=1$:
$E(X) = \sum\limits_{r=1}^{n} r \cdot \frac{n}{r} {}^{n-1}C_{r-1} p^r q^{n-r}$
Taking $np$ common outside the summation:
$E(X) = np \sum\limits_{r=1}^{n} {}^{n-1}C_{r-1} p^{r-1} q^{(n-1)-(r-1)}$
Let $k = r-1$. As $r$ goes from $1$ to $n$, $k$ goes from $0$ to $n-1$:
$E(X) = np \sum\limits_{k=0}^{n-1} {}^{n-1}C_{k} p^{k} q^{(n-1)-k}$
Using the binomial expansion property, the summation equals $(q+p)^{n-1}$. Since $q+p = 1$:
$E(X) = np(1)^{n-1} = np$
[Hence Proved]
2. Variance of Binomial Distribution
The Variance, denoted by $Var(X)$ or $\sigma^2$, measures the dispersion of the number of successes around the mean.
Derivation:
We use the formula: $Var(X) = E(X^2) - [E(X)]^2$. To find $E(X^2)$, we rewrite $r^2$ as $r(r-1) + r$:
$E(X^2) = \sum\limits_{r=0}^{n} r(r-1) P(r) + \sum\limits_{r=0}^{n} r P(r)$
The second part is simply $E(X) = np$. For the first part, terms for $r=0$ and $r=1$ vanish:
$\sum\limits_{r=2}^{n} r(r-1) \frac{n(n-1)}{r(r-1)} {}^{n-2}C_{r-2} p^r q^{n-r} + np$
$n(n-1)p^2 \sum\limits_{r=2}^{n} {}^{n-2}C_{r-2} p^{r-2} q^{n-r} + np$
The summation again equals $(q+p)^{n-2} = 1$:
$E(X^2) = n(n-1)p^2 + np$
Now, calculating Variance:
$Var(X) = [n(n-1)p^2 + np] - (np)^2$
$Var(X) = n^2p^2 - np^2 + np - n^2p^2$
$Var(X) = np - np^2 = np(1-p)$
$Var(X) = npq$
[Hence Proved]
The Standard Deviation (S.D.) is the square root of variance:
$\sigma = \sqrt{npq}$
Summary of Formulas
| Property | Formula |
|---|---|
| Mean ($\mu$) | $np$ |
| Variance ($\sigma^2$) | $npq$ |
| Standard Deviation ($\sigma$) | $\sqrt{npq}$ |
Testing for Bias (Statistical Significance)
In real-world applications, like checking if a ₹10 coin or a die used in a board game in India is biased, we compare the actual observation with the expected mean and standard deviation.
Rules for Bias Detection:
| Observation Range | Conclusion |
|---|---|
| Value lies in $\mu \pm 2\sigma$ | Unbiased (Fair) |
| Value between $\pm 2\sigma$ and $\pm 3\sigma$ | Perhaps Biased |
| Value lies outside $\pm 3\sigma$ | Definitely Biased |
Example. A fair coin is tossed 400 times (e.g., during a sequence of cricket match tosses). Calculate the expected number of heads and the standard deviation.
Answer:
Given:
$n = 400$
$p = \frac{1}{2}$ (Probability of Heads)
$q = \frac{1}{2}$ (Probability of Tails)
I. Calculation for Mean ($\mu$):
$\mu = np = 400 \times \frac{1}{2} = 200$
II. Calculation for Standard Deviation ($\sigma$):
First, find Variance:
$\sigma^2 = npq = 400 \times \frac{1}{2} \times \frac{1}{2} = 100$
Now, Standard Deviation:
$\sigma = \sqrt{100} = 10$
Conclusion: We expect 200 heads. If the actual number of heads in a test is 215, it falls within $200 \pm 2(10)$ i.e., $[180, 220]$, so the coin is Unbiased.