Binomial Distribution
Binomial Experiment: Definition and Bernoulli Trials
Bernoulli Trial
The foundation of the binomial distribution is a simple type of random experiment called a **Bernoulli trial**. A Bernoulli trial is a single experiment with the following characteristics:
- It has exactly **two** possible outcomes. These outcomes are conventionally labeled as "success" and "failure". These labels are arbitrary and depend on the event of interest (e.g., getting a head can be "success", or getting a number greater than 4 on a die roll can be "success").
- The probability of the outcome labeled "success" is constant for each trial and is denoted by $p$.
- The probability of the outcome labeled "failure" is constant for each trial and is denoted by $q$.
- Since there are only two possible outcomes and they are mutually exclusive and exhaustive, the probability of success and the probability of failure must sum to 1:
$$p + q = 1$$
... (1)
This also means $q = 1 - p$.
Examples of Bernoulli Trials:
- Tossing a fair coin once (Success = Head, $p=0.5$; Failure = Tail, $q=0.5$).
- Rolling a standard six-sided die once (Success = Rolling a 6, $p=1/6$; Failure = Not rolling a 6, $q=5/6$).
- Checking if a randomly selected manufactured item is defective (Success = Defective, $p$ = probability of defective; Failure = Not Defective, $q = 1-p$).
- Asking a randomly selected person if they approve of a policy (Success = Approve, $p$; Failure = Not Approve, $q$).
Binomial Experiment
A **binomial experiment** is a specific type of statistical experiment that consists of a fixed number of repeated, independent Bernoulli trials. For an experiment to be classified as binomial, it must satisfy the following four conditions:
-
Fixed Number of Trials ($n$):
The experiment consists of a predetermined, fixed number of trials, denoted by $n$. The number of trials does not change during the experiment.
-
Two Outcomes per Trial:
Each trial must have only two possible outcomes, which can be classified as "success" (S) or "failure" (F), as in a Bernoulli trial.
-
Constant Probability of Success ($p$):
The probability of obtaining a "success", denoted by $p$, must remain exactly the same from one trial to the next. Consequently, the probability of "failure", $q = 1 - p$, is also constant for all trials.
-
Independent Trials:
The outcome of any single trial must be independent of the outcomes of all other trials. This means knowing the result of one trial does not change the probability of success or failure on any other trial.
In a binomial experiment, the random variable of interest is typically defined as the **total number of successes** that occur over the $n$ trials. The possible values for this random variable are $0, 1, 2, \dots, n$.
Example
Example 1. Consider tossing a fair coin 5 times. Let success be getting a Head. Is this a binomial experiment?
Answer:
Given: Tossing a fair coin 5 times. Success = getting a Head.
To Determine: If this is a binomial experiment.
Solution:
We check if the experiment meets the four conditions for a binomial experiment:
- Fixed Number of Trials ($n$): The coin is tossed exactly 5 times. So, $n=5$, which is a fixed number. This condition is met.
- Two Outcomes: Each individual coin toss has only two possible outcomes: Head or Tail. We have defined getting a Head as "Success" and getting a Tail as "Failure". This condition is met.
- Constant Probability of Success ($p$): The coin is fair. The probability of getting a Head on any single toss is $P(\text{Head}) = 0.5$. This probability remains the same for each of the 5 tosses. So, $p=0.5$. The probability of failure is $q = 1 - 0.5 = 0.5$, which is also constant. This condition is met.
- Independent Trials: The outcome of one coin toss does not influence or affect the outcome of any other coin toss. Each toss is independent of the others. This condition is met.
Since all four conditions are satisfied, this experiment is a **binomial experiment**.
The random variable associated with this experiment would be the number of heads obtained in the 5 tosses. This random variable can take values $0, 1, 2, 3, 4, 5$.
Binomial Distribution: Definition and Probability Mass Function $P(X=k) = \binom{n}{k} p^k q^{n-k}$
Definition
The **Binomial Distribution** is a discrete probability distribution that models the number of "successes" in a binomial experiment. It describes the probability of obtaining exactly $k$ successes in a fixed number of $n$ independent Bernoulli trials, where each trial has the same probability of success $p$.
If a random variable $X$ represents the number of successes in a binomial experiment with $n$ trials and probability of success $p$, we say that $X$ follows a binomial distribution and denote it as $X \sim B(n, p)$. The parameters of the binomial distribution are $n$ and $p$.
The possible values for the random variable $X$ (number of successes) are integers from 0 to $n$: $0, 1, 2, \dots, n$.
Probability Mass Function (PMF)
For a discrete random variable $X$ that follows a binomial distribution with parameters $n$ and $p$, the probability of obtaining exactly $k$ successes is given by the **Binomial Probability Formula**, which is also the **Probability Mass Function (PMF)** of the binomial distribution.
The formula is:
$$P(X=k) = \binom{n}{k} p^k q^{n-k}$$
... (1)
This formula is valid for $k = 0, 1, 2, \dots, n$.
Where:
- $n$: The total number of trials in the binomial experiment.
- $k$: The specific number of successes we are interested in (must be an integer between 0 and $n$).
- $p$: The probability of success on a single trial ($0 \le p \le 1$).
- $q$: The probability of failure on a single trial ($q = 1 - p$).
- $\binom{n}{k}$: The **binomial coefficient**, read as "n choose k". It represents the number of different ways to choose exactly $k$ successes from $n$ trials without regard to the order of successes. It is calculated using the factorial function:
$$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$
... (2)
where $n!$ (n factorial) is the product of all positive integers up to $n$ ($n! = n \times (n-1) \times \dots \times 2 \times 1$), and $0!$ is defined as 1.
Explanation of the Binomial Probability Formula
The binomial probability formula $P(X=k) = \binom{n}{k} p^k q^{n-k}$ combines two components:
- $(p^k q^{n-k})$: This part represents the probability of obtaining one **specific sequence** of $n$ outcomes that contains exactly $k$ successes and $(n-k)$ failures. For example, if $n=4$ and $k=3$ (like SSFS), the probability is $p \cdot p \cdot q \cdot p = p^3 q^1 = p^3 q^{4-3}$. Since the trials are independent, we multiply the probabilities of the individual outcomes.
- $\binom{n}{k}$: This part represents the **number of different possible sequences** of $n$ outcomes that result in exactly $k$ successes and $(n-k)$ failures. For example, with $n=3$ trials and $k=2$ successes, the possible sequences are SSF, SFS, FSS. The number of such sequences is $\binom{3}{2} = \frac{3!}{2!(3-2)!} = \frac{6}{2 \times 1} = 3$. The binomial coefficient counts these arrangements.
By multiplying these two components, the formula gives the total probability of obtaining $k$ successes, considering all the different ways those $k$ successes can be arranged among the $n$ trials.
Example
Example 1. A fair coin is tossed 4 times. What is the probability of getting exactly 3 heads?
Answer:
Given: Tossing a fair coin 4 times. Event of interest: getting exactly 3 heads.
To Find: The probability of getting exactly 3 heads.
Solution:
This scenario meets the conditions of a binomial experiment:
- Fixed number of trials: $n=4$ (the coin is tossed 4 times).
- Two outcomes per trial: Success = Head (H), Failure = Tail (T).
- Constant probability of success: The coin is fair, so $P(\text{Head}) = p = 0.5$ for each toss. $P(\text{Tail}) = q = 1 - p = 0.5$.
- Independent trials: The outcome of one toss does not affect the others.
We are interested in the probability of getting exactly $k=3$ successes (heads).
Using the binomial probability formula $P(X=k) = \binom{n}{k} p^k q^{n-k}$ (Formula 1):
Substitute $n=4$, $k=3$, $p=0.5$, $q=0.5$:
$$P(X=3) = \binom{4}{3} (0.5)^3 (0.5)^{4-3}$$
... (iii)
First, calculate the binomial coefficient $\binom{4}{3}$:
$$\binom{4}{3} = \frac{4!}{3!(4-3)!} = \frac{4!}{3!1!}$$
... (iv)
$$\binom{4}{3} = \frac{4 \times \cancel{3!}}{\cancel{3!} \times 1!} = \frac{4}{1} = 4$$
($4! = 4 \times 3!, 1! = 1$)
$$\binom{4}{3} = 4$$
... (v)
Now, calculate the powers of $p$ and $q$:
- $p^k = (0.5)^3 = 0.5 \times 0.5 \times 0.5 = 0.125$.
- $q^{n-k} = (0.5)^{4-3} = (0.5)^1 = 0.5$.
Substitute these values and the binomial coefficient back into formula (iii):
$$P(X=3) = 4 \times (0.125) \times (0.5)$$
... (vi)
$$P(X=3) = 4 \times 0.0625$$
($0.125 \times 0.5 = 0.0625$)
$$P(X=3) = 0.25$$
... (vii)
The probability of getting exactly 3 heads in 4 tosses of a fair coin is 0.25 or 1/4.
Mean and Variance of Binomial Distribution
For a random variable $X$ that follows a binomial distribution, denoted as $X \sim B(n, p)$, where $n$ is the number of trials and $p$ is the probability of success on a single trial, the expected value (mean) and variance can be calculated using straightforward formulas derived from the properties of expected value and variance.
Mean (Expected Value) of a Binomial Distribution
The mean or expected value of a binomial random variable $X$ represents the theoretical average number of successes one would expect to observe over $n$ independent trials, each with success probability $p$.
Formula for the Mean of a Binomial Distribution:
$$E(X) = \mu = np$$
... (1)
Where:
- $n$ is the number of trials.
- $p$ is the probability of success on a single trial.
Intuition: The formula $np$ is intuitive. If you toss a fair coin ($p=0.5$) 10 times ($n=10$), you would expect, on average, $10 \times 0.5 = 5$ heads. If you roll a die ($p=1/6$ for rolling a 6) 12 times ($n=12$), you would expect, on average, $12 \times (1/6) = 2$ sixes.
Derivation Outline: The formula $E(X) = np$ can be formally derived from the definition of expected value for a discrete random variable $E(X) = \sum_{k=0}^{n} k \cdot P(X=k)$. Substituting the binomial PMF, we get $E(X) = \sum_{k=0}^{n} k \binom{n}{k} p^k q^{n-k}$. This sum involves manipulating binomial coefficients and can be shown to simplify to $np$.
Variance of a Binomial Distribution
The variance of a binomial random variable $X$ measures the spread or variability in the number of successes around the mean ($np$). It quantifies how much the actual number of successes is likely to deviate from the expected number.
Let $q = 1 - p$ be the probability of failure on a single trial.
Formula for the Variance of a Binomial Distribution:
$$Var(X) = \sigma^2 = npq$$
... (2)
Where:
- $n$ is the number of trials.
- $p$ is the probability of success on a single trial.
- $q = 1 - p$ is the probability of failure on a single trial.
Derivation Outline: The variance can be derived using the formula $Var(X) = E(X^2) - [E(X)]^2$. This requires first calculating $E(X^2) = \sum_{k=0}^{n} k^2 \binom{n}{k} p^k q^{n-k}$. This summation is more involved than the calculation for $E(X)$ and typically relies on advanced combinatorial identities or moment generating functions. The result of $E(X^2)$ simplifies to $npq + (np)^2$. Substituting this and $E(X)=np$ into the variance formula gives $Var(X) = (npq + (np)^2) - (np)^2 = npq$.
Standard Deviation of a Binomial Distribution
The standard deviation ($\sigma$) of a binomial random variable is the positive square root of its variance. It is a measure of spread in the same units as the number of successes.
Formula for the Standard Deviation of a Binomial Distribution:
$$SD(X) = \sigma = \sqrt{npq}$$
... (3)
Example
Example 1. If a fair six-sided die is rolled 12 times, what is the mean and standard deviation of the number of times a '6' appears?
Answer:
Given: Rolling a fair six-sided die 12 times. Random variable X = number of times a '6' appears.
To Find: The mean and standard deviation of X.
Solution:
This experiment meets the conditions of a binomial experiment:
- Fixed number of trials: $n=12$.
- Two outcomes per trial: Success = Rolling a '6', Failure = Not rolling a '6'.
- Constant probability of success: For a fair die, $P(\text{rolling a 6}) = 1/6$. So, $p = 1/6$.
- Constant probability of failure: $q = 1 - p = 1 - 1/6 = 5/6$.
- Independent trials: Each roll is independent.
The random variable $X$, the number of times a '6' appears, follows a binomial distribution $X \sim B(12, 1/6)$.
Calculate the Mean (Expected Value):
Using the formula $E(X) = np$ (Formula 1):
$$E(X) = 12 \times \frac{1}{6}$$
... (iv)
$$E(X) = \frac{12}{6} = 2$$
... (v)
The mean number of times a '6' appears in 12 rolls is 2. This means, on average, you expect to roll a '6' twice.
Calculate the Variance:
Using the formula $Var(X) = npq$ (Formula 2):
$$Var(X) = 12 \times \frac{1}{6} \times \frac{5}{6}$$
... (vi)
$$Var(X) = \cancel{12}^{2} \times \frac{1}{\cancel{6}} \times \frac{5}{6} = 2 \times \frac{5}{6}$$
$$Var(X) = \frac{10}{6} = \frac{5}{3}$$
... (vii)
The variance of the number of times a '6' appears is $\frac{5}{3}$.
Calculate the Standard Deviation:
Using the formula $SD(X) = \sqrt{Var(X)}$ (Formula 3):
$$SD(X) = \sqrt{\frac{5}{3}}$$
... (viii)
We can leave the answer as $\sqrt{5/3}$ or rationalize the denominator:
$$SD(X) = \frac{\sqrt{5}}{\sqrt{3}} = \frac{\sqrt{5} \times \sqrt{3}}{\sqrt{3} \times \sqrt{3}} = \frac{\sqrt{15}}{3}$$
... (ix)
Numerically, $\sqrt{5} \approx 2.236$, $\sqrt{3} \approx 1.732$, $\sqrt{15} \approx 3.873$.
$SD(X) \approx \sqrt{1.666...} \approx 1.291$.
$$SD(X) \approx 1.291$$
... (x)
Mean = 2.
Standard Deviation = $\sqrt{5/3}$ or $\frac{\sqrt{15}}{3}$ (approximately 1.291).
Properties and Applications of Binomial Distribution
Properties of the Binomial Distribution
The binomial distribution $X \sim B(n, p)$ has several important properties:
-
Parameters:
The binomial distribution is completely defined by its two parameters: the number of trials, $n$, and the probability of success on a single trial, $p$. Once $n$ and $p$ are known, the entire probability distribution (the PMF) is determined.
-
Discrete Nature:
It is a discrete probability distribution because the random variable $X$, representing the number of successes, can only take on a finite, countable set of integer values from 0 to $n$ ($0, 1, 2, \dots, n$).
-
Shape:
The shape of the binomial distribution is determined by the values of $n$ and $p$.
- If $p = 0.5$, the distribution is perfectly **symmetric** (like the number of heads in coin tosses). The histogram of probabilities will be balanced around the mean ($n/2$).
- If $p < 0.5$, the distribution is typically **skewed to the right** (positively skewed). This means the tail of the distribution is longer on the right side, towards higher values of $X$.
- If $p > 0.5$, the distribution is typically **skewed to the left** (negatively skewed). This means the tail is longer on the left side, towards lower values of $X$.
- As the number of trials $n$ increases, the shape of the binomial distribution becomes increasingly bell-shaped, resembling the normal distribution, regardless of the value of $p$. This approximation is generally considered good when both $np \ge 5$ (or 10) and $nq \ge 5$ (or 10).
-
Sum of Probabilities:
As with any valid probability distribution, the sum of the probabilities of all possible outcomes must be equal to 1. For the binomial distribution, this is guaranteed by the binomial theorem:
$$\sum_{k=0}^{n} P(X=k) = \sum_{k=0}^{n} \binom{n}{k} p^k q^{n-k} = (p+q)^n$$
... (i)
Since $p+q = 1$, $\sum P(X=k) = (1)^n = 1$.
Applications of the Binomial Distribution
The binomial distribution is one of the most frequently used discrete probability distributions in various fields because many real-world scenarios can be modeled as a series of independent Bernoulli trials. It is applicable in any situation where an experiment consists of a fixed number of independent trials, each resulting in one of two outcomes with constant probabilities.
Common Applications:
- **Quality Control and Manufacturing:** Estimating the probability of finding a specific number of defective items in a random sample from a production run, assuming the probability of a defect is constant for each item and defects are independent.
- **Medicine and Biology:** Calculating the probability of a certain number of patients recovering from a disease after a treatment, given a known recovery rate. Modeling the number of individuals with a specific trait in a population.
- **Market Research and Surveys:** Predicting the probability of getting a certain number of 'yes' or 'no' responses in a survey, assuming responses are independent.
- **Sports Analytics:** Modeling the number of successful free throws in a game, the number of successful hits in baseball, etc., assuming constant probability and independence.
- **Genetics:** Modeling the inheritance of simple genetic traits following Mendelian principles.
- **Reliability Engineering:** Estimating the probability of a certain number of components failing in a system within a specific time period, assuming independent failures.
- **Finance:** Modeling the number of successful outcomes in a series of independent investment decisions.
The key to applying the binomial distribution correctly is verifying that the four underlying conditions of a binomial experiment (fixed $n$, two outcomes, constant $p$, independent trials) are reasonably met by the real-world scenario.