| title | Bernoulli and Binomial Distributions | ||||||
|---|---|---|---|---|---|---|---|
| sidebar_label | Binomial | ||||||
| description | Understanding the foundations of binary outcomes: The Bernoulli trial and the Binomial distribution, essential for classification models. | ||||||
| tags |
|
In Machine Learning, we often ask "Yes/No" questions: Will a user click this ad? Is this transaction fraudulent? Does the image contain a cat? These binary outcomes are modeled using the Bernoulli and Binomial distributions.
A Bernoulli Distribution is the simplest discrete distribution. It represents a single trial with exactly two possible outcomes: Success (1) and Failure (0).
If
-
Mean (
$\mu$ ):$p$ -
Variance (
$\sigma^2$ ):$p(1-p)$
The Binomial Distribution is the sum of
For a variable to follow a Binomial distribution, it must meet these criteria:
- Binary: Only two outcomes per trial (Success/Failure).
- Independent: The outcome of one trial doesn't affect the next.
-
Number: The number of trials (
$n$ ) is fixed in advance. -
Same: The probability of success (
$p$ ) is the same for every trial.
The Probability Mass Function (PMF) is:
Where
graph TD
Start["$$n$$ Independent Trials"] --> Success["Success (p)"]
Start --> Failure["Failure (1-p)"]
Success --> Binomial["Binomial Distribution: $$X \sim B(n, p)$$"]
style Binomial fill:#f3f,color:#333,stroke:#333,stroke-width:2px
If we have
graph LR
%% Main Tree Structure
Root([Start]) --> H1["H ($$p$$)"]
Root --> T1["T ($$q$$)"]
H1 --> H2["HH ($$p^2$$)"]
H1 --> T2["HT ($$pq$$)"]
T1 --> H3["TH ($$qp$$)"]
T1 --> T3["TT ($$q^2$$)"]
%% Using a Subgraph to represent the "Note"
subgraph Logic ["The Binomial distribution"]
H2
T2
H3
T3
end
%% Styling for clarity
style Logic fill:#f5f5f5,stroke:#333,color:#333,stroke-dasharray: 5 5
style Root fill:#e1f5fe,color:#333,stroke:#01579b
When you train a Logistic Regression model, you are essentially assuming your target variable follows a Bernoulli distribution. The model outputs the parameter
If you show an ad to
The "Loss Function" used in most neural networks is derived directly from the likelihood of a Bernoulli distribution. Minimizing this loss is equivalent to finding the p that best fits your binary data.
| Feature | Bernoulli | Binomial |
|---|---|---|
| Number of Trials | ||
| Outcomes |
|
|
| Mean | ||
| Variance |
The Binomial distribution covers discrete successes. But what if we are counting the number of events happening over a fixed interval of time or space? For that, we turn to the Poisson distribution.