I. Entropy 》II. Joint Entropy 》III. Conditional Entropy 》IV. Mutual Information 》V. Information Gain
📊 The Math Behind Entropy - Example
We will use a Healthcare Scenario: Predicting if a patient has a Fever (
| Patient | Cough (X) 😮💨 | **Fever (Y) 🤒 ** |
|---|---|---|
| 1-4 | Yes | Yes |
| 5-6 | Yes | No |
| 7 | No | Yes |
| 8-10 | No | No |
1. Summary Statistics:
- Total Sample Space (
) = . - For Cough (
) - For Fever (
) - Join Probabilities
2. Shannon Entropy
- Entropy of Fever H(Y)
- Entropy of Cough H(X)
Break down of metric from results
: A value of 1.0 indicates a perfect 50/50 split—maximum uncertainty. You are essentially flipping a coin to guess if a patient has a fever. : A value of 0.971 indicates High uncertainty. The distribution is slightly biased (60/40), so there is slightly less "surprise" than a 50/50 split.
3. Joint Entropy
Joint Entropy is a measure of the uncertainty associated with a set of variables (Cough and Fever together).
Break down of metric from results
- If
is close to the sum of individual entropies ( ), the variables are mostly independent. At 1.846, they share some overlap, but there is still significant independent "noise" in the system.
4. Conditional Entropy
Break down of metric from results
- This is the "noise" that remains. After checking for a cough, the uncertainty about the fever dropped from 1.0 to 0.875.
- Strong predictor is the value is close to 0; it is useless if it stays near 1.
5. Mutual Information and Information Gain
Interpretation of result
This is small.
Meaning:
- Cough provides some information
- But it’s weak predictor of Fever
Summary !
★ How They Connect?: The Information Flow
1. Joint Probability is the Foundation
- From the patient counts, we compute:
- These define the full probabilistic structure of the system.
- Everything — entropy, conditional entropy, mutual information — is derived from these probabilities.
2. Joint Entropy Measures Total Uncertainty
- Using
- This measures the total uncertainty of the combined system (Cough + Fever).
3. Chain Rule of Entropy Gives Conditional Entropy
- From information theory:
- Rearranging:
- So we subtract
from the joint entropy because of the chain rule, not arbitrarily.
4. Mutual Information = Reduction in Uncertainty
- Mutual Information is defined as:
- Using the chain rule substitution:
- Rearranging:
Mutual Information is The amount by which joint entropy is smaller than the sum of independent entropies.