I. Entropy 》II. Joint Entropy 》III. Conditional Entropy 》IV. Mutual Information 》V. Information Gain
I. Entropy (Shannon Entropy)
In information theory, Shannon Entropy (
In other words, Shannon Entropy is the fundamental measure of randomness or impurity in your data. In Data Science, we use it to understand how "spread out" or "mixed" a distribution is.
It forms the conceptual foundation for Mutual Information, which measures how much of this entropy is reduced when we know another variable.
- High Entropy
More Randomness Heterogeneous. The data is unpredictable (e.g., a fair coin toss). - Low Entropy
Less Randomness Homogeneous. The data is predictable
★ The Formula
For a discrete random variable
Where
- Probability
: This is the likelihood of a specific outcome occurring. - Surprise/Information Content
: - Information is inversely proportional to probability. If an event is 100% certain, it provides zero information when it happens.
- If an event is rare, its occurrence is very "surprising" and contains high information.
- The Negative Sign (
): Since probabilities are between 0 and 1, their logarithms are negative. The leading negative sign ensures the final entropy value is positive. - The Base (
): - If
, entropy is measured in bits. - If
(natural log), it is measured in nats.
- If
★ How to Interpret "Entropy" Value?
Think of this as the "Chaos Level" of your target variable before you know anything else.
- High Entropy (close to 1.0 bit): The target is unpredictable (e.g., a 50/50 split by tossing fair coin).
- Low Entropy (close to 0 bits): The target is highly predictable even without extra features (e.g., 99% of patients have no fever).
- Goal:
- To quantify the level of uncertainty or "surprise" associated with a random variable.
- You want features that can significantly reduce this number.
★ The Weather Example
Imagine you live in a place where the weather is almost always Sunny.
Dataset: 9 days of Sun, 1 day of Rain.
The Intuition: The entropy 0.469 indicates the weather is predictable. If you tell someone "It's sunny today," they aren't very surprised. However, notice that the "Rain" part of the math (
Why? Because the rare event (Rain) carries more "surprise" or "information value" when it actually happens!