I. Entropy 》II. Joint Entropy 》III. Conditional Entropy 》IV. Mutual Information 》V. Information Gain
II. Joint Entropy
Joint entropy is an information theory measure of the total uncertainty, randomness, or average information associated with a set of multiple random variables (e.g.,
- Joint Entropy measures the total uncertainty of the entire system. It considers
and as a single combined unit. - Symmetric. It treats
and as equal partners. - It tells you how much the combination of two variables fluctuates.
- Symmetric. It treats
Simplest Term: Joint Entropy is entropy of the pair (X, Y)
★ The Formula
where
★ How to Interpret "Joint Entropy" Value?
1. The "Redundancy" Check
Joint Entropy is the most direct way to see if two features are providing unique information or are redundant.
- If Joint Entropy is High:
and are likely independent. They both bring unique, messy information to the table. Knowing one tells you almost nothing about the other.
- If Joint Entropy is Low:
and are highly synchronized or redundant. The "total mess" of the two together isn't much higher than the mess of just one of them.
2. The Relationship to Individual Entropies
The most useful thing about Joint Entropy is how it compares to the individual entropies
- Independence:
- If
, the variables are perfectly independent. Adding the second variable doubles the unique information you must track.
- If
- Dependence:
- If
, there is an overlap. Some information is shared, meaning the "sum of the parts" is greater than the whole.
- If
★ Example: Commuting to Work
Imagine we track 100 days of commuting. We want to know how unpredictable the combination of "Weather" and "Traffic"
| Weather (Y) | Traffic (X) | Probability P(x,y) |
|---|---|---|
| Sunny | Clear | 0.60 (60%) |
| Sunny | Jam | 0.10 (10%) |
| Rainy | Clear | 0.05 (5%) |
| Rainy | Jam | 0.25 (25%) |
The Calculation
The formula for Joint Entropy is:
Let’s plug in our probabilities:
- Sunny & Clear:
- Sunny & Jam:
- Rainy & Clear:
- Rainy & Jam:
Total Joint Entropy:
Interpretation
- System Complexity: This number represents the total "information content" of the Weather-Traffic pair. If you wanted to send a text message every day reporting both the weather and the traffic, you would need about 1.49 bits on average to encode that message efficiently.
- Comparison to Individual Entropy:
- If you calculated the entropy of Weather alone, it might be 0.8 bits.
- If you calculated the entropy of Traffic alone, it might be 0.9 bits.
- The Joint Entropy (1.49) is less than the sum of the two (
) because Weather and Traffic are related. Knowing one tells you something about the other, so the "combined chaos" is lower than if they were totally independent.
Visualizing the "1.49 Bits" Imagine you are texting your boss every morning. To save on data, you decide: - Since it's usually Sunny & Clear (60%), you just text a single `0`. (1 bit) - For the rarer events, you send longer codes like `10`, `110`, or `111`. (2 or 3 bits) On most days, you only send 1 bit. On some days, you send 3 bits. The "Average" length of your messages over 100 days would be exactly 1.49 bits.