I. EntropyII. Joint EntropyIII. Conditional EntropyIV. Mutual InformationV. Information Gain


II. Joint Entropy H(X,Y)

Joint entropy is an information theory measure of the total uncertainty, randomness, or average information associated with a set of multiple random variables (e.g., X and Y).

Simplest Term: Joint Entropy is entropy of the pair (X, Y)

★ The Formula

H(X,Y)=xXyYP(x,y)log2P(x,y)

where x and y are particular values of X and Y, respectively, P(x,y) is joint probability.

★ How to Interpret "Joint Entropy" Value?

1. The "Redundancy" Check

Joint Entropy is the most direct way to see if two features are providing unique information or are redundant.

2. The Relationship to Individual Entropies

The most useful thing about Joint Entropy is how it compares to the individual entropies H(X) and H(Y):

H(X,Y)H(X)+H(Y)

★ Example: Commuting to Work

Imagine we track 100 days of commuting. We want to know how unpredictable the combination of "Weather" and "Traffic"

Weather (Y) Traffic (X) Probability P(x,y)
Sunny Clear 0.60 (60%)
Sunny Jam 0.10 (10%)
Rainy Clear 0.05 (5%)
Rainy Jam 0.25 (25%)

The Calculation

The formula for Joint Entropy is:

H(X,Y)=xXyYP(x,y)log2P(x,y)

Let’s plug in our probabilities:

  1. Sunny & Clear: H(Clear,Sunny)=(0.60×log20.60)(0.60×0.737)0.442
  2. Sunny & Jam: H(Jam,Sunny)=(0.10×log20.10)(0.10×3.322)0.332
  3. Rainy & Clear: H(Clear,Rainy)=(0.05×log20.05)(0.05×4.322)0.216
  4. Rainy & Jam: H(Jam,Rainy)=(0.25×log20.25)(0.25×2.000)0.500

Total Joint Entropy: 0.442+0.332+0.216+0.500=1.49 bits

Interpretation
  1. System Complexity: This number represents the total "information content" of the Weather-Traffic pair. If you wanted to send a text message every day reporting both the weather and the traffic, you would need about 1.49 bits on average to encode that message efficiently.
  2. Comparison to Individual Entropy:
    • If you calculated the entropy of Weather alone, it might be 0.8 bits.
    • If you calculated the entropy of Traffic alone, it might be 0.9 bits.
    • The Joint Entropy (1.49) is less than the sum of the two (0.8+0.9=1.7) because Weather and Traffic are related. Knowing one tells you something about the other, so the "combined chaos" is lower than if they were totally independent.
Visualizing the "1.49 Bits"
Imagine you are texting your boss every morning. To save on data, you decide:
- Since it's usually Sunny & Clear (60%), you just text a single `0`. (1 bit)
- For the rarer events, you send longer codes like `10`, `110`, or `111`. (2 or 3 bits)
On most days, you only send 1 bit. On some days, you send 3 bits. The "Average" length of your messages over 100 days would be exactly 1.49 bits.