I. Entropy 》II. Joint Entropy 》III. Conditional Entropy 》IV. Mutual Information 》V. Information Gain

📊 The Math Behind Entropy - Example

We will use a Healthcare Scenario: Predicting if a patient has a Fever (Y) based on the presence of a Cough (X).

Patient Cough (X) 😮‍💨 **Fever (Y) 🤒 **
1-4 Yes Yes
5-6 Yes No
7 No Yes
8-10 No No

1. Summary Statistics:

2. Shannon Entropy

H(Fever) H(Y)=P(y)log2P(y)=[0.5log2(0.5)+0.5log2(0.5)]=1.0 bit H(Cough) H(X)=P(x)log2P(x)=[0.4log2(0.4)+0.6log2(0.6)]=0.971 bit
Break down of metric from results

  • H(Fever): A value of 1.0 indicates a perfect 50/50 split—maximum uncertainty. You are essentially flipping a coin to guess if a patient has a fever.
  • H(Cough): A value of 0.971 indicates High uncertainty. The distribution is slightly biased (60/40), so there is slightly less "surprise" than a 50/50 split.

3. Joint Entropy H(X,Y)

Joint Entropy is a measure of the uncertainty associated with a set of variables (Cough and Fever together).

H(X,Y)=P(x,y)log2P(x,y)H(X,Y)=[0.4log20.4+0.2log20.2+0.1log20.1+0.3log20.3]1.846 bits
Break down of metric from results

  • If H(X,Y) is close to the sum of individual entropies (1.0+0.97=1.97), the variables are mostly independent. At 1.846, they share some overlap, but there is still significant independent "noise" in the system.

4. Conditional Entropy

H(Y|X)=H(X,Y)H(X)H(Y|X)=1.84650.971H(Y|X)=0.8755Similarly H(X|Y)0.846 bits
Break down of metric from results

  • This is the "noise" that remains. After checking for a cough, the uncertainty about the fever dropped from 1.0 to 0.875.
  • Strong predictor is the value is close to 0; it is useless if it stays near 1.

5. Mutual Information and Information Gain

I(X;Y)=H(Y)H(YX)=10.8755=0.1245bits
Interpretation of result

This is small.

Meaning:

  • Cough provides some information
  • But it’s weak predictor of Fever

Summary !

★ How They Connect?: The Information Flow

1. Joint Probability is the Foundation
2. Joint Entropy Measures Total Uncertainty
3. Chain Rule of Entropy Gives Conditional Entropy
4. Mutual Information = Reduction in Uncertainty

Mutual Information is The amount by which joint entropy is smaller than the sum of independent entropies.