I. EntropyII. Joint EntropyIII. Conditional EntropyIV. Mutual InformationV. Information Gain


I. Entropy (Shannon Entropy)

In information theory, Shannon Entropy (H) is the mathematical measure of the average uncertainty, surprise, or information contained within a random variable.

In other words, Shannon Entropy is the fundamental measure of randomness or impurity in your data. In Data Science, we use it to understand how "spread out" or "mixed" a distribution is.

It forms the conceptual foundation for Mutual Information, which measures how much of this entropy is reduced when we know another variable.

Example

  • High Entropy More Randomness Heterogeneous. The data is unpredictable (e.g., a fair coin toss).
  • Low Entropy Less Randomness Homogeneous. The data is predictable

★ The Formula

For a discrete random variable X with possible outcomes {x1,...,xn}, the formula is:

H(X)=i=1nP(xi)logbP(xi)

Where

  1. Probability P(xi): This is the likelihood of a specific outcome xi occurring.
  2. Surprise/Information Content logbP(xi):
    • Information is inversely proportional to probability. If an event is 100% certain, it provides zero information when it happens.
    • If an event is rare, its occurrence is very "surprising" and contains high information.
  3. The Negative Sign (): Since probabilities are between 0 and 1, their logarithms are negative. The leading negative sign ensures the final entropy value is positive.
  4. The Base (b):
    • If b=2, entropy is measured in bits.
    • If b=e (natural log), it is measured in nats.

★ How to Interpret "Entropy" Value?

Think of this as the "Chaos Level" of your target variable before you know anything else.

★ The Weather Example

Imagine you live in a place where the weather is almost always Sunny.
Dataset: 9 days of Sun, 1 day of Rain.

H(Weather)=(0.9log20.9)(0.1log20.1)(0.9×0.152)(0.1×3.32)0.137+0.332=0.469 bits

The Intuition: The entropy 0.469 indicates the weather is predictable. If you tell someone "It's sunny today," they aren't very surprised. However, notice that the "Rain" part of the math (0.332) actually contributes more to the total entropy than the "Sun" part (0.137).
Why? Because the rare event (Rain) carries more "surprise" or "information value" when it actually happens!