What is a Logit?

A logit is the natural logarithm of the odds ratio. It's a transformation that maps a probability value from [0, 1] to the entire real number line (-∞, +∞).

Mathematical Definition

\begin{array}{r} Logit (p) = \log (\frac{p}{1 - p}) = z \end{array}

Where:

Probability p: The likelihood of an event happening (e.g., 0.8 for an 80% chance), where $p \in [0, 1]$
Odds: The ratio of the probability of an event happening to it not happening $Odds = \frac{p}{1 - p}$
Logit (Log-odds): The natural logarithm of the odds, where $z \in (- \infty, + \infty)$

Analogy: Horse Betting

To understand why this transformation is useful, think about horse betting.

In horse betting, there's a commonly used term called odds. When we say the odds of horse number 5 winning are 3/8, we're actually saying that after 11 races, the horse will win 3 of them and lose 8.

Mathematically, odds are expressed as:

odds = \frac{p (x)}{1 - p (x)}

The odds can take any positive value: $[0, + \infty)$ . However, if we take the log of the odds, the range changes to $(- \infty, + \infty)$ . This is called the logit function.

❓Why is this useful?
Linear models (like neural networks before the final activation) produce outputs on the entire real number line $(- \infty, + \infty)$ . By predicting logits instead of probabilities directly, the model doesn't have to worry about constraining its output to be between 0 and 1. We can then convert the logit back to a probability using the Sigmoid function.

Deriving the Sigmoid Function from Logit

If we set the logit to a variable $z$ (the raw output of a model), we can solve for the probability $p$ :

\begin{aligned} z & = \log (\frac{p}{1 - p}) \\ e^{z} & = \frac{p}{1 - p} & (exponentiate both sides) \\ e^{z} (1 - p) & = p & (multiply both sides by (1 - p)) \\ e^{z} - e^{z} \cdot p & = p \\ e^{z} & = p + e^{z} \cdot p \\ e^{z} & = p (1 + e^{z}) \\ p & = \frac{e^{z}}{1 + e^{z}} \end{aligned}

By dividing the numerator and denominator by $e^{z}$ , we get the familiar sigmoid formula:

p = \frac{e^{z} / e^{z}}{(1 + e^{z}) / e^{z}} = \frac{1}{1 / e^{z} + 1} = \frac{1}{1 + e^{- z}}

This shows that the Sigmoid function is the inverse of the Logit function. It converts a logit back into a probability.

Context in Machine Learning

In Neural Networks: The input $z$ is the weighted sum of the last layer, usually represented as:
$z = w_{1} x_{1} + w_{2} x_{2} + \dots + w_{n} x_{n} + b$
In Softmax: The input is a vector of logits for multiple classes:
$z = [\begin{matrix} z_{1} \\ z_{2} \\ ⋮ \\ z_{n} \end{matrix}] = [\begin{matrix} logit (P (Y = {class}_{1} ∣ x)) \\ logit (P (Y = {class}_{2} ∣ x)) \\ ⋮ \\ logit (P (Y = {class}_{n} ∣ x)) \end{matrix}]$