Gaussian Distribution (Normal Distribution)

The Normal distribution, also called the Gaussian distribution, is the most important distribution in statistics and machine learning.

We all would have heard, during our yearly performance review that management has adjusted your rating to fit in a belt curve. Below is that bell curve. 🀣 β€”this is the shape of the normal distribution!

Learning/images/bell-1.png|600

1. What is the Normal Distribution?

A normal distribution is a continuous probability distribution that is symmetric about its mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, the normal distribution will appear as a bell curve.

2. Key Properties

3. Visualizing the Parameters

4. The Empirical Rule (68-95-99.7 Rule)

This rule tells us how data is distributed in a normal distribution:

5. Standard Normal Distribution

The standard normal distribution is a special case where:

The PDF becomes:

f(z)=12Ο€eβˆ’z22

where z=xβˆ’ΞΌΟƒ is called the z-score.

ND-4.png|600

6. Example: Heights of Students

Suppose the heights of students in a school are normally distributed with a mean (ΞΌ) of 170 cm and a standard deviation (Οƒ) of 10 cm.

(a) What is the probability that a randomly chosen student is taller than 185 cm?

Step 1: Convert to z-score

z=xβˆ’ΞΌΟƒ=185βˆ’17010=1.5

Step 2: Find the probability using the standard normal table

Final Answer:
There is a 6.68% chance that a randomly chosen student is taller than 185 cm.

(b) What percentage of students are between 160 cm and 180 cm?

Step 1: Convert both values to z-scores

Step 2: Find probabilities from the z-table

Step 3: Subtract to find the probability between

P(160<X<180)=P(Z<1)βˆ’P(Z<βˆ’1)=0.8413βˆ’0.1587=0.6826

Final Answer:
About 68.26% of students are between 160 cm and 180 cm tall.

Why is the Normal Distribution Important?

  • Many natural phenomena (heights, test scores, measurement errors) follow a normal distribution.
  • The Central Limit Theorem states that the sum (or average) of many independent random variables tends toward a normal distribution, even if the original variables themselves are not normally distributed.
  • Used in hypothesis testing, confidence intervals, and many machine learning algorithms.