Histogram Plot

Purpose

Visualize the frequency distribution of a single continuous variable to understand its shape, central tendency, and spread.

Histogram = discrete distribution view
KDE = smooth distribution view

Analysis Type

Univariate

Documentation

What to Look For

1. Distribution Shape

Symmetric: Data evenly distributed around the center (potential normal distribution)
Skewed Right (Positive Skew): Long tail on the right side
Skewed Left (Negative Skew): Long tail on the left side
Bimodal/Multimodal: Multiple peaks indicating subgroups

2. Normality Check

Bell-shaped curve suggests normal distribution
Important for many ML algorithms that assume normality

3. Outliers

Values far from the main distribution
Appear as isolated bars at extremes

4. Range and Spread

Width of distribution indicates variability
Narrow distribution = low variance
Wide distribution = high variance

5. Linearity

Linear:
- Roughly symmetric or bell-shaped curve" (Normal) shape often pairs well with linear models (after standardization).
- If both feature and target look fairly “well-behaved” (not extreme skew), linear relationships are easier to spot.
Non-Linear:
- A "Heavy Tail" or "Skewed" histogram usually indicates exponential or multiplicative behavior,thus the feature will have a non-linear relationship with other variables. They often suggest transformations that linearize relationships.
- Multimodal histograms (two peaks) can signal hidden groups, which often creates non-linear patterns in scatter plots.

Code Example

# Basic histogram
plt.hist(df['column'], bins=30, edgecolor='black', alpha=0.7)
plt.title("Distribution of Variable")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

# Seaborn with KDE overlay
sns.histplot(df['column'], kde=True, bins=30)
plt.title("Distribution with KDE")
plt.show()

Pro Tip

Use kde=True in sns.histplot() to overlay a kernel density estimate curve, which smooths the distribution and makes patterns easier to see. If the KDE curve is bell-shaped and symmetric, your data is likely normally distributed.