Distribution Plot (KDE Plot)
Purpose
Show smooth continuous probability density of a variable using kernel density estimation. Alternative to histograms with smoother appearance.
Histogram = discrete distribution view
KDE = smooth distribution view
Analysis Type
Univariate
What to Look For
1. Distribution Shape
- Smooth curve instead of histogram bars
- Easier to see overall shape
- Better for presentations
2. Peaks (Modes)
- One peak: Unimodal
- Two peaks: Bimodal (two subgroups)
- Multiple peaks: Multimodal, which often signals non-linear hidden subgroups.
3. Symmetry
- Symmetric peaks suggest normal distribution
- Asymmetric peaks indicate skewness
4. Tail Behavior
- Long tails indicate extreme values
- Short tails indicate concentrated data
5. Comparing Distributions
- Overlay multiple KDEs to compare groups
- Look for separation between groups
- Useful for understanding class distributions
6. Linearity
- Linear:
- In 2D KDE (feature vs target), density contours look like tilted ellipses (like a stretched oval).
- In 1D KDE, a single, smooth, unimodal shape often aligns with simpler (often more linear) relationships.
- Non-Linear:
- In 2D KDE, contours bend (banana shape), show multiple lobes, or curve around → strong non-linear structure.
- Multiple peaks (Bimodal/Multimodal) or a very "long tail" stretching far to one side.
7. Gaussian distribution
Histograms with KDE's Interpretation
- Bell-shaped curve = likely normal
- Skewed left/right = not normal
- Multiple peaks = multimodal distribution, not normal
Code Example
# Example: Distribution Plot (KDE Plot) using seaborn's tips dataset
import seaborn as sns
import matplotlib.pyplot as plt
# Load sample dataset
tips = sns.load_dataset('tips')
# KDE plot for total_bill
sns.kdeplot(tips['total_bill'])
plt.title("Kernel Density Estimate of Total Bill")
plt.xlabel("Total Bill")
plt.ylabel("Density")
plt.show()
# Multiple KDE plots for comparison (by sex)
sns.kdeplot(tips[tips['sex']=='Male']['total_bill'], label='Male')
sns.kdeplot(tips[tips['sex']=='Female']['total_bill'], label='Female')
plt.title("Distribution Comparison by Sex")
plt.xlabel("Total Bill")
plt.legend()
plt.show()
# KDE with shading
sns.kdeplot(tips['total_bill'], fill=True, alpha=0.5)
plt.title("KDE with Shading (Total Bill)")
plt.xlabel("Total Bill")
plt.show()
# Bivariate KDE: total_bill vs tip
sns.kdeplot(x='total_bill', y='tip', data=tips, fill=True, cmap='Blues')
plt.title("Bivariate KDE: Total Bill vs Tip")
plt.xlabel("Total Bill")
plt.ylabel("Tip")
plt.show()




Pro Tip
Use bw_adjust parameter to control smoothness: sns.kdeplot(df['var'], bw_adjust=0.5) for less smooth (more detail), or bw_adjust=2 for smoother curves. Default is 1. For comparing groups, use different colors with transparency: sns.kdeplot(data=df, x='value', hue='group', fill=True, alpha=0.5, common_norm=False) to see overlapping distributions clearly. Set cut=0 to limit KDE to actual data range instead of extending beyond.
Documentation