Distribution Plot (KDE Plot)

Purpose

Show smooth continuous probability density of a variable using kernel density estimation. Alternative to histograms with smoother appearance.

Histogram = discrete distribution view
KDE = smooth distribution view

Analysis Type

Univariate

What to Look For

1. Distribution Shape
2. Peaks (Modes)
3. Symmetry
4. Tail Behavior
5. Comparing Distributions
6. Linearity
7. Gaussian distribution

Histograms with KDE's Interpretation

Code Example

# Example: Distribution Plot (KDE Plot) using seaborn's tips dataset
import seaborn as sns
import matplotlib.pyplot as plt

# Load sample dataset
tips = sns.load_dataset('tips')

# KDE plot for total_bill
sns.kdeplot(tips['total_bill'])
plt.title("Kernel Density Estimate of Total Bill")
plt.xlabel("Total Bill")
plt.ylabel("Density")
plt.show()

# Multiple KDE plots for comparison (by sex)
sns.kdeplot(tips[tips['sex']=='Male']['total_bill'], label='Male')
sns.kdeplot(tips[tips['sex']=='Female']['total_bill'], label='Female')
plt.title("Distribution Comparison by Sex")
plt.xlabel("Total Bill")
plt.legend()
plt.show()

# KDE with shading
sns.kdeplot(tips['total_bill'], fill=True, alpha=0.5)
plt.title("KDE with Shading (Total Bill)")
plt.xlabel("Total Bill")
plt.show()

# Bivariate KDE: total_bill vs tip
sns.kdeplot(x='total_bill', y='tip', data=tips, fill=True, cmap='Blues')
plt.title("Bivariate KDE: Total Bill vs Tip")
plt.xlabel("Total Bill")
plt.ylabel("Tip")
plt.show()

ML_AI/_feature_engineering/images/kde-1.pngML_AI/_feature_engineering/images/kde-2.png
ML_AI/_feature_engineering/images/kde-3.pngML_AI/_feature_engineering/images/kde-4.png

Pro Tip

Use bw_adjust parameter to control smoothness: sns.kdeplot(df['var'], bw_adjust=0.5) for less smooth (more detail), or bw_adjust=2 for smoother curves. Default is 1. For comparing groups, use different colors with transparency: sns.kdeplot(data=df, x='value', hue='group', fill=True, alpha=0.5, common_norm=False) to see overlapping distributions clearly. Set cut=0 to limit KDE to actual data range instead of extending beyond.

Documentation