Pair Plot (Scatter Plot Matrix)

Purpose

Visualize pairwise relationships between all numerical variables in a dataset simultaneously. Essential for initial exploratory data analysis. Used for Collinearity and Interaction detection.

Analysis Type

Multivariate

What to Look For

1. Correlation Patterns
2. Diagonal Plots
3. Redundant Features
4. Feature Selection
5. Data Quality
6. Class Separation
7. Linearity

Code Example

# Basic pair plot
sns.pairplot(df)
plt.show()

# Pair plot with color by target variable
sns.pairplot(df, hue='target_variable', diag_kind='kde')
plt.show()

# Pair plot for specific columns
sns.pairplot(df[['var1', 'var2', 'var3', 'target']], hue='target')
plt.show()
Pro Tip

Use diag_kind='kde' instead of default histograms for smoother distribution visualization: sns.pairplot(df, diag_kind='kde'). When working with large datasets, use plot_kws={'alpha': 0.5, 's': 10} to make points smaller and transparent. Add corner=True to show only the lower triangle and save space: sns.pairplot(df, corner=True).

ML_AI/_feature_engineering/images/pairplot-1.png

Documentation