Heatmap (Correlation Matrix)

I. Purpose

A heatmap shows correlations (correlation coefficients) between numerical variables using a color-coded matrix. Essential for identifying multicollinearity and feature relationships. Correlation measures linear relationship only.

⚠️ A non-linear relationship Y=X2 may show zero correlation even though relationship exists.

II. Analysis Type

Multivariate

III. What to Look For

1. Correlation Strength
2. Multicollinearity
3. Target Variable Relationships
4. Redundant Features
5. Feature Groups
6. Linearity

IV. Common Patterns and Their Meanings

Pattern Visual Cue Interpretation Action
Strong positive corr Dark red/hot color, value near +1 Linear relationship, features move together Use for feature selection, beware multicollinearity
Strong negative corr Dark blue/cold color, value near -1 Linear relationship, features move oppositely Use for feature selection, beware multicollinearity
No correlation Neutral color, value near 0 No linear relationship May be non-linear, check scatter plot
Multicollinearity Multiple strong correlations among predictors Predictors highly related Remove or combine correlated features
Feature clusters Blocks of similar color Groups of related features Consider dimensionality reduction
Redundant features Value near 1.0 or -1.0 Features nearly identical or inverse Keep only one from pair
Target relationships Strong color in target row/col Feature important for prediction Use for feature selection

V. Advantages of Heatmaps

VII. Disadvantages

VIII. Code Example

# Basic correlation heatmap
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0)
plt.title("Correlation Heatmap")
plt.show()

# With better formatting
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, fmt='.2f', cmap='coolwarm', 
            square=True, linewidths=0.5, center=0,
            cbar_kws={"shrink": 0.8})
plt.title("Feature Correlation Matrix")
plt.tight_layout()
plt.show()

VI. Best Practices for Effective Heatmaps

sns.heatmap(df.corr(), cmap='coolwarm', center=0)
import numpy as np
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, cmap='coolwarm', center=0)
sns.heatmap(df.corr(), annot=True, fmt='.2f')
sns.heatmap(df.corr(), square=True)

ML_AI/_feature_engineering/images/heatmap-1.png

VIII. Documentation & External References