Residual Plot

Purpose

Diagnose regression model assumptions by visualizing the difference between predicted and actual values. Critical for validating linear regression.

Analysis Type

Bivariate (model evaluation)

What to Look For

1. Random Pattern (πŸ‘ GOOD)
2. Linearity
3. Non-linearity (🚫 BAD)
4. Heteroscedasticity** (🚫 BAD)
5. Outliers
6. Systematic Bias
7. Ideal Pattern

Code Example

from sklearn.linear_model import LinearRegression

# Fit model
X = df[['feature1', 'feature2']]
y = df['target']
model = LinearRegression()
model.fit(X, y)

# Calculate residuals
y_pred = model.predict(X)
residuals = y - y_pred

# Residual plot
plt.scatter(y_pred, residuals, alpha=0.6)
plt.axhline(y=0, color='red', linestyle='--', linewidth=2)
plt.title("Residual Plot")
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.show()

# Seaborn version
sns.residplot(x=y_pred, y=y, lowess=True, line_kws={'color': 'red'})
plt.title("Residual Plot with LOWESS")
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.show()
Pro Tip

Use sns.residplot(..., lowess=True, line_kws={'color': 'red'}) to add a LOWESS smoothing line. If this red line is flat and horizontal at zero, your model is well-specified. If it curves or shows a trend, you have non-linearity issues. Also create a histogram of residuals - they should be normally distributed and centered at zero.

ML_AI/_feature_engineering/images/residual-1.png

Documentation