Residual Plot
Purpose
Diagnose regression model assumptions by visualizing the difference between predicted and actual values. Critical for validating linear regression.
Analysis Type
Bivariate (model evaluation)
What to Look For
1. Random Pattern (π GOOD)
- Residuals scattered randomly around zero line
- No clear pattern or structure
- Indicates model assumptions are met
2. Linearity
- Linear:
- Residuals look like random noise around zero with no structure (no pattern).
- The spread is fairly constant across predictions (no funnel shape).
- Points are balanced evenly above and below the center line (zero).
- Non-Linear:
- Residuals show a pattern: curve, wave, or systematic drift (e.g., positive then negative).
- βFunnelβ shape (variance increases with prediction) often signals missing transformation or non-linear effects.
3. Non-linearity (π« BAD)
- U-shape or inverted U-shape: Relationship is non-linear
- Solution: Add polynomial features or transform variables
- Pattern indicates model is missing non-linear terms
4. Heteroscedasticity** (π« BAD)
- Fan shape: Variance increases with predicted values
- Funnel shape: Non-constant variance
- Solution: Transform target variable (log, sqrt) or use weighted regression
5. Outliers
- Points far from zero line
- High residual values indicate poor predictions
- May need investigation or removal
6. Systematic Bias
- Most residuals above or below zero
- Indicates systematic over/under-prediction
7. Ideal Pattern
- Residuals centered at zero
- Equal spread across all predicted values
- No patterns, trends, or curves
Code Example
from sklearn.linear_model import LinearRegression
# Fit model
X = df[['feature1', 'feature2']]
y = df['target']
model = LinearRegression()
model.fit(X, y)
# Calculate residuals
y_pred = model.predict(X)
residuals = y - y_pred
# Residual plot
plt.scatter(y_pred, residuals, alpha=0.6)
plt.axhline(y=0, color='red', linestyle='--', linewidth=2)
plt.title("Residual Plot")
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.show()
# Seaborn version
sns.residplot(x=y_pred, y=y, lowess=True, line_kws={'color': 'red'})
plt.title("Residual Plot with LOWESS")
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.show()
Pro Tip
Use sns.residplot(..., lowess=True, line_kws={'color': 'red'}) to add a LOWESS smoothing line. If this red line is flat and horizontal at zero, your model is well-specified. If it curves or shows a trend, you have non-linearity issues. Also create a histogram of residuals - they should be normally distributed and centered at zero.

Documentation