Scatter Plot

Purpose

Visualize the relationship between two continuous variables to identify patterns, correlations, trends, and outliers.

Analysis Type

Bivariate

What to Look For

1. Linearity
2. Correlation Strength
3. Direction
4. Outliers
5. Heteroscedasticity

ML_AI/_feature_engineering/images/heteroscedasticity-1.png

Code Example

# Basic scatter plot
plt.scatter(df['x_var'], df['y_var'], alpha=0.6)
plt.title("Scatter Plot: X vs Y")
plt.xlabel("X Variable")
plt.ylabel("Y Variable")
plt.show()

# Seaborn scatter with regression line
sns.regplot(x='x_var', y='y_var', data=df, lowess=True, line_kws={"color": "red"})
plt.title("Scatter Plot with Linear Fit")
plt.show()
Pro Tip - Linearity Detection

Use sns.regplot(x='x_var', y='y_var', data=df, lowess=True, line_kws={"color": "red"}) to fit a LOWESS (locally weighted) curve instead of a straight line. If the red curve is straight, the relationship is linear. If it's curved, the relationship is non-linear and you may need polynomial features or transformations.

ML_AI/_feature_engineering/images/scatter-1.png

Documentation