Q-Q Plot (Quantile-Quantile Plot)

Purpose

Check if a variable follows a normal distribution by comparing its quantiles to theoretical normal distribution quantiles.

Analysis Type

Univariate

What to Look For

1. Normal Distribution (GOOD)
2. Heavy Tails
3. Light Tails
4. Right Skew (Positive Skew)
5. Left Skew (Negative Skew)
6. S-Shape:
7. Gaussian distribution

Code Example

from scipy import stats
import statsmodels.api as sm

# Load the longley dataset
data = sm.datasets.longley.load_pandas()
data = data.exog  # Use explanatory variables

# Q-Q plot
stats.probplot(data['GNP'], dist="norm", plot=plt)
plt.title("Q-Q Plot (with scipy)")
plt.show()

# Using statsmodels
sm.qqplot(data['GNP'], line='s')
plt.title("Q-Q Plot (with statsmodels)")
plt.show()
Pro Tip

Create Q-Q plots before and after transformations to verify improvement: stats.probplot(df['original'], plot=plt) vs stats.probplot(np.log(df['original']), plot=plt). If points deviate from the line at the ends (tails), try log transformation for right-skewed data, square transformation for left-skewed data, or Box-Cox transformation for general non-normality. Use with Shapiro-Wilk test for formal normality testing.

ML_AI/_feature_engineering/images/qq-1.pngML_AI/_feature_engineering/images/qq-2.png

Documentation