Kolmogorov-Smirnov Test (K-S Test)

Interpretation

⚠️ Note: K-S test is sensitive to sample size. With very large samples, even minor deviations may be flagged as significant.

Python Example

# Example: Test for Normality using Kolmogorov-Smirnov Test (scipy.stats.kstest)
from scipy.stats import kstest
import numpy as np
import matplotlib.pyplot as plt

# Generate sample data: normal and non-normal
data_normal = np.random.normal(loc=0, scale=1, size=10000)
data_non_normal = np.random.exponential(scale=2, size=10000)

# Plot histograms for visual inspection
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].hist(data_normal, bins=30, color='skyblue', edgecolor='black')
axes[0].set_title('Normal Data Histogram')
axes[1].hist(data_non_normal, bins=30, color='salmon', edgecolor='black')
axes[1].set_title('Non-Normal Data Histogram')
plt.tight_layout()
plt.show()

# Kolmogorov-Smirnov test for normality
ks_stat_norm, ks_p_norm = kstest(data_normal, 'norm', args=(np.mean(data_normal), np.std(data_normal)))
ks_stat_non_norm, ks_p_non_norm = kstest(data_non_normal, 'norm', args=(np.mean(data_non_normal), np.std(data_non_normal)))

print(f"Normal Data: KS Statistic={ks_stat_norm:.4f}, p-value={ks_p_norm:.4f}")
print(f"Non-Normal Data: KS Statistic={ks_stat_non_norm:.4f}, p-value={ks_p_non_norm:.4f}")

if ks_p_norm > 0.05:
    print("Normal Data: Likely Gaussian (fail to reject H0)")
else:
    print("Normal Data: Not Gaussian (reject H0)")

if ks_p_non_norm > 0.05:
    print("Non-Normal Data: Likely Gaussian (fail to reject H0)")
else:
    print("Non-Normal Data: Not Gaussian (reject H0)")

Output
ML_AI/_feature_engineering/images/kolmogorov-1.png
Normal Data: KS Statistic=0.0059, p-value=0.8709
Non-Normal Data: KS Statistic=0.1572, p-value=0.0000
Normal Data: Likely Gaussian (fail to reject H0)
Non-Normal Data: Not Gaussian (reject H0)