Dispersion Ratio

If I told you the average temperature of a lake was 20°C, you might jump in—but if the "dispersion" is high, one end might be frozen and the other boiling!

What is Dispersion Ratio?

Dispersion Ratio (a.k.a Index of Dispersion or Variance-to-Mean Ratio) is a filter-based feature selection technique that measures how spread out or variable a feature’s values are relative to its mean.

The idea is simple — it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.
A feature that doesn’t vary much (almost constant) can’t help your model make distinctions.

🧮 Mathematical Intuition

It is defined as the ratio of the variance $σ^{2}$ to the mean $μ$

\begin{array}{r} D R = \frac{σ^{2}}{μ} \end{array}

★ Range of Dispersion Ratio

The value ranges from +1 to $+ \infty$ ,

A value of 1 indicates that all samples for that feature are the same (low relevance).

★ Usage

It is used to separate useful signals (features with high variance) from noisy or constant ones.
Think if it this way, a higher dispersion ratio implies higher feature importance, making it a useful, non-biased measure for selecting features.

Removing Low-Variance Features:
- If a feature has a dispersion ratio close to 1, it means the values are very close to each other (low variance), making it a candidate for removal.
Identifying Informative Features:
- Features with high dispersion ratios — signaling it might carry more predictive power, are considered more informative and are kept for model training.

In short, the higher the Dispersion Ratio, the more discriminative the feature might be.

🧠 Requirements & Data Compatibility for Feature Selection

Linearity: ✅ Works regardless of linearity (measures spread only).
Normalization: ⚠️ Yes, recommended — scale features to comparable ranges, or large-valued features dominate.
Ordinal (Ranked): ✅ Works fine — as ranks still produce meaningful variance.
Numeric Encoded / Discretized: ⚠️ Use with care — if encoding changes the scale drastically, normalize after encoding.

🏆 Strategic Advantages

Simple and fast — Just variance and mean; computationally light.
No target needed — Works for unsupervised feature filtering.
Identifies inactive features — Helps eliminate near-zero variance columns easily.
Works across domains — Can be applied to regression, classification, or clustering tasks.
Measures variability across different groups (useful for classification tasks but can still provide insights for regression). i.e When you need to ensure features differ meaningfully between classes, making it a better option for classification problems.
Application: It works on various data types, including numerical, nominal, and categorical data, and is used to find optimal splitting attributes in decision trees.

⚠️ Constraints

Ignores relationships with the target — A feature may vary a lot but still be irrelevant.
Sensitive to outliers — Extreme values can distort variance and inflate DR.
Scale-dependent — Features with larger numeric ranges will dominate unless normalized.
Not ideal for categorical data — Without encoding, variance is undefined for pure categories.

🚨 Caution — Common Misconceptions

“High DR = good feature” — Not always. A feature might vary a lot but still have no predictive power.
Forgetting to scale — If features have different units (e.g., meters vs. dollars), DR becomes misleading.
Blind filtering — Always pair DR with correlation or mutual information to confirm feature relevance.
❌ Not ideal when features have both positive and negative values, as dispersion ratio is affected by mean values close to zero.

Comparing difference between Dispersion Ratio and Variance Threshold

Feature	Dispersion Ratio	Variance Threshold
Defination	Measures the relative spread of feature values based on variance-to-mean ratio.	Measures absolute variance of feature values.
Formula	$$\begin{align}\large \frac{\sigma^2}{\mu}\end{align}$$	$$\Large \sigma^2$$
Relationship with Target	It does not consider the relationship between features and target labels. A feature could have low variance but still be highly relevant for classification.	Compares the spread (distribution) of a feature across different target classes.
Interpretation	Higher values indicate greater variability relative to the mean.	Higher variance means more spread-out data.
Scaling	More stable for features with different scales, since it considers the mean.	Sensitive to scale; may require normalization.
Best for	- When you need to ensure features differ meaningfully between classes, making it a better option for classification problems. - Features with varying scales (e.g., financial, demographic data, real estate).	- Identifying features that are constant or near-constant. - When scale is not a concern.
When to Avoid	- When features have zero mean, as it can cause division errors. - If data is highly skewed, as the mean may not be meaningful. - Not ideal when features have both positive and negative values, as dispersion ratio is affected by mean values close to zero.	- If feature scaling differs, variance values might not be comparable. - When relative variability is more important than absolute variance.
Use Case Example	Identifying relevant features in datasets where some features have small values but high significance (e.g., income data, disease biomarkers).	Removing constant or low-variance features in image processing, text analysis, or high-dimensional datasets.

The Dispersion Ratio is like your data’s “energy meter.” If a feature’s energy (variance) is low, it’s not contributing much to the conversation.
Use it early in your feature selection process to filter out weak or flat features — but don’t stop there. Pair it with target-aware techniques like Pearson correlation, Mutual Information, or ReliefF for a complete and balanced selection.

Coding Examples

1: Regression — Predicting House Prices

You’re building a model to predict house prices based on features like sqft, num_rooms, and distance_to_city.

import pandas as pd  
import numpy as np  
  
# Example dataset  
data = {  
	'sqft': [1000, 1500, 1800, 2000, 2500],  
	'num_rooms': [2, 3, 3, 4, 4],  
	'distance_to_city': [10, 8, 12, 6, 5],  
	'price': [200000, 250000, 270000, 300000, 350000]  
}  
  
df = pd.DataFrame(data)  
  
# Calculate Dispersion Ratio for numeric features  
def dispersion_ratio(series):  
return np.var(series) / (np.mean(series) ** 2)  
  
dr_values = df.drop(columns=['price']).apply(dispersion_ratio)  
print(dr_values)

Output

sqft 0.080837  
num_rooms 0.054687  
distance_to_city 0.097561  
dtype: float64

Here, distance_to_city has the highest DR, meaning it varies most relative to its mean — potentially a strong candidate for predicting house price.

2: Using Dispersion Ratio in Supervised — Classification

You’re predicting whether a transaction is fraudulent (1) or not (0) based on features like transaction_amount, time_of_day, and merchant_rating.

import pandas as pd  
import numpy as np  
  
# Loading Dataset  
from sklearn.datasets import load_wine  
data = load_wine()  
  
# Convert data to Dataframe and target  
X =  pd.DataFrame(data.data, columns=data.feature_names)  
y = data.target  
  
# Define a function to calculate Dispersion Ratio  
def dispersion_ratio(df, feature_col, target_col):  
    """Computes the dispersion ratio of a feature across different target classes.  
    """  
    class_groups = df.groupby(target_col)[feature_col]  
      
    # Compute standard deviation per class  
    class_std = class_groups.std()  
      
    # Compute mean per class  
    class_mean = class_groups.mean()  
      
    # Avoid division by zero  
    dispersion_values = class_std / class_mean.replace(0, np.nan)  
      
    return dispersion_values.mean()  
  
dispersion_scores = {col: dispersion_ratio(X, col, y) for col in X.columns}  
  
# To Dataframe  
ratio = pd.DataFrame({"dispersion_ratio":np.round(list(dispersion_scores.values()),2)}, index=dispersion_scores.keys()).sort_values("dispersion_ratio")  
  
# display  
ratio

Output

alcohol
ash 0.04
magnesium 0.10
alcalinity_of_ash 0.13
od280/od315_of_diluted_wines 0.14
hue 0.15
total_phenols 0.16
proline 0.19
flavanoids 0.23
color_intensity 0.28
nonflavanoid_phenols 0.28
proanthocyanins 0.29
malic_acid 0.31
dispersion_ratio 0.40