Dispersion Ratio

If I told you the average temperature of a lake was 20°C, you might jump in—but if the "dispersion" is high, one end might be frozen and the other boiling!

What is Dispersion Ratio?

Dispersion Ratio (a.k.a Index of Dispersion or Variance-to-Mean Ratio) is a filter-based feature selection technique that measures how spread out or variable a feature’s values are relative to its mean.

🧮 Mathematical Intuition

It is defined as the ratio of the variance σ2  to the mean μ

DR=σ2μ

★ Range of Dispersion Ratio

The value ranges from +1 to +,

★ Usage

It is used to separate useful signals (features with high variance) from noisy or constant ones.
Think if it this way, a higher dispersion ratio implies higher feature importance, making it a useful, non-biased measure for selecting features.

In short, the higher the Dispersion Ratio, the more discriminative the feature might be.

🧠 Requirements & Data Compatibility for Feature Selection

  1. Linearity: ✅ Works regardless of linearity (measures spread only).
  2. Normalization: ⚠️ Yes, recommended — scale features to comparable ranges, or large-valued features dominate.
  3. Ordinal (Ranked): ✅ Works fine — as ranks still produce meaningful variance.
  4. Numeric Encoded / Discretized: ⚠️ Use with care — if encoding changes the scale drastically, normalize after encoding.

🏆 Strategic Advantages

  1. Simple and fast — Just variance and mean; computationally light.
  2. No target needed — Works for unsupervised feature filtering.
  3. Identifies inactive features — Helps eliminate near-zero variance columns easily.
  4. Works across domains — Can be applied to regression, classification, or clustering tasks.
  5. Measures variability across different groups (useful for classification tasks but can still provide insights for regression). i.e When you need to ensure features differ meaningfully between classes, making it a better option for classification problems.
  6. Application: It works on various data types, including numerical, nominal, and categorical data, and is used to find optimal splitting attributes in decision trees.

⚠️ Constraints

  1. Ignores relationships with the target — A feature may vary a lot but still be irrelevant.
  2. Sensitive to outliers — Extreme values can distort variance and inflate DR.
  3. Scale-dependent — Features with larger numeric ranges will dominate unless normalized.
  4. Not ideal for categorical data — Without encoding, variance is undefined for pure categories.

🚨 Caution — Common Misconceptions

  1. “High DR = good feature” — Not always. A feature might vary a lot but still have no predictive power.
  2. Forgetting to scale — If features have different units (e.g., meters vs. dollars), DR becomes misleading.
  3. Blind filtering — Always pair DR with correlation or mutual information to confirm feature relevance.
  4. ❌ Not ideal when features have both positive and negative values, as dispersion ratio is affected by mean values close to zero.

Comparing difference between Dispersion Ratio and Variance Threshold

Feature Dispersion Ratio Variance Threshold
Defination Measures the relative spread of feature values based on variance-to-mean ratio. Measures absolute variance of feature values.
Formula $$\begin{align*}\large \frac{\sigma^2}{\mu}\end{align*}$$ $$\Large \sigma^2$$
Relationship with Target It does not consider the relationship between features and target labels. A feature could have low variance but still be highly relevant for classification. Compares the spread (distribution) of a feature across different target classes.
Interpretation Higher values indicate greater variability relative to the mean. Higher variance means more spread-out data.
Scaling More stable for features with different scales, since it considers the mean. Sensitive to scale; may require normalization.
Best for - When you need to ensure features differ meaningfully between classes, making it a better option for classification problems.
- Features with varying scales (e.g., financial, demographic data, real estate).
- Identifying features that are constant or near-constant.
- When scale is not a concern.
When to Avoid - When features have zero mean, as it can cause division errors.
- If data is highly skewed, as the mean may not be meaningful.
- Not ideal when features have both positive and negative values, as dispersion ratio is affected by mean values close to zero.
- If feature scaling differs, variance values might not be comparable.
- When relative variability is more important than absolute variance.
Use Case Example Identifying relevant features in datasets where some features have small values but high significance (e.g., income data, disease biomarkers). Removing constant or low-variance features in image processing, text analysis, or high-dimensional datasets.

The Dispersion Ratio is like your data’s “energy meter.” If a feature’s energy (variance) is low, it’s not contributing much to the conversation.
Use it early in your feature selection process to filter out weak or flat features — but don’t stop there. Pair it with target-aware techniques like Pearson correlationMutual Information, or ReliefF for a complete and balanced selection.


Coding Examples

1: Regression — Predicting House Prices

You’re building a model to predict house prices based on features like sqftnum_rooms, and distance_to_city.

import pandas as pd  
import numpy as np  
  
# Example dataset  
data = {  
	'sqft': [1000, 1500, 1800, 2000, 2500],  
	'num_rooms': [2, 3, 3, 4, 4],  
	'distance_to_city': [10, 8, 12, 6, 5],  
	'price': [200000, 250000, 270000, 300000, 350000]  
}  
  
df = pd.DataFrame(data)  
  
# Calculate Dispersion Ratio for numeric features  
def dispersion_ratio(series):  
return np.var(series) / (np.mean(series) ** 2)  
  
dr_values = df.drop(columns=['price']).apply(dispersion_ratio)  
print(dr_values)

Output

sqft 0.080837  
num_rooms 0.054687  
distance_to_city 0.097561  
dtype: float64

Here, distance_to_city has the highest DR, meaning it varies most relative to its mean — potentially a strong candidate for predicting house price.

2: Using Dispersion Ratio in Supervised — Classification

You’re predicting whether a transaction is fraudulent (1) or not (0) based on features like transaction_amounttime_of_day, and merchant_rating.

import pandas as pd  
import numpy as np  
  
# Loading Dataset  
from sklearn.datasets import load_wine  
data = load_wine()  
  
# Convert data to Dataframe and target  
X =  pd.DataFrame(data.data, columns=data.feature_names)  
y = data.target  
  
# Define a function to calculate Dispersion Ratio  
def dispersion_ratio(df, feature_col, target_col):  
    """Computes the dispersion ratio of a feature across different target classes.  
    """  
    class_groups = df.groupby(target_col)[feature_col]  
      
    # Compute standard deviation per class  
    class_std = class_groups.std()  
      
    # Compute mean per class  
    class_mean = class_groups.mean()  
      
    # Avoid division by zero  
    dispersion_values = class_std / class_mean.replace(0, np.nan)  
      
    return dispersion_values.mean()  
  
dispersion_scores = {col: dispersion_ratio(X, col, y) for col in X.columns}  
  
# To Dataframe  
ratio = pd.DataFrame({"dispersion_ratio":np.round(list(dispersion_scores.values()),2)}, index=dispersion_scores.keys()).sort_values("dispersion_ratio")  
  
# display  
ratio

Output

alcohol
ash 0.04
magnesium 0.10
alcalinity_of_ash 0.13
od280/od315_of_diluted_wines 0.14
hue 0.15
total_phenols 0.16
proline 0.19
flavanoids 0.23
color_intensity 0.28
nonflavanoid_phenols 0.28
proanthocyanins 0.29
malic_acid 0.31
dispersion_ratio 0.40