Mean Absolute Percentage Error (MAPE)

Definition

MAPE measures the average percentage error between predicted and actual values. It expresses the error as a percentage of the actual value, making it highly interpretable, especially for business stakeholders.

Formula:

MAPE=100%ni=1n|yiyi^yi|

Advantages

1. Easy to Interpret (Percentage-Based Understanding)
2. Scale-Independent (Cross-Dataset Comparison)
3. Business-Friendly (Stakeholder Communication)
4. Intuitive Relative Magnitude

Disadvantages

1. Undefined for Zero Values (Division by Zero Problem)
2. Asymmetric Penalty (Biased Toward Underprediction)
3. Biased Toward Low Values (Small Denominator Problem)
4. Can Be Misleading (Equal Percentage ≠ Equal Importance)

When to Use MAPE

When to Avoid MAPE

Scaling and Practical Considerations

1. Does MAPE Need Scaled Data?

The short answer: No—MAPE is inherently scale-invariant.
The real answer: Feature scaling helps model training, but MAPE should ALWAYS be calculated on original-scale data. Never scale the target when using MAPE.

2. Key Insight: MAPE is Scale-Invariant by Design

Why MAPE is different:

Why this matters:

Analogy: MAPE is like grading on a curve—it automatically adjusts for the "difficulty" (scale) of each prediction. But just like curves can be unfair to edge cases, MAPE struggles with small values.

3. When does scaling help?

Always scale features, but keep target unscaled

★ Target Scaling - DANGEROUS with MAPE

This is where things go wrong

The critical warning: MAPE's formula |yy^|y requires actual values (y) to be in their original, meaningful scale.

What happens with target scaling:

Problem 1: MinMax Scaling (0-1)

# Original: Actual = $5, Predicted = $6
# MAPE = |5-6|/5 * 100 = 20%

# After MinMax scaling (range $1-$100):
# Scaled actual = 0.04, Scaled predicted = 0.05
# MAPE = |0.04-0.05|/0.04 * 100 = 25% ← DIFFERENT!

# The denominator changed, so MAPE changed!

Problem 2: Standardization (mean=0, std=1)

# Actual = $50 (scaled to 0 after standardization, mean=$50)
# Predicted = $55 (scaled to 0.25)

# MAPE = |0 - 0.25|/0 → Division by zero!
# Even if not exactly zero, negative values create nonsense:
# MAPE = |-2.25 - (-2.20)|/(-2.25) * 100 = -2.22% ← Negative percentage!?

Problem 3: Log Transformation

# MAPE on log scale has no interpretable meaning
# log($100) = 4.6, but MAPE = 10% on logs ≠ 10% on original scale

4. Effect of Scaling on MAPE

Scaling Type Effect on MAPE Recommendation
Feature Scaling Only No direct effect on MAPE; improves model training ✅ Recommended
Target MinMax (0-1) Changes MAPE values; amplifies errors on small values ❌ Don't use
Target Standardization Creates negative denominators; MAPE becomes meaningless ❌ Never use
Target Log Transform MAPE percentages lose meaning; not on original scale ❌ Avoid
No target scaling MAPE works naturally as intended ✅ Best for MAPE

5. Why MAPE Breaks with Target Scaling

The mathematical reason:

The intuition:

6. Best Practice for MAPE

7. When Scaling Creates Issues: A Complete Example

import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import mean_absolute_percentage_error

# Original data
y_test = np.array([5, 10, 50, 100, 500])
y_pred = np.array([6, 11, 55, 110, 550])

# MAPE on original scale (CORRECT)
mape_original = mean_absolute_percentage_error(y_test, y_pred)
print(f"MAPE (original scale): {mape_original * 100:.2f}%")  # ~10%

# WRONG: MAPE after standardization
scaler = StandardScaler()
y_test_scaled = scaler.fit_transform(y_test.reshape(-1, 1)).ravel()
y_pred_scaled = scaler.transform(y_pred.reshape(-1, 1)).ravel()
# This will likely error or give nonsense due to negative/zero values

# WRONG: MAPE after MinMax scaling
scaler2 = MinMaxScaler()
y_test_minmax = scaler2.fit_transform(y_test.reshape(-1, 1)).ravel()
y_pred_minmax = scaler2.transform(y_pred.reshape(-1, 1)).ravel()
mape_minmax = mean_absolute_percentage_error(y_test_minmax, y_pred_minmax)
print(f"MAPE (MinMax scaled): {mape_minmax * 100:.2f}%")  # Different from 10%!

# CORRECT: Inverse transform before MAPE
y_pred_original = scaler2.inverse_transform(y_pred_minmax.reshape(-1, 1)).ravel()
mape_correct = mean_absolute_percentage_error(y_test, y_pred_original)
print(f"MAPE (inverse transformed): {mape_correct * 100:.2f}%")  # Back to ~10%