Regression Loss Functions

Introduction: What is a Loss Function?

When building regression models, we often face a critical question: How do we measure the quality of our predictions? This is where loss functions come in.

A loss function numerically quantifies how "wrong" your model's predictions are. It's the mathematical way to measure the difference between what your model predicts (y^) and what actually happened (y).

Key Concepts:


Decision Flowchart: How to Choose the Right Loss Function

graph TD
    Start[Start: Choose Loss Function] --> Q1{Do you have
outliers in data?} Q1 -->|Yes, many outliers| Q2{Can you remove
or treat outliers?} Q1 -->|No outliers| Q3{What type of
error distribution?} Q2 -->|No, keep outliers| Q4{How sensitive should
model be to outliers?} Q2 -->|Yes, can remove| Q3 Q4 -->|Very robust| MAE[MAE
Mean Absolute Error] Q4 -->|Moderately robust| HUBER[Huber Loss
Smooth MAE] Q4 -->|Some robustness| LOGCOSH[Log-Cosh Loss] Q3 -->|Normal distribution| Q5{Do you need
interpretability?} Q3 -->|Skewed distribution| Q6{Predicting what
kind of values?} Q5 -->|Yes, same units as target| RMSE[RMSE
Root Mean Squared Error] Q5 -->|No, just optimization| MSE[MSE
Mean Squared Error] Q6 -->|Exponential growth
or percentages| MSLE[MSLE
Mean Squared Log Error] Q6 -->|Relative errors
matter most| MAPE[MAPE
Mean Absolute % Error] Q3 -->|Want to check bias| MBE[MBE
Mean Bias Error] style Start fill:#e1f5ff style MSE fill:#90EE90 style RMSE fill:#90EE90 style MAE fill:#FFD700 style HUBER fill:#FFD700 style MSLE fill:#FFA07A style MAPE fill:#FFA07A style MBE fill:#DDA0DD style LOGCOSH fill:#FFD700

Summary: Choosing the Right Loss Function

Here's a quick reference table to help you choose the right loss function for your regression problem:

Loss Function Best For Avoid When Key Characteristic Outlier Sensitivity Scaling Requirements
Mean Squared Error Normal distribution, few outliers, gradient-based optimization Many outliers present Penalizes large errors heavily ⚠️⚠️⚠️ Very High ✅ Features: Yes for gradients/regularization
⚠️ Target: Optional
Root Mean Squared Error Need interpretable metric in original units Many outliers present Same as MSE, but in original units ⚠️⚠️⚠️ Very High ✅ Features: Yes for gradients/regularization
⚠️ Target: Report in original units
Mean Absolute Error Robust to outliers, equal treatment of all errors Need to penalize large errors Linear penalty, all errors weighted equally ✅ Low ✅ Features: Recommended
⚠️ Target: Optional
Huber Loss Some outliers, need smooth optimization No outliers (use MSE) or extreme outliers (use MAE) Hybrid of MSE + MAE, tunable threshold ⚠️ Medium ✅ Features: Essential (delta is scale-dependent)
⚠️ Target: Optional
Mean Squared Logarithmic Error Exponential growth, multiple scales, percentage errors matter Negative values, equal over/under prediction Cares about relative errors ⚠️ Medium ✅ Features: Yes
❌ Target: Don't scale (loses relative error property)
Mean Absolute Percentage Error Business reporting, percentage-based interpretation Zeros in data, need symmetry Easy to explain (% error) ⚠️⚠️ High ✅ Features: Yes
❌ Target: Never scale (always compute on original)
Mean Bias Error Detecting systematic bias in predictions As sole evaluation metric Shows prediction bias direction N/A (diagnostic) ✅ Features: Yes
✅ Target: Report in original units for clarity
LogCosh Need smooth gradients with outlier robustness Computational efficiency critical Smooth approximation of MAE ⚠️ Medium-Low ✅ Features: Essential
✅ Target: Recommended (keeps errors in [-3,3] range)

Best Practices

1. Don't Rely on a Single Metric

Always use multiple loss functions to get a complete picture of your model's performance. For example:

2. Match the Loss to Your Business Goal

3. Visualize Your Residuals

Always plot your prediction errors. This helps you:

4. Consider the Scale

5. Training vs. Evaluation

6. Scaling Best Practices ⭐ NEW

Loss Function Scale Target? Reason
MSE/RMSE Optional Helps neural network convergence; report in original units
MAE Optional Doesn't change relative errors
Huber Optional But tune delta after scaling
MSLE No Already handles scale via log; scaling breaks relative error property
MAPE Never Scale-invariant; scaling produces wrong percentages
MBE Optional But report in original units for interpretability
Log-Cosh Yes Keeps errors in optimal [-3, 3] range for balanced MSE/MAE behavior

Conclusion

  • Start with MSE/RMSE as a baseline
  • Switch to MAE if you have outliers
  • Use Huber or Log-Cosh if you need both smoothness and robustness
  • Use MSLE or MAPE for percentage-based problems
  • Always check for bias using MBE
  • Remember to scale appropriately based on your chosen loss function!