Regression Loss Functions

Introduction: What is a Loss Function?

When building regression models, we often face a critical question: How do we measure the quality of our predictions? This is where loss functions come in.

A loss function numerically quantifies how "wrong" your model's predictions are. It's the mathematical way to measure the difference between what your model predicts ( $\hat{y}$ ) and what actually happened ( $y$ ).

Key Concepts:

Lower loss = Better model: A model with less error is preferred.
Different loss functions emphasize different things: Some care more about outliers, some treat all errors equally, some penalize underprediction vs. overprediction differently.
Choosing the right loss function is critical: The loss function you choose directly influences what your model learns to optimize for.

Decision Flowchart: How to Choose the Right Loss Function

graph TD
    Start[Start: Choose Loss Function] --> Q1{Do you have
outliers in data?}
    
    Q1 -->|Yes, many outliers| Q2{Can you remove
or treat outliers?}
    Q1 -->|No outliers| Q3{What type of
error distribution?}
    
    Q2 -->|No, keep outliers| Q4{How sensitive should
model be to outliers?}
    Q2 -->|Yes, can remove| Q3
    
    Q4 -->|Very robust| MAE[MAE
Mean Absolute Error]
    Q4 -->|Moderately robust| HUBER[Huber Loss
Smooth MAE]
    Q4 -->|Some robustness| LOGCOSH[Log-Cosh Loss]
    
    Q3 -->|Normal distribution| Q5{Do you need
interpretability?}
    Q3 -->|Skewed distribution| Q6{Predicting what
kind of values?}
    
    Q5 -->|Yes, same units as target| RMSE[RMSE
Root Mean Squared Error]
    Q5 -->|No, just optimization| MSE[MSE
Mean Squared Error]
    
    Q6 -->|Exponential growth
or percentages| MSLE[MSLE
Mean Squared Log Error]
    Q6 -->|Relative errors
matter most| MAPE[MAPE
Mean Absolute % Error]
    
    Q3 -->|Want to check bias| MBE[MBE
Mean Bias Error]
    
    style Start fill:#e1f5ff
    style MSE fill:#90EE90
    style RMSE fill:#90EE90
    style MAE fill:#FFD700
    style HUBER fill:#FFD700
    style MSLE fill:#FFA07A
    style MAPE fill:#FFA07A
    style MBE fill:#DDA0DD
    style LOGCOSH fill:#FFD700

Summary: Choosing the Right Loss Function

Here's a quick reference table to help you choose the right loss function for your regression problem:

Loss Function	Best For	Avoid When	Key Characteristic	Outlier Sensitivity	Scaling Requirements
Mean Squared Error	Normal distribution, few outliers, gradient-based optimization	Many outliers present	Penalizes large errors heavily	⚠️⚠️⚠️ Very High	✅ Features: Yes for gradients/regularization ⚠️ Target: Optional
Root Mean Squared Error	Need interpretable metric in original units	Many outliers present	Same as MSE, but in original units	⚠️⚠️⚠️ Very High	✅ Features: Yes for gradients/regularization ⚠️ Target: Report in original units
Mean Absolute Error	Robust to outliers, equal treatment of all errors	Need to penalize large errors	Linear penalty, all errors weighted equally	✅ Low	✅ Features: Recommended ⚠️ Target: Optional
Huber Loss	Some outliers, need smooth optimization	No outliers (use MSE) or extreme outliers (use MAE)	Hybrid of MSE + MAE, tunable threshold	⚠️ Medium	✅ Features: Essential (delta is scale-dependent) ⚠️ Target: Optional
Mean Squared Logarithmic Error	Exponential growth, multiple scales, percentage errors matter	Negative values, equal over/under prediction	Cares about relative errors	⚠️ Medium	✅ Features: Yes ❌ Target: Don't scale (loses relative error property)
Mean Absolute Percentage Error	Business reporting, percentage-based interpretation	Zeros in data, need symmetry	Easy to explain (% error)	⚠️⚠️ High	✅ Features: Yes ❌ Target: Never scale (always compute on original)
Mean Bias Error	Detecting systematic bias in predictions	As sole evaluation metric	Shows prediction bias direction	N/A (diagnostic)	✅ Features: Yes ✅ Target: Report in original units for clarity
LogCosh	Need smooth gradients with outlier robustness	Computational efficiency critical	Smooth approximation of MAE	⚠️ Medium-Low	✅ Features: Essential ✅ Target: Recommended (keeps errors in [-3,3] range)

Best Practices

1. Don't Rely on a Single Metric

Always use multiple loss functions to get a complete picture of your model's performance. For example:

MSE/RMSE for overall error magnitude
MAE for median performance
MBE to check for systematic bias

2. Match the Loss to Your Business Goal

If overpredicting is worse than underpredicting → Use asymmetric losses (MSLE)
If outliers are measurement errors → Use robust losses (MAE, Huber)
If outliers are real and important → Use MSE

3. Visualize Your Residuals

Always plot your prediction errors. This helps you:

Identify patterns the metrics might miss
Detect heteroscedasticity (varying error across prediction range)
Spot systematic bias

4. Consider the Scale

For comparing models on the same dataset: MSE is fine
For comparing across datasets: Use MAPE or standardized metrics
For business reporting: Use RMSE or MAPE (easier to interpret)

5. Training vs. Evaluation

Training loss: Often use MSE for smooth optimization
Evaluation metric: Use what matters to your business (could be different!)

6. Scaling Best Practices ⭐ NEW

✅ Always scale features for gradient-based models (Neural Networks, Linear/Logistic Regression)
✅ Always scale features for regularized models (Ridge, Lasso, Elastic Net) - it's mandatory
❌ Never scale features for tree-based models (Random Forest, XGBoost, Decision Trees) - it's unnecessary
Target Variable Scaling: 👇

Loss Function	Scale Target?	Reason
MSE/RMSE	Optional	Helps neural network convergence; report in original units
MAE	Optional	Doesn't change relative errors
Huber	Optional	But tune delta after scaling
MSLE	❌ No	Already handles scale via log; scaling breaks relative error property
MAPE	❌ Never	Scale-invariant; scaling produces wrong percentages
MBE	Optional	But report in original units for interpretability
Log-Cosh	✅ Yes	Keeps errors in optimal [-3, 3] range for balanced MSE/MAE behavior

Conclusion

Start with MSE/RMSE as a baseline
Switch to MAE if you have outliers
Use Huber or Log-Cosh if you need both smoothness and robustness
Use MSLE or MAPE for percentage-based problems
Always check for bias using MBE
Remember to scale appropriately based on your chosen loss function!