Bagging (Bootstrap Aggregating)

Bagging is short for Bootstrap Aggregating, is a parallel ensemble learning technique that mainly aims to reduces variance and prevents overfitting by training multiple models independently on different bootstrap samples of the training data and then aggregating their predictions.

By training models on different bootstrap samples (random samples with replacement), each model sees a slightly different view of the data, capturing different patterns and making different errors. When we aggregate their predictions, individual mistakes tend to cancel out, while correct predictions reinforce each other.

Key Distinctions

Homogeneous Algorithms

Different Model Versions

Independent Training

Equal Weighting

Aggregation Strategy:

The Bagging Process

Phase 1: Data Preparation

Step 1: Train-Test Split

Split the main dataset into a training set (typically 80%) and a test set (remaining 20%). The test set is reserved exclusively for final evaluation and is never used during the training or validation process.

Phase 2: Bootstrap Sampling and Model Training

Step 2: Create Bootstrap Samples

For each model in the ensemble (e.g., Model 1 through Model B, where B is typically 50-500), generate a bootstrap sample by randomly sampling n instances from the training set with replacement (where n is the size of the training set).

Key Property: Each bootstrap sample has approximately:

ML_AI/images/bagging-1.png700

Step 3: Train Individual Models

Train one model on each bootstrap sample independently and in parallel:

Important Characteristics:

Repeat Steps 2-3 for all B models in the ensemble.

After all models are trained, we can perform internal validation using OOB samples without needing a separate validation set.

Step 4: Generate OOB Predictions for Each Sample

For each training sample (xi,yi):

Step 5: Calculate OOB Error

Aggregate the OOB predictions across all training samples and compare with true labels to compute the OOB Error.

Why OOB is valuable: This provides a validation estimate similar to cross-validation but without the computational cost of retraining models. It tells you how well your ensemble generalizes before ever touching the test set.

Phase 4: Final Evaluation on Test Set

Step 6: Generate Test Set Predictions

Once all models are trained, each model makes predictions on the entire test set (data never seen during training).

Step 7: Aggregate Predictions

Combine predictions from all B models to produce the final ensemble prediction:

ML_AI/images/bagging-2.png700

Step 8: Compute Final Performance Metrics

Compare the ensemble's final predictions against the true labels of the test set to calculate performance metrics (accuracy, F1-score, RMSE, etc.).

Mathematical Intuition: Why Bagging Reduces Variance

Consider B independent models, each with variance σ2. When we average their predictions:

Var(average)=σ2B

This shows that averaging reduces variance by a factor of B. In practice, models aren't perfectly independent (they're trained on overlapping data), so the reduction is less dramatic, but still significant.

Advantages of Bagging

Variance Reduction

Parallelization

Noise Reduction

No Hyperparameter Tuning Required

Out-of-Bag Evaluation

Feature Importance

Probabilistic Predictions

Limitations of Bagging

Bias Not Addressed

Interpretability Loss

Computational Cost

Memory Requirements

Not Ideal for Linear Models

Inference Time

When to Use Bagging

✅ Best Suited For:

High-Variance Base Models

Complex models

Sufficient Data

Noisy Data

Parallel Computing Available

Feature Importance Needed

❌ Avoid When:

Base Model Has High Bias

Interpretability Critical

Extremely Limited Resources

Linear Relationships Dominate

Small Dataset

Imbalance Dataset

Common Pitfalls and How to Avoid Them

Pitfall Problem Solution
Bagging Linear Models Minimal benefit from bagging models that are already stable Use bagging with high-variance models (trees, neural n/w)
Too Few Estimators B=10 trees won't provide enough variance reduction Start with B=100 minimum, increase until performance plateaus
Overly Restricted Base Models Very shallow trees (max_depth=3) → each model has high bias Allow trees to grow deeper than for single-tree models
Ignoring Class Imbalance Majority class dominates bootstrap samples Use stratified sampling or balanced bagging methods
Not Using OOB Evaluation Wasting data on separate validation set Use OOB scores for model validation and selection
Forgetting Scaling for Some Base Models Bagging KNN or SVM without feature scaling Scale features appropriately for distance-based models
Memory Issues in Production 500 decision trees require substantial memory Reduce B, compress models, or use model distillation

Common Bagging Algorithms

  1. Random Forest
  2. Extra Trees (Extremely Randomized Trees)
  3. Bagged Decision Trees
  4. General Bagging Wrapper
    • Bagged SVMs
    • Bagged Neural Networks
    • Bagged Logistic Regression
    • Bagged KNN

Scikit-learn's BaggingClassifier and BaggingRegressor can bag any base estimator

Advanced Techniques

Techniques Description Use Case
Pasting Like bagging but sampling without replacement. Each model sees a unique subset of data. Very large datasets where bootstrap sampling is unnecessary.
Random Subspaces Sample features instead of (or in addition to) samples. High-dimensional data where feature redundancy is high.
Random Patches Combine both row (sample) and column (feature) sampling. Very high-dimensional datasets (images, text).
Weighted Bagging Weight samples during bootstrap sampling based on importance or difficulty. Imbalanced datasets or when some samples are more reliable.
Balanced Bagging Ensure each bootstrap sample has balanced class distribution. Highly imbalanced classification problems.