Types of Voting

1. Hard Voting (Majority Voting)

For Classification Only

Each model predicts a class label, and the final prediction is the class that receives the most votes.

i. Example

Binary Classification (Fraud Detection):

Model 1 (Random Forest): Predicts "Fraud"
Model 2 (Logistic Regression): Predicts "Not Fraud"
Model 3 (SVM): Predicts "Fraud"
Model 4 (Neural Network): Predicts "Fraud"

Final Prediction: "Fraud" (3 votes vs. 1 vote)

Multi-Class Classification (Animal Recognition):

Model 1: Predicts "Cat"
Model 2: Predicts "Dog"
Model 3: Predicts "Cat"
Model 4: Predicts "Cat"
Model 5: Predicts "Bird"

Final Prediction: "Cat" (3 votes out of 5)

ii. Tie Breaking

When there's a tie:

Random selection: Pick randomly among tied classes
Order-based: First model in the list wins
Confidence-based: Use soft voting as tiebreaker

iii. Characteristics

Advantages:

Simple and intuitive
Fast (just counting votes)
Works with any classifier (doesn't need probability estimates)
Robust to individual model errors

Disadvantages:

Treats all models equally (can't weight by confidence)
Ignores prediction confidence (90% sure vs. 51% sure treated the same)
Less flexible than soft voting
Requires all models to predict the same class labels

2. Soft Voting (Weighted Averaging of Probabilities)

For Classification Only

Each model outputs class probabilities, and the final prediction is based on the averaged probabilities across all models.

How It Works?

Given $M$ classifiers, each outputting probability estimates for $K$ classes:

For class $k$ and input $x$ :

P (y = k | x) = \frac{1}{M} \sum_{i = 1}^{M} p_{i} (y = k | x)

Where $p_{i} (y = k | x)$ is the probability that model $i$ assigns to class $k$ .

Final prediction:

H (x) = \arg max_{k} P (y = k | x)

Choose the class with the highest averaged probability.

i. Example

Binary Classification (Is this email spam?):

Model	P(Spam)	P(Not Spam)
Random Forest	0.8	0.2
Logistic Regression	0.6	0.4
SVM	0.9	0.1

Averaged Probabilities:

P(Spam) = (0.8 + 0.6 + 0.9) / 3 = 0.767
P(Not Spam) = (0.2 + 0.4 + 0.1) / 3 = 0.233

Final Prediction: Spam (higher averaged probability)

Multi-Class Example (Digit Recognition):

Model	P(0)	P(1)	P(2)	P(3)	...
CNN-1	0.1	0.7	0.1	0.05	...
CNN-2	0.15	0.6	0.15	0.05	...
CNN-3	0.2	0.5	0.2	0.05	...
Average	0.15	0.6	0.15	0.05	...

Final Prediction: Class 1 (highest averaged probability)

ii. Weighted Soft Voting

Assign different weights to models based on their reliability:

P (y = k | x) = \frac{\sum_{i = 1}^{M} w_{i} \cdot p_{i} (y = k | x)}{\sum_{i = 1}^{M} w_{i}}

Where $w_{i}$ is the weight for model $i$ (typically based on validation performance).

Example:

# Pseudocode
# Model 1 has 80% accuracy → weight = 0.8
# Model 2 has 70% accuracy → weight = 0.7
# Model 3 has 90% accuracy → weight = 0.9

weights = [0.8, 0.7, 0.9]
weighted_probs = np.average(all_probs, axis=0, weights=weights)

iii. Characteristics

Advantages:

Accounts for prediction confidence
Generally higher accuracy than hard voting
Can weight models by their reliability
Smooths probability estimates

Disadvantages:

Requires models that output calibrated probabilities
Slightly more complex than hard voting
Assumes probability estimates are meaningful
Not all models provide good probability estimates

3. Averaging

For Regression Only

Each model predicts a continuous value, and the final prediction is the average (or weighted average) of all predictions.

Simple Averaging

\hat{y} = \frac{1}{M} \sum_{i = 1}^{M} {\hat{y}}_{i}

Weighted Averaging

\hat{y} = \frac{\sum_{i = 1}^{M} w_{i} \cdot {\hat{y}}_{i}}{\sum_{i = 1}^{M} w_{i}}

i. Example

House Price Prediction:

Model 1 (Linear Regression): $350,000
Model 2 (Random Forest): $370,000
Model 3 (XGBoost): $365,000

Simple Average: ($350k + $370k + $365k) / 3 = $361,667

Weighted Average (if Model 3 is most reliable):

Weights: [0.2, 0.3, 0.5]
Prediction: 0.2×$350k + 0.3×$370k + 0.5×$365k = $363,500

ii. Alternative Aggregation Methods

Median (Robust to outliers):

\hat{y} = median ({\hat{y}}_{1}, {\hat{y}}_{2}, . . ., {\hat{y}}_{M})

Trimmed Mean (Remove extremes):

Sort predictions
Remove top and bottom 10-20%
Average remaining predictions

iii. Characteristics

Advantages:

Reduces variance (smooths individual predictions)
Simple and interpretable
Robust to individual model errors
Median provides outlier resistance

Disadvantages:

Doesn't reduce bias (average of biased models is still biased)
Treats all models equally by default
May not capture complex interactions

Advanced Voting Techniques

1. Dynamic Voting

Adjust weights based on input characteristics:

# Pseudocode
def dynamic_voting(x):
    if feature_A(x) > threshold:
        # Model 1 is better for high feature_A
        weights = [2, 1, 1]
    else:
        # Model 2 is better for low feature_A
        weights = [1, 2, 1]
    
    return weighted_vote(x, weights)

2. Confidence-Based Voting

Only use predictions above certain confidence:

# Pseudocode
threshold = 0.8

valid_predictions = []
for model in models:
    prob = model.predict_proba(x)
    if max(prob) > threshold:
        valid_predictions.append(model.predict(x))

final = majority_vote(valid_predictions)

3. Hierarchical Voting

Multi-stage voting:

# Pseudocode
# Stage 1: Fast models vote
if unanimous_agreement(fast_models):
    return quick_prediction
else:
    # Stage 2: Add slow but accurate models
    return full_ensemble_prediction

4. Selective Voting

Choose different subsets of models for different samples:

# Pseudocode
# For easy examples, use 3 models
# For hard examples, use all 7 models

difficulty_score = estimate_difficulty(x)
if difficulty_score < threshold:
    models_to_use = [model1, model2, model3]
else:
    models_to_use = all_models

return vote(x, models_to_use)

5. Voting with Rejection Option

Reject predictions when models disagree strongly:

# Pseudocode
votes = [model.predict(x) for model in models]
agreement = max(Counter(votes).values()) / len(votes)

if agreement < 0.6:
    return "UNCERTAIN - MANUAL REVIEW"
else:
    return majority_vote(votes)