Bubble Plot (Bubble Chart)

I. Purpose

Visualize relationships among three or four continuous variables simultaneously. A bubble plot extends the scatter plot by encoding a third variable through bubble size, and optionally a fourth variable through bubble color (hue). This is particularly useful for:

II. Analysis Type

Multivariate (Three or four numeric features, optionally with categories)

III. What to Look For

1. Three-Way Relationships
2. Clusters and Groupings
3. Outliers
4. Distribution Patterns
5. Proportionality

IV. Advantages of Bubble Plots

Relationship How to Spot it in a Bubble Plot
Linear The bubbles form a straight diagonal "string."
Crucially, the size of the bubbles should also stay consistent or
grow/shrink at a steady rate along that line.
Non-Linear The bubbles follow a curved path. You might also see a "Growth Pattern"
where bubbles start small in one corner and become massive in another,
indicating an exponential relationship.

V. Drawbacks of Bubble Plots


VI. When to Use Bubble Plots


VII. When to Avoid Bubble Plots


VIII. Code Example

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load a dataset with multiple numeric features
df = sns.load_dataset('iris')

# Create bubble plot: Sepal Length vs Sepal Width, size = Petal Length, color = Species
plt.figure(figsize=(10, 6))
sns.scatterplot(
    data=df,
    x='sepal_length',
    y='sepal_width',
    size='petal_length',      # Third dimension: bubble size
    hue='species',            # Fourth dimension: color
    sizes=(50, 500),          # Control size range
    alpha=0.6,                # Transparency for overlap visibility
    palette='Set2'
)
plt.title("Iris Dataset: Bubble Plot of Sepal Dimensions", fontsize=14, fontweight='bold')
plt.xlabel("Sepal Length (cm)", fontsize=12)
plt.ylabel("Sepal Width (cm)", fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=10)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

ML_AI/_feature_engineering/images/bubble-1.png

Interactive Bubble Plot with Plotly

import plotly.express as px

# Load sample data
df = px.data.gapminder().query("year == 2007")

# Create interactive bubble plot
fig = px.scatter(
    df,
    x='gdpPercap',
    y='lifeExp',
    size='pop',              # Bubble size: population
    color='continent',       # Color by continent
    hover_name='country',    # Show country on hover
    log_x=True,              # Log scale for GDP
    size_max=60,             # Maximum bubble size
    title='Global Development: GDP vs Life Expectancy (2007)<br><sub>Bubble size represents population</sub>'
)

fig.update_layout(
    xaxis_title='GDP per Capita (log scale)',
    yaxis_title='Life Expectancy (years)',
    font=dict(size=12),
    hovermode='closest'
)

fig.show()

ML_AI/_feature_engineering/images/bubble-2.png


IX. Interpretation Guide

1. Reading Bubble Size

2. Common Patterns and Their Meanings

Pattern Visual Cue Interpretation Action
Linear relationship Bubbles form a straight diagonal string Linear correlation between X and Y Consider linear regression
Non-linear relationship Bubbles follow a curve or cluster in arcs Non-linear or polynomial relationship Consider polynomial or non-linear model
Clustered bubbles Groups of bubbles in specific regions Subgroups or latent classes Investigate group characteristics
Large size outliers One or few bubbles much larger than others Extreme values in size variable Check for outliers or data errors
Overplotting/occlusion Many bubbles overlap, forming dense regions Too many points, hard to interpret Use transparency, facet, or filter data
Category separation (hue) Colors cluster in different plot regions Category effect on X/Y/size Explore category-specific trends

3. Visual Cues for Proportionality

5. Outlier and Cluster Detection


X. Pro Tips for Bubble Plots

1. Size Overplotting (Occlusion)

2. Scaling Issues (Invisible or Exploding Bubbles)

3. Human Perception Errors

4. Cognitive Overload (Too Many Variables)

XI. Documentation & External References

Official Documentation: