Geometric Distance Metrics

1. Euclidean Distance (L2 Norm)

The Euclidean distance is the most widely used distance metric in machine learning, especially in K-means clustering. It calculates the straight-line distance between two data points in vector space.

πŸ‘‰ Euclidean Distance is like like measuring with a ruler

Formula for 'n' dimensions

For two points P=(x1,x2,...,xn) and Q=(y1,y2,...,yn), the Euclidean distance d(P,Q) is defined as:

d(P,Q)=βˆ‘i=1n(xiβˆ’yi)2

Component Breakdown:

  1. xi and yi: Coordinates of the two points P and Q in n-dimensional space
  2. βˆ‘i=1n: Summation over all n dimensions (features)
  3. (xiβˆ’yi)2: Squared difference between corresponding coordinates (ensures positive values)
  4. β‹…: Square root gives actual distance (otherwise it's squared Euclidean distance)

Visual Representation in 2D:
ML_AI/images/euclidean-1.png400

Example in 2D:

πŸ‘‰ In K-Means: Point Q represents the cluster centroid, and we minimize the sum of squared Euclidean distances

When to Use:

Advantages:

Disadvantages:

Practical Tips:

❗ What’s wrong with using Euclidean Distance for Multivariate data?

Euclidean distance does not account for correlations among features
1. Euclidean Distance Ignores Correlation:

Euclidean distance only measures the straight-line distance between two points, without considering relationships between dimensions (features). However, in real-world datasets, features are often correlated or dependent on one another (e.g., height and weight, or income and spending habits).

2. Misleading Interpretation in Correlated Data

When features are positively or negatively correlated, data points form an elliptical distribution rather than a spherical one. In such cases, Euclidean distance assumes the points are distributed equally in all directions, leading to misleading distance measurements.

ML_AI/images/euclidean-2.png800

Illustration of Correlated Data
  1. Uncorrelated Data (Left Plot)
    When Point 1 and Point 2 are uncorrelated (points are distributed uniformly in all directions): The Euclidean distance is effective because it directly measures the proximity of a point to the cluster's centroid.

  2. Correlated Data (Right Plot)
    When Point 1 and Point 2 are correlated (e.g., points tend to form an elliptic cluster):

    • Both Point 1 (inside the cluster) and Point 2 (outlier) can have identical Euclidean distances to the centroid.
      • Point 1 (purple) aligns with the direction of the cluster (along the ellipse's major axis).
      • Point 2 (pink) deviates significantly from the cluster because it goes against the natural direction of the data variance.

    Despite its deviation, Euclidean distance cannot capture this, making Point 2 look just as close to the cluster as Point 1.

Why Does This Happen?

Euclidean distance ignores the distribution of other points in the dataset. It only considers the distance between two individual points and doesn't account for how the rest of the points vary. Essentially:

The Solution: Use Mahalanobis Distance

The Mahalanobis Distance is a more robust alternative for measuring the distance of a point from a cluster when dealing with correlated data. It considers the distribution of the entire dataset.

2. Manhattan Distance (L1 Norm)

The Manhattan Distance (also called Taxicab Distance, City Block Distance, or L1 norm) measures the distance between two points by summing the absolute differences of their coordinates. It mimics navigating a grid-like city street layout where you can only move horizontally or vertically.

Formula for 'n' dimension

For two points P=(x1,x2,…,xn) and Q=(y1,y2,…,yn) in an n-dimensional space:

d(P,Q)=βˆ‘i=1n|xiβˆ’yi|

Component Breakdown:

  1. |xiβˆ’yi|: Absolute difference (no squaring, unlike Euclidean)
  2. βˆ‘i=1n: Sum over all dimensions
  3. No square root needed (simpler computation)

Example in 2D:

Visual Comparison (2D example):

When to Use:

Advantages:

Disadvantages:

Applications:

  1. Machine Learning: KNN, K-Medians clustering, LASSO regression (L1 regularization)
  2. Robotics/Pathfinding: A* algorithm on grid maps
  3. Image Processing: Comparing pixel intensities, image histograms
  4. Recommendation Systems: Similarity with categorical or ordinal features

3. Chebyshev Distance

Chebyshev distance (or L∞ metric) is a distance measure defined as the maximum absolute difference between corresponding coordinates of two vectors.
It assumes the moment can occur in any direction including diagnols.

This distance is especially useful in grid-based systems like chessboards or pathfinding in games where diagonal movement is allowed.

Formula

For points P=(x1​,x2​,...,xn​) and Q=(y1,y2,...,yn)

DChebyshev(P,Q)=βˆ‘i=1nmax(|xiβˆ’yi|)

Component Breakdown:

  1. |xiβˆ’yi|: Absolute difference (no squaring, unlike Euclidean)
  2. βˆ‘i=1n: Sum over all dimensions
  3. No square root needed (simpler computation)
  4. Maximum of all Absolute difference

Visual Example in 2D
ML_AI/images/chebyshev-1.png400
Example in 2D
Consider 2 points: X=(0,0), Y=(βˆ’2,βˆ’3)

dChebyshev(X,Y)=max(|0βˆ’(βˆ’2)|,|0βˆ’(βˆ’3)|)=max(2,3)=3

The plot shows a square centered at (0,0) representing all points exactly 3 units away by Chebyshev distance.

Advantages:

Disadvantages:

Applications:

  1. Game AI & Robotics:
    • Chess king movement, pathfinding with diagonal moves
    • Grid-based robot navigation where diagonal movement has equal cost
  2. Logistics & Operations:
    • Warehouse optimization (slowest worker determines completion time)
    • Scheduling problems (bottleneck identification)
  3. Image Processing:
    • Pixel neighborhood analysis (8-connectivity)
    • Maximum color channel difference detection
  4. Quality Control:
    • Manufacturing tolerance checks (worst-case dimension)
    • Identifying the most deviant measurement

4. Minkowski Distance (Generalized Lp Norm)

Minkowski distance is a generalized metric used to calculate the distance between two points inΒ 
n-dimensional space, unifying measures like Euclidean and Manhattan distances. It is defined by a parameterΒ pΒ (order), where higherΒ p values change how coordinate differences are weighted. It is widely applied in machine learning for clustering (k-means) and classification (
K-NN)

Formula

For two points P=(x1,x2,…,xn) and Q=(y1,y2,…,yn):

d(P,Q)=(βˆ‘i=1n|xiβˆ’yi|p)1/p

Special Cases:

Example (P=(3,4), Q=(0,0)):

When to Use:

Visualization:

ML_AI/images/minkowski-1.png


➒ Euclidean vs Manhattan vs Chebyshev

1. Side-by-Side View
ML_AI/images/chebyshev-2.png700

2. Overlay View: Assuming (0,0) as a one of the point
ML_AI/images/chebyshev-3.png600

Comparison (for P=(3,4), Q=(1,1)):