Angular/Similarity-Based Distances

1. Cosine Similarity

Cosine Similarity is a similarity measure (not a distance metric) that calculates the cosine of the angle between two vectors in n-dimensional space. Unlike distance metrics, it focuses on direction rather than magnitude, making it ideal for high-dimensional and sparse data.

It is widely used in artificial intelligence and natural language processing to compare documents or search intent.

👉 Key Insight: Two vectors pointing in the same direction have high similarity, regardless of their length

Formula

Given two vectors A=(a1,a2,...,an) and B=(b1,b2,...,bn):

Cosine Similarity=cos(θ)=ABAB

where

  1. Dot Product (AB): Sum of element-wise products
AB=i=1naibi
  1. Magnitude (A and B): Length (Euclidean norm) of each vector
A=i=1nai2,B=i=1nbi2
  1. Cosine of the Angle (cos(θ)): Normalized dot product
cos(θ)=Dot ProductProduct of Magnitudes

Value Range and Interpretation:

Example in 2D Space:
Consider two 2D vectors: A=(3,4),B=(4,6)

  1. Calculate dot product:
    AB=(34)+(46)=12+24=36

  2. Compute magnitudes:
    A=32+42=9+16=5
    B=42+62=16+36=527.211

  3. Calculate cosine similarity:
    cos(θ)=365×7.211=3636.0550.9985

The cosine similarity is approximately 0.9985, indicating the vectors are highly similar in direction (angle ≈ 3.15°).

Cosine Distance:
To convert cosine similarity to a distance metric:

Cosine Distance=1Cosine Similarity

When to Use:

Advantages:

Disadvantages:

Applications:

  1. NLP & Text Mining: Document similarity, plagiarism detection, information retrieval
  2. Recommendation Systems: User-item similarity (collaborative filtering)
  3. Image Recognition: Comparing feature vectors
  4. Bioinformatics: Gene expression analysis
  5. Anomaly Detection: Identifying outliers in high-dimensional spaces