Angular/Similarity-Based Distances

1. Cosine Similarity

Cosine Similarity is a similarity measure (not a distance metric) that calculates the cosine of the angle between two vectors in $n$ -dimensional space. Unlike distance metrics, it focuses on direction rather than magnitude, making it ideal for high-dimensional and sparse data.

It is widely used in artificial intelligence and natural language processing to compare documents or search intent.

👉 Key Insight: Two vectors pointing in the same direction have high similarity, regardless of their length

Formula

Given two vectors $A = (a_{1}, a_{2}, . . ., a_{n})$ and $B = (b_{1}, b_{2}, . . ., b_{n})$ :

Cosine Similarity = \cos (θ) = \frac{A \cdot B}{∥ A ∥ ∥ B ∥}

where

Dot Product $(A \cdot B)$ : Sum of element-wise products

A \cdot B = \sum_{i = 1}^{n} a_{i} b_{i}

Magnitude ( $∥ A ∥$ and $∥ B ∥$ ): Length (Euclidean norm) of each vector

∥ A ∥ = \sqrt{\sum_{i = 1}^{n} a_{i}^{2}}, ∥ B ∥ = \sqrt{\sum_{i = 1}^{n} b_{i}^{2}}

Cosine of the Angle ( $\cos (θ)$ ): Normalized dot product

\cos (θ) = \frac{Dot Product}{Product of Magnitudes}

Value Range and Interpretation:

+1: Vectors point in exactly the same direction (maximum similarity, 0° angle)
0: Vectors are orthogonal/perpendicular (no similarity, 90° angle)
-1: Vectors point in opposite directions (maximum dissimilarity, 180° angle)

Example in 2D Space:
Consider two 2D vectors: $A = (3, 4), B = (4, 6)$

Calculate dot product:
$A \cdot B = (3 \cdot 4) + (4 \cdot 6) = 12 + 24 = 36$
Compute magnitudes:
$∥ A ∥ = \sqrt{3^{2} + 4^{2}} = \sqrt{9 + 16} = 5$
$∥ B ∥ = \sqrt{4^{2} + 6^{2}} = \sqrt{16 + 36} = \sqrt{52} \approx 7.211$
Calculate cosine similarity:
$\cos (θ) = \frac{36}{5 \times 7.211} = \frac{36}{36.055} \approx 0.9985$

The cosine similarity is approximately 0.9985, indicating the vectors are highly similar in direction (angle ≈ 3.15°).

Cosine Distance:
To convert cosine similarity to a distance metric:

Cosine Distance = 1 - Cosine Similarity

Range: $[0, 2]$ (often normalized to $[0, 1]$ for non-negative vectors)
Lower values = more similar

When to Use:

Text data and NLP (document similarity, word embeddings)
High-dimensional sparse data
When magnitude doesn't matter (only direction/orientation)
Recommendation systems (user-item preferences)

Advantages:

✅ Scale invariant: Not affected by vector magnitude
✅ Efficient for sparse data (only non-zero elements matter)
✅ Works well in high dimensions: Common in NLP (TF-IDF, word2vec)
✅ Bounded range: [-1, 1] makes it easy to interpret
✅ Robust to document length: Perfect for text analysis

Disadvantages:

❌ Ignores magnitude: [1, 2] and [10, 20] have similarity = 1
❌ Not a true metric: Doesn't satisfy triangle inequality
❌ Sensitive to zero vectors: Undefined for zero-length vectors
❌ Less intuitive than Euclidean for geometric problems

Applications:

NLP & Text Mining: Document similarity, plagiarism detection, information retrieval
Recommendation Systems: User-item similarity (collaborative filtering)
Image Recognition: Comparing feature vectors
Bioinformatics: Gene expression analysis
Anomaly Detection: Identifying outliers in high-dimensional spaces