Skip to content
Distance metrics
Share
Explore

Distance metrics

Introduction

Used to measure the distance between two records
Will influence the shape of the clusters [
@Clustering Distance Measures
]

Metrics

Name
Definition
Suited for
Notes
1
Euclidean
Commonly used default distance metric, performs well in general
Also used in K-means clustering
Also called l2 distance?
2
Manhattan
Calculated by summing the absolute value of the difference between the dimensions
Ex. In a map, if the Euclidean distance is the shortest route between two points, the Manhattan distance implies moving straight, first along one axis and then along the other — as a car in the city would, reaching a destination by driving along city blocks.
l1 distance is often good for sparse features, or sparse noise: i.e. many of the features are zero, as in text mining using occurrences of rare words ]
Similar to Euclidean distance
Also called l1 distance?
3
Cosine
A good choice when there are too many variables and you worry that some variable may not be significant.
Cosine distance reduces noise by taking the shape of the variables, more than their values, into account.
Invariant to global scalings of the signal [
@2.3.6.4 Varying the metric
]
It tends to associate observations that have the same maximum and minimum variables, regardless of their effective value.

4
Hamming
5
Pearson correlation distance
6
Spearman correlation distance
There are no rows in this table
6
Count

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.