CLUSTERING SCHEME EVALUATION

Clustering scheme evaluation is important in order to comprehend the quality of a clustering scheme, it’s also used to compere clustering scheme

MEASUREMENT CRITERIA

COHESION (SSE)

the sum of the proximities between the element of the clusters and the geometric center (prototype), the prototype could be a centroid or a medoid in context where mean is not defined

SEPARATION (SSB)

The distance between 2 clusters (proximity between prototypes)

TOTAL SUM OF SQUARES (TSS)

Sum of square distances between the points and the global center

the TSS is the sum between SSE and SSB, this is a global property of the dataset

SILHOUETTE

Is a measure of how much a point contributes to separation between clusters and increasing cohesion following formulas considers silhouette value for a point

is the average between the distances of all points in the same cluster and is the distance between the point and the points of all other clusters. A lower then value means that there is a dominance of object in other clusters that are more closer than the points of the cluster

the global scores of silhouette are given by the average of the scores of all the single points

silhouette is a lot expensive to compute due to it’s nature

SEARCHING FOR THE BEST VALUE (the elbow method)

silhouette and sse can be used in order to find the best value parameter, the best points to look are the minimums of the relation between sse and and the maximums in the relation between silhouette and

COMPARING CLUSTERING SCHEMES

The concept is similar to the ones used to test classification, there is a known partition of the dataset similar to the data to be clustered called gold standard and a labeling scheme

acts as a test set but for clustering so we can compare a clustering scheme to it and gain information about the quality of the clustering

PREVIOUS NEXT