CLUSTERING SCHEME EVALUATION
Clustering scheme evaluation is important in order to comprehend the quality of a clustering scheme, it’s also used to compere clustering scheme
MEASUREMENT CRITERIA
COHESION (SSE)
the sum of the proximities between the element of the clusters and the geometric center (prototype), the prototype could be a centroid or a medoid in context where mean is not defined
SEPARATION (SSB)
The distance between 2 clusters (proximity between prototypes)
TOTAL SUM OF SQUARES (TSS)
Sum of square distances between the points and the global center
the TSS is the sum between SSE and SSB, this is a global property of the dataset
SILHOUETTE
Is a measure of how much a point contributes to separation between clusters and increasing cohesion following formulas considers silhouette value for a point
is the average between the distances of all points in the same cluster and is the distance between the point and the points of all other clusters. A lower then value means that there is a dominance of object in other clusters that are more closer than the points of the cluster
the global scores of silhouette are given by the average of the scores of all the single points
silhouette is a lot expensive to compute due to it’s nature
SEARCHING FOR THE BEST VALUE (the elbow method)
silhouette and sse can be used in order to find the best value parameter, the best points to look are the minimums of the relation between sse and and the maximums in the relation between silhouette and
COMPARING CLUSTERING SCHEMES
The concept is similar to the ones used to test classification, there is a known partition of the dataset similar to the data to be clustered called gold standard and a labeling scheme
acts as a test set but for clustering so we can compare a clustering scheme to it and gain information about the quality of the clustering