SIMILARITY AND DISSIMILARITY
SIMILARITY
numerical measure of how alike two data objects are, higher when objects are more alike ( range of )
DISSIMILARITY
numerical measure of how different two data objects are, lower when objects are more alike ( minimum value upper bound varies )
ATTRIBUTE TYPE | DISSIMILARITY | SIMILARITY |
---|---|---|
NOMINAL | if and viceversa | if and viceversa |
ORDINAL | ||
INTERVAL |
PROPERTIES OF SIMILARITY
SIMILARITY BETWEEN VECTORS
SIMPLE MATCHING COEFFICIENT
the ratio between the number of matches and the number of attributes
JACCARD COEFFICIENT
the ratio between the number of matches and the number of non attributes
COSINE COEFFICIENT
the cosine between the vectors
EXTENDED JACCARD COEFFICIENT TANIMOTO
the jaccard coefficient for continuous attributes