DIMENSIONALITY REDUCTION

when dimensionality is very high occupation of the space become sparse and discrimination based on distance becomes ineffective

the scope of this processing is to:

  • avoid this problem called the curse of dimensionality
  • reduce noise on data
  • reduce time and memory complexity of the mining algorithms
  • improve visualization

PCA

procedure to extract important variables from the dataset and reduce dimensionality. The new dataset will contains the attributes that capture most of the data variation

the PCA algorithm uses singular value decomposition to find the attributes that capture most of the data variation and reduce the number of dimensions in the dataset to make it more suitable for ML processes

PREVIOUS NEXT