LABEL ENCODING

Transforming in a numerical quantity the features that represent categories.

there are 2 types of encoders

ENCODERS APPLIED TO PREDICTORS

  • ONE HOT ENCODER

    It transforms features in a set of columns that integrate the predictor X:

    • it counts all the categories within the features;
    • it replaces the analyzed features with as many columns as there are categories;
    • all the new columns values will be 0 or 1. For every sample (X’s rows), it will be put the 1 value only if the column corresponds to the represented category.

    here a usage example:

    from sklearn.preprocessing import OneHotEncoder
    ohe = OneHotEncoder() # creating object
    ohe.fit(X) # fit the data
    ohe.categories_ # show categories founded
    ohe.transform(X) # apply the transformation
  • ORDINAL ENCODER

    • it locates all the categories available;
    • it assignes an incremental value to every category
    • a single column with the corresponding incremental values is returned

    here a usage example:

from sklearn.preprocessing import OrdinalEncoder
oe = OrdinalEncoder() # creating object
oe.fit(X) # fit the data
oe.categories_ # show categories founded
oe.transform(X) # apply the transformation

ENCODERS APPLIED TO TARGET

The most famous is LabelEncoder. It’s similar to OrdinalEncoding, but it’s applied to the targed instead of the predictor.

here a usage example:

from sklearn.preprocessing import LabelEncoding
le = LabelEncoding() # creating object
le.fit(y) # fit the data
le.classes # show classes founded
le.transform(y) # apply the transformation