Take the columns those contains Yes and No;
Preprocess the value with LabelEncoder;
Drop the target variable from training dataset;
Do Pearson Correlation with target variable
Set the range of high correlated values;
if(correalted variable has High correlation?) {
if(orrealted variable is in range?);
Keep the variable in the best subset of input features;
while(!orrealted variable is in range?)
while(!correalted variable has High correlation?)
X= High correalted variables;
call scaler.transform() on X;
Set X_train, X_test, y_train, y_test;
Define Random Forest classifier to be used by Boruta;
call fit() to find all relevant features;
Review feature feature names, ranks, and decisions;
Use the subset of features to fit Random Forest model on training data;
Call feature selector.transform() to make sure same features are selected from test data
call fit() to find internal_cv_score with class weight and threshold of 0.79;
Define Supervised Learning Model according to internal_cv_score;
call fit() to find all relevant features with class weight and threshold of 0.79;
Use the subset of features to fit the model on training data;
Call feature selector.transform() to make sure same features are selected from test data;