This blog covers Binary classification on a heart disease dataset.
After preprocessing the data we will build multiple models with different estimators and different hyperparemeters to find the best performing model.
As an example dataset, we'll import heart-disease.csv. This file contains anonymised patient medical records and whether or not they have heart disease or not.
Here, each row is a different patient and all columns except target are different patient characteristics. target indicates whether the patient has heart disease (target = 1) or not (target = 0).
This is often referred to as model or clf (short for classifier) or estimator (as in the Scikit-Learn) documentation.
Hyperparameters are like knobs on an oven you can tune to cook your favourite dish.
Now we've made some predictions, we can start to use some more Scikit-Learn methods to figure out how good our model is.
Each model or estimator has a built-in score method. This method compares how well the model was able to learn the patterns between the features and labels. In other words, it returns how accurate your model is.
First we create an evaluation function to output all the needs metrics
Now we make predictions using the test data to see how the model performs
The Jupyter Notebook can be found here, GitHub