The Step-By-Step guide for building KNN Classifier- the Topmost Machine Learning Algorithm
Updated: May 14, 2020
Have you ever wondered how does Spotify prepare your daily mix songs playlist based on your songs listening patterns?
Have you ever wondered how does Netflix recommends you movies?
The answer to both of these questions lies in Machine Learning. Thanks to Machine learning technology that we don’t have to program computers with painstaking details to perform any task like we used to do earlier instead nowadays machines could learn from the data and program themselves providing you with the best possible course of action for any task.
Spotify’s computers look at which songs you liked and figure out which other songs you could like in future. Netflix looks at which movies you watched in the past and predicts which movies you could watch in future.
The ability of machines to identify patterns that humans may overlook or are unable to find quickly in huge volumes of data is overwhelming which makes machine learning one of the sexiest technology at the moment.
In this article, we would discuss step-by-step guide for building K Nearest Neighbor (KNN) classifier which happens to be one of the simplest, versatile and most commonly applied machine learning algorithms.
In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding factor. Suppose F1 is the point, for which label needs to predict. First, classifier would find the neighbor closest to point F1 and then the label of the nearest point would be assigned to F1.
K Nearest Neighbor (KNN) Classifier
In this dataset, you have three features Type of Alcohol (toa), Days of week (dow), and Timings of day (tod) and one label or desired output Drinking Partner (dp).
Encoding data columns
Various machine learning algorithms require numerical input data, so you need to represent categorical columns in a numerical column.
In order to encode this data, you could map each value to a number. e.g. Vodka:1, Beer:0, and Wine:2
This process is known as label encoding, and sklearn conveniently will do this for you using Label Encoder.
Here, you will combine multiple columns or features into a single set of data using "zip" function.
Let's build KNN classifier model.
First, import the KNeighborsClassifier module and create KNN classifier object by passing argument number of neighbors in KNeighborsClassifier() function.
Then, fit your model on the train set using fit() and perform prediction on the test set using predict().
KNN Classifier predicts Elwira as best drinking partner given inputs ([1, 4, 2]) vodka=1, Thursday=4, night=2.
In Nutshell, Machine learning algorithms work by computing a target function (f) that best maps input variables (X), in our case Type of Alcohol (toa), Days of week (dow), and Timings of day (tod), to an output variable (Y), in our model Drinking Partner (dp).
Y = f(X)
Well generalized model would make accurate predictions in the future (Y) given new examples of input variables (X). However, we don’t know what the function (f) looks like or it’s form. If we would know, then we could use that function (f) directly and we would not need to learn it from data using machine learning algorithms.