Coding kNN From Scratch: Neighbors and Predictions
You’ll use these values to train your kNN model. You also have a
new_data_point, which contain physical measurements for a brand new abalone, and you’re trying to predict the rings of this abalone.
You’d like to know which of those distances is the smallest, but even beyond that, you actually want to know which of those neighbors are the closest to your data points. So instead of just taking
distances and sorting this array, you’re actually going to use
.argsort(). And let’s take a look at what this does.
01:30 So instead of actually getting a sorted array out of this, we’re going to get the appropriate indices that would sort this array. That means that the indices of the closest neighbors will show up first, and the furthest away will show up last.
So create a new variable called
nearest_neighbor_ids and set that equal to
distances.argsort(). But instead of returning the entire array of indices, you just want to pick off the first
k, or, in this case, three values.
And to prove to you that these actually are the three closest neighbors, let’s look at the physical measurements for neighbor number
4045. These values are actually very similar to the physical measurements of the new data point, and the distance between these two values is actually very, very small.
You now know the ID numbers of the three closest neighbors to the new data point. How does this help you make a prediction? Well, remember for regression problems, you actually want to average the neighbors’ targets together in order to make a prediction. Remember that the neighbors’ targets are found in the
So if you go look at the value for neighbor number
4045, you would see that this abalone has
9 rings. So create a new variable called
nearest_neighbor_rings, and you can gather these ring values from the
y array by submitting the
nearest_neighbor_rings has three different values for rings.
These are the three ring values for those three nearest neighbors. Because this is a regression problem, the only thing we need to do to go ahead and make a prediction is to take those rings and average them together, which we can do by doing this
The prediction for our new data point is
10.0 rings. So to make this prediction, you had the new data point’s physical measurements, you found the three closest neighbors to that data point, then you averaged together the closest neighbors’ ring values.
This lesson featured a regression problem, but if you have a classification problem, you can just use majority vote based on the neighbors’ targets. So say that you have your nearest neighbors, and you know what their class labels are. You would probably have those in a NumPy array, and perhaps those are given by
You can use these now to make a prediction for your new data point. There are a couple different ways that you can do this. Let’s go ahead and import something called a counter.
You can go ahead and save this as
class_count. There’s several different cool things that you can do with counters, but taking a look at the docstring, you can see that one of the first thing that comes up is called
05:51 While coding models from scratch can be great for educational purposes, it isn’t usually the most optimized way of doing machine learning. So in the next lessons, you’ll create a kNN model with Python’s scikit-learn library, which will help you perform all sorts of data science tasks.
Become a Member to join the conversation.