Using k-Nearest Neighbors (kNN) in Python (Summary)
Now that you know all about the kNN algorithm, you’re ready to start building performant predictive models in Python. These sorts of predictive models can save you lots of time, whether you’re working with data about sea snails or something else.
In this video course, you learned how to:
- Understand the mathematical foundations behind the kNN algorithm
- Code the kNN algorithm from scratch in NumPy
- Use the scikit-learn implementation to fit a kNN with a minimal amount of code
To continue your machine learning journey, check out the Machine Learning Learning Path, and feel free to leave a comment to share any questions or remarks that you may have.
Further Investigation:
- K-Means Clustering in Python: A Practical Guide
- Using pandas and Python to Explore Your Dataset
- Splitting Datasets With scikit-learn and
train_test_split()
- Setting Up Python for Machine Learning on Windows
- Starting With Linear Regression in Python
- Logistic Regression in Python
- NumPy, SciPy, and pandas: Correlation With Python
- Python AI: How to Build a Neural Network & Make Predictions
- PyTorch vs TensorFlow for Your Python Deep Learning Project
Or maybe you’d like to learn more about abalones!
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00 Let’s wrap up by summarizing what you’ve learned in this kNN course. First, you got familiar with some of the properties of kNN. k-nearest neighbors is a supervised machine learning algorithm. It’s nonlinear, so it can pick up on complex patterns. kNN is also nonparametric.
00:20 It doesn’t assume any mathematical function between the input features and the target output, so there are no parameters for it to learn. You used kNN for a regression problem in this course, but remember that it can also be used for classification problems to make categorical predictions.
00:40 You also learned the main steps of the k-nearest neighbors algorithm. Given a trained kNN model, which is really just a collection of memorized training data points, a prediction can be made for a new data point by finding that point’s nearest neighbors and predicting based on those neighbors’ targets.
00:58 For regression problems, you just take the average of the neighbors’ targets, and for classification, you use majority vote.
01:08 And you were able to code up kNN in Python in two ways. First from scratch, where after a bit of data manipulation, you calculated the distances between the features of a new data point and every other observation.
01:22
Then you found the ID numbers of the points’ nearest neighbors with .argsort()
, and finally, you made a prediction by collecting and averaging the neighbors target values.
01:36
Secondly, you used scikit-learn to build a kNN model. You split your data into training and test sets using train_test_split()
. Next, you instantiated a k-nearest neighbor model for your regression problem and then fit that model to your training data. After that, you made predictions for your test set using the .predict()
method.
01:58 Now that you’ve learned all about kNN, what would you say are its biggest benefits? Well, it’s certainly an intuitive algorithm. You can easily implement it yourself from scratch, and you’ll be able to comfortably explain its main steps.
02:12 kNN is a highly flexible model that can pick up on complicated patterns due to its nonlinear and nonparametric nature. If you think about how the algorithm works, kNN can actually adapt its predictions as new training observations are collected.
02:28 This could be quite useful if you’re working on a problem that has a continuous flow of new data. And finally, kNN essentially has no training time. Many machine learning algorithms need to perform numerous calculations before they’re considered trained. The kNN only needs to store the training dataset, which is virtually instantaneous.
02:51 Bbut on the downside, this virtue can also be considered a drawback. k-nearest neighbors is a so-called lazy learner. That means it does nearly all its work only when a prediction needs to be made.
03:04 Its training time is fast, but its prediction time can be quite slow. In fact, the prediction time scales linearly with the size of its training set. That can be a big drawback.
03:15 Machine learning models typically improve as their training data increases, but with kNN, bigger training sets mean slower predictions. k-nearest neighbors also requires a lot of memory since it needs to keep the full training set to make a prediction.
03:32 This is another big drawback if you want to run your model on a small device, like a phone, say. Finally, the physical measurements for the abalone problem.
03:41 In this course, were all roughly on the same scale, but if you have some very small inputs and others on the scale of millions, your features will need to be scaled for kNN to judge the distances between points fairly. Otherwise, the large scaled inputs will dominate the distance calculations.
04:03
If you want to learn more about other algorithms and data science techniques, check out the Real Python Machine Learning With Python learning path. It includes plenty of great tips for getting started, including a course on splitting datasets with scikit-learn and train_test_split()
, as well as starting with linear aggression in Python.
04:26 This concludes Using k-Nearest Neighbors in Python. Thanks for joining me, and enjoy building your own kNN models.
Become a Member to join the conversation.
Jerry C on May 24, 2023
This model might help me in my audio classification project. But I wonder if a range of audio frequencies (consider 200 Hz to 8 kHz) would greatly slow processing. Or could the total audio spectrum be condensed by this model. Think of phase shift and time of arrival and distance as parameters. Any comments?