Distinguishing Features of kNN

Kimberly Fessel

Using k-Nearest Neighbors (kNN) in Python Kimberly Fessel 05:50

Transcript
Discussion

00:00 In this lesson, you’ll learn about the distinguishing features of the k-nearest neighbors algorithm, or kNN.

00:09 kNN is a supervised learning algorithm. In general, supervised learning algorithms learn from training data in order to make predictions. You’ll start with a model and feed it data points that will have been labeled with the kinds of target values you’re interested in predicting.

00:25 The model will learn from these inputted observations, and this process is called training or fitting the model. Once that’s finished, you’ll have a trained model ready to go.

00:36 Trained models can then go on to make predictions for new data points. Later, you may have some new data that do not have target labels. You can pass these values into your trained supervised learning model, and the output will be predicted targets. This phase is called predicting.

00:54 kNN works just like this. You’ll pass some training data with known target values into it, and then you’ll be able to use that trained model to make predictions for new data observations that perhaps do not have target labels.

01:11 There are two types of supervised learning problems, classification and regression, and kNN can be used for either type. In classification, you’re trying to build a model that makes categorical predictions, such as whether an observation is a square, circle, or triangle.

01:29 Or, for a real-world example, you might train a classification model to predict whether or not a customer will click on your ad. Regression problems, on the other hand, make numerical predictions, or continuous values from a number line.

01:44 Now your question might be, how much money will a customer spend next month? Since the predictions for this question are dollar amounts rather than discrete categories, this is a regression problem.

01:56 Machine learning models can be built to answer classification or regression questions, and luckily, kNN can be used to predict either.

02:08 Machine learning algorithms can be linear or nonlinear. Linear models make predictions that follow lines or hyperplanes, whereas nonlinear models do not. For example, say you have classification data like bees.

02:23 Height and width are your input variables. You’re trying to predict if the target is a triangle, cross, or star. If you build a linear model to make this prediction, you could end up with a line like this one, which divides the input space into triangles versus non-triangles.

02:41 Your model predicts every point below the line to be a triangle and every point above the line to be a non-triangle.

02:49 If you train a nonlinear model, however, you’ll likely end up with a more complicated classification boundaries. For example, you might create this decision tree model. It says if a point’s height is low, this point is predicted to be a triangle.

03:05 If the height is high, and the width is low, it’s a cross. Otherwise, the point is a star.

03:15 kNN is a nonlinear algorithm. For classification problems, that means it can make more complicated decision boundaries that don’t necessarily follow lines or hyperplanes. For regression problems, its predictions don’t need to follow linear or even monotonic relationships with respect to its inputs.

03:38 And finally, some machine learning algorithms work by learning model parameters. For example, linear regression models are built by finding the best parameters to summarize the training data that they’re exposed to.

03:52 A simple linear regression model like y equals ax plus b has two model parameters, a and b, that are learned during the model fitting phase.

04:03 These parameters are then reused when it comes time to make predictions from the trained model.

04:12 kNN, on the other hand, is a nonparametric algorithm. That means it does not have model parameters that need to be fitted during training. Predictions are made directly from the training data itself.

04:24 This nonparametric nature makes kNN a highly flexible algorithm since it doesn’t attempt to force any model form onto the data. However, to be able to make predictions, a trained kNN model must retain the entire training set.

04:40 That can be highly memory intensive for problems with a lot of training data, and it can also mean that kNN takes a long time to make predictions.

04:52 In summary, kNN is a machine learning algorithm. It is considered a supervised algorithm because it learns from data with labeled target values. kNN can be used for either classification or regression problems.

05:06 It’s a nonlinear algorithm, and it’s also nonparametric. That makes kNN highly flexible, and as we’ll see in upcoming videos, its logic is really intuitive, and it’s easy to explain.

05:20 But on the downside, kNN tends to be memory intensive and inefficient for large amounts of data since it needs to keep the whole training set in order to make predictions.

05:34 In the rest of this course, you’ll learn how the kNN algorithm works and how to use it to make predictions—namely, predicting the age of sea snails like this little guy.

05:43 He doesn’t look a day over ten to me, but you’ll learn all about his dataset coming up next.

Become a Member to join the conversation.