The K-Nearest Neighbor is an Instance-based algorithm. You won't build an ML model with trainable variables in this but just compare an inference with the already known data for prediction.
The value of K will decide how well your model fits the data. If the value is very low, say 1, then the predicted class will be of the closest data point. And even very close data points might get a different class output. This means your model is overfitted or has a very high variance.
If the value of K is very high, say 10, then the predicted class will be the majority among 10 closest points. And suppose 3-4 points of a class are very close to the given point and the other 6-7 are farther, then too the farther class will be assigned to the given data point. This means your model is underfitted or has high bias.
An appropriate value has to be assigned to the K-value, this won't happen in one go but will require some testing.
The value K here is a 'hyperparameter'. That is, it cannot be learned by the algorithm but has to be provided by the developer.

Comments
Post a Comment