Talk:K-nearest neighbors algorithm

Robotics Start‑class Mid‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-importance on the project's importance scale.

Comment

I will write a more complete article for this when the exam season has passed.

Are there not any weighted kNN algorithms, that will take (to some degree) the actual distances into account? If the 5 closest neighbors are 0.1, 4000, 10000, 20000 and 55570 units away, we might say the very close first match has more importance than the others.

For an example of a weighted kNN algorithm, see F. Nigsch et al., Journal of Chemical Information and Modeling, 46, 2412-2422 (2006), DOI: 10.1021/ci060149f

So what is the complexity of kNN? I'm guessing O(n^2) but I'm not 100 % sure...

A kd-tree can be built in O(n log(n)) time, and queried in O(log(n)) time. A Ball-tree provides better guarantees and can be even faster than a kd-tree for finding neighbors. It's only O(n^2) if you're lazy and you code it up by comparing distances between every pair of points. Also, yes weighted k-NN is very common. If you weight by the inverse of the distance, it is called linear weighting because it is equivalent to using linear interpolation. This weighting is especially good for regression problems. For classification, it is more common to weight by the inverse of the square of the distance because it's faster to compute and tends to classify slightly better. This article also requires some significant discussion of feature-scaling, which can make k-NN significantly more accurate and more robust to irrelevant attributes. I like to use a simple hill-climber with cross-validation on the training set to adjust feature scaling factors. It improves accuracy by a significant amount. --128.187.80.2 (talk) 22:26, 9 December 2008 (UTC)[reply]

For a single classification result, it's n:

Step 1: calculate the distances: O(n)
Step 2: find the k smallest: also O(n)--see selection algorithm

If there are multiple classification results, perhaps this can be made better,

but then we have to introduce another variable: m, say, for the number of classifcation results (I call them "test points")... Peteymills (talk) 15:17, 15 June 2010 (UTC)[reply]

refly to other something.....

Comoplexly of kNN is O(n^2). I think so "It's guess" Complexly of NNS(Nearest Neighbor Search) is O(nlogn) on the KB-tree, finding one item.

I'm studing a kNN in multiple nodes. so I made Hk-NN is called by kNN. Processing time reduced effectively. I need to share and discussion in order to find out better Hk-NN method. I don't know cost of kNN, and anyone who are known cost did nothing but guessing.

If you known kNN's processing cost with accuracy, please send a message by the mail.

My e-mail: hanjjon@gmail.com

Maybe this draft version of an information retrieval book will help you. Chapter 14 discusses kNN. http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html Hope this helps. --GrandiJoos 07:59, 1 October 2007 (UTC)[reply]

Kriging?

How did any of this Kriging stuff get in here (history)? —Preceding unsigned comment added by 66.92.232.105 (talk) 23:15, 16 March 2008 (UTC)[reply]

I think the Kriging stuff was given too much prominence. Is there a consensus on its relevance and whether it would be better further down the article?

Jbom1 (talk) 16:45, 17 March 2008 (UTC)[reply]

Listen - Kriging has nothing to do with K-nearest neighbor, and shouldn't be in this article. KNN is a classifier - takes a set of multidimensional points and a user choice of K, then splits the data into K different classes. Kriging is a way to take a set of multidimensional points, then for a new point with one dimension unknown, interpolate to generate that unknown. Simply, KNN does not rely on Kriging, and has no relationship that is apparent from the paragraph on Kriging in this article. —Preceding unsigned comment added by 158.130.14.30 (talk) 16:58, 17 March 2008 (UTC)[reply]

Split this topic in two?

I agree with previous comment. KNN is a statistical clustering algorith which uses density linkage. Maybe the two subjects should be treated separately? 196.36.217.137 (talk) 17:23, 3 December 2008 (UTC)[reply]

This article is all about instance learning, which is perhaps the most common reason to find the k-nearest-neighbors, but there are many other uses as well. For example, the k-nearest-neighbors are often used to establishing neighborhood metrics for manifold learning. I propose that we make another article called "knn_instance_learner" and move all the stuff about regression and classification into it. This article should be about algorithms for finding the k-nearest neighbors (like kd-trees, Ball-trees, etc), no matter what the reason is for doing so. --128.187.80.2 (talk) 22:31, 9 December 2008 (UTC)[reply]

No, keep it here IMO. The other article you have in mind is Nearest neighbor search --mcld (talk) 20:42, 20 October 2009 (UTC)[reply]

"Furthest" Neighbors?

I removed reference to "furthest" neighbours from overview. kNN regression is done on the basis of a (possibly weighted) average of the properties of 'nearest neighbours. See the Nigsch et al. article cited earlier in the discussion.