Why need to alternate dimension in kd-tree construction

383 Views Asked by At

I have a question regarding the way to partition the spaces in kd-tree algorithm.

Assuming I have points in the plane, with (x,y) coordinate. Assuming we're not in a particular situation when points are in the same line. I was thinking why we need to alternate the splitting coordinate, at one level, use x axis, the following level, use y axis. What matters if we use only x direction to split spaces, we always have a binary tree, and search algorithm always take log(n) in average (assuming we have relatively well balanced tree).

What give me more when I split space by alternating splitting directions? I wonder if it's related to some general probabilistic properties in multi-dimension?

1

There are 1 best solutions below

0
On

I think the problem comes when you start searching the tree, for example with a window query (rectangular query).

Lets assume a rectangular dataset with evenly distributed points between -1,000 and 1,000 in every direction. If you sort by x, then a query that should return all point with (-900 < x < 900) && (1 < y < 10) may have to scan almost the whole tree. The log(n) search would only work if your query only limits x and not y, i.e. (1<x<10) && (-inf<y<+inf).