Fast Approximation of Nearest Neighbors

Spatial Association Search

Suppose you have a large number entities spread out over a large area and you need to know the nearest other entity or entities. Suppose you also have fairly tight time constraints and you don't mind some imperfection. I present an algorithm to quickly and simultaneously approximate the nearest neighbors for a large number of objects.

Each of entities has M pointers to other entities. These pointers all start by referencing the entity itself. As often as desirable, we will iterate over all entities, exchanging one of these pointers with a pointer from another entity that we have a pointer to. We swap only when the total distance between both entities and those entities being considered for a swap would be reduced by the swap. We never swap a pointer to an entity to that same entity, and should an entity have a pointer to itself, it shall be swapped at random with any pointer in any other entity.

For each iteration of this loop, the average distance to discovered neighbors will decrease to some steady state. However, this is not completely desirable: it is possible that this steady state has no path to discover better neighbors, or that loops of pointers have formed. To correct for this, we will swap a small fraction (~0.5%) of pointers at random each iteration.

When an entity needs to find nearby entities, simply iterate over its pointers to other entities and consider those entities as the results. This can be repeated to any desired depth by searching those entities' pointers, but a greater depth returns many more results and may be stuck in a cycle. Entities returned in this search will typically be the nearest, but there is no guarantee.

Removing an element requires recovering all pointers to an entity by iterating through all entities and swapping any pointers to the deleted entity with one of the pointers available from the deleted entity.

Performance

Insertion: O(1)

Search: O(M^k)

Deletion: O(N)

Iteration: O(N)

Where k is the depth (recommended 0-3).

The number of iterations required to reach 50% accuracy (percentage of neighbors that match the true nearest neighbor) grows with the log of the number of entities - about 50 iterations at 20,000 entities. Average distance for approximated neighbors is about 20% further than the true nearest neighbor. Larger numbers of entities slow down and limit the improvement in locating optimal neighbors.

On the machine I tested with, each iteration takes ~0.2µs per entity, Searches range from ~0.02µs (depth 0) to 5µs (depth 3). This makes it practical to do large numbers of approximate searches for a real time application.

Download the example implementation (C++) used in this testing here: SAS Example