1 Introduction

Finding nearest neighbors in k{dimensional space is a task encountered in many data processing problems. In the context of time{series analysis e.g. it occurs if one is interested in local properties in a reconstructed phase space. Examples are predictions, noise reduction or Lyapunov exponent estimates based on local ts to the dynamics, or the calculation of dimension estimates. Other applications in physics include simulations of molecular dynamics with nite range interactions, where a box{oriented approach is used called \link{cell algorithm." Fincham & Heyes, 1985, Form et al., 1992] As long as only small sets (say n < 1000 points) are evaluated, neighbors can be found in a straightforward way by computing the n2 =2 distances between all pairs of points. However, numerical simulations and to an increasing degree experiments are able to provide much larger amounts of data. With increasing data sets e cient handling becomes more important. Neighbor searching and related problems of computational geometry have been extensively studied in computing science, with a rich literature covering both theoretical and practical issues. General references include Sedgewick, 1990, Preparata & Shamos, 1985, Gonnet & Baeza{Yates, 1991, Mehlhorn, 1984]. In particular, the tree{like data structures are studied in Omohumdro, 1

1987, Bentley, 1980, Bentley, 1990], and the bucket (or box) based methods in Noga & Allison, 1985, Devroye, 1986, Asano et al., 1985]. Although considerable expertise is required to nd and implement an optimal algorithm, we want to demonstrate in this paper that with relatively little e ort a substantial factor in e ciency can be gained. The use of any intelligent algorithm can result in reducing CPU time (or increasing the maximal amount of data which can be handled with reasonable resources) by orders of magnitude, compared to which the di erences among these methods and the gain through re nements of an existing algorithm are rather marginal. Thus we give a simple and general algorithm which is worth the e ort even for sets of only moderate size. The box{assisted algorithm given here has been heuristically developed in the context of time series analysis Grassberger, 1990, Grassberger et al., 1991]. Similar procedures are proposed for this purpose in Theiler, 1987, Kostelich & Yorke, 1988], while the k{d{tree (Sec. 2) seems to be the most popular approach Bingham & Kot, 1989, Farmer & Sidorowich, 1988]. In Sec. 3 we describe a very simple version of a box{assisted algorithm for nding all points closer than a given distance . In Sec. 4 we describe how it can be improved by using linked lists. To illustrate the usefulness of this data structure, a very fast sorting method is presented. Furthermore we describe how to modify the basic algorithm in order to nd a given number of neighbors rather than a neighborhood of xed diameter. In Sec. 5 we will discuss some examples of the performance of the algorithms described. For comparison we give results obtained using the k{d{tree algorithm described in Bingham & Kot, 1989], which represents a similar level of sophistication. Both the box{assisted and the tree implementation used were chosen rather for simplicity than for optimality. The reader who wants to go beyond this will nd some suggestions in Sec. 6.

2 The classical approach: multidimensional trees

How to nd...