An approximate nearest neighbours search algorithm based on the Extended General Spacefilling Curves Heuristic
Abstract
In this paper, an algorithm for approximate nearest neighbours search in vector spaces is proposed. It is based on the Extended General Spacefilling Curves Heuristic (EGSH). Under this general scheme, a number of mappings are established between a region of a multidimensional real vector space and an interval of the real line, and then for each mapping the problem is solved in one dimension. To this end, the real values that represent the prototypes are stored in several ordered data structures (e.g. b-trees). The nearest neighbours of a test point are then efficiently searched in each structure and placed into a set of candidate neighbours. Finally, the distance from each candidate to the test point is measured in the original multidimensional space, and the nearest one(s) are chosen.
Keywords
Test Point Uniform Random Distribution Exhaustive Search Method Neighbour Search Algorithm Temporal CostReferences
- Bartholdi, J.J.; Platzman, L.K. (1983) “A Fast Heuristic Based on Spacefilling Curves for Minimum-Weight Matching in the Plane“, Information Processing Letters, 17. pp. 177–180.CrossRefGoogle Scholar
- Bartholdi, J.J.; Platzman, L.K. (1988) “Heuristics Based on Spacefilling Curves for Combinatorial Problems in Euclidean Space”, Management Science, 34. pp. 291–305.Google Scholar
- Bentley, J.L.; Weide, B.W.; Yao, A.C. (1980). “Optimal Expected Time Algorithms for Closest Point Problems”, ACM Transactions on Mathematical Software, Vol. 6, pp. 563–580.CrossRefGoogle Scholar
- Bern, M. (1993). “Approximate Closest-Point Queries in High Dimensions”, Pattern Recognition, Vol. 45, pp. 95–99.Google Scholar
- Fukunaga, K.; Narendra, P.M. (1975). “A Branch and Bound Algorithm for Computing k-Nearest Neighbors”, IEEE Transactions on Computers, Vol. 24, No. 7, pp. 750–753.Google Scholar
- Friedman, J.H.; Baskett, F.; Shustek, L.J. (1975). “An Algorithm for Finding Nearest Neighbors”, IEEE Tr. on Computers, Vol. 24, No. 10, pp. 1000–1006.Google Scholar
- Friedman, J.H.; Bentley, J.L.; Finkel, R.A. (1977). “An Algorithm for Finding Best Matches in Logarithmic Expected Time”, ACM Transactions on Mathematical Software, Vol. 3, No. 3, pp. 209–226.CrossRefGoogle Scholar
- Fukunaga, K. (1990). “Introduction to Statistical Pattern Recognition”, Academic Press, San Diego, CA.Google Scholar
- Hilbert, D. (1891). “Ueber die steitge Abbildung einer Linie auf ein Flaechenstueck”, Math. Ann, Vol. 38, pp. 459–460.CrossRefGoogle Scholar
- Imai, H. (1986). “Worst-Case Analysis for Planar Matching and Tour Heuristics with Bucketing Techniques and Spacefilling Curves”, Journal of the Operations Research Society of Japan, Vol. 29, No. 1, pp. 43–67.Google Scholar
- Jain, A.K.; Dubes, R.C. (1988). “Algorithms for Clustering Data”, Prentice Hall.Google Scholar
- Kalantari, I.; McDonald, G. (1983). “A Data Structure and an Algorithm for the Nearest Point Problem”, IEEE Trans. on Software Engineering, Vol. 9, No. 5, pp. 631–634.Google Scholar
- Kim, B.S.; Park, S.B. (1986). “A Fast k-Nearest Neighbor Finding Algorithm Based on the Ordered Partition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 8, No. 6, pp. 761–766.Google Scholar
- Miclet, L.; Dabouz, M. (1983). “Approximative Fast Nearest Neighbor Recognition”, Pattern Recognition Letters, Vol. 1, No. 5/6, pp. 277–285.CrossRefGoogle Scholar
- Murphy, O.J.; Selkow, S.M. (1990). “Finding Nearest Neighbors with Voronoi Tessellations”, Information Processing Letters, Vol. 34, pp. 37–41.CrossRefGoogle Scholar
- Omohundro, S.M. (1990). “Geometric Learning Algorithms”, Physica D, 42. pp.307–321.CrossRefGoogle Scholar
- Peano, G. (1890). “Sur une Courbe qui Remplit Toute une Aire Plane”, Math. Ann., Vol, 36, pp. 157–160.CrossRefMathSciNetGoogle Scholar
- Pérez, J.C.; Vidal, E. (1994). “Métodos Geométricos de Aprendizaje Supervisado”,.Ph.D. Thesis (In spanish) DSIC. Univ. Politécnica de Valencia.Google Scholar
- Pérez, J.C.; Vidal, E. (1997). “The Extended General Spacefilling Curves Heuristic” Technical Report, Dept. DISCA. Universidad Politécnica de Valencia. http://www.disca.upv.esGoogle Scholar
- Pérez, J.C.; Vidal, E. (1998). “The Extended General Spacefilling Curves Heuristic” Submitted to ICPR-98.Google Scholar
- Poggio, T.; Girosi, F. (1990). “Networks for Approximation and Learning”, Proceedings of the IEEE, Vol.78, no.9. pp. 1481–1497.CrossRefGoogle Scholar
- Sethi, LK. (1981). “A Fast Algorithm for Recognizing Nearest Neighbors”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 11, No. 3, pp. 245–248.Google Scholar
- Shasha, D.; Wang, T. (1990). “New Techniques for Best-Match Retrieval”, ACM Transactions on Information Systems, Vol. 8, No. 2, pp. 140–158.CrossRefGoogle Scholar
- Sierpinski, M.W. (1912). “Sur une Nouvelle Courbe Continue qui Remplit Toute une Aire Plane”, Bull. Acad. Sci. de Cracovie, pp. 462–478.Google Scholar
- Skubalska, E.; Krzyzak, A. (1996). “Fast k-NN Classification Rule Using Metric on Space-Filling Curves”, Proceedings of ICPR-96, pp. 121–125.Google Scholar
- Vidal, E. (1986). “An Algorithm for Finding Nearest Neighbours in (Approximately) Constant Average Time”, Pattern Recognition Letters, Vol. 4, pp. 333–344.Google Scholar
- Yao, A.C.; Yao, F.F. (1985). “A General Approach to d-dimensional Geometric Queries”, Proceedings of the 17th Annual ACM Symposium on the Theory of Computing, 163-168.Google Scholar
- Yunck, T.P. (1976). “A Technique to Identify Nearest Neighbors”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 6, No. 10, pp. 678–683.Google Scholar