Skip to main content

Finding Data Broadness Via Generalized Nearest Neighbors

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Abstract

A data object is broad if it is one of the k-Nearest Neighbors (k-NN) of many data objects. We introduce a new database primitive called Generalized Nearest Neighbor (GNN) to express data broadness. We also develop three strategies to answer GNN queries efficiently for large datasets of multidimensional objects. The R*-Tree based search algorithm generates candidate pages and ranks them based on their distances. Our first algorithm, Fetch All (FA), fetches as many candidate pages as possible. Our second algorithm, Fetch One (FO), fetches one candidate page at a time. Our third algorithm, Fetch Dynamic (FD), dynamically decides on the number of pages that needs to be fetched. We also propose three optimizations, Column Filter, Row Filter and Adaptive Filter, to eliminate pages from each dataset. Column Filter prunes the pages that are guaranteed to be non-broad. Row Filter prunes the pages whose removal do not change the broadness of any data point. Adaptive Filter prunes the search space dynamically along each dimension to eliminate unpromising objects. Our experiments show that FA is the fastest when the buffer size is large and FO is the fastest when the buffer size is small. FD is always either fastest or very close to the faster of FA and FO. FD is significantly faster than the existing methods adapted to the GNN problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albers, S.: Competitive Online Algorithms. Technical Report LS-96-2, brics (September 1996)

    Google Scholar 

  2. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In: International Conference on Management of Data (SIGMOD), pp. 322–331 (1990)

    Google Scholar 

  3. Berchtold, S., Ertl, B., Keim, D.A., Kriegel, H.-P., Seidl, T.: Fast Nearest Neighbor Search in High-dimensional Space. In: International Conference on Data Engineering (ICDE), pp. 209–218 (1998)

    Google Scholar 

  4. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Böhm, C., Krebs, F.: The k-Nearest Neighbour Join: Turbo Charging the KDD Process. Knowledge and Information Systems (KAIS) 6(6) (2004)

    Google Scholar 

  6. Çamoğlu, O., Kahveci, T., Singh, A.K.: Towards Index-based Similarity Search for Protein Structure Databases. Journal of Bioinformatics and Computational Biology (JBCB) 2(1), 99–126 (2004)

    Article  Google Scholar 

  7. Chan, C.Y., Ooi, B.C.: Efficient Scheduling of Page Access in Index- Based Join Processing. IEEE Transactions on Knowledge and Data Engineering (TKDE) 9(6), 1005–1011 (1997)

    Article  Google Scholar 

  8. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Computational Systems Bioinformatics Conference (CSB), pp. 523–528 (2003)

    Google Scholar 

  9. Hjaltason, G.R., Samet, H.: Ranking in Spatial Databases. In: Symposium on Spatial Databases, Portland, Maine, August 1995, pp. 83–95 (1995)

    Google Scholar 

  10. Huang, X., Madan, A.: CAP3: A DNA Sequence Assembly Program. Genome Research 9(9), 868–877 (1999)

    Article  Google Scholar 

  11. Kamel, I., Faloutsos, C.: Hilbert R-tree: An Improved R-tree using Fractals. In: International Conference on Very Large Databases (VLDB), pp. 500–509 (1994)

    Google Scholar 

  12. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: International Conference on Management of Data (SIGMOD), pp. 201–212 (2000)

    Google Scholar 

  13. Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast Nearest Neighbor Search in Medical Databases. In: International Conference on Very Large Databases (VLDB), India, pp. 215–226 (1996)

    Google Scholar 

  14. Merrett, T.H., Kambayashi, Y., Yasuura, H.: Scheduling of Page-Fetches in Join Operations. In: International Conference on Very Large Databases (VLDB), pp. 488–498 (1981)

    Google Scholar 

  15. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest Neighbor Queries. In: International Conference on Management of Data (SIGMOD), San Jose, CA (1995)

    Google Scholar 

  16. Leutenegger, M.L.S., Edgington, J.: STR: A Simple and Efficient Algorithm for R-Tree Packing. In: International Conference on Data Engineering (ICDE), pp. 497–506 (1997)

    Google Scholar 

  17. Seeger, B.: An analysis of schedules for performing multi-page requests. Information Systems 21(5), 387–407 (1996)

    Article  MathSciNet  Google Scholar 

  18. Seidl, T., Kriegel, H.P.: Optimal Multi-Step k-Nearest Neighbor Search. In: International Conference on Management of Data, SIGMOD (1998)

    Google Scholar 

  19. Stanoi, I., Riedewald, M., Agrawal, D., Abbadi, A.E.: Discovery of Influence Sets in Frequently Updated Databases. In: International Conference on Very Large Databases (VLDB), pp. 99–108 (2001)

    Google Scholar 

  20. Tao, Y., Papadias, D., Lian, X.: Reverse kNN Search in Arbitrary Dimensionality. In: International Conference on Very Large Databases, VLDB (2004)

    Google Scholar 

  21. Xia, C., Lu, H., Ooi, B.C., Hu, J.: GORDER: An Efficient Method for KNN Join Processing. In: International Conference on Very Large Databases, VLDB (2004)

    Google Scholar 

  22. Yang, C., Lin, K.-I.: An Index Structure for Efficient Reverse Nearest Neighbor Queries. In: International Conference on Data Engineering (ICDE), pp. 485–492 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Venkateswaran, J., Kahveci, T., Camoglu, O. (2006). Finding Data Broadness Via Generalized Nearest Neighbors. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_39

Download citation

  • DOI: https://doi.org/10.1007/11687238_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics