Advertisement

Progressive High-Dimensional Similarity Join

  • Wee Hyong Tok
  • Stéphane Bressan
  • Mong-Li Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4653)

Abstract

The Rate-Based Progressive Join (RPJ) is a non-blocking relational equijoin algorithm. It is an equijoin that can deliver results progressively. In this paper, we first present a naive extension, called neRPJ, to the progressive computation of the similarity join of high-dimensional data. We argue that this naive extension is not suitable. We therefore propose an adequate solution in the form of a Result-Rate Progressive Join (RRPJ) for high-dimensional distance similarity joins. Using both synthetic and real-life datasets, we empirically show that RRPJ is effective and efficient, and outperforms the naive extension.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tao, Y., Yiu, M.L., Papadias, D., Hadjieleftheriou, M., Mamoulis, N.: RPJ: Producing fast join results on streams through rate-based optimization. In: SIGMOD, pp. 371–382 (2005)Google Scholar
  2. 2.
    Tok, W.H., Bressan, S., Lee, M.-L.: RRPJ: Result-rate based progressive relational join. In: DASFAA, pp. 43–54 (2007)Google Scholar
  3. 3.
    Tok, W.H., Bressan, S., Lee, M.-L.: Progressive spatial joins. In: SSDBM, pp. 353–358 (2006)Google Scholar
  4. 4.
    Shim, K., Srikant, R., Agrawal, R.: High-dimensional similarity joins. In: ICDE, pp. 301–311 (1997)Google Scholar
  5. 5.
    Koudas, N., Sevcik, K.C.: High dimensional similarity joins: Algorithms and performance evaluation. IEEE Transactions on Knowledge and Data Engineering 12(1), 3–18 (2000)CrossRefGoogle Scholar
  6. 6.
    Böhm, C., Braunmüller, B., Breunig, M.M., Kriegel, H.-P.: High performance clustering based on the similarity join. In: CIKM, pp. 298–305 (2000)Google Scholar
  7. 7.
    Böhm, C., Braunmüller, B., Krebs, F., Kriegel, H.-P.: Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data. In: SIGMOD, pp. 379–388 (2001)Google Scholar
  8. 8.
    Kalashnikov, D.V., Prabhakar, S.: Fast similarity join for multi-dimensional data. Inf. Syst. 32(1), 160–177 (2007)CrossRefGoogle Scholar
  9. 9.
    Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)Google Scholar
  10. 10.
    Berchtold, S., Keim, D.A., Kriegel, H.-P.: The x-tree: An index structure for high-dimensional data. In: VLDB, pp. 28–39 (1996)Google Scholar
  11. 11.
    Koudas, N., Sevcik, K.C.: High dimensional similarity joins: Algorithms and performance evaluation. In: ICDE, pp. 466–475 (1998)Google Scholar
  12. 12.
    Urhan, T., Franklin, M.J.: XJoin: Getting fast answers from slow and bursty networks. Technical Report CS-TR-3994, University of Maryland (1999)Google Scholar
  13. 13.
    Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: A generic and non-blocking sort-based join algorithm. In: VLDB, pp. 299–310 (2002)Google Scholar
  14. 14.
    Mokbel, M.F., Lu, M., Aref, W.G.: Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In: ICDE, pp. 251–263 (2004)Google Scholar
  15. 15.
    Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: PDIS, pp. 68–77 (1991)Google Scholar
  16. 16.
    Corel image features dataset (1999), http://kdd.ics.uci.edu/

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Wee Hyong Tok
    • 1
  • Stéphane Bressan
    • 1
  • Mong-Li Lee
    • 1
  1. 1.School of Computing, National University ofSingapore

Personalised recommendations