Skip to main content

Probabilistic Similarity Join on Uncertain Data

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3882))

Included in the following conference series:

Abstract

An important database primitive for commonly used feature databases is the similarity join. It combines two datasets based on some similarity predicate into one set such that the new set contains pairs of objects of the two original sets. In many different application areas, e.g. sensor databases, location based services or face recognition systems, distances between objects have to be computed based on vague and uncertain data. In this paper, we propose to express the similarity between two uncertain objects by probability density functions which assign a probability value to each possible distance value. By integrating these probabilistic distance functions directly into the join algorithms the full information provided by these functions is exploited. The resulting probabilistic similarity join assigns to each object pair a probability value indicating the likelihood that the object pair belongs to the result set. As the computation of these probability values is very expensive, we introduce an efficient join processing strategy exemplarily for the distance-range join. In a detailed experimental evaluation, we demonstrate the benefits of our probabilistic similarity join. The experiments show that we can achieve high quality join results with rather low computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  2. Ankerst, M., Kastenmüller, G., Kriegel, H.-P., Seidl, T.: 3D shape histograms for similarity search and classification in spatial databases. In: Güting, R.H., Papadias, D., Lochovsky, F.H. (eds.) SSD 1999. LNCS, vol. 1651, pp. 207–228. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  3. Böhm, C., Braunmüller, B., Breunig, M., Kriegel, H.-P.: High Performance Clustering Based on the Similarity Join. In: CIKM 2000 (2000)

    Google Scholar 

  4. Brinkhoff, T., Kriegel, H.P., Seeger, B.: Efficient Processing of Spatial Joins Using R-trees. In: SIGMOD 1993 (1993)

    Google Scholar 

  5. van den Bercken, J., Seeger, B., Widmayer, P.: A General Approach to Bulk Loading Multidimensional Index Structures. In: VLDB 1997 (1997)

    Google Scholar 

  6. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanovichi, T., Tasumi, M.: The Protein Data Bank: a Computer-based Archival File for Macromolecular Structures. Journal of Molecular Biology 112 (1977)

    Google Scholar 

  7. Bracewell, R.: The Impulse Symbol. Ch. 5 in The Fourier Transform and Its Applications, 3rd edn. McGraw-Hill, New York (1999)

    Google Scholar 

  8. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD 2003 (2003)

    Google Scholar 

  9. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE Transactions on Knowledge and Data Engineering (2004)

    Google Scholar 

  10. Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic Spatial Queries on Existentially Uncertain Data. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 400–417. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD 1984 (1984)

    Google Scholar 

  12. Huang, Y.-W., Jing, N., Rundensteiner, E.A.: Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations. In: VLDB 1997

    Google Scholar 

  13. Januzaj, E., Kriegel, H.-P., Pfeifle, M.: Scalable Density-Based Distributed Clustering. In: PKDD 2004

    Google Scholar 

  14. Kamel I., Faloutsos C.: Hilbert R-tree: AnImproved R-tree using Fractals. In: VLDB 1994 (1994)

    Google Scholar 

  15. Koudas, N., Sevcik, K.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation. In: ICDE 1998 (1998)

    Google Scholar 

  16. Koudas, N., Sevcik, K.: Size Separation Spatial Join. In: SIGMOD 1997 (1997)

    Google Scholar 

  17. Kriegel, H.-P., Brecheisen, S., Kröger, P., Pfeifle, M., Schubert, M.: Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects. In: SIGMOD 2003 (2003)

    Google Scholar 

  18. Kriegel, H.-P., Kunath, P., Pfeifle, M., Renz, M.: Approximated Clustering of Distributed High-Dimensional Data. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 432–441. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  19. Lo, M.-L., Ravishankar, C.V.: Spatial Joins UsingSeeded Trees. In: SIGMOD 1994 (1994)

    Google Scholar 

  20. Lo, M.-L., Ravishankar, C.V.: Spatial Hash Joins. In: SIGMOD 1996 (1996)

    Google Scholar 

  21. McQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symp. Math. Statist. Prob., vol. 1 (1967)

    Google Scholar 

  22. Motro, A.: Management of Uncertainty in Database Systems. In: Kim, W. (ed.) Modern Database Systems, Addison Wesley, Reading (1995)

    Google Scholar 

  23. Patel, J.M., DeWitt, D.J.: Partition Based Spatial-Merge Join. In: SIGMOD 1996 (1996)

    Google Scholar 

  24. Seidl, T., Kriegel, H.-P.: Optimal Multi-Step k-Nearest Neighbor Search. SIGMOD 1998 (1998)

    Google Scholar 

  25. Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. In: ICDE 1997 (1997)

    Google Scholar 

  26. Wolfson, O., Sistla, A.P., Chamberlain, S., Yesha, Y.: Updating and Querying Databases that Track Mobile Units. Distributed and Parallel Databases 7(3) (1999)

    Google Scholar 

  27. Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004, pp. 443–454 (2004)

    Google Scholar 

  28. Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: A literature survey. ACM Computational Survey 35(4) (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kriegel, HP., Kunath, P., Pfeifle, M., Renz, M. (2006). Probabilistic Similarity Join on Uncertain Data. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_22

Download citation

  • DOI: https://doi.org/10.1007/11733836_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33337-1

  • Online ISBN: 978-3-540-33338-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics