Abstract
In many modern applications such as biometric identification systems, sensor networks, medical imaging, geology, and multimedia databases, the data objects are not described exactly. Therefore, recent solutions propose to model data objects by probability density functions(pdf). Since a pdf describing an uncertain object is often not explicitly known, approximation techniques like Gaussian mixture models(GMM) need to be employed. In this paper, we introduce a method for efficiently indexing and querying GMMs allowing fast object retrieval for arbitrary shaped pdf. We consider probability ranking queries which are very important for probabilistic similarity search. Our method stores the components and weighting functions of each GMM in an index structure. During query processing the mixture models are dynamically reconstructed whenever necessary. In an extensive experimental evaluation, we demonstrate that GMMs yield a compact and descriptive representation of video clips. Additionally, we show that our new query algorithm outperforms competitive approaches when answering the given probabilistic queries on a database of GMMs comprising about 100.000 single Gaussians.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Faradjian, A., Gehrke, J., Bonnet, P.: GADT: A Probability Space ADT For Representing and Querying the Physical World. In: Proc. 18th Int. Conf. on Data Engineering (ICDE 2002),San Jose, CA, USA p. 201 (2002)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating Probabilistic Queries over Imprecise Data. In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 2003), San Diego, CA, USA pp. 551–562 (2003)
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data. In: Proc. 30th Int. Conf. on Very Large Data Bases (VLDB 2004), Toronto, Cananda, pp. 876–887 (2004)
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: Proc. 30th Int. Conf. on Very Large Data Bases (VLDB 2004), Toronto, Cananda (2004)
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions. In: Proc. 30th Int. Conf. on Very Large Data Bases (VLDB 2005), Trondheim, Norway, pp. 922–933. (2005)
Böhm, C., Pryakhin, A., Schubert, M.: The Gauss-Tree: Efficient Object Identification of Probabilistic Feature Vectors. In: Proc. 22nd Int. Conf. on Data Engineering (ICDE 2006), Atlanta, GA, US, p. 9 (2006)
Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical analysis of finite mixture distribution. Wiley, New York (1985)
Lindsay, B.G.: Mixture models: Theory, geometry, and applications (1995)
Greenspan, H., Goldberger, J., Mayer, A.: A probabilistic framework for spatio-temporal video representation & indexing. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 461–475. Springer, Heidelberg (2002)
Yang, M., Ahuja, N.: Gaussian mixture model for human skin color and its application in image and video databases. In: SPIE 1999. Proc. of the Conf. on Storage and Retrieval for Image and Video Databases, vol. 3656, pp. 458–466. Springer, Heidelberg (1999)
Chen, S.-C., Kashyap, R.L., Ghafoor, A.: Semantic Models for Multimedia Database Searching and Browsing. Kluwer Academic Publishers, Dordrecht (2002)
Srinivasan, U., Nepal, N.: Managing Multimedia Semantics. IRM Press (2005)
Deb, S.: Video Data Management and Information Retrieval. Idea Group Publishing (2005)
Gavin, D.G., Hu, F.S.: Bioclimatic modelling using gaussian mixture distributions and multiscale segmentation. Global Ecology and Biogeography 14, 491 (2005)
Lim, P., Quek, S., Peh, K.: Application of the gaussian mixture model to drug dissolution profiles prediction. Neural Comput. Appl. 14(4), 345–352 (2005)
Zajdel, W., Kröse, B.: Gaussian mixture model for multi-sensor tracking. In: Proc. of the 15th Dutch-Belgian Artificial Intelligence Conference (BNAIC 2003), pp. 371–378 (2003)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1), 19–41 (2000)
Yoo, S-H.: Application of a mixture model to approximate bottled water consumption distribution. Applied Economics Letters 10(3), 181–184 (2003)
Deshpande, A., Guestrin, C., Madden, S.R.: Using Probabilistic Models for Data Management in Acquisitional Environments. In: Proc. CIDR (2005)
Böhm, C., Pryakhin, A., Schubert, M.: Probabilistic Ranking Queries on Gaussians. In: Proc. of the 18th Int. Conf. on Scientific and Statistical Database Management (SSDBM 2006), pp. 169–178 (2006)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluation of Probabilistic Queries over Imprecise Data in Constantly-Evolving Environments 32(1), 104–130 (2007)
Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic Spatial Queries on Existentially Uncertain Data. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 400–417. Springer, Heidelberg (2005)
Ljosa, V., Singh, A.K.: APLA: Indexing arbitrary probability distributions. In: Proc. of the 23rd Int. Conf. on Data Engineering (ICDE 2007) (2007)
Chang, H.S., Sull, S., Lee, S.U.: Efficient Video Indexing Scheme for Content-Based Retrieval. In: IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, pp. 1269–1279. IEEE Computer Society Press, Los Alamitos (1999)
Zhuang, Y., Rui, Y., Huang, T.S., Mehrotra, S.: Adaptive key frame extraction using unsupervised clustering. In: ICIP (1), pp. 866–870 (1998)
Cheung, S.S., Zakhor, A.: Efficient video similarity measurement with video signature. In: ICIP 2002. IEEE International Conference on Image Processing, vol. 1, pp. 621–624. IEEE Computer Society Press, Los Alamitos (2002)
Han, J., M., K.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
Witten, I.H., E., F.: Data Mining. Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 47–57. ACM Press, New York (1984)
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-Tree: An Index Structure for High-Dimensional Data. In: Proc. 22nd Int. Conf. on Very Large Data Bases (VLDB 1996), Bombay, India, pp. 28–39 (1996)
Eiter, T., Mannila, H.: Distance measures for point sets and their computation. Acta Informatica 34(2), 103–133 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Böhm, C., Kunath, P., Pryakhin, A., Schubert, M. (2007). Querying Objects Modeled by Arbitrary Probability Distributions. In: Papadias, D., Zhang, D., Kollios, G. (eds) Advances in Spatial and Temporal Databases. SSTD 2007. Lecture Notes in Computer Science, vol 4605. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73540-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-73540-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73539-7
Online ISBN: 978-3-540-73540-3
eBook Packages: Computer ScienceComputer Science (R0)