Abstract
Many sources of data streams, e.g. geo-spatial streams derived from GPS-tracking systems or sensor streams provided by sensor networks are inherently uncertain due to impreciseness of sensing devices, due to outdated information, and due to human errors. In order to support data analysis on such data, aggregation queries are an important class of queries. This paper introduces a scalable approach for continuous probabilistic SUM query processing in uncertain stream environments. Here we consider an uncertain stream as a stream of uncertain values, each given by a probability distribution among the domain of the sensor values. Continuous probabilistic sum queries maintain the probability distribution of the sum of possible sensor values actually derived from the streaming environment. Our approach is able to efficiently compute the probabilistic SUM according to the possible world semantics, i.e., without any loss of information. Furthermore, we show the query’s answer can be efficiently updated in dynamic environments where attribute values change frequently. Our experimental results show that our approach computes probabilistic sum queries efficiently, and that processing queries incrementally instead of performing computation from scratch further boosts the performance of our algorithm significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Proc. VLDB (2006)
Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. In: Proc. VLDB, vol. 2(1), pp. 502–513 (2009)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: Proc. ICDE, pp. 896–905 (2007)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proc. SIGMOD, SIGMOD 2003, pp. 551–562 (2003)
Murthy, R., Ikeda, R., Widom, J.: Making aggregation work in uncertain and probabilistic databases. TKDE 23(8), 1261–1273 (2011)
Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR, pp. 262–276 (2005)
Tao, Y., Xiao, X., Cheng, R.: Range search on multidimensional uncertain data. ACM Trans. Database Syst. 32(3) (August 2007)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. 16(9), 1112–1127 (2004)
Iijima, Y., Ishikawa, Y.: Finding probabilistic nearest neighbors for query objects with imprecise locations. In: Proc. MDM, pp. 52–61 (2009)
Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected results. In: Proc. ICDE, pp. 305–316 (2009)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proc. SIGMOD, pp. 551–562 (2003)
Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C.M., Haas, P.J.: Mcdb: a monte carlo approach to managing uncertain data. In: Proc. SIGMOD, pp. 687–700 (2008)
Hubig, N., Züfle, A., Emrich, T., Renz, N.M.M., Kriegel, H.-P.: Continuous probabilistic sum queries in wireless sensor networks with ranges. In: Proc. SSDBM, pp. 96–105 (2012)
Cranor, C., Johnson, T., Spataschek, O.: Gigascope: a stream database for network applications. In: SIGMOD, pp. 647–651 (2003)
Balazinska, M., Balakrishnan, H., Stonebraker, M.: Load management and high availability in the medusa distributed stream processing system. In: SIGMOD, pp. 929–930 (2004)
Tran, T.T.L., McGregor, A., Diao, Y., Peng, L., Liu, A.: Conditioning and aggregating uncertain data streams: Going beyond expectations. In: PVLDB, pp. 1302–1313 (2010)
Muthukrishnan, S.: Data streams: algorithms and applications. Now Publishers (2005)
Jayram, T.S., McGregor, A., Muthukrishnan, S., Vee, E.: Estimating statistical aggregates on probabilistic data streams. ACM Trans. Database Syst. 30, 26:1–26:3 (2008)
Jayram, T.S., Kale, S., Vee, E.: Efficient aggregation algorithms for probabilistic data, in SODA, pp. 346–355. Society for Industrial and Applied Mathematics (2007)
Tobler, W.: A computer movie simulating urban growth in the detroit region. Economic Geography 46(2), 234–240
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hubig, N., Züfle, A., Emrich, T., Renz, M., Nascimento, M.A., Kriegel, HP. (2014). Monitoring Probabilistic Threshold SUM Query Processing in Uncertain Streams. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-05810-8_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05809-2
Online ISBN: 978-3-319-05810-8
eBook Packages: Computer ScienceComputer Science (R0)