Reliable Aggregation over Prioritized Data Streams

Works, Karen; Rundensteiner, Elke A.

doi:10.1007/978-3-662-45714-6_1

Karen Works¹⁹ &
Elke A. Rundensteiner²⁰

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8800))

294 Accesses

Abstract

Under limited resources, targeted prioritized data stream systems (TP) adjust the processing order of tuples to produce the most significant results first. In TP, an aggregation operator may not receive all tuples within an aggregation group. Typically, the aggregation operator is unaware of how many and which tuples are missing. As a consequence, computed averages over these streams could be skewed, invalid, and worse yet totally misleading. Such inaccurate results are unacceptable for many applications. TP-Ag is a novel aggregate operator for TP that produces reliable average calculations for normally distributed data under adverse conditions. It determines at run-time which results to produce and which subgroups in the aggregate population are used to generate each result. A carefully designed application of Cochran’s sample size methodology is used to measure the reliability of results. Each result is annotated with which subgroups were used in its production. Our experimental findings substantiate that TP-Ag increases the reliability of average calculations compared to the state-of-the-art approaches for TP systems (up to 91% more accurate results).

This work is supported by GAANN and NSF grants: IIS-1018443 & 0917017 & 0414567 & 0551584 (equipment grant).

This work started during Karen’s Ph.D. study at WPI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. The International Journal on Very Large Data Bases, 120–139 (2003)
Google Scholar
Abadi, D.J., et al.: Aurora: A new model and architecture for data stream management. VLDB Journal, 120–139 (2003)
Google Scholar
Arasu, A., et al.: The cql continuous query language: semantic foundations and query execution. VLDB Journal, 121–142 (2006)
Google Scholar
Babcock, B., et al.: Load shedding for aggregation queries over data streams. In: ICDE, p. 350 (2004)
Google Scholar
Basaran, C., Kang, K.-D., Zhou, Y., Suzer, M.H.: Adaptive load shedding via fuzzy control in data stream management systems. In: 2012 5th IEEE International Conference on Service-Oriented Computing and Applications (SOCA), pp. 1–8. IEEE (2012)
Google Scholar
Carney, D., et al.: Monitoring streams: A new class of data management applications. In: VLDB, pp. 215–226 (2002)
Google Scholar
Cochran, W.G.: Sampling Techniques, 3 edn. John Wiley (1977)
Google Scholar
Cormode, G., Korn, F., Tirthapura, S.: Time-decaying aggregates in out-of-order streams. PODS, 89–98 (2008)
Google Scholar
Das, A., et al.: Semantic approximation of data stream joins. IEEE, 44–59 (2005)
Google Scholar
Dobra, A., et al.: Processing complex aggregate queries over data streams. In: SIGMOD, pp. 61–72 (2002)
Google Scholar
Fama, E.F.: The behavior of stock-market prices. The Journal of Business 38(1), 34–105 (1965)
Article Google Scholar
Finance, Y.: http://finance.yahoo.com/
Gainey, R.R., et al.: Understanding the experience of house arrest with electronic monitoring: An analysis of quantitative and qualitative data. International Journal of Offender Therapy and Comparative Criminology (2000)
Google Scholar
Golab, L., et al.: Update-pattern-aware modeling and processing of cont. queries. In: SIGMOD, pp. 658–669 (2005)
Google Scholar
Guo, J.-F., He, C.-L.: Load shedding for sliding window aggregation queries over data streams. Application Research of Computers, 1–23 (2009)
Google Scholar
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. SIGMOD 26(2), 171–182 (1997)
Article Google Scholar
Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association 58(301), 13–30 (1963)
Article MATH MathSciNet Google Scholar
Hoyle, S.: Use and abuse of statistics. ASLIB Proc. 40(11–12), 321–324 (1988)
Article Google Scholar
Kang, H.G., Mahoney, D.F., Hoenig, H., Hirth, V.A., Bonato, P., Hajjar, I., Lipsitz, L.A.: In situ monitoring of health in older adults: technologies and issues. Journal of the American Geriatrics Society 58(8), 1579–1586 (2010)
Article Google Scholar
Kargupta, H., Park, B.-H., Pittie, S., Liu, L., Kushraj, D., Sarkar, K.: Mobimine: monitoring the stock market from a pda. SIGKDD Explor. Newsl. 3(2), 37–46 (2002)
Article Google Scholar
Katopodis, P., et al.: A hybrid, large-scale wireless sensor network for missile defense. IEEE, 1–5 (2007)
Google Scholar
Li, J., et al.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD 34, 39–44 (2005)
Article Google Scholar
Li, J., et al.: Semantics and evaluation techniques for window aggregates in data streams. SIGMOD, 311–322 (2005)
Google Scholar
Lin, C.-C., et al.: Wireless health care service system for elderly with dementia. IEEE, 696–704 (2006)
Google Scholar
Lin, O., Qin, Z., Jingjing, Q., Qiumei, P.: A new linear programming based load-shedding strategy. In: 2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science (DCABES), pp. 260–263. IEEE (2012)
Google Scholar
Liu, B., et al.: Run-time operator state spilling for memory intensive long-running queries. SIGMOD, 347–358 (2006)
Google Scholar
Longbo, Z., Zhanhuai, L., Zhenyou, W., Min, Y.: Semantic load shedding for sliding window join-aggregation queries over data streams. In: International Conference on Convergence Information Technology, pp. 2152–2155 (2007)
Google Scholar
Ma, L., Zhang, Q., Shi, N.: A semantic load shedding algorithm based on priority table in data stream system. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1167–1172 (2010)
Google Scholar
Nehme, R.V., Rundensteiner, E.A.: Clustersheddy: Load shedding using moving clusters over spatio-temporal data streams. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 637–651. Springer, Heidelberg (2007)
Google Scholar
Network, M.: Where have all the investors gone? (February 2012). http://money.msn.com
Olston, C., Widom, J.: Offering a precision-performance tradeoff for aggregation queries over replicated data. Technical Report 2000–16, Stanford InfoLab (2000)
Google Scholar
Press, A.: Officials lose track of 16,000 sex offenders after gps fails (2010). http://www.foxnews.com
Reiss, F., Hellerstein, J.M.: Data triage: An adaptive architecture for load shedding in telegraphcq. In: IEEE International Conference on Data Engineering, pp. 155–156 (2005)
Google Scholar
Rundensteiner, E.A., et al.: Cape: Continuous query engine with heterogeneous-grained adaptivity. In: VLDB, pp. 1353–1356 (2004)
Google Scholar
Senthamilarasu, S., Hemalatha, M.: Load shedding techniques based on windows in data stream systems. In: 2012 International Conference on Emerging Trends in Science, Engineering and Technology (INCOSET), pp. 68–73. IEEE (2012)
Google Scholar
Tatbul, N.: QoS-driven load shedding on data streams. In: Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.) EDBT 2002. LNCS, vol. 2490, pp. 566–576. Springer, Heidelberg (2002)
Chapter Google Scholar
Tatbul, N., Çetintemel, U., Zdonik, S.: Staying fit: Efficient load shedding techniques for distributed stream processing. In: International Conference on Very Large Data Bases, pp. 159–170 (2007)
Google Scholar
Tatbul, N., et al.: Load shedding in a data stream manager. In: VLDB, pp. 309–320 (2003)
Google Scholar
Tatbul, N., Zdonik, S.: Window-aware load shedding for aggregation queries over data streams. VLDB, 799–810 (2006)
Google Scholar
Pham, T.N., Chrysanthis, P.K., Labrinidis, A.: Self-managing load shedding for data stream management systems, 1–7 (2013)
Google Scholar
Wang, H.-Y., Qin, Z.-D., Li, B.-Y., Cong, J., Wang, Z.-J., Du, M.: Novel load shedding approach for real-time data stream processing. Journal of Chinese Computer Systems, 1–4 (2010)
Google Scholar
Wei, M., et al.: Achieving high output quality under limited resources through structure-based spilling in xml streams. PVLDB, 1267–1278 (2010)
Google Scholar
Works, K., Rundensteiner, E.: Preferential resource allocation in stream processing systems. International Journal of Cooperative Information Systems (2014)
Google Scholar
Works, K., Rundensteiner, E.A.: The proactive promotion engine. In: ICDE, pp. 1340–1343 (2011)
Google Scholar
Zdonik, S.B., et al.: The aurora and medusa projects. IEEE, 3–10 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Westfield State University, Westfield, MA, U.S.A.
Karen Works
Worcester Polytechnic Institute, Worcester, MA, U.S.A.
Elke A. Rundensteiner

Authors

Karen Works
View author publications
You can also search for this author in PubMed Google Scholar
Elke A. Rundensteiner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karen Works .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, FAW, Linz, Austria
Josef Küng
Linz, Austria
Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Works, K., Rundensteiner, E.A. (2014). Reliable Aggregation over Prioritized Data Streams. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV. Lecture Notes in Computer Science(), vol 8800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45714-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-662-45714-6_1
Published: 21 November 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45713-9
Online ISBN: 978-3-662-45714-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics