Skip to main content

Reliable Aggregation over Prioritized Data Streams

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8800))

  • 294 Accesses

Abstract

Under limited resources, targeted prioritized data stream systems (TP) adjust the processing order of tuples to produce the most significant results first. In TP, an aggregation operator may not receive all tuples within an aggregation group. Typically, the aggregation operator is unaware of how many and which tuples are missing. As a consequence, computed averages over these streams could be skewed, invalid, and worse yet totally misleading. Such inaccurate results are unacceptable for many applications. TP-Ag is a novel aggregate operator for TP that produces reliable average calculations for normally distributed data under adverse conditions. It determines at run-time which results to produce and which subgroups in the aggregate population are used to generate each result. A carefully designed application of Cochran’s sample size methodology is used to measure the reliability of results. Each result is annotated with which subgroups were used in its production. Our experimental findings substantiate that TP-Ag increases the reliability of average calculations compared to the state-of-the-art approaches for TP systems (up to 91% more accurate results).

This work is supported by GAANN and NSF grants: IIS-1018443 & 0917017 & 0414567 & 0551584 (equipment grant).

This work started during Karen’s Ph.D. study at WPI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. The International Journal on Very Large Data Bases, 120–139 (2003)

    Google Scholar 

  2. Abadi, D.J., et al.: Aurora: A new model and architecture for data stream management. VLDB Journal, 120–139 (2003)

    Google Scholar 

  3. Arasu, A., et al.: The cql continuous query language: semantic foundations and query execution. VLDB Journal, 121–142 (2006)

    Google Scholar 

  4. Babcock, B., et al.: Load shedding for aggregation queries over data streams. In: ICDE, p. 350 (2004)

    Google Scholar 

  5. Basaran, C., Kang, K.-D., Zhou, Y., Suzer, M.H.: Adaptive load shedding via fuzzy control in data stream management systems. In: 2012 5th IEEE International Conference on Service-Oriented Computing and Applications (SOCA), pp. 1–8. IEEE (2012)

    Google Scholar 

  6. Carney, D., et al.: Monitoring streams: A new class of data management applications. In: VLDB, pp. 215–226 (2002)

    Google Scholar 

  7. Cochran, W.G.: Sampling Techniques, 3 edn. John Wiley (1977)

    Google Scholar 

  8. Cormode, G., Korn, F., Tirthapura, S.: Time-decaying aggregates in out-of-order streams. PODS, 89–98 (2008)

    Google Scholar 

  9. Das, A., et al.: Semantic approximation of data stream joins. IEEE, 44–59 (2005)

    Google Scholar 

  10. Dobra, A., et al.: Processing complex aggregate queries over data streams. In: SIGMOD, pp. 61–72 (2002)

    Google Scholar 

  11. Fama, E.F.: The behavior of stock-market prices. The Journal of Business 38(1), 34–105 (1965)

    Article  Google Scholar 

  12. Finance, Y.: http://finance.yahoo.com/

  13. Gainey, R.R., et al.: Understanding the experience of house arrest with electronic monitoring: An analysis of quantitative and qualitative data. International Journal of Offender Therapy and Comparative Criminology (2000)

    Google Scholar 

  14. Golab, L., et al.: Update-pattern-aware modeling and processing of cont. queries. In: SIGMOD, pp. 658–669 (2005)

    Google Scholar 

  15. Guo, J.-F., He, C.-L.: Load shedding for sliding window aggregation queries over data streams. Application Research of Computers, 1–23 (2009)

    Google Scholar 

  16. Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. SIGMOD 26(2), 171–182 (1997)

    Article  Google Scholar 

  17. Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association 58(301), 13–30 (1963)

    Article  MATH  MathSciNet  Google Scholar 

  18. Hoyle, S.: Use and abuse of statistics. ASLIB Proc. 40(11–12), 321–324 (1988)

    Article  Google Scholar 

  19. Kang, H.G., Mahoney, D.F., Hoenig, H., Hirth, V.A., Bonato, P., Hajjar, I., Lipsitz, L.A.: In situ monitoring of health in older adults: technologies and issues. Journal of the American Geriatrics Society 58(8), 1579–1586 (2010)

    Article  Google Scholar 

  20. Kargupta, H., Park, B.-H., Pittie, S., Liu, L., Kushraj, D., Sarkar, K.: Mobimine: monitoring the stock market from a pda. SIGKDD Explor. Newsl. 3(2), 37–46 (2002)

    Article  Google Scholar 

  21. Katopodis, P., et al.: A hybrid, large-scale wireless sensor network for missile defense. IEEE, 1–5 (2007)

    Google Scholar 

  22. Li, J., et al.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD 34, 39–44 (2005)

    Article  Google Scholar 

  23. Li, J., et al.: Semantics and evaluation techniques for window aggregates in data streams. SIGMOD, 311–322 (2005)

    Google Scholar 

  24. Lin, C.-C., et al.: Wireless health care service system for elderly with dementia. IEEE, 696–704 (2006)

    Google Scholar 

  25. Lin, O., Qin, Z., Jingjing, Q., Qiumei, P.: A new linear programming based load-shedding strategy. In: 2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science (DCABES), pp. 260–263. IEEE (2012)

    Google Scholar 

  26. Liu, B., et al.: Run-time operator state spilling for memory intensive long-running queries. SIGMOD, 347–358 (2006)

    Google Scholar 

  27. Longbo, Z., Zhanhuai, L., Zhenyou, W., Min, Y.: Semantic load shedding for sliding window join-aggregation queries over data streams. In: International Conference on Convergence Information Technology, pp. 2152–2155 (2007)

    Google Scholar 

  28. Ma, L., Zhang, Q., Shi, N.: A semantic load shedding algorithm based on priority table in data stream system. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1167–1172 (2010)

    Google Scholar 

  29. Nehme, R.V., Rundensteiner, E.A.: Clustersheddy: Load shedding using moving clusters over spatio-temporal data streams. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 637–651. Springer, Heidelberg (2007)

    Google Scholar 

  30. Network, M.: Where have all the investors gone? (February 2012). http://money.msn.com

  31. Olston, C., Widom, J.: Offering a precision-performance tradeoff for aggregation queries over replicated data. Technical Report 2000–16, Stanford InfoLab (2000)

    Google Scholar 

  32. Press, A.: Officials lose track of 16,000 sex offenders after gps fails (2010). http://www.foxnews.com

  33. Reiss, F., Hellerstein, J.M.: Data triage: An adaptive architecture for load shedding in telegraphcq. In: IEEE International Conference on Data Engineering, pp. 155–156 (2005)

    Google Scholar 

  34. Rundensteiner, E.A., et al.: Cape: Continuous query engine with heterogeneous-grained adaptivity. In: VLDB, pp. 1353–1356 (2004)

    Google Scholar 

  35. Senthamilarasu, S., Hemalatha, M.: Load shedding techniques based on windows in data stream systems. In: 2012 International Conference on Emerging Trends in Science, Engineering and Technology (INCOSET), pp. 68–73. IEEE (2012)

    Google Scholar 

  36. Tatbul, N.: QoS-driven load shedding on data streams. In: Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.) EDBT 2002. LNCS, vol. 2490, pp. 566–576. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  37. Tatbul, N., Çetintemel, U., Zdonik, S.: Staying fit: Efficient load shedding techniques for distributed stream processing. In: International Conference on Very Large Data Bases, pp. 159–170 (2007)

    Google Scholar 

  38. Tatbul, N., et al.: Load shedding in a data stream manager. In: VLDB, pp. 309–320 (2003)

    Google Scholar 

  39. Tatbul, N., Zdonik, S.: Window-aware load shedding for aggregation queries over data streams. VLDB, 799–810 (2006)

    Google Scholar 

  40. Pham, T.N., Chrysanthis, P.K., Labrinidis, A.: Self-managing load shedding for data stream management systems, 1–7 (2013)

    Google Scholar 

  41. Wang, H.-Y., Qin, Z.-D., Li, B.-Y., Cong, J., Wang, Z.-J., Du, M.: Novel load shedding approach for real-time data stream processing. Journal of Chinese Computer Systems, 1–4 (2010)

    Google Scholar 

  42. Wei, M., et al.: Achieving high output quality under limited resources through structure-based spilling in xml streams. PVLDB, 1267–1278 (2010)

    Google Scholar 

  43. Works, K., Rundensteiner, E.: Preferential resource allocation in stream processing systems. International Journal of Cooperative Information Systems (2014)

    Google Scholar 

  44. Works, K., Rundensteiner, E.A.: The proactive promotion engine. In: ICDE, pp. 1340–1343 (2011)

    Google Scholar 

  45. Zdonik, S.B., et al.: The aurora and medusa projects. IEEE, 3–10 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karen Works .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Works, K., Rundensteiner, E.A. (2014). Reliable Aggregation over Prioritized Data Streams. In: Hameurlain, A., KĂĽng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV. Lecture Notes in Computer Science(), vol 8800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45714-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45714-6_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45713-9

  • Online ISBN: 978-3-662-45714-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics