Skip to main content

Locality Aware Task Scheduling in Parallel Data Stream Processing

  • Conference paper
Intelligent Distributed Computing VIII

Part of the book series: Studies in Computational Intelligence ((SCI,volume 570))

Abstract

Parallel data processing and parallel streaming systems become quite popular. They are employed in various domains such as real-time signal processing, OLAP database systems, or high performance data extraction.One of the key components of these systems is the task scheduler which plans and executes tasks spawned by the system on available CPU cores. The multiprocessor systems and CPU architecture of the day become quite complex, which makes the task scheduling a challenging problem. In this paper, we propose a novel task scheduling strategy for parallel data stream systems, that reflects many technical issues of the current hardware. We were able to achieve up to 3× speed up on a NUMA system and up to 10% speed up on an older SMP system with respect to the unoptimized version of the scheduler. The basic ideas implemented in our scheduler may be adopted for task schedulers that focus on other priorities or employ different constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Impact of Load Imbalance on Processors with Hyper-Threading Technology (2011), http://software.intel.com/en-us/articles/impact-of-load-imbalance-on-processors-with-hyper-threading-technology (accessed March 18, 2014)

  2. Intel Threading Building Blocks Reference Manual (2014), http://software.intel.com/en-us/node/506130 (accessed March 18, 2014)

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Thomas, D.: Operator scheduling in data stream systems. The International Journal on Very Large Data Bases 13(4), 333–353 (2004)

    Article  Google Scholar 

  4. Bednárek, D., Dokulil, J.: TriQuery: Modifying XQuery for RDF and Relational Data. In: 2010 Workshops on Database and Expert Systems Applications, pp. 342–346. IEEE (2010)

    Google Scholar 

  5. Bednárek, D., Dokulil, J., Yaghob, J., Zavoral, F.: The Bobox Project - A Parallel Native Repository for Semi-structured Data and the Semantic Web. In: ITAT - IX. Informačné technológie - aplikácie a teória, pp. 44–59 (2009)

    Google Scholar 

  6. Bednárek, D., Dokulil, J., Yaghob, J., Zavoral, F.: Using methods of parallel semi-structured data processing for semantic web. In: International Conference on Advances in Semantic Processing, pp. 44–49 (2009)

    Google Scholar 

  7. Bednárek, D., Dokulil, J., Yaghob, J., Zavoral, F.: Data-flow awareness in parallel data processing. In: Fortino, G., Badica, C., Malgeri, M., Unland, R. (eds.) IDC 2012. SCI, vol. 446, pp. 149–154. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Boag, S., Chamberlin, D., Fernández, M., Florescu, D., Robie, J., Siméon, J., Stefanescu, M.: XQuery 1.0: An XML query language. W3C working draft 15 (2002)

    Google Scholar 

  9. Broquedis, F., Furmento, N., Goglin, B., Namyst, R., Wacrenier, P.-A.: Dynamic task and data placement over NUMA architectures: An openMP runtime perspective. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 79–92. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Chen, Q., Guo, M., Huang, Z.: CATS: Cache Aware Task-stealing Based on Online Profiling in Multi-socket Multi-core Architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 163–172. ACM, New York (2012)

    Google Scholar 

  11. Chen, Q., Huang, Z., Guo, M., Zhou, J.: Cab: Cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: 2011 International Conference on Parallel Processing (ICPP), pp. 722–732. IEEE (2011)

    Google Scholar 

  12. Cieslewicz, J., Mee, W., Ross, K.: Cache-conscious buffering for database operators with state. In: Proceedings of the Fifth International Workshop on Data Management on New Hardware, pp. 43–51. ACM (2009)

    Google Scholar 

  13. Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of openMP task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Falt, Z., Čermak, M., Zavoral, F.: Highly Scalable Sort-Merge Join Algorithm for RDF Querying. In: Proceedings of the 2nd International Conference on Data Management Technologies and Applications (2013)

    Google Scholar 

  15. Jiang, Q., Chakravarthy, S.: Scheduling strategies for processing continuous queries over streams. In: Williams, H., MacKinnon, L.M. (eds.) BNCOD 2004. LNCS, vol. 3112, pp. 16–30. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Kruliš, M., Falt, Z., Bednárek, D., Yaghob, J.: Task scheduling in hybrid CPU-GPU systems. Informačné Technológie-Aplikácie a Teória, p. 17

    Google Scholar 

  17. Kukanov, A., Voss, M.: The foundations for scalable multi-core software in Intel Threading Building Blocks. Intel Technology Journal 11(4), 309–322 (2007)

    Article  Google Scholar 

  18. Prud’Hommeaux, E., Seaborne, A., et al.: SPARQL query language for RDF. W3C working draft, 4 (2006)

    Google Scholar 

  19. Reinders, J.: Intel Threading building blocks. O’Reilly (2007)

    Google Scholar 

  20. Safaei, A.A., Haghjoo, M.S.: Parallel processing of continuous queries over data streams. Distrib. Parallel Databases 28, 93–118 (2010)

    Article  Google Scholar 

  21. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: a SPARQL performance benchmark. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 222–233. IEEE (2009)

    Google Scholar 

  22. Sinnen, O.: Task scheduling for parallel systems, vol. 60. John Wiley & Sons (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zbyněk Falt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Falt, Z., Kruliš, M., Bednárek, D., Yaghob, J., Zavoral, F. (2015). Locality Aware Task Scheduling in Parallel Data Stream Processing. In: Camacho, D., Braubach, L., Venticinque, S., Badica, C. (eds) Intelligent Distributed Computing VIII. Studies in Computational Intelligence, vol 570. Springer, Cham. https://doi.org/10.1007/978-3-319-10422-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10422-5_35

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10421-8

  • Online ISBN: 978-3-319-10422-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics