Abstract
Parallel data processing and parallel streaming systems become quite popular. They are employed in various domains such as real-time signal processing, OLAP database systems, or high performance data extraction.One of the key components of these systems is the task scheduler which plans and executes tasks spawned by the system on available CPU cores. The multiprocessor systems and CPU architecture of the day become quite complex, which makes the task scheduling a challenging problem. In this paper, we propose a novel task scheduling strategy for parallel data stream systems, that reflects many technical issues of the current hardware. We were able to achieve up to 3× speed up on a NUMA system and up to 10% speed up on an older SMP system with respect to the unoptimized version of the scheduler. The basic ideas implemented in our scheduler may be adopted for task schedulers that focus on other priorities or employ different constraints.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Impact of Load Imbalance on Processors with Hyper-Threading Technology (2011), http://software.intel.com/en-us/articles/impact-of-load-imbalance-on-processors-with-hyper-threading-technology (accessed March 18, 2014)
Intel Threading Building Blocks Reference Manual (2014), http://software.intel.com/en-us/node/506130 (accessed March 18, 2014)
Babcock, B., Babu, S., Datar, M., Motwani, R., Thomas, D.: Operator scheduling in data stream systems. The International Journal on Very Large Data Bases 13(4), 333–353 (2004)
Bednárek, D., Dokulil, J.: TriQuery: Modifying XQuery for RDF and Relational Data. In: 2010 Workshops on Database and Expert Systems Applications, pp. 342–346. IEEE (2010)
Bednárek, D., Dokulil, J., Yaghob, J., Zavoral, F.: The Bobox Project - A Parallel Native Repository for Semi-structured Data and the Semantic Web. In: ITAT - IX. Informačné technológie - aplikácie a teória, pp. 44–59 (2009)
Bednárek, D., Dokulil, J., Yaghob, J., Zavoral, F.: Using methods of parallel semi-structured data processing for semantic web. In: International Conference on Advances in Semantic Processing, pp. 44–49 (2009)
Bednárek, D., Dokulil, J., Yaghob, J., Zavoral, F.: Data-flow awareness in parallel data processing. In: Fortino, G., Badica, C., Malgeri, M., Unland, R. (eds.) IDC 2012. SCI, vol. 446, pp. 149–154. Springer, Heidelberg (2012)
Boag, S., Chamberlin, D., Fernández, M., Florescu, D., Robie, J., Siméon, J., Stefanescu, M.: XQuery 1.0: An XML query language. W3C working draft 15 (2002)
Broquedis, F., Furmento, N., Goglin, B., Namyst, R., Wacrenier, P.-A.: Dynamic task and data placement over NUMA architectures: An openMP runtime perspective. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 79–92. Springer, Heidelberg (2009)
Chen, Q., Guo, M., Huang, Z.: CATS: Cache Aware Task-stealing Based on Online Profiling in Multi-socket Multi-core Architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 163–172. ACM, New York (2012)
Chen, Q., Huang, Z., Guo, M., Zhou, J.: Cab: Cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: 2011 International Conference on Parallel Processing (ICPP), pp. 722–732. IEEE (2011)
Cieslewicz, J., Mee, W., Ross, K.: Cache-conscious buffering for database operators with state. In: Proceedings of the Fifth International Workshop on Data Management on New Hardware, pp. 43–51. ACM (2009)
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of openMP task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008)
Falt, Z., Čermak, M., Zavoral, F.: Highly Scalable Sort-Merge Join Algorithm for RDF Querying. In: Proceedings of the 2nd International Conference on Data Management Technologies and Applications (2013)
Jiang, Q., Chakravarthy, S.: Scheduling strategies for processing continuous queries over streams. In: Williams, H., MacKinnon, L.M. (eds.) BNCOD 2004. LNCS, vol. 3112, pp. 16–30. Springer, Heidelberg (2004)
Kruliš, M., Falt, Z., Bednárek, D., Yaghob, J.: Task scheduling in hybrid CPU-GPU systems. Informačné Technológie-Aplikácie a Teória, p. 17
Kukanov, A., Voss, M.: The foundations for scalable multi-core software in Intel Threading Building Blocks. Intel Technology Journal 11(4), 309–322 (2007)
Prud’Hommeaux, E., Seaborne, A., et al.: SPARQL query language for RDF. W3C working draft, 4 (2006)
Reinders, J.: Intel Threading building blocks. O’Reilly (2007)
Safaei, A.A., Haghjoo, M.S.: Parallel processing of continuous queries over data streams. Distrib. Parallel Databases 28, 93–118 (2010)
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: a SPARQL performance benchmark. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 222–233. IEEE (2009)
Sinnen, O.: Task scheduling for parallel systems, vol. 60. John Wiley & Sons (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Falt, Z., Kruliš, M., Bednárek, D., Yaghob, J., Zavoral, F. (2015). Locality Aware Task Scheduling in Parallel Data Stream Processing. In: Camacho, D., Braubach, L., Venticinque, S., Badica, C. (eds) Intelligent Distributed Computing VIII. Studies in Computational Intelligence, vol 570. Springer, Cham. https://doi.org/10.1007/978-3-319-10422-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-10422-5_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10421-8
Online ISBN: 978-3-319-10422-5
eBook Packages: EngineeringEngineering (R0)