Abstract
In order to store and process data at large-scale, distributed databases are built to partition data and process it in parallel on distributed nodes in a cluster. When the database concurrently execute heterogeneous query workloads, performance prediction is needed. However, running queries in a distributed database heavily touches upon the network overhead as the data transmission between cluster nodes. Hence, in this work, we take network latency into account when predict concurrent query performance. We propose a linear regression model to estimate the interaction when execute concurrent query for analytical workloads in distributed database system. Since network latency and overheads of local processing are the two most significant factors for query execution, we analyze the query behavior with multivariate regression on both of them at different degree of concurrency. In addition, we use sampling techniques to obtain various query mixes as concurrency level increasing. The experiments for evaluation the performance of our prediction model are conducted over a PostgreSQL database cluster with a representative analytical workloads of TPC-H, the experimental results demonstrates that the query latency predictions of our model can minimize the relative error within 14 % on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barzan, M., Carlo, C., Samuel, M.: Resource and performance prediction for building a next generation database cloud. In: Proceedings of the Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA (2013)
Jennie, D., Yun, C., Hakan, H., Shenghou, Z., Ugur, C.: Packing light: portable workload performance prediction for the cloud. IEEE Computer Society (2013)
Ganapathi, A., Kuno, H., Dayal, U., Wiener, J.L., Fox, A., Jordan, M., Patterson, D.: Predicting multiple metrics for queries: better decisions enabled by machine learning. In: Proceedings of the 29th International Conference on Data Engineering, ICDE 2009, pp. 592–603, Shanghai, China, 29 March – 2 April 2009
Chaudhuri, S., Narasayya, V., Ramamurthy, R.: Estimating progress of execution for sql queries. In: Proceedings of ACM SIGMOD/PODS 2004 Conference, pp. 803–814, Maison de la Chimie, Paris, France, 13–18 June 2004
Luo, G., Naughton, J.F., Yu, P.S.: Multi-query SQL progress indicators. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 921–941. Springer, Heidelberg (2006)
Luo, G., Nanughton, J.F., Ellmann, C.J., Watzke, M.W.: Toward a progress indicator for database queries. In: Proceedings of ACM SIGMOD/PODS 2004 Conference, pp. 791–802, Maison de la Chimie, Paris, France, 13–18 June 2004
Ahmad, M., Aboulnaga, A., Babu, S., Munagala, K.: Modeling and exploiting query interactions in database systems. In: Proceedings of ACM 17th Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, 26–30 October 2008
Ahmad, M., Aboulnaga, A., Babu, S., Munagala, K.: Interaction aware scheduling of report-generation workloads. VLDB J. 20(4), 589–615 (2011)
Ahmad, M., Aboulnaga, A., Babu, S.: Query interactions in database workloads. In: Proceedings of 2nd International Workshop on Testing Database Systems, DBTest 2009, Providence, Rhode Island, USA, 29 June 2009
Akdere, M., Cetintemel, U., Riondato, E., Upfal, E., Zdonik, S.: Learning-based query performance modeling and prediction. In: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, pp. 390–401, Washington, DC, USA, 1–5 April 2012
Elnaffar, S., Martin, P., Horman, R.: Automatically classifying database workloads. In: Proceedings of 11th International Conference on Information and Knowledge Management, CIKM 2002, pp. 622–624. ACM, New York (2002)
Ahmad, M., Bowman, I.T.: Predicting system performance for multitenant database workloads. In: Proceedings of 4th International Workshop on Testing Database Systems, DBTest 2011, Athens, Greece, Article no. 6, 13 June 2011
Mozafari, B., Curino, C., Madden, S.: Performance and resource modeling in highly-concurrent OLTP workloads. In: Proceedings of the 2013 International Conference on Management of Data, SIGMOD 2013, pp. 301–312, New York, USA, 22–27 June 2013
Jennie, D., Uger, C., Olga, P., Eli, U.: Performance prediction for concurrent database workloads. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 337–348, Athens, Greece, 12–16 June 2011
Mehta, A., Gupta, C., Dayal, U.: BI batch manager: a system for managing batch workloads on enterprise data-warehouses. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2008, pp. 640–651, New York, NY, USA (2008)
Lo, J.L., Barroso, L.A.: Eggers, S.J., Gharachorloo, K., Levy, H.M., Parekh, S.S.: An analysis of database workload performance on simultaneous multithreaded processors. In: Proceedings of the 25th Annual International Symposium on Computer Architecture, vol. 26(3), pp. 39–50. ACM, New York, June 1998
Chaudhuri, S., Kaushik, R., Ramamurthy, R.: When can we trust progress estimators for SQL queries? In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD 2005, pp. 575–586, Baltimore, Maryland, USA, 14–16 June 2005
Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacıg, H., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: Proceedings of the 2013 IEEE International Conference on Data Engineering, ICDE 2013, pp. 1081–1092. IEEE Computer Society, Washington, DC, USA, 8–11 April 2013
Curino, C., Jones, E.P.C., Popa, R.A., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational cloud: a database service for the cloud. In: Proceedings of 5th Biennial Conference on Innovative Data Systems Research, CIDR 2011, pp. 235–240, Asilomar, CA, January 2011
Hacıg, H., Tatemura, J., Hsiung, W.P., Moon, H.J., Po, O., Sawires, A., Chi, Y., Jafarpour, H.: CloudDB: One size fits all revived. In: Proceedings of 6th World Congress on Services, SERVICES 2010, pp. 148–149, Miami, Florida, USA, 5–10 July 2010
Xiong, P., Chi, Y., Zhu, S., Moon, H.J.; Pu, C., Hacig, H.: Intelligent management of virtualized resources for database systems in cloud environment. In: Proceedings of 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011, pp. 87–98, Hannover, Germany, 11–16 April 2011
Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2004, pp. 2–13. ACM, New York, 12–16 June 2004
Jesper, M.: On the impact of network latency on distributed systems design. Inf. Technol. Manage. 1, 183–194 (2000)
Latin hypercube sampling. http://en.wikipedia.org/wiki/Latin_hypercube_sampling (Accessed: 25 November 2014)
Postgres-XL. http://www.postgres-xl.org/ (Accessed: 25 November 2014)
TPC-H. http://www.tpc.org/tpch/ (Accessed: 25 November 2014)
Aliyun’s DRDS (Distribute Relational Database Service). http://www.aliyun.com/product/drds/ (Accessed: 25 November 2014)
Acknowledgments
This work was supported by the China Ministry of Science and Technology under the State Key Development Program for Basic Research (2012CB821800), Fund of National Natural Science Foundation of China (No. 61462012, 61562010, U1531246), Scientific Research Fund for talents recruiting of Guizhou University (No. 700246003301), Science and Technology Fund of Guizhou Province (No. J [2013]2099), High Tech. Project Fund of Guizhou Development and Reform Commission (No. [2013]2069), Industrial Research Projects of the Science and Technology Plan of Guizhou Province (No. GY[2014]3018) and The Major Applied Basic Research Program of Guizhou Province (NO. JZ20142001, NO. JZ20142001-05).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, H., Hou, X., Chen, M., Dai, Z., Zhu, M., Huang, M. (2015). Performance Prediction for Concurrent Workloads in Distributed Database Systems. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-27140-8_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27139-2
Online ISBN: 978-3-319-27140-8
eBook Packages: Computer ScienceComputer Science (R0)