On the Timed Analysis of Big-Data Applications
Apache Spark is one of the best-known frameworks for executing big-data batch applications over a cluster of (virtual) machines. Defining the cluster (i.e., the number of machines and CPUs) to attain guarantees on the execution times (deadlines) of the application is indeed a trade-off between the cost of the infrastructure and the time needed to execute the application. Sizing the computational resources, in order to prevent cost overruns, can benefit from the use of formal models as a means to capture the execution time of applications.
Our model of Spark applications, based on the CLTLoc logic, is defined by considering the directed acyclic graph around which Spark programs are organized, the number of available CPUs, the number of tasks elaborated by the application, and the average execution times of tasks. If the outcome of the analysis is positive, then the execution is feasible—that is, it can be completed within a given time span. The analysis tool has been implemented on top of the Zot formal verification tool. A preliminary evaluation shows that our model is sufficiently accurate: the formal analysis identifies execution times that are close (the error is less than 10%) to those obtained by actually running the applications.
KeywordsBig-Data Applications Metric temporal logic Formal verification Apache Spark
This work has been partially supported by the DICE project (Horizon 2020 project no. 644869) and by the GAUSS national research project (MIUR, PRIN 2015, Contract 2015KWREMX).
- 3.Baresi, L., Pourhashem Kallehbasti, M.M., Rossi, M.: How bit-vector logic can help improve the verification of LTL specifications over infinite domains. In: Proceedings of 31st Annual ACM Symposium on Applied Computing, pp. 1666–1673 (2016)Google Scholar
- 5.Bersani, M., Erascu, M., Marconi, F., Rossi, M.: DICE verification tool - final version. Technical report, DICE Consortium (2017). www.dice-h2020.eu
- 7.Bradley, S., Henderson, W., Kendall, D.: Using timed automata for response time analysis of distributed real-time systems. In: 24th IFAC/IFIP Workshop on Real-Time Programming, pp. 143–148 (1999)Google Scholar
- 8.Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of International World-Wide Web Conference (WWW), pp. 107–117 (1998)Google Scholar
- 9.Brito, A., Ardagna, D., Blanquer, I., Evangelinou, A., Barbierato, E., Gribaudo, M., Almeida, J., Couto, A.P., Braga, T.: D3.4 EUBra-BIGSEA QoS infrastructure services intermediate version. Technical report, 3 July 2017Google Scholar
- 12.Henzinger, T.A.: The theory of hybrid automata. In: Inan, M.K., Kurshan, R.P. (eds.) Verification of Digital and Hybrid Systems. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 170, pp. 265–292. Springer, Berlin (2000). https://doi.org/10.1007/978-3-642-59615-5_13 CrossRefGoogle Scholar
- 13.Krakora, J., Waszniowski, L., Pisa, P., Hanzalek, Z.: Timed automata approach to real time distributed system verification. In: Proceedings of IEEE International Workshop on Factory Communication Systems, pp. 407–410, September 2004Google Scholar
- 14.MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297 (1967)Google Scholar
- 15.Marconi, F., Bersani, M.M., Erascu, M., Rossi, M.: Towards the formal verification of data-intensive applications through metric temporal logic. In: Ogata, K., Lawford, M., Liu, S. (eds.) ICFEM 2016. LNCS, vol. 10009, pp. 193–209. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47846-3_13 CrossRefGoogle Scholar
- 16.Palencia, J.C., Harbour, M.G.: Schedulability analysis for tasks with static and dynamic offsets. In: Proceedings of IEEE Real-Time Systems Symposium, pp. 26–37, December 1998Google Scholar
- 17.Perez, D., Bernardi, S., Merseguer, J.Z., Requeno, J.I., Casale, G., Zhu, L.: DICE simulation tools - final version. Deliverable. http://www.dice-h2020.eu/resources/