On the Timed Analysis of Big-Data Applications

  • Francesco MarconiEmail author
  • Giovanni Quattrocchi
  • Luciano Baresi
  • Marcello M. Bersani
  • Matteo Rossi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10811)


Apache Spark is one of the best-known frameworks for executing big-data batch applications over a cluster of (virtual) machines. Defining the cluster (i.e., the number of machines and CPUs) to attain guarantees on the execution times (deadlines) of the application is indeed a trade-off between the cost of the infrastructure and the time needed to execute the application. Sizing the computational resources, in order to prevent cost overruns, can benefit from the use of formal models as a means to capture the execution time of applications.

Our model of Spark applications, based on the CLTLoc logic, is defined by considering the directed acyclic graph around which Spark programs are organized, the number of available CPUs, the number of tasks elaborated by the application, and the average execution times of tasks. If the outcome of the analysis is positive, then the execution is feasible—that is, it can be completed within a given time span. The analysis tool has been implemented on top of the Zot formal verification tool. A preliminary evaluation shows that our model is sufficiently accurate: the formal analysis identifies execution times that are close (the error is less than 10%) to those obtained by actually running the applications.


Big-Data Applications Metric temporal logic Formal verification Apache Spark 



This work has been partially supported by the DICE project (Horizon 2020 project no. 644869) and by the GAUSS national research project (MIUR, PRIN 2015, Contract 2015KWREMX).


  1. 1.
    Abdeddaim, Y., Asarin, E., Maler, O.: Scheduling with timed automata. Theoret. Comput. Sci. 354(2), 272–300 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Alur, R., Dill, D.L.: A theory of timed automata. Theoret. Comput. Sci. 126(2), 183–235 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Baresi, L., Pourhashem Kallehbasti, M.M., Rossi, M.: How bit-vector logic can help improve the verification of LTL specifications over infinite domains. In: Proceedings of 31st Annual ACM Symposium on Applied Computing, pp. 1666–1673 (2016)Google Scholar
  4. 4.
    Behrmann, G., Larsen, K.G., Rasmussen, J.I.: Optimal scheduling using priced timed automata. SIGMETRICS Perform. Eval. Rev. 32(4), 34–40 (2005)CrossRefGoogle Scholar
  5. 5.
    Bersani, M., Erascu, M., Marconi, F., Rossi, M.: DICE verification tool - final version. Technical report, DICE Consortium (2017).
  6. 6.
    Bersani, M.M., Rossi, M., San Pietro, P.: A tool for deciding the satisfiability of continuous-time metric temporal logic. Acta Informatica 53(2), 171–206 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Bradley, S., Henderson, W., Kendall, D.: Using timed automata for response time analysis of distributed real-time systems. In: 24th IFAC/IFIP Workshop on Real-Time Programming, pp. 143–148 (1999)Google Scholar
  8. 8.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of International World-Wide Web Conference (WWW), pp. 107–117 (1998)Google Scholar
  9. 9.
    Brito, A., Ardagna, D., Blanquer, I., Evangelinou, A., Barbierato, E., Gribaudo, M., Almeida, J., Couto, A.P., Braga, T.: D3.4 EUBra-BIGSEA QoS infrastructure services intermediate version. Technical report, 3 July 2017Google Scholar
  10. 10.
    Corbett, J.C.: Timing analysis of ada tasking programs. IEEE Trans. Softw. Eng. 22(7), 461–483 (1996)CrossRefGoogle Scholar
  11. 11.
    Demri, S., D’Souza, D.: An automata-theoretic approach to constraint LTL. Inf. Comput. 205(3), 380–415 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Henzinger, T.A.: The theory of hybrid automata. In: Inan, M.K., Kurshan, R.P. (eds.) Verification of Digital and Hybrid Systems. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 170, pp. 265–292. Springer, Berlin (2000). CrossRefGoogle Scholar
  13. 13.
    Krakora, J., Waszniowski, L., Pisa, P., Hanzalek, Z.: Timed automata approach to real time distributed system verification. In: Proceedings of IEEE International Workshop on Factory Communication Systems, pp. 407–410, September 2004Google Scholar
  14. 14.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297 (1967)Google Scholar
  15. 15.
    Marconi, F., Bersani, M.M., Erascu, M., Rossi, M.: Towards the formal verification of data-intensive applications through metric temporal logic. In: Ogata, K., Lawford, M., Liu, S. (eds.) ICFEM 2016. LNCS, vol. 10009, pp. 193–209. Springer, Cham (2016). CrossRefGoogle Scholar
  16. 16.
    Palencia, J.C., Harbour, M.G.: Schedulability analysis for tasks with static and dynamic offsets. In: Proceedings of IEEE Real-Time Systems Symposium, pp. 26–37, December 1998Google Scholar
  17. 17.
    Perez, D., Bernardi, S., Merseguer, J.Z., Requeno, J.I., Casale, G., Zhu, L.: DICE simulation tools - final version. Deliverable.

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.DEIBPolitecnico di MilanoMilanItaly

Personalised recommendations