Skip to main content

On the Timed Analysis of Big-Data Applications

  • Conference paper
  • First Online:
Book cover NASA Formal Methods (NFM 2018)

Abstract

Apache Spark is one of the best-known frameworks for executing big-data batch applications over a cluster of (virtual) machines. Defining the cluster (i.e., the number of machines and CPUs) to attain guarantees on the execution times (deadlines) of the application is indeed a trade-off between the cost of the infrastructure and the time needed to execute the application. Sizing the computational resources, in order to prevent cost overruns, can benefit from the use of formal models as a means to capture the execution time of applications.

Our model of Spark applications, based on the CLTLoc logic, is defined by considering the directed acyclic graph around which Spark programs are organized, the number of available CPUs, the number of tasks elaborated by the application, and the average execution times of tasks. If the outcome of the analysis is positive, then the execution is feasible—that is, it can be completed within a given time span. The analysis tool has been implemented on top of the Zot formal verification tool. A preliminary evaluation shows that our model is sufficiently accurate: the formal analysis identifies execution times that are close (the error is less than 10%) to those obtained by actually running the applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    github.com/fm-polimi/zot.

  2. 2.

    github.com/dice-project/DICE-Verification.

  3. 3.

    github.com/franco-maroni/xSpark-bench.

References

  1. Abdeddaim, Y., Asarin, E., Maler, O.: Scheduling with timed automata. Theoret. Comput. Sci. 354(2), 272–300 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  2. Alur, R., Dill, D.L.: A theory of timed automata. Theoret. Comput. Sci. 126(2), 183–235 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  3. Baresi, L., Pourhashem Kallehbasti, M.M., Rossi, M.: How bit-vector logic can help improve the verification of LTL specifications over infinite domains. In: Proceedings of 31st Annual ACM Symposium on Applied Computing, pp. 1666–1673 (2016)

    Google Scholar 

  4. Behrmann, G., Larsen, K.G., Rasmussen, J.I.: Optimal scheduling using priced timed automata. SIGMETRICS Perform. Eval. Rev. 32(4), 34–40 (2005)

    Article  Google Scholar 

  5. Bersani, M., Erascu, M., Marconi, F., Rossi, M.: DICE verification tool - final version. Technical report, DICE Consortium (2017). www.dice-h2020.eu

  6. Bersani, M.M., Rossi, M., San Pietro, P.: A tool for deciding the satisfiability of continuous-time metric temporal logic. Acta Informatica 53(2), 171–206 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bradley, S., Henderson, W., Kendall, D.: Using timed automata for response time analysis of distributed real-time systems. In: 24th IFAC/IFIP Workshop on Real-Time Programming, pp. 143–148 (1999)

    Google Scholar 

  8. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of International World-Wide Web Conference (WWW), pp. 107–117 (1998)

    Google Scholar 

  9. Brito, A., Ardagna, D., Blanquer, I., Evangelinou, A., Barbierato, E., Gribaudo, M., Almeida, J., Couto, A.P., Braga, T.: D3.4 EUBra-BIGSEA QoS infrastructure services intermediate version. Technical report, 3 July 2017

    Google Scholar 

  10. Corbett, J.C.: Timing analysis of ada tasking programs. IEEE Trans. Softw. Eng. 22(7), 461–483 (1996)

    Article  Google Scholar 

  11. Demri, S., D’Souza, D.: An automata-theoretic approach to constraint LTL. Inf. Comput. 205(3), 380–415 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  12. Henzinger, T.A.: The theory of hybrid automata. In: Inan, M.K., Kurshan, R.P. (eds.) Verification of Digital and Hybrid Systems. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 170, pp. 265–292. Springer, Berlin (2000). https://doi.org/10.1007/978-3-642-59615-5_13

    Chapter  Google Scholar 

  13. Krakora, J., Waszniowski, L., Pisa, P., Hanzalek, Z.: Timed automata approach to real time distributed system verification. In: Proceedings of IEEE International Workshop on Factory Communication Systems, pp. 407–410, September 2004

    Google Scholar 

  14. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  15. Marconi, F., Bersani, M.M., Erascu, M., Rossi, M.: Towards the formal verification of data-intensive applications through metric temporal logic. In: Ogata, K., Lawford, M., Liu, S. (eds.) ICFEM 2016. LNCS, vol. 10009, pp. 193–209. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47846-3_13

    Chapter  Google Scholar 

  16. Palencia, J.C., Harbour, M.G.: Schedulability analysis for tasks with static and dynamic offsets. In: Proceedings of IEEE Real-Time Systems Symposium, pp. 26–37, December 1998

    Google Scholar 

  17. Perez, D., Bernardi, S., Merseguer, J.Z., Requeno, J.I., Casale, G., Zhu, L.: DICE simulation tools - final version. Deliverable. http://www.dice-h2020.eu/resources/

Download references

Acknowledgment

This work has been partially supported by the DICE project (Horizon 2020 project no. 644869) and by the GAUSS national research project (MIUR, PRIN 2015, Contract 2015KWREMX).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Marconi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marconi, F., Quattrocchi, G., Baresi, L., Bersani, M.M., Rossi, M. (2018). On the Timed Analysis of Big-Data Applications. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds) NASA Formal Methods. NFM 2018. Lecture Notes in Computer Science(), vol 10811. Springer, Cham. https://doi.org/10.1007/978-3-319-77935-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77935-5_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77934-8

  • Online ISBN: 978-3-319-77935-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics