Skip to main content

Temporal Random Testing for Spark Streaming

  • Conference paper
  • First Online:
Integrated Formal Methods (IFM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9681))

Included in the following conference series:

Abstract

With the rise of Big Data technologies, distributed stream processing systems (SPS) have gained popularity in the last years. Among them, Spark Streaming stands out as a particularly attractive option with a growing adoption in the industry. In this work we explore the combination of temporal logic and property-based testing for testing Spark Streaming programs, by adding temporal logic operators to ScalaCheck generators and properties. This allows us to deal with the time component that complicates the testing of Spark Streaming programs and SPS in general. In particular we propose a discrete time linear temporal logic for finite words, that allows to associate a timeout to each temporal operator in order to increase the expressiveness of generators and properties. Finally, our prototype is presented with some examples.

This research has been partially supported by MINECO Spanish projects StrongSoft (TIN2012-39391-C04-04), CAVI-ART (TIN2013-44742-C4-3-R), and TRACES (TIN2015-67522-C3-3-R), and by the Comunidad de Madrid project N-Greens Software-CM (S2013/ICE-2731).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See https://spark.apache.org/docs/latest/programming-guide.html for more details.

  2. 2.

    Here we use the integration of ScalaCheck with the Specs2 [21] testing library.

  3. 3.

    Due to space limitations, the results for release are available in [19].

  4. 4.

    Note that the value? is only reached when the word is consumed and this simplification cannot be applied.

  5. 5.

    See https://github.com/juanrh/sscheck-examples/blob/master/src/test/scala/es/ucm/fdi/sscheck/spark/demo/twitter/TwitterAmpcampDemo.scala and https://github.com/juanrh/sscheck-examples/wiki/TwitterAmpcampDemo-execution for the execution logs.

References

  1. Akidau, T., Balikov, A., Bekiroğlu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., Whittle, S.: MillWheel: fault-tolerant stream processing at internet scale. Proc. VLDB Endowment 6(11), 1033–1044 (2013)

    Article  Google Scholar 

  2. Barringer, H., Havelund, K.: TraceContract: A scala DSL for trace analysis. In: Butler, M., Schulte, W. (eds.) FM 2011. LNCS, vol. 6664, pp. 57–72. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. Bauer, A., Leucker, M., Schallhart, C.: Monitoring of real-time properties. In: Arun-Kumar, S., Garg, N. (eds.) FSTTCS 2006. LNCS, vol. 4337, pp. 260–272. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Bauer, A., Leucker, M., Schallhart, C.: The good, the bad, and the ugly, but how ugly is ugly? In: Sokolsky, O., Taşıran, S. (eds.) RV 2007. LNCS, vol. 4839, pp. 126–138. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Beck, K.: Test Driven Development: By Example. Addison-Wesley Professional, Boston (2003)

    Google Scholar 

  6. Blackburn, P., van Benthem, J., Wolter, F. (eds.): Handbook of Modal Logic. Elsevier, Philadelphia (2006)

    Google Scholar 

  7. Claessen, K., Hughes, J.: QuickCheck: A lightweight tool for random testing of Haskell programs. ACM Sigplan Not. 46(4), 53–64 (2011)

    Article  Google Scholar 

  8. Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. D’Angelo, B., Sankaranarayanan, S., Sánchez, C., Robinson, W., Finkbeiner, B., Sipma, H.B., Mehrotra, S., Manna, Z.: LOLA: runtime monitoring of synchronous systems. In: Proceedings of the 12th International Symposium on Temporal Representation and Reasoning, TIME, pp. 166–174. IEEE Computer Society (2005)

    Google Scholar 

  10. Halbwachs, N.: Synchronous programming of reactive systems. Springer International Series in Engineering and Computer Science, vol. 215. Kluwer Academic Publishers, Dordrecht (1992)

    MATH  Google Scholar 

  11. Karau, H.: Spark-testing-base (2015). http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/

  12. Kuhn, R., Allen, J.: Reactive Design Patterns. Manning Publications, Greenwich (2014)

    Google Scholar 

  13. Leucker, M., Schallhart, C.: A brief account of runtime verification. J. Logic Algebraic Program. 78(5), 293–303 (2009)

    Article  MATH  Google Scholar 

  14. Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., Stamford (2015)

    Google Scholar 

  15. Morales, G.D.F., Bifet, A.: SAMOA: Scalable advanced massive online analysis. J. Mach. Learn. Res. 16, 149–153 (2015)

    MATH  Google Scholar 

  16. Nilsson, R.: ScalaCheck: The Definitive Guide. IT Pro, Artima Incorporated, Upper Saddle River (2014)

    Google Scholar 

  17. Pnueli, A.: Applications of temporal logic to the specification and verification of reactive systems: a survey of current trends. In: de Bakker, J.W., de Roever, W.-P., Rozenberg, G. (eds.) Current Trends in Concurrency. LNCS, vol. 224, pp. 510–584. Springer, Heidelberg (1986)

    Chapter  Google Scholar 

  18. Raymond, P., Roux, Y., Jahier, E.: Lutin: a language for specifying and executing reactive scenarios. EURASIP J. Emb. Syst. 2008, 1–11 (2008). Article ID: 753821

    Google Scholar 

  19. Riesco, A., Rodríguez-Hortalá, J.: A lightweight tool for random testing of stream processing systems (extended version). Technical Report SIC 02/15, Departamento de Sistemas Informáticos y Computación de la Universidad Complutense de Madrid, September 2015. http://maude.sip.ucm.es/~adrian/pubs.html

  20. Schelter, S., Ewen, S., Tzoumas, K., Markl, V.: All roads lead to Rome: optimistic recovery for distributed iterative data processing. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pp. 1919–1928. ACM (2013)

    Google Scholar 

  21. Shamshiri, S., Rojas, J.M., Fraser, G., McMinn P.: Random or genetic algorithm search for object-oriented test suite generation? In: Proceedings of the on Genetic and Evolutionary Computation Conference, pp. 1367–1374. ACM (2015)

    Google Scholar 

  22. Venners, B.: Re: Prop.exists and scalatest matchers (2015). https://groups.google.com/forum/#!msg/scalacheck/Ped7joQLhnY/gNH0SSWkKUgJ

  23. Wolper, P.: Temporal logic can be more expressive. Inf. Control 56(1/2), 72–99 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  24. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, p. 2. USENIX Assoc (2012)

    Google Scholar 

  25. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles, pp. 423–438. ACM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrián Riesco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Riesco, A., Rodríguez-Hortalá, J. (2016). Temporal Random Testing for Spark Streaming. In: Ábrahám, E., Huisman, M. (eds) Integrated Formal Methods. IFM 2016. Lecture Notes in Computer Science(), vol 9681. Springer, Cham. https://doi.org/10.1007/978-3-319-33693-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-33693-0_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-33692-3

  • Online ISBN: 978-3-319-33693-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics