Automatically learning usage behavior and generating event sequences for black-box testing of reactive systems

  • M. Furkan KıraçEmail author
  • Barış Aktemur
  • Hasan Sözer
  • Ceren Şahin Gebizli


We propose a novel technique based on recurrent artificial neural networks to generate test cases for black-box testing of reactive systems. We combine functional testing inputs that are automatically generated from a model together with manually-applied test cases for robustness testing. We use this combination to train a long short-term memory (LSTM) network. As a result, the network learns an implicit representation of the usage behavior that is liable to failures. We use this network to generate new event sequences as test cases. We applied our approach in the context of an industrial case study for the black-box testing of a digital TV system. LSTM-generated test cases were able to reveal several faults, including critical ones, that were not detected with existing automated or manual testing activities. Our approach is complementary to model-based and exploratory testing, and the combined approach outperforms random testing in terms of both fault coverage and execution time.


Test case generation Black-box testing Recurrent neural networks Long short-term memory networks Learning usage behavior 



We would like to thank the software developers, test engineers, and technicians at Vestel Electronics for sharing their resources with us and supporting our case study. We also thank the anonymous reviewers for their comments on this paper.


  1. Aceto, L., Ingólfsdóttir, A., Larsen, K., Srba, J. (2007). Reactive systems: modelling, specification and verification. New York: Cambridge University Press.CrossRefGoogle Scholar
  2. Agruss, C., & Johnson, B. (2000). Ad hoc software testing: a perspective on exploration and improvisation. In Florida institute of technology, pp. 68–69.Google Scholar
  3. Amalfitano, D., Fasolino, A., Tramontana, P., Ta, B., Memon, A. (2015). MobiGUITAR: automated model-based testing of mobile apps. IEEE Software, 32(5), 53–59.CrossRefGoogle Scholar
  4. Barr, E., Harman, M., McMinn, P., Shahbaz, M., Yoo, S. (2015). The oracle problem in software testing: a survey. IEEE Transactions on Software Engineering, 41 (5), 507–525.CrossRefGoogle Scholar
  5. Belli, F. (2001). Finite state testing and analysis of graphical user interfaces. In Proceedings of 12th international symposium on software reliability engineering, pp. 34–43.Google Scholar
  6. Belli, F., Budnik, C., White, L. (2006). Event-based modelling, analysis and testing of user interactions: approach and case study. Software Testing Verification and Reliability, 16(1), 3–32.CrossRefGoogle Scholar
  7. Berner, S., Weber, R., Keller, R. K. (2005). Observations and lessons learned from automated testing. In Proceedings of the 27th international conference on software engineering, pp. 571–579.Google Scholar
  8. Bottou, L. (2012). Stochastic gradient descent tricks. In Neural networks: tricks of the trade, pp. 421–436. Springer.Google Scholar
  9. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing, pp. 1724–1734.Google Scholar
  10. Cotter, A., Shamir, O., Srebro, N., Sridharan, K. (2011). Better mini-batch algorithms via accelerated gradient methods. In Advances in neural information processing systems, pp. 1647–1655.Google Scholar
  11. Dalal, S. R., Jain, A., Karunanithi, N., Leaton, J. M., Lott, C. M., Patton, G. C., Horowitz, B. M. (1999). Model-based testing in practice. In Proceedings of the international conference on software engineering, pp. 285–294.Google Scholar
  12. Elbaum, S., Rothermel, G., Karre, IIS. (2005). M.F.: leveraging user-session data to support web application testing. IEEE Transactions on Software Engineering, 31(3), 187–202.CrossRefGoogle Scholar
  13. Entin, V., Winder, M., Zhang, B., Christmann, S. (2011). Combining model-based and capture-replay testing techniques of graphical user interfaces: an industrial approach. In Proceedings of the 4th IEEE international conference on software testing, verification and validation workshops, pp. 572–577.Google Scholar
  14. Fard, A., Mirzaaghaei, M., Mesbah, A. (2014). Leveraging existing tests in automated test generation for web applications. In Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp. 67–78.Google Scholar
  15. Ferguson, R., & Korel, B. (1996). The chaining approach for software test data generation. ACM Transactions on Software Engineering and Methodology, 5(1), 63–86.CrossRefGoogle Scholar
  16. Gebizli, C., & Sozer, H. (2016). Automated refinement of models for model-based testing using exploratory testing. Software Quality Journal. Published online.
  17. Gebizli, C.S., & Sozer, H. (2014). Improving models for model-based testing based on exploratory testing. In Proceedings of the 6th IEEE workshop on software test automation, pp. 656–661. (COMPSAC Companion).Google Scholar
  18. Gers, F., & Schmidhuber, E. (2001). LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 12(6), 1333–1340.CrossRefGoogle Scholar
  19. Gers, F., & Schmidhuber, J. (2000). Recurrent nets that time and count. In Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, pp. 189–194.Google Scholar
  20. Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv:1308.0850.
  21. Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., Schmidhuber, J. (2017). LSTM: a search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222–2232.MathSciNetCrossRefGoogle Scholar
  22. Guen, H. L., Marie, R., Thelin, T. (2004). Reliability estimation for statistical usage testing using Markov chains. In Proceedings of the 15th international symposium on software reliability engineering, pp. 54–65.Google Scholar
  23. Hagan, M., Demuth, H., Beale, M. (1995). Neural network design. New York: PWS Publishing.Google Scholar
  24. Harel, D. (1987). Statecharts: a visual formalism for complex systems. Science of Computer Programming, 8(3), 231–274.MathSciNetCrossRefGoogle Scholar
  25. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computing, 9(8), 1735–1780.CrossRefGoogle Scholar
  26. Itkonen, J. (2011). Empirical studies on exploratory software testing. Ph.D. thesis Aalto University.Google Scholar
  27. Itkonen, J., Mantyla, M. V., Lassenius, C. (2007). Defect detection efficiency: test case based vs. exploratory testing. In First international symposium on empirical software engineering and measurement, pp. 61–70. IEEE computer society.Google Scholar
  28. Štefanovič, J. (2000). A neural network algorithm for digital circuits test generation. In Proceedings of the European symposium on the state of the art in computational intelligence, pp. 56-60, Physica-Verlag HD, Heidelberg.Google Scholar
  29. Bach, J. (2003). Exploratory testing explained. Tech. rep.,
  30. Kaner, C. (2006). Exploratory testing. In Quality assurance institute worldwide annual software testing conference.Google Scholar
  31. Karpathy, A. (2015). char-rnn
  32. Kingma, D., & Ba, J. (2014). Adam: a method for stochastic optimization. arXiv:1412.6980.
  33. Kirac, M., Aktemur, B., Sozer, H. (2018). VISOR: a fast image processing pipeline with scaling and translation invariance for test oracle automation of visual output systems. Journal of Systems and Software, 136, 266–277.CrossRefGoogle Scholar
  34. Lukac, Z., Zlokolica, V., Mlikota, B., Radonjic, M., Velikic, I. (2012). A testing methodology and system for functional verification of general HbbTV device. In Proceedings of the IEEE international conference on consumer electronics, pp. 325–326.Google Scholar
  35. Marijan, D., Zlokolica, V., Teslic, N., Pekovic, V., Tekcan, T. (2010). Automatic functional TV set failure detection system. IEEE Transactions on Consumer Electronics, 56(1), 125–133. 10.1109/TCE.2010.5439135.CrossRefGoogle Scholar
  36. Meinke, K., & Sindhu, M.A. (2013). LBTest: a learning-based testing tool for reactive systems. In Proceedings of the 6th IEEE international conference on software testing, verification and validation, pp. 447–454.Google Scholar
  37. Memon, A., Banerjee, I., Nguyen, B. N., Robbins, B. (2013). The first decade of GUI ripping: extensions, applications, and broader impacts. In Proceedings of the 20th working conference on reverse engineering, pp. 11–20.Google Scholar
  38. Memon, A., Soffa, M., Pollack, M. (2001). Coverage criteria for GUI testing. ACM SIGSOFT Software Engineering Notes, 26(5), 256–267.CrossRefGoogle Scholar
  39. Mesbah, A., van Deursen, A., Roest, D. (2012). Invariant-based automatic testing of modern web applications. IEEE Transactions on Software Engineering, 38 (1), 35–53.CrossRefGoogle Scholar
  40. Michael, C., McGraw, G., Schatz, M. (2001). Generating software test data by evolution. IEEE Transactions on Software Engineering, 27(12), 1085–1110.CrossRefGoogle Scholar
  41. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814.Google Scholar
  42. Neto, A. C. D., Subramanyan, R., Vieira, M., Travassos, G.H. (2007). A survey on model-based testing approaches: a systematic review. In Proceedings of the 1st ACM international workshop on empirical assessment of software engineering languages and technologies, pp. 31–36.Google Scholar
  43. Nguyen, B., & Memon, A. (2014). An observe-model-exercise* paradigm to test event-driven systems with undetermined input spaces. IEEE Transactions on Software Engineering, 40(3), 216–234.CrossRefGoogle Scholar
  44. Nguyen, B., Robbins, B., Banerjee, I., Memon, A. (2014). GUITAR: an innovative tool for automated testing of gui-driven software. Automated Software Engineering, 21(1), 65–105.CrossRefGoogle Scholar
  45. Pacheco, C., Lahiri, S., Ernst, M., Ball, T. (2006). Feedback-directed random test generation. In Proceedings of the 29th international conference on software engineering, pp. 396–405.Google Scholar
  46. Peković, V., Teslić, N., Resetar, I., Tekcan, T. (2010). Test management and test execution system for automated verification of digital television systems. In IEEE International symposium on consumer electronics (ISCE 2010), pp. 1–6.
  47. Rafi, D., Moses, K., Petersen, K., Mäntylä, M. (2012). Benefits and limitations of automated software testing: systematic literature review and practitioner survey. In Proceedings of the 7th international workshop on automation of software test, pp. 36–42.Google Scholar
  48. Robinson, H. (1999). Finite state model-based testing on a shoestring. In Proceedings of the software testing and analysis and review west conference.Google Scholar
  49. Robinson, H. (2000). Intelligent test automation – a model-based method for generating tests from a description of an application’s behavior. Software Testing and Quality Engineering Magazine, pp. 24–32.Google Scholar
  50. Sak, H., Senior, A., Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the 15th annual conference of the international speech communication association, pp. 338–342.Google Scholar
  51. Sivaraman, G., César, P., Vuorimaa, P. (2001). System software for digital television applications. In IEEE International conference on multimedia and expo, pp. 784–787.Google Scholar
  52. Sprenkle, A., Gibson, E., Sampath, S., Pollock, L. (2005). Automated replay and failure detection for web applications. In Proceedings of the 20th IEEE/ACM international conference on automated software engineering, pp. 253–262.Google Scholar
  53. Tinkham, A., & Kaner, C. (2003). Exploring exploratory testing. In Proceedings of the software testing and analysis and review east conference.Google Scholar
  54. Tretmans, J. (2011). Formal methods for eternal networked software systems, Springer, Berlin.Google Scholar
  55. Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.CrossRefGoogle Scholar
  56. Whittaker, J., & Thomason, M. (1994). A Markov chain model for statistical software testing. IEEE Transactions on Software Engineering, 20(10), 812–824.CrossRefGoogle Scholar
  57. Wohlin, C., Runeson, P., Host, M., Ohlsson, M., Regnell, B., Wesslen, A. (2012). Experimentation in software engineering. Berlin: Springer.CrossRefGoogle Scholar
  58. Wong, W., Debroy, V., Golden, R., Xu, X., Thuraisingham, B. (2012). Effective software fault localization using an RBF neural network. IEEE, Transactions on Reliability, 61(1), 149–169.CrossRefGoogle Scholar
  59. Wong, W., & Qi, Y. (2009). Bp neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 19(4), 573–597.CrossRefGoogle Scholar
  60. Xie, T., & Notkin, D. (2006). Tool-assisted unit-test generation and selection based on operational abstractions. Automated Software Engineering, 13(3), 345–371.CrossRefGoogle Scholar
  61. Wu, Y., & et al. (2016). Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Ozyegin UniversityİstanbulTurkey
  2. 2.Vestel ElectronicsManisaTurkey

Personalised recommendations