Efficient Methods for Trace Analysis Parallelization

  • Fabien Reumont-Locke
  • Naser Ezzati-JivanEmail author
  • Michel R. Dagenais


Tracing provides a low-impact, high-resolution way to observe the execution of a system. As the amount of parallelism in traced systems increases, so does the data generated by the trace. Most trace analysis tools work in a single thread, which hinders their performance as the scale of data increases. In this paper, we explore parallelization as an approach to speedup system trace analysis. We propose a solution which uses the inherent aspects of the CTF trace format to create balanced and parallelizable workloads. Our solution takes into account key factors of parallelization, such as good load balancing, low synchronization overhead and an efficient resolution of data dependencies. We also propose an algorithm to detect and resolve data dependencies during trace analysis, with minimal locking and synchronization. Using this approach, we implement three different trace analysis programs: event counting, CPU usage analysis and I/O usage analysis, to assess the scalability in terms of parallel efficiency. The parallel implementations achieve parallel efficiency above 56% with 32 cores, which translates to a speedup of 18 times the serial speed, when running the parallel trace analyses and using trace data stored on consumer-grade solid state storage devices. We also show the scalability and potential of our approach by measuring the effect of future improvements to trace decoding on parallel efficiency.


Tracing Trace analysis Parallel computing 



The financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and Ericsson Software Research is gratefully acknowledged. We would also like to thank Francis Giraldeau for his advice and comments, as well as Geneviève Bastien and Julien Desfossez for their help.


  1. 1.
    Biancheri, C., Ezzati-Jivan, N., Dagenais, M.R.: Multilayer virtualized systems analysis with kernel tracing. In: IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW 2016), Aug 2016, pp. 1–6 (Online).
  2. 2.
    Ezzati-Jivan, N., Dagenais, M.R.: Multi-scale navigation of large trace data: a survey. Concurr. Comput. Pract. Exp. 29(10), e4068 (2017). CrossRefGoogle Scholar
  3. 3.
    Desnoyers, M., Dagenais, M.: The LTTNG tracer: a low impact performance and behavior monitor for gnu/linux. In: Proceedings of the Ottawa Linux Symposium, vol. 2006 (2006)Google Scholar
  4. 4.
    Desnoyers, M., Dagenais, M.R.: Lockless multi-core high-throughput buffering scheme for kernel tracing. ACM SIGOPS Oper. Syst. Rev. 46(3), 65–81 (2012)CrossRefGoogle Scholar
  5. 5.
    Rostedt, S.: Finding origins of latencies using ftrace. Proc, RT Linux WS (2009)Google Scholar
  6. 6.
    Eigler, F.C., Hat, R.: Problem solving with systemtap. In: Proceedings of the Ottawa Linux Symposium. Citeseer, pp. 261–268, (2006)Google Scholar
  7. 7.
    de Melo, A.C.: The new linux’perf’tools. In: Slidesfrom Linux Kongress (2010)Google Scholar
  8. 8.
    Fournier, P.-M., Desnoyers, M., Dagenais, M.R.: Combined tracing of the kernel and applications with LTTNG. In: Proceedings of the 2009 Linux Symposium (2009)Google Scholar
  9. 9.
    Matni, G., Dagenais, M.: Automata-based approach for kernel trace analysis. In: Canadian Conference on Electrical and Computer Engineering: CCECE’09. IEEE 2009, 970–973 (2009)Google Scholar
  10. 10.
    Wininger, F., Ezzati-Jivan, N., Dagenais, M.R.: A declarative framework for stateful analysis of execution traces. Softw. Qual. J. 25(1), 201–229 (2017). CrossRefGoogle Scholar
  11. 11.
    Kouame, K., Ezzati-Jivan, N., Dagenais, M.R.: A flexible data-driven approach for execution trace filtering. IEEE International Congress on Big Data 2015, 698–703 (2015). CrossRefGoogle Scholar
  12. 12.
    Montplaisir, A., Ezzati-Jivan, N., Wininger, F., Dagenais, M.R.: State history tree: an incremental disk-based data structure for very large interval data. In: International Conference on Social Computing, pp. 716–724 (2013).
  13. 13.
    Veeraraghavan, K., Lee, D., Wester, B., Ouyang, J., Chen, P.M., Flinn, J., Narayanasamy, S.: Doubleplay: parallelizing sequential logging and replay. ACM Trans. Comput. Syst. (TOCS) 30(1), 3 (2012)CrossRefGoogle Scholar
  14. 14.
    Nightingale, E. B., Peek, D., Chen, P. M., Flinn, J.: Parallelizing security checks on commodity hardware. In: ACM Sigplan Notices, vol. 43(3), pp. 308–318. ACM (2008)Google Scholar
  15. 15.
    Süßkraut, M., Knauth, T., Weigert, S., Schiffel, U., Meinhold, M., Fetzer, C.: Prospect: a compiler framework for speculative parallelization. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 131–140. ACM (2010)Google Scholar
  16. 16.
    Zilles, C., Sohi, G.: Master, slave speculative parallelization. In: Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture: (MICRO-35), pp. 85–96. IEEE (2002)Google Scholar
  17. 17.
    Wolf, F., Mohr, B.: Automatic performance analysis of hybrid mpi/openmp applications. J. Syst. Archit. 49(10), 421–439 (2003)CrossRefGoogle Scholar
  18. 18.
    Geimer, M., Wolf, F., Wylie, B.J., Mohr, B.: Scalable parallel trace-based performance analysis. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 303–312. Springer (2006)Google Scholar
  19. 19.
    Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parall. Comput. 35(7), 375 – 388 (2009). (Online)
  20. 20.
    Tumeo, A., Villa, O., Chavarria-Miranda, D.G.: Aho-corasick string matching on shared and distributed-memory parallel architectures. IEEE Trans. Parall. Distrib. Syst. 23(3), 436–443 (2012)CrossRefGoogle Scholar
  21. 21.
    Schuff, D.L., Choe, Y.R., Pai, V.S.: Conservative vs. optimistic parallelization of stateful network intrusion detection. In: IEEE International Symposium on Performance Analysis of Systems and software: ISPASS, 2008, 32–43. IEEE (2008)Google Scholar
  22. 22.
    Vasiliadis, G., Polychronakis, M., Ioannidis, S.: Midea: a multi-parallel intrusion detection architecture. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 297–308. ACM (2011)Google Scholar
  23. 23.
    Ladner, R.E., Fischer, M.J.: Parallel prefix computation. J. ACM (JACM) 27(4), 831–838 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Hillis, W.D., Steele Jr., G.L.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986)CrossRefGoogle Scholar
  25. 25.
    Mytkowicz, T., Musuvathi, M., Schulte, W.: Data-parallel finite-state machines. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 529–542. ACM (2014)Google Scholar
  26. 26.
    Desnoyers, M.: Common trace format (ctf) specification (Online).;a=blob_plain;;hb=master
  27. 27.
    Vergé, A., Ezzati-Jivan, N., Dagenais, M.R.: Hardware-assisted software event tracing. Concurr. Comput. Pract. Exp. 29(10), (2017).
  28. 28.
    Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scalable address spaces using RCU balanced trees. ACM SIGARCH Comput. Archit. News 40(1), 199–210 (2012)CrossRefGoogle Scholar
  29. 29.
    Reumont-Locke, F.: Méthodes efficaces de parallélisation de l’analyse de traces NOYAU. Masters thesis, École Polytechnique de Montréal (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Fabien Reumont-Locke
    • 1
    • 2
  • Naser Ezzati-Jivan
    • 1
    Email author
  • Michel R. Dagenais
    • 1
  1. 1.Computer and Software Engineering DepartmentEcole Polytechnique de MontrealMontrealCanada
  2. 2.GoogleMountain ViewUSA

Personalised recommendations