We are presenting a highly-efficient, novel architecture (which we call FAST, or Forensic Analysis of Sensitive Traces) for high-performance big data forensics for heterogeneous systems (CPU and GPU-based). Our model uses a highly-compact storage format of the widely known Aho-Corasick algorithm [1], as well as a partial pruning mechanism to ensure the lowest possible memory footprint, while maximizing throughput performance. We are comparing our performance with classic methods used in data forensics and observe significant memory footprint improvements, as well as massive throughput improvements throughout all stages of big data processing.


Data forensics Big data High performance computing Efficient storage Aho-corasick GPU processing 



This work was partially supported by the VI-SEEM H2020-EINFRA 675121 grant and InnoHPC Interreg - Danube Transnational Programme grant. The views expressed in this paper do not necessarily reflect those of the corresponding projects consortium members.


  1. 1.
    Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Malwadkar, A., Patil, S.: Data mining techniques for digital forensic analysis. Int. J. Recent Innov. Trends Comput. Commun. 4(3), 17–22 (2016)Google Scholar
  3. 3.
    Baggili, I., Breitinger, F.: Data sources for advancing cyber forensics: what the social world has to offer. In: 2015 AAAI Spring Symposium Series. AAAI Publications (2015)Google Scholar
  4. 4.
    Mercedes, B., Mariela, L.: Solving a big-data problem with GPU: the network traffic analysis. J. Comput. Sci. Technol. 15(1), 30–39 (2015). ISSN 1666–6038Google Scholar
  5. 5.
    Achile, M., Roger, A.: Obtaining digital evidence from intrusion detection systems. Int. J. Comput. Appl. 95(12), 34–41 (2014). (0975 8887)Google Scholar
  6. 6.
    Pilli, E., Joshi, R., Niyogi, R.: A framework for network forensic analysis. In: Information and Communication Technologies. ICT: Communications in Computer and Information Science, vol. 101. Springer, Berlin, Heidelberg (2010)Google Scholar
  7. 7.
    Breeuwsma, M., et al.: Forensic data recovery from flash memory. Small Scale Digit. Device Forensics J. 1(1), 1–17 (2007)Google Scholar
  8. 8.
    Al-Alawi, A.: Cybercrimes, computer forensics and their impact in business climate: Bahrain status. Res. J. Bus. Manage. 8, 139–156 (2014)CrossRefGoogle Scholar
  9. 9.
  10. 10.
    FileSig Software, SimpleCarver.
  11. 11.
  12. 12.
    Pontello, M.: TrID - File Identifier.
  13. 13.
    NVIDIA, NVIDIA CUDA Compute Unified Device Architecture Programming Guide, version 4.1.
  14. 14.
    Pungila, C., Reja, M., Negru, V.: Efficient parallel automata construction for hybrid resource-impelled data-matching. Future Gener. Comput. Syst. 36, 31–41 (2013). ISSN 0167-739XCrossRefGoogle Scholar
  15. 15.
    Pungila, C., Negru, V.: A highly-efficient memory-compression approach for GPU-accelerated virus signature matching. In: Information Security Conference (ISC) (2012)Google Scholar
  16. 16.
    Pungila, C., Negru, V.: Real-time polymorphic Aho-Corasick automata for heterogeneous malicious code detection. In: International Joint Conference SOCO 2013-CISIS 2013-ICEUTE 2013. Advances in Intelligent Systems and Computing, Series no. 239, pp. 439–448. Springer (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Faculty of Mathematics and Informatics, Computer Science DepartmentWest University of TimisoaraTimisoaraRomania

Personalised recommendations