Skip to main content
Log in

A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments

International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

We present techniques to parallelize membership tests for Deterministic Finite Automata (DFAs). Our method searches arbitrary regular expressions by matching multiple bytes in parallel using speculation. We partition the input string into chunks, match chunks in parallel, and combine the matching results. Our parallel matching algorithm exploits structural DFA properties to minimize the speculative overhead. Unlike previous approaches, our speculation is failure-free, i.e., (1) sequential semantics are maintained, and (2) speed-downs are avoided altogether. On architectures with a SIMD gather-operation for indexed memory loads, our matching operation is fully vectorized. The proposed load-balancing scheme uses an off-line profiling step to determine the matching capacity of each participating processor. Based on matching capacities, DFA matches are load-balanced on inhomogeneous parallel architectures such as cloud computing environments. We evaluated our speculative DFA membership test for a representative set of benchmarks from the Perl-compatible Regular Expression (PCRE) library and the PROSITE protein database. Evaluation was conducted on a 4 CPU (40 cores) shared-memory node of the Intel Academic Program Manycore Testing Lab (Intel MTL), on the Intel AVX2 SDE simulator for 8-way fully vectorized SIMD execution, and on a 20-node (288 cores) cluster on the Amazon EC2 computing cloud. Obtained speedups are on the order of \(\mathcal O \left( 1+\frac{|P|-1}{|Q|\cdot \gamma }\right) \), where \(|P|\) denotes the number of processors or SIMD units, \(|Q|\) denotes the number of DFA states, and \(0<\gamma \le 1\) represents a statically computed DFA property. For all observed cases, we found that \(0.02<\gamma <0.47\). Actual speedups range from 2.3\(\times \) to 38.8\(\times \) for up to 512 DFA states for PCRE, and between 1.3\(\times \) and 19.9\(\times \) for up to 1,288 DFA states for PROSITE on a 40-core MTL node. Speedups on the EC2 computing cloud range from 5.0\(\times \) to 65.8\(\times \) for PCRE, and from 5.0\(\times \) to 138.5\(\times \) for PROSITE. Speedups of our C-based DFA matcher over the Perl-based ScanProsite scan tool range from 559.3\(\times \) to 15079.7\(\times \) on a 40-core MTL node. We show the scalability of our approach for input-sizes of up to 10 GB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. A high-performance and widely portable implementation of the MPI standard (MPICH2) Web Site: http://www.mcs.anl.gov/research/projects/mpich2. Retrieved Aug 2012. MPICH2 version 1.4

  2. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  3. Amazon Web Services: EC2 web site. http://aws.amazon.com/ec2. Retrieved Aug 2012

  4. Amazon Web Services: EC2 FAQs. http://aws.amazon.com/ec2/faqs. Retrieved Jan 2013

  5. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Zaharia, M.: Above the clouds: a Berkeley view of cloud computing. Technical report, University of California at Berkeley, Electrical Engineering and Computer Sciences (2009)

  6. Asian Association for Algorithms and Computation (AAAC): 4th annual meeting website (2011). http://3glab.cs.nthu.edu.tw/aaac2011. Retrieved Jan 2013

  7. Burgstaller, B., Han, Y.-S., Jung, M., Ko, Y.: On the parallelization of DFA membership tests. Technical Report. TR-0003, Department of Computer Science, Yonsei University, Seoul 120–749, Korea. http://elc.yonsei.ac.kr/PDFA.html (2011)

  8. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M., Estreicher, A., Gasteiger, E., Martin, M., Michoud, K., O’Donovan, C., Phan, I., et al.: The SWISS-PROT protein knowledgebase. Nucleic Acids Res. 31(1), 365 (2003)

    Article  Google Scholar 

  9. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)

    Article  MATH  Google Scholar 

  10. Brumley, D., Newsome, J., Song, D., Wang, H., Jha, S.: Towards automatic generation of vulnerability-based signatures. In: Proceedings of the 2006 IEEE Symposium on Security and Privacy, SP ’06, pp. 2–16. IEEE Computer Society (2006). doi:10.1109/SP.2006.41

  11. Butenhof, D.R.: Programming with POSIX Threads. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1997)

    Google Scholar 

  12. Choi, H., Burgstaller, B.: Non-blocking parallel subset construction on shared-memory multicore architectures. In: Proceedings of the 11th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2013). CRPIT (2013)

  13. Cox, R.: Regular expression matching can be simple and fast (but is slow in Java, Perl, PHP, Python, Ruby,...) (2007). http://swtch.com/rsc/regexp/regexp1.html

  14. Gattiker, A., Gasteiger, E., Bairoch, A.: ScanProsite: a reference implementation of a PROSITE scanning tool. Appl. Bioinform 1(2), 107–108 (2002)

    Google Scholar 

  15. Grail+ Project Web Site: http://www.csd.uwo.ca/Research/grail. Retrieved Aug 2012

  16. Gschwind, M., Hofstee, H.P., Flachs, B., Hopkins, M., Watanabe, Y., Yamazaki, T.: Synergistic Processing in Cell’s Multicore Architecture. IEEE Micro 26(2), 10–24 (2006). doi:10.1109/MM.2006.41

    Google Scholar 

  17. Haertel, M.: Why GNU grep is fast (2010). http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

  18. Hillis, W.D., Steele Jr, G.L.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986). doi:10.1145/7902.7903

    Article  Google Scholar 

  19. Holub, J., Štekr, S.: On parallel implementations of deterministic finite automata. In: Proceedings of the 14th International Conference on Implementation and Application of Automata, pp. 54–64 (2009)

  20. Intel Academic Program Manycore Testing Lab Site: http://software.intel.com/en-us/articles/intel-many-core-testing-lab. Retrieved Aug 2012

  21. Intel Advanced Vector Extensions Programming Reference: http://software.intel.com/en-us/avx. Retrieved Aug 2012. JUNE 2011 version

  22. Intel Software Development Emulator (SDE) Web Site: http://software.intel.com/en-us/articles/intel-software-development-emulator. Retrieved Aug 2012. SDE version 4.46.0

  23. Jones, C.G., Liu, R., Meyerovich, L., Asanović, K., Bodík, R.: Parallelizing the web browser. In: Proceedings of the First USENIX Conference on Hot Topics in Parallelism, HotPar’09, pp. 7–7. USENIX Association, Berkeley, CA, USA (2009)

  24. Karonis, N., Supinski, B.R.D., Foster, I., Gropp, W., Lusk, E., Bresnahan, J.: Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In: Proceedings of the 14th International Symposium on Parallel and Distributed Processing, IPDPS ’00, pp. 377. IEEE Computer Society, Washington, DC, USA (2000)

  25. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  26. Ladner, R.E., Fischer, M.J.: Parallel prefix computation. J. ACM 27(4), 831–838 (1980). doi:10.1145/322217.322232

    Article  MATH  MathSciNet  Google Scholar 

  27. Lin, C., Snyder, L.: Principles of Parallel Programming. Addison Wesley, Boston (2008)

    Google Scholar 

  28. Luchaup, D., Smith, R., Estan, C., Jha, S.: Multi-byte regular expression matching with speculation. In: Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection, RAID ’09, pp. 284–303. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-642-04342-0_15

  29. Luchaup, D., Smith, R., Estan, C., Jha, S.: Speculative parallel pattern matching. IEEE Trans. Inf. Forensics Secur. 6(2), 438–451 (2011)

    Article  Google Scholar 

  30. Luján, M., Gustafson, P., Paleczny, M., Vick, C.A.: Speculative parallelization–eliminating the overhead of failure. In: Perrott, R.H., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC, Lecture Notes in Computer Science, vol. 4782, pp. 460–471. Springer, Berlin (2007)

  31. Misra, J.: Derivation of a parallel string matching algorithm. Inf. Process. Lett. 85(5), 255–260 (2003). doi:10.1016/S0020-0190(02)00416-7

    Article  MATH  Google Scholar 

  32. OpenMPI Web Site: http://www.open-mpi.org. Retrieved Aug 2012

  33. Ostermann, S., Iosup, A., Yigitbasi, N., Prodan, R., Fahringer, T., Epema, D.: A performance analysis of EC2 cloud computing services for scientific computing. In: Cloud Computing, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 34, chap. 9, pp. 115–131. Springer, Berlin, Heidelberg (2010)

  34. Perl Compatible Regular Expression Library Web Site: http://www.pcre.org. Retrieved Aug 2012

  35. PROSITE Web Site: http://prosite.expasy.org. Retrieved Aug 2012

  36. Ravikumar, B.: Parallel algorithms for finite automata problems. In: IPPS/SPDP Workshops, vol. 1388. Springer (1998)

  37. Raymond, D., Wood, D.: Grail: a C++ library for automata and expressions. J. Symb. Comput. 17, 17–341 (1995)

    Google Scholar 

  38. Roesch, M.: Snort: lightweight intrusion detection for networks. In: Proceedings of the 13th USENIX Conference on System Administration, LISA ’99, pp. 229–238. USENIX Association (1999)

  39. ScanProsite Web Site: http://prosite.expasy.org/scanprosite. Retrieved Aug 2012

  40. Scarpazza, D.P., Villa, O., Petrini, F.: Peak-performance DFA-based string matching on the Cell processor. In: 21th International Parallel and Distributed Processing, Symposium, pp. 1–8 (2007)

  41. Schad, J., Dittrich, J., Quiané-Ruiz, J.A.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. Proc. VLDB Endow. 3(1–2), 460–471 (2010)

    Google Scholar 

  42. Sigrist, C., Cerutti, L., De Castro, E., Langendijk-Genevaux, P., Bulliard, V., Bairoch, A., Hulo, N.: PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 38(suppl 1), D161 (2010)

    Article  Google Scholar 

  43. Sommer, R., Paxson, V.: Enhancing byte-level network intrusion detection signatures with context. In: Proceedings of the 10th ACM Conference on Computer and Communications Security, CCS ’03, pp. 262–271. ACM (2003). doi:10.1145/948109.948145

  44. StarCluster cluster computing toolkit: http://web.mit.edu/star/cluster. Retrieved July 2012, version 0.93.3

  45. Tewari, A., Srivastava, U., Gupta, P.: A parallel DFA minimization algorithm. In: Proceedings of the 9th International Conference on High Performance Computing, HiPC ’02, pp. 34–40. Springer (2002)

  46. Wang, G., Ng, T.S.E.: The impact of virtualization on network performance of Amazon EC2 data center. In: Proceedings of the 29th Conference on Information Communications, INFOCOM’10, pp. 1163–1171. IEEE Press (2010)

  47. Wang, X., He, K., Liu, B.: Parallel architecture for high throughput DFA-based deep packet inspection. In: 2010 IEEE International Conference on Communications, pp. 1–5 (2010)

Download references

Acknowledgments

Research partially supported by the National Research Foundation of Korea (NRF) grants funded by the Korean government (MEST) (Grant No. 2010-0005234, 2012R1A1A2044562 and 2012K2A1A9054713), through the Global Ph.D. Fellowship Program 2011 of the NRF (Grant No. 2010-0008582), and by the Intel Academic Program Manycore Testing Lab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bernd Burgstaller.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ko, Y., Jung, M., Han, YS. et al. A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments. Int J Parallel Prog 42, 456–489 (2014). https://doi.org/10.1007/s10766-013-0258-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-013-0258-5

Keywords

Navigation