Skip to main content

A Memory-Efficient Multiple String Matching Algorithm Based on Charset Transformation

  • Conference paper
  • First Online:
Applications and Techniques in Information Security (ATIS 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1116))

Abstract

Multiple string matching algorithm is a core technology in network intrusion detection system. Automata based matching algorithms such as AC and BOM are widely used in practical systems because of their excellent matching performance, but the huge memory usage of automata restricts them to be applied to large-scale pattern set. In this paper, we proposed a charset-transformation-based multiple string matching algorithm named CTM to reduce the memory usage of the automata. Based on the classical compression algorithm banded-row, CTM algorithm optimizes the compression method and increases the compression rate. The proposed CTM algorithm plays a charset transformation on the charset of the patterns to increase the continuity of distribution of non-empty elements in the automata, and then uses the banded-row method to compress the automata. Experiments on random ASCII charset show that the proposed algorithm significantly reduces memory usage and still holds a fast matching speed. Above all, CTM costs about 2.5% of the memory usage of AC, and compared with basic banded-row method, the compression rate of CTM can be increased by about 35%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  Google Scholar 

  2. Allauzen, C., Crochemore, M., Raffinot, M.: Factor oracle: a new structure for pattern matching. In: Pavelka, J., Tel, G., Bartošek, M. (eds.) SOFSEM 1999. LNCS, vol. 1725, pp. 295–310. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47849-3_18

    Chapter  Google Scholar 

  3. Raffinot, M.: On the multi backward DAWG matching algorithm. In: Proceedings of the 4th South American Workshop on String Processing, pp. 149–165 (1999)

    Google Scholar 

  4. Norton, M.: Optimizing pattern matching for intrusion detection. www.idsresearch.org (2004)

  5. Karp, R.M., Rabi, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MathSciNet  Google Scholar 

  6. Wu, S., Manber, U.: A fast algorithm for multi-pattern searching, TR-94-17, Tucson. Department of Computer Science, University of Arizona, AZ (1994)

    Google Scholar 

  7. Baeza-Yates, R.A., Ricardo, A., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992)

    Article  Google Scholar 

  8. Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exper. Algorithmics 5(4), 1–36 (2000)

    MathSciNet  MATH  Google Scholar 

  9. Dencker, P., Dorre, K., Heuft, J.: Optimization of parser tables for portable compilers. ACM Trans. Program. Lang. Syst. 6(4), 546–572 (1984)

    Article  Google Scholar 

  10. Tan, L., Sherwood, T.: A high throughput string matching architecture for intrusion detection and prevention. ACM SIGARCH Comput. Archit. News 33(2), 112–122 (2005)

    Article  Google Scholar 

  11. Vakili, S., Langlois, J.M.P., Boughzala, B., et al.: Memory-efficient string matching for intrusion detection systems using a high-precision pattern grouping algorithm. In: Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems, pp. 37–42 (2016)

    Google Scholar 

  12. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley Publishing, Boston (1986)

    MATH  Google Scholar 

  13. Aoe, J., Morimotoo, K., Sato, T.: An efficient implementation of trie structures. Softw. Pract. Exper. 22(9), 695–721 (1992)

    Article  Google Scholar 

  14. https://en.wikipedia.org/wiki/Finite-state_machine

  15. https://www.snort.org

Download references

Acknowledgement

This work is partially supported by the National Key Research and Development Program of China (Grant No. 2017YFC0820700), and Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDC02030000).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanbing Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, Y., Liu, Y., Sun, G., Tan, J. (2019). A Memory-Efficient Multiple String Matching Algorithm Based on Charset Transformation. In: Shankar Sriram, V., Subramaniyaswamy, V., Sasikaladevi, N., Zhang, L., Batten, L., Li, G. (eds) Applications and Techniques in Information Security. ATIS 2019. Communications in Computer and Information Science, vol 1116. Springer, Singapore. https://doi.org/10.1007/978-981-15-0871-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0871-4_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0870-7

  • Online ISBN: 978-981-15-0871-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics