Abstract
Multiple string matching algorithm is a core technology in network intrusion detection system. Automata based matching algorithms such as AC and BOM are widely used in practical systems because of their excellent matching performance, but the huge memory usage of automata restricts them to be applied to large-scale pattern set. In this paper, we proposed a charset-transformation-based multiple string matching algorithm named CTM to reduce the memory usage of the automata. Based on the classical compression algorithm banded-row, CTM algorithm optimizes the compression method and increases the compression rate. The proposed CTM algorithm plays a charset transformation on the charset of the patterns to increase the continuity of distribution of non-empty elements in the automata, and then uses the banded-row method to compress the automata. Experiments on random ASCII charset show that the proposed algorithm significantly reduces memory usage and still holds a fast matching speed. Above all, CTM costs about 2.5% of the memory usage of AC, and compared with basic banded-row method, the compression rate of CTM can be increased by about 35%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Allauzen, C., Crochemore, M., Raffinot, M.: Factor oracle: a new structure for pattern matching. In: Pavelka, J., Tel, G., Bartošek, M. (eds.) SOFSEM 1999. LNCS, vol. 1725, pp. 295–310. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47849-3_18
Raffinot, M.: On the multi backward DAWG matching algorithm. In: Proceedings of the 4th South American Workshop on String Processing, pp. 149–165 (1999)
Norton, M.: Optimizing pattern matching for intrusion detection. www.idsresearch.org (2004)
Karp, R.M., Rabi, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Wu, S., Manber, U.: A fast algorithm for multi-pattern searching, TR-94-17, Tucson. Department of Computer Science, University of Arizona, AZ (1994)
Baeza-Yates, R.A., Ricardo, A., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992)
Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exper. Algorithmics 5(4), 1–36 (2000)
Dencker, P., Dorre, K., Heuft, J.: Optimization of parser tables for portable compilers. ACM Trans. Program. Lang. Syst. 6(4), 546–572 (1984)
Tan, L., Sherwood, T.: A high throughput string matching architecture for intrusion detection and prevention. ACM SIGARCH Comput. Archit. News 33(2), 112–122 (2005)
Vakili, S., Langlois, J.M.P., Boughzala, B., et al.: Memory-efficient string matching for intrusion detection systems using a high-precision pattern grouping algorithm. In: Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems, pp. 37–42 (2016)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley Publishing, Boston (1986)
Aoe, J., Morimotoo, K., Sato, T.: An efficient implementation of trie structures. Softw. Pract. Exper. 22(9), 695–721 (1992)
Acknowledgement
This work is partially supported by the National Key Research and Development Program of China (Grant No. 2017YFC0820700), and Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDC02030000).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lu, Y., Liu, Y., Sun, G., Tan, J. (2019). A Memory-Efficient Multiple String Matching Algorithm Based on Charset Transformation. In: Shankar Sriram, V., Subramaniyaswamy, V., Sasikaladevi, N., Zhang, L., Batten, L., Li, G. (eds) Applications and Techniques in Information Security. ATIS 2019. Communications in Computer and Information Science, vol 1116. Springer, Singapore. https://doi.org/10.1007/978-981-15-0871-4_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-0871-4_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0870-7
Online ISBN: 978-981-15-0871-4
eBook Packages: Computer ScienceComputer Science (R0)