String Matching on the Internet
We consider a variant of the “string searching in database” problem where the string database comes on a data stream, and processing the data is at a premium but querying is not a runtime bottleneck. Speci.cally, the strings to be searched into (let’s call them the documents) have to be processed online very e.ciently, meaning the documents have to be added to some string searching data structure one by one in time proportional to their length. Of course, we desire this data structure to be small, i.e. at most linear space, and hopefully exhibit a tradeo. between storage/processing cost and accuracy. Upon some query string, the data structure must return whether that string is contained in a document (the presence query), and must also be able to return a list of the documents which contain the query (the attribution query). We may require that the query be large enough and that only portions of it may match (pattern matching). In practice, it is acceptable that the data structure return a superset of the answer, as long as no document from the answer is missing and there are only few false positives; either the false positives can be .ltered (by actual veri.cation if the document texts are available in a repository), or a small number of false positives are acceptable for the application (e.g. network forensics, see below).
KeywordsFalse Positive Rate Block Size Hash Function Intrusion Detection Bloom Filter
Unable to display preview. Download preview PDF.
- 2.Broder, A., Mitzenmatcher, M.: Network applications of Bloom filters: A survey. In: Annual Allerton Conference on Communication, Control, and Computing, pp. 636–646 (2002)Google Scholar
- 3.Cao, P.: Bloom filters - the math, http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html
- 4.Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: An efficient data structure for static support lookup tables. In: Proc. ACM/SIAM Symposium on Discrete Algorithms, pp. 30–39 (2004)Google Scholar
- 5.Cohen, S., Matias, Y.: Spectral Bloom filters. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 241–252 (2003)Google Scholar
- 7.Dharmapurikar, S., Attig, M., Lockwood, J.: Design and implementation of a string matching system for network intrusion detection using fpga-based bloom filters. Technical Report, CSE Dept, Washington University, Saint Louis, MO (2004)Google Scholar
- 9.Kumar, A., Li, L., Wang, J.: Space-code bloom filter for efficient traffic flow measurement. In: Proc. of the Conference on Internet Measurement, Miami Beach, FL, USA, pp. 167–172 (2003)Google Scholar
- 10.Manber, U.: Finding similar files in a large file system. In: Proc. of the Winter 1994 USENIX Conference, San Francisco, CA, pp. 1–10 (1994)Google Scholar
- 12.Rhea, S.C., Liang, K., Brewer, E.: Value-based web caching. In: Proc. 12th International Conference on World Wide Web, pp. 619–628. ACM Press, New York (2003)Google Scholar
- 13.Shanmugasundaram, K., Brönnimann, H., Memon, N.: Payload attribution via hierarchical bloom filters. In: Proc. of the ACM Conference on Computer Communications and Security, pp. 31–41 (2004)Google Scholar
- 14.Shanmugasundaram, K., Memon, N., Savant, A., Brönnimann, H.: Fornet: A distributed forensics network. In: Proc. of MMM-ACNS Workshop, pp. 1–16 (2003)Google Scholar