Advertisement

String Matching on the Internet

  • Hervé Brönnimann
  • Nasir Memon
  • Kulesh Shanmugasundaram
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3405)

Abstract

We consider a variant of the “string searching in database” problem where the string database comes on a data stream, and processing the data is at a premium but querying is not a runtime bottleneck. Speci.cally, the strings to be searched into (let’s call them the documents) have to be processed online very e.ciently, meaning the documents have to be added to some string searching data structure one by one in time proportional to their length. Of course, we desire this data structure to be small, i.e. at most linear space, and hopefully exhibit a tradeo. between storage/processing cost and accuracy. Upon some query string, the data structure must return whether that string is contained in a document (the presence query), and must also be able to return a list of the documents which contain the query (the attribution query). We may require that the query be large enough and that only portions of it may match (pattern matching). In practice, it is acceptable that the data structure return a superset of the answer, as long as no document from the answer is missing and there are only few false positives; either the false positives can be .ltered (by actual veri.cation if the document texts are available in a repository), or a small number of false positives are acceptable for the application (e.g. network forensics, see below).

Keywords

False Positive Rate Block Size Hash Function Intrusion Detection Bloom Filter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bloom, B.: Space/time tradeoffs in hash coding with allowable errors. Communnications of the ACM 13(7), 422–426 (1970)zbMATHCrossRefGoogle Scholar
  2. 2.
    Broder, A., Mitzenmatcher, M.: Network applications of Bloom filters: A survey. In: Annual Allerton Conference on Communication, Control, and Computing, pp. 636–646 (2002)Google Scholar
  3. 3.
  4. 4.
    Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: An efficient data structure for static support lookup tables. In: Proc. ACM/SIAM Symposium on Discrete Algorithms, pp. 30–39 (2004)Google Scholar
  5. 5.
    Cohen, S., Matias, Y.: Spectral Bloom filters. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 241–252 (2003)Google Scholar
  6. 6.
    Demaine, E.D., Lopez-Ortiz, A.: A linear lower bound on index size for text retrieval. Journal of Algorithms 48(1), 2–15 (2003); Special issue of selected papers from the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2001)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Dharmapurikar, S., Attig, M., Lockwood, J.: Design and implementation of a string matching system for network intrusion detection using fpga-based bloom filters. Technical Report, CSE Dept, Washington University, Saint Louis, MO (2004)Google Scholar
  8. 8.
    Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: A scalable wide-area web cache sharing protocol. IEEE /ACM Transactions on Networking 8(3), 281–293 (2000)CrossRefGoogle Scholar
  9. 9.
    Kumar, A., Li, L., Wang, J.: Space-code bloom filter for efficient traffic flow measurement. In: Proc. of the Conference on Internet Measurement, Miami Beach, FL, USA, pp. 167–172 (2003)Google Scholar
  10. 10.
    Manber, U.: Finding similar files in a large file system. In: Proc. of the Winter 1994 USENIX Conference, San Francisco, CA, pp. 1–10 (1994)Google Scholar
  11. 11.
    Mitzenmacher, M.: Compressed Bloom filters. IEEE/ACM Transactions on Networking 10(5), 613–620 (2002)CrossRefGoogle Scholar
  12. 12.
    Rhea, S.C., Liang, K., Brewer, E.: Value-based web caching. In: Proc. 12th International Conference on World Wide Web, pp. 619–628. ACM Press, New York (2003)Google Scholar
  13. 13.
    Shanmugasundaram, K., Brönnimann, H., Memon, N.: Payload attribution via hierarchical bloom filters. In: Proc. of the ACM Conference on Computer Communications and Security, pp. 31–41 (2004)Google Scholar
  14. 14.
    Shanmugasundaram, K., Memon, N., Savant, A., Brönnimann, H.: Fornet: A distributed forensics network. In: Proc. of MMM-ACNS Workshop, pp. 1–16 (2003)Google Scholar
  15. 15.
    Snoeren, A.C., Partridge, C., Sanchez, L.A., Jones, C.E., Tchakountio, F., Kent, S.T., Strayer, W.T.: Single-packet IP traceback. IEEE/ACM Transactions on Networking 10(6), 721–734 (2002)CrossRefGoogle Scholar
  16. 16.
    Spring, N.T., Wetherall, D.: A protocol-independent technique for eliminating redundant network traffic. In: Proc. of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 87–95. ACM Press, New York (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Hervé Brönnimann
    • 1
  • Nasir Memon
    • 1
  • Kulesh Shanmugasundaram
    • 1
  1. 1.Polytechnic UniversityUSA

Personalised recommendations