Skip to main content

Efficient Algorithms for Similarity Measures over Sequential Data: A Look Beyond Kernels

  • Conference paper
Pattern Recognition (DAGM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4174))

Included in the following conference series:

Abstract

Kernel functions as similarity measures for sequential data have been extensively studied in previous research. This contribution addresses the efficient computation of distance functions and similarity coefficients for sequential data. Two proposed algorithms utilize different data structures for efficient computation and yield a runtime linear in the sequence length. Experiments on network data for intrusion detection suggest the importance of distances and even non-metric similarity measures for sequential data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hamming, R.W.: Error-detecting and error-correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)

    MathSciNet  Google Scholar 

  2. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1964)

    Google Scholar 

  3. Salton, G.: Mathematics and information retrieval. Journal of Documentation 35(1), 1–29 (1979)

    Article  Google Scholar 

  4. Damashek, M.: Gauging similarity with n-grams: Language-independent categorization of text. Science 267(5199), 843–848 (1995)

    Article  Google Scholar 

  5. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Technical Report 23, LS VIII, University of Dortmund (1997)

    Google Scholar 

  6. Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security. Kluwer, Dordrecht (2002)

    Google Scholar 

  7. Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.R.: Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites. BioInformatics 16(9), 799–807 (2000)

    Article  Google Scholar 

  8. Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. Pacific Symp. Biocomputing, pp. 564–575 (2002)

    Google Scholar 

  9. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)

    Article  MATH  Google Scholar 

  10. Vishwanathan, S., Smola, A.: Fast Kernels for String and Tree Matching. In: Kernels and Bioinformatics, pp. 113–130. MIT Press, Cambridge (2004)

    Google Scholar 

  11. Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.: Mismatch string kernel for discriminative protein classification. Bioinformatics 1(1), 1–10 (2003)

    Google Scholar 

  12. Leslie, C., Kuang, R.: Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)

    MathSciNet  Google Scholar 

  13. Rousu, J., Shawe-Taylor, J.: Efficient computation of gapped substring kernels for large alphabets. Journal of Machine Leaning Research 6, 1323–1344 (2005)

    MathSciNet  Google Scholar 

  14. Schölkopf, B.: The kernel trick for distances. In: Leen, T., Diettrich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13. MIT Press, Cambridge (2001)

    Google Scholar 

  15. Jaccard, P.: Contribution au problème de l’immigration post-glaciaire de la flore alpine. Bulletin de la Société Vaudoise des Sciences Naturelles 36, 87–130 (1900)

    Google Scholar 

  16. Anderberg, M.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)

    MATH  Google Scholar 

  17. Knuth, D.: The art of computer programming, vol. 3. Addison-Wesley, Reading (1973)

    Google Scholar 

  18. Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  19. Sonnenburg, S., Zien, A., Rätsch, G.: ARTS: Accurate recognition of transcription starts in human. Bioinformatics (submitted, 2006)

    Google Scholar 

  20. Lippmann, R., Haines, J., Fried, D., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Computer Networks 34(4), 579–595 (2000)

    Article  Google Scholar 

  21. Lewis, D.D.: Reuters-21578 text categorization test collection. AT&T Labs Research (1997)

    Google Scholar 

  22. Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rieck, K., Laskov, P., Müller, KR. (2006). Efficient Algorithms for Similarity Measures over Sequential Data: A Look Beyond Kernels. In: Franke, K., Müller, KR., Nickolay, B., Schäfer, R. (eds) Pattern Recognition. DAGM 2006. Lecture Notes in Computer Science, vol 4174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11861898_38

Download citation

  • DOI: https://doi.org/10.1007/11861898_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44412-1

  • Online ISBN: 978-3-540-44414-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics