Efficient Algorithms for Similarity Measures over Sequential Data: A Look Beyond Kernels

Rieck, Konrad; Laskov, Pavel; Müller, Klaus-Robert

doi:10.1007/11861898_38

Konrad Rieck²⁰,
Pavel Laskov²⁰ &
Klaus-Robert Müller^20,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4174))

Included in the following conference series:

Joint Pattern Recognition Symposium

2251 Accesses
3 Citations

Abstract

Kernel functions as similarity measures for sequential data have been extensively studied in previous research. This contribution addresses the efficient computation of distance functions and similarity coefficients for sequential data. Two proposed algorithms utilize different data structures for efficient computation and yield a runtime linear in the sequence length. Experiments on network data for intrusion detection suggest the importance of distances and even non-metric similarity measures for sequential data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hamming, R.W.: Error-detecting and error-correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
MathSciNet Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1964)
Google Scholar
Salton, G.: Mathematics and information retrieval. Journal of Documentation 35(1), 1–29 (1979)
Article Google Scholar
Damashek, M.: Gauging similarity with n-grams: Language-independent categorization of text. Science 267(5199), 843–848 (1995)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Technical Report 23, LS VIII, University of Dortmund (1997)
Google Scholar
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security. Kluwer, Dordrecht (2002)
Google Scholar
Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.R.: Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites. BioInformatics 16(9), 799–807 (2000)
Article Google Scholar
Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. Pacific Symp. Biocomputing, pp. 564–575 (2002)
Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Article MATH Google Scholar
Vishwanathan, S., Smola, A.: Fast Kernels for String and Tree Matching. In: Kernels and Bioinformatics, pp. 113–130. MIT Press, Cambridge (2004)
Google Scholar
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.: Mismatch string kernel for discriminative protein classification. Bioinformatics 1(1), 1–10 (2003)
Google Scholar
Leslie, C., Kuang, R.: Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)
MathSciNet Google Scholar
Rousu, J., Shawe-Taylor, J.: Efficient computation of gapped substring kernels for large alphabets. Journal of Machine Leaning Research 6, 1323–1344 (2005)
MathSciNet Google Scholar
Schölkopf, B.: The kernel trick for distances. In: Leen, T., Diettrich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13. MIT Press, Cambridge (2001)
Google Scholar
Jaccard, P.: Contribution au problème de l’immigration post-glaciaire de la flore alpine. Bulletin de la Société Vaudoise des Sciences Naturelles 36, 87–130 (1900)
Google Scholar
Anderberg, M.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)
MATH Google Scholar
Knuth, D.: The art of computer programming, vol. 3. Addison-Wesley, Reading (1973)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004)
Google Scholar
Sonnenburg, S., Zien, A., Rätsch, G.: ARTS: Accurate recognition of transcription starts in human. Bioinformatics (submitted, 2006)
Google Scholar
Lippmann, R., Haines, J., Fried, D., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Computer Networks 34(4), 579–595 (2000)
Article Google Scholar
Lewis, D.D.: Reuters-21578 text categorization test collection. AT&T Labs Research (1997)
Google Scholar
Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer FIRST.IDA, Kekuléstraße 7, 12489, Berlin, Germany
Konrad Rieck, Pavel Laskov & Klaus-Robert Müller
University of Potsdam, Am Neuen Palais 10, 14469, Potsdam, Germany
Klaus-Robert Müller

Authors

Konrad Rieck
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Laskov
View author publications
You can also search for this author in PubMed Google Scholar
Klaus-Robert Müller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Norwegian Information Security Laboratory, Gjøvik University College, Norway
Katrin Franke
Fraunhofer FIRST (IDA), Berlin, Germany
Klaus-Robert Müller
Department of Security Technology, Fraunhofer Institute for Production Systems and Design Technology (IPK), Pascalstr. 8-9, 10587, Berlin, Germany
Bertram Nickolay
Department of Electronic Imaging Technology, Fraunhofer Institute for Information and Communication Technology, Heinrich Hertz Institute (HHI), Einsteinufer 37, 10587, Berlin, Germany
Ralf Schäfer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rieck, K., Laskov, P., Müller, KR. (2006). Efficient Algorithms for Similarity Measures over Sequential Data: A Look Beyond Kernels. In: Franke, K., Müller, KR., Nickolay, B., Schäfer, R. (eds) Pattern Recognition. DAGM 2006. Lecture Notes in Computer Science, vol 4174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11861898_38

Download citation

DOI: https://doi.org/10.1007/11861898_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44412-1
Online ISBN: 978-3-540-44414-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics