Abstract
Kernel functions as similarity measures for sequential data have been extensively studied in previous research. This contribution addresses the efficient computation of distance functions and similarity coefficients for sequential data. Two proposed algorithms utilize different data structures for efficient computation and yield a runtime linear in the sequence length. Experiments on network data for intrusion detection suggest the importance of distances and even non-metric similarity measures for sequential data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hamming, R.W.: Error-detecting and error-correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1964)
Salton, G.: Mathematics and information retrieval. Journal of Documentation 35(1), 1–29 (1979)
Damashek, M.: Gauging similarity with n-grams: Language-independent categorization of text. Science 267(5199), 843–848 (1995)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Technical Report 23, LS VIII, University of Dortmund (1997)
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security. Kluwer, Dordrecht (2002)
Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.R.: Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites. BioInformatics 16(9), 799–807 (2000)
Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. Pacific Symp. Biocomputing, pp. 564–575 (2002)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Vishwanathan, S., Smola, A.: Fast Kernels for String and Tree Matching. In: Kernels and Bioinformatics, pp. 113–130. MIT Press, Cambridge (2004)
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.: Mismatch string kernel for discriminative protein classification. Bioinformatics 1(1), 1–10 (2003)
Leslie, C., Kuang, R.: Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)
Rousu, J., Shawe-Taylor, J.: Efficient computation of gapped substring kernels for large alphabets. Journal of Machine Leaning Research 6, 1323–1344 (2005)
Schölkopf, B.: The kernel trick for distances. In: Leen, T., Diettrich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13. MIT Press, Cambridge (2001)
Jaccard, P.: Contribution au problème de l’immigration post-glaciaire de la flore alpine. Bulletin de la Société Vaudoise des Sciences Naturelles 36, 87–130 (1900)
Anderberg, M.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)
Knuth, D.: The art of computer programming, vol. 3. Addison-Wesley, Reading (1973)
Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004)
Sonnenburg, S., Zien, A., Rätsch, G.: ARTS: Accurate recognition of transcription starts in human. Bioinformatics (submitted, 2006)
Lippmann, R., Haines, J., Fried, D., Korba, J., Das, K.: The 1999 DARPA off-line intrusion detection evaluation. Computer Networks 34(4), 579–595 (2000)
Lewis, D.D.: Reuters-21578 text categorization test collection. AT&T Labs Research (1997)
Rieck, K., Laskov, P.: Detecting unknown network attacks using language models. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 74–90. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rieck, K., Laskov, P., Müller, KR. (2006). Efficient Algorithms for Similarity Measures over Sequential Data: A Look Beyond Kernels. In: Franke, K., Müller, KR., Nickolay, B., Schäfer, R. (eds) Pattern Recognition. DAGM 2006. Lecture Notes in Computer Science, vol 4174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11861898_38
Download citation
DOI: https://doi.org/10.1007/11861898_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44412-1
Online ISBN: 978-3-540-44414-5
eBook Packages: Computer ScienceComputer Science (R0)