Abstract
A statistical model is presented as an alternative to negative selection in anomaly detection of discrete data. We extend the use of probabilistic generative models from fixed-length binary strings into variable-length strings from a finite symbol alphabet using a mixture model of multinomial distributions for the frequency of adjacent symbols in a sliding window over a string. Robust and localized change analysis of text corpora is viewed as an application area.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Forrest, S., Perelson, A.S., Allen, L., Cherukuri, R.: Self-nonself discrimination in a computer. In: Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy, Oakland, CA, pp. 202–212. IEEE Computer Society Press, Los Alamitos (1994)
Stibor, T.: An empirical study of self/non-self discrimination in binary data with a kernel estimator. In: Bentley, P.J., Lee, D., Jung, S. (eds.) ICARIS 2008. LNCS, vol. 5132, pp. 352–363. Springer, Heidelberg (2008)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Stibor, T.: Discriminating self from non-self with finite mixtures of multivariate Bernoulli distributions. In: Proceedings of Genetic and Evolutionary Computation Conference – GECCO, pp. 127–134. ACM Press, New York (2008)
Pöllä, M., Honkela, T.: Change detection of text documents using negative first-order statistics. In: Proceedings of AKRR 2008, The Second International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, Porvoo, Finland, September 2008, pp. 48–55 (2008)
D’haeseleer, P.: An immunological approach to change detection: theoretical results. In: Proceedings of the 9th Computer Security Foundations Workshop, pp. 18–26. IEEE Computer Society Press, Los Alamitos (1996)
de Castro, L.N., Timmis, J. (eds.): Artificial Immune Systems: A New Computational Intelligence Approach. Springer, Heidelberg (2002)
González, F.A.: Anomaly detection using real-valued negative selection. Genetic programming and evolvable machines. Journal of Genetic Programming and Evolvable Machines, 4–383 (2003)
Stibor, T., Timmis, J., Eckert, C.: The link between r-contiguous detectors and k-CNF satisfiability. In: Congress on Evolutionary Computation – CEC, pp. 491–498. IEEE Press, Los Alamitos (2006); revised and extended version
Stibor, T., Mohr, P., Timmis, J., Eckert, C.: Is negative selection appropriate for anomaly detection? In: GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, pp. 321–328. ACM, New York (2005)
Stibor, T., Bayarou, K.M., Eckert, C.: An investigation of R-chunk detector generation on higher alphabets. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 299–307. Springer, Heidelberg (2004)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society B 39, 1–38 (1977)
Novovičová, J., Malík, A.: Application of multinomial mixture model to text classification. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 646–653. Springer, Heidelberg (2003)
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization, pp. 161–175 (1994)
Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution (2003)
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. Journal of Machine Learning Research 2, 139–154 (2001)
Srihari, X.W.R., Zheng, Z.: Document representation for one-class SVM. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 489–500. Springer, Heidelberg (2004)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pöllä, M. (2009). A Generative Model for Self/Non-self Discrimination in Strings. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2009. Lecture Notes in Computer Science, vol 5495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04921-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-04921-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04920-0
Online ISBN: 978-3-642-04921-7
eBook Packages: Computer ScienceComputer Science (R0)