Abstract
We investigate the hypothesis that it is possible to detect from the End User License Agreement (EULA) if the associated software hosts spyware. We apply 15 learning algorithms on a data set consisting of 100 applications with classified EULAs. The results show that 13 algorithms are significantly more accurate than random guessing. Thus, we conclude that the hypothesis can be accepted. Based on the results, we present a novel tool that can be used to prevent spyware by automatically halting application installers and classifying the EULA, giving users the opportunity to make an informed choice about whether to continue with the installation. We discuss positive and negative aspects of this prevention approach and suggest a method for evaluating candidate algorithms for a future implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boldt, M., Jacobsson, A., Lavesson, N., Davidsson, P.: Automated Spyware Detection Using End User License Agreements. In: 2nd International Conference on Information Security and Assurance, IEEE Press, New York (2008)
Adaware, http://www.lavasoft.com
EULA Analyzer, http://www.spywareguide.com/analyze
EULAlyzer, http://www.javacoolsoftware.com/eulalyzerpro.html
Cohen, W.: Learning Rules that Classify E-Mail. In: Advances in Inductive Logic Programming, IOS Press, Amsterdam (1996)
Drucker, H., Wu, D., Vapnik, V.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)
Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to Classify E-Mail. Information Sciences 177, 2167–2187 (2007)
CNET Download.com, http://www.download.com
Spyware Guide, http://www.spywareguide.com
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Frank, E., Bouckaert, R.R.: Naive Bayes for Text Classification with Unbalanced Classes. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 503–510. Springer, Heidelberg (2006)
Provost, F., Fawcett, T., Kohavi, R.: The Case against Accuracy Estimation for Comparing Induction Algorithms. In: 15th International Conference on Machine Learning, pp. 445–453. Morgan Kaufmann, San Francisco (1998)
Provost, F., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions. In: 3rd International Conference on Knowledge Discovery and Data Mining, pp. 43–48. AAAI Press, Menlo Park (1997)
Nadeau, C., Bengio, Y.: Inference for the Generalization Error. Machine Learning 52(3), 239–281 (2003)
McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48. AAAI Press, Menlo Park (1998)
Kibriya, A.M., Frank, E., Pfahringer, B., Holmes, G.: Multinomial Naive Bayes for Text Categorization Revisited. In: 7th Australian Joint Conference on Artificial Intelligence, pp. 488–499. Springer, Berlin (2004)
Lavesson, N., Davidsson, P.: Quantifying the Impact of Learning Algorithm Parameter Tuning. In: 21st National Conference on Artificial Intelligence, pp. 395–400. AAAI Press, Menlo Park (2006)
Lavesson, N., Davidsson, P.: Generic Methods for Multi-criteria Evaluation. In: SIAM International Conference on Data Mining (2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lavesson, N., Davidsson, P., Boldt, M., Jacobsson, A. (2008). Spyware Prevention by Classifying End User License Agreements. In: Nguyen, N.T., Katarzyniak, R. (eds) New Challenges in Applied Intelligence Technologies. Studies in Computational Intelligence, vol 134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79355-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-79355-7_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79354-0
Online ISBN: 978-3-540-79355-7
eBook Packages: EngineeringEngineering (R0)