MADS: Malicious Android Applications Detection through String Analysis

  • Borja Sanz
  • Igor Santos
  • Javier Nieves
  • Carlos Laorden
  • Iñigo Alonso-Gonzalez
  • Pablo G. Bringas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7873)


The use of mobile phones has increased in our lives because they offer nearly the same functionality as a personal computer. Besides, the number of applications available for Android-based mobile devices has also experienced a importat grow. Google offers to programmers the opportunity to upload and sell applications in the Android Market, but malware writers upload their malicious code there. In light of this background, we present here Malicious Android applications Detection through String analysis (MADS), a new method that extracts the contained strings from the Android applications to build machine-learning classifiers and detect malware.


malware android machine learning security 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, S&P, pp. 38–49. IEEE (2001)Google Scholar
  2. 2.
    Santos, I., Devesa, J., Brezo, F., Nieves, J., Bringas, P.G.: OPEM: A static-dynamic approach for machine-learning-based malware detection. In: Herrero, Á., Snášel, V., Abraham, A., Zelinka, I., Baruque, B., Quintián, H., Calvo, J.L., Sedano, J., Corchado, E. (eds.) Int. Joint Conf. CISIS’12-ICEUTE’12-SOCO’12. AISC, vol. 189, pp. 271–280. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Santos, I., Nieves, J., Bringas, P.G.: Semi-supervised learning for unknown malware detection. In: Abraham, A., Corchado, J.M., González, S.R., De Paz Santana, J.F. (eds.) International Symposium on DCAI. AISC, vol. 91, pp. 415–422. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Santos, I., Laorden, C., Bringas, P.G.: Collective classification for unknown malware detection. In: Proceedings of the 6th International Conference on Security and Cryptography (SECRYPT), pp. 251–256 (2011)Google Scholar
  5. 5.
    Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode Sequences as Representation of Executables for Data-mining-based Unknown Malware Detection. Information Sciences 231, 64–82 (2013) ISSN: 0020-0255, doi:10.1016/j.ins.2011.08.020 Google Scholar
  6. 6.
    Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Tian, R., Batten, L., Islam, R., Versteeg, S.: An automated classification system based on the strings of trojan and virus families. In: Proceedings of the 4th International Conference on Malicious and Unwanted Software (MALWARE), pp. 23–30 (2009)Google Scholar
  8. 8.
    Shabtai, A., Fledel, Y., Elovici, Y.: Automated static code analysis for classifying android applications using machine learning. In: Proceedings of the International Conference on Computational Intelligence and Security (CIS), pp. 329–333 (2010)Google Scholar
  9. 9.
    Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 15–26. ACM (2011)Google Scholar
  10. 10.
    Blasing, T., Batyuk, L., Schmidt, A., Camtepe, S., Albayrak, S.: An android application sandbox system for suspicious software detection. In: Proceedings of the 5th International Conference on Malicious and Unwanted Software (MALWARE), pp. 55–62 (2010)Google Scholar
  11. 11.
    Shabtai, A., Elovici, Y.: Applying behavioral detection on android-based devices. In: Cai, Y., Magedanz, T., Li, M., Xia, J., Giannelli, C. (eds.) Mobilware 2010. LNICST, vol. 48, pp. 235–249. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Oberheide, J., Miller, J.: Dissecting the android bouncer. In: SUMERCON 2012 (2012),
  13. 13.
    Santos, I., Penya, Y., Devesa, J., Bringas, P.G.: N-Grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS), vol. AIDSS, pp. 317–320 (2009)Google Scholar
  14. 14.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc, Boston (1999)Google Scholar
  15. 15.
    Salton, G., McGill, M.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  16. 16.
    Bishop, C.: Pattern recognition and machine learning. Springer, New York (2006)zbMATHGoogle Scholar
  17. 17.
    Kotsiantis, S., Zaharakis, I., Pintelas, P.: Supervised machine learning: A review of classification techniques. Frontiers in Artificial Intelligence and Applications 160, 3 (2007)Google Scholar
  18. 18.
    Kotsiantis, S., Pintelas, P.: Recent advances in clustering: A brief survey. WSEAS Transactions on Information Science and Applications 1(1), 73–81 (2004)Google Scholar
  19. 19.
    Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised learning. MIT Press (2006)Google Scholar
  20. 20.
    Pearl, J.: Reverend bayes on inference engines: a distributed hierarchical approach. In: Proceedings of the National Conference on Artificial Intelligence, pp. 133–136 (1982)Google Scholar
  21. 21.
    Castillo, E., Gutiérrez, J.M., Hadi, A.S.: Expert Systems and Probabilistic Network Models, Erste edn., New York, NY, USA (1996)Google Scholar
  22. 22.
    Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  23. 23.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  24. 24.
    Garner, S.: Weka: The Waikato environment for knowledge analysis. In: Proceedings of the 1995 New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)Google Scholar
  25. 25.
    Quinlan, J.: C4.5 programs for machine learning. Morgan Kaufmann Publishers (1993)Google Scholar
  26. 26.
    Fix, E., Hodges, J.L.: Discriminatory analysis: Nonparametric discrimination: Small sample performance. Technical Report Project 21-49-004, Report Number 11 (1952)Google Scholar
  27. 27.
    Vapnik, V.: The nature of statistical learning theory. Springer (2000)Google Scholar
  28. 28.
    Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6), 783–789 (1999)CrossRefGoogle Scholar
  29. 29.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1145. Lawrence Erlbaum Associates Ltd. (1995)Google Scholar
  30. 30.
    Devijver, P., Kittler, J.: Pattern recognition: A statistical approach. Prentice/Hall International (1982)Google Scholar
  31. 31.
    Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. International Journal of Computer Applications in Technology 35(2), 183–193 (2009)CrossRefGoogle Scholar
  32. 32.
    Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 2001 International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)Google Scholar
  33. 33.
    Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C., Weiss, Y.: Andromaly: a behavioral malware detection framework for android devices. Journal of Intelligent Information Systems, 1–30 (2012)Google Scholar
  34. 34.
    Peng, H., Gates, C., Sarma, B., Li, N., Qi, Y., Potharaju, R., Nita-Rotaru, C., Molloy, I.: Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 241–252. ACM (2012)Google Scholar
  35. 35.
    Cano, J., Herrera, F., Lozano, M.: On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining. Applied Soft Computing Journal 6(3), 323–332 (2006)CrossRefGoogle Scholar
  36. 36.
    Czarnowski, I., Jedrzejowicz, P.: Instance reduction approach to machine learning and multi-database mining. In: Proceedings of the 2006 Scientific Session Organized during XXI Fall Meeting of the Polish Information Processing Society, Informatica, ANNALES Universitatis Mariae Curie-Skłodowska, Lublin, pp. 60–71 (2006)Google Scholar
  37. 37.
    Pyle, D.: Data preparation for data mining. Morgan Kaufmann (1999)Google Scholar
  38. 38.
    Tsang, E., Yeung, D., Wang, X.: OFFSS: optimal fuzzy-valued feature subset selection. IEEE Transactions on Fuzzy Systems 11(2), 202–213 (2003)CrossRefGoogle Scholar
  39. 39.
    Torkkola, K.: Feature extraction by non parametric mutual information maximization. The Journal of Machine Learning Research 3, 1415–1438 (2003)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151(1-2), 155–176 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    Liu, H., Motoda, H.: Instance selection and construction for data mining. Kluwer Academic Pub. (2001)Google Scholar
  42. 42.
    Liu, H., Motoda, H.: Computational methods of feature selection. Chapman & Hall/CRC (2008)Google Scholar
  43. 43.
    Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  44. 44.
    Derrac, J., García, S., Herrera, F.: A First Study on the Use of Coevolutionary Algorithms for Instance and Feature Selection. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS (LNAI), vol. 5572, pp. 557–564. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  45. 45.
    Dietterich, T., Lathrop, R., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)zbMATHCrossRefGoogle Scholar
  46. 46.
    Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 570–576 (1998)Google Scholar
  47. 47.
    Kang, M., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp. 46–53 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Borja Sanz
    • 1
  • Igor Santos
    • 1
  • Javier Nieves
    • 1
  • Carlos Laorden
    • 1
  • Iñigo Alonso-Gonzalez
    • 1
  • Pablo G. Bringas
    • 1
  1. 1.S3 Lab, DeustoTech ComputingUniversity of DeustoBilbaoSpain

Personalised recommendations