Topic Model Based Android Malware Detection

  • Yucai Song
  • Yang Chen
  • Bo LangEmail author
  • Hongyu Liu
  • Shaojie Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11611)


Nowadays, the security risks brought by Android malwares are increasing. Machine learning is considered as a potential solution for promoting the performance of malware detection. For machine learning based Android malware detection, feature extraction plays a key role. Thinking the source codes of applications are comparable with text documents, we propose a new Android malware detection method based on the topic model which is an effective technique in text feature extraction. Our method regards the decompiled codes of an application as a text document, and the topic model is used to mine the potential topics in the codes which can reflect the semantic feature of the application. The experimental results demonstrate that, our approach performs better than the state-of-the-art methods. Also, our method mines the features in the application files automatically without manually design, and therefore overcomes the limitation in present methods which relies on experts’ prior knowledge.


Android malware detection Topic model Machine learning 


  1. 1.
    Lab: 2018 Android malware special report (2019)Google Scholar
  2. 2.
    Wang, W., Wang, X., Feng, D., Liu, J., Han, Z., Zhang, X.: Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans. Inf. Forensics Secur. 9, 1869–1882 (2014)CrossRefGoogle Scholar
  3. 3.
    Aafer, Y., Du, W., Yin, H.: DroidAPIMiner: mining API-level features for robust malware detection in android. In: Zia, T., Zomaya, A., Varadharajan, V., Mao, M. (eds.) SecureComm 2013. LNICST, vol. 127, pp. 86–103. Springer, Cham (2013). Scholar
  4. 4.
    Feizollah, A., Anuar, N.B., Salleh, R., Suarez-Tangil, G., Furnell, S.: AndroDialysis: analysis of android intent effectiveness in malware detection. Comput. Secur. 65, 121–134 (2017)CrossRefGoogle Scholar
  5. 5.
    Yang, Z., Yang, M.: Leakminer: detect information leakage on android with static taint analysis. In: 2012 Third World Congress on Software Engineering (WCSE), pp. 101–104. IEEE (2012)Google Scholar
  6. 6.
    Zhao, Z., Osono, F.C.C.: “TrustDroid™”: preventing the use of SmartPhones for information leaking in corporate networks through the used of static analysis taint tracking. In: 2012 7th International Conference on Malicious and Unwanted Software (MALWARE), pp. 135–143. IEEE (2012)Google Scholar
  7. 7.
    Arzt, S., et al.: Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. ACM SIGPLAN Not. 49, 259–269 (2014)CrossRefGoogle Scholar
  8. 8.
    Burket, J., Flynn, L., Klieber, W., Lim, J., Snavely, W.: Making DidFail succeed: enhancing the CERT static taint analyzer for Android app sets (2015)Google Scholar
  9. 9.
    Yang, C., Xu, Z., Gu, G., Yegneswaran, V., Porras, P.: DroidMiner: automated mining and characterization of fine-grained malicious behaviors in android applications. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014, Part I. LNCS, vol. 8712, pp. 163–182. Springer, Cham (2014). Scholar
  10. 10.
    Shabtai, A., Fledel, Y., Kanonov, U., Elovici, Y., Dolev, S., Glezer, C.: Google android: a comprehensive security assessment. IEEE Secur. Priv. 8, 35–44 (2010)CrossRefGoogle Scholar
  11. 11.
    Seo, J., Kim, D., Cho, D., Shin, I., Kim, T.: FLEXDROID: enforcing in-app privilege separation in android. In: NDSS (2016)Google Scholar
  12. 12.
    Afonso, V.M., et al.: Going native: using a large-scale analysis of android apps to create a practical native-code sandboxing policy. In: NDSS (2016)Google Scholar
  13. 13.
    Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: DREBIN: effective and explainable detection of android malware in your pocket. In: NDSS, pp. 23–26 (2014)Google Scholar
  14. 14.
    Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current android malware. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 252–276. Springer, Cham (2017). Scholar
  15. 15.
    Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. IEEE Trans. Dependable Secur. Comput. 12, 400–412 (2015)CrossRefGoogle Scholar
  16. 16.
    Du, Y., Wang, J., Li, Q.: An android malware detection approach using community structures of weighted function call graphs. IEEE Access 5, 17478–17486 (2017)CrossRefGoogle Scholar
  17. 17.
    Zhou, H., Zhang, W., Wei, F., Chen, Y.: Analysis of Android malware family characteristic based on isomorphism of sensitive API call graph. In: 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC), pp. 319–327. IEEE (2017)Google Scholar
  18. 18.
    Narayanan, A., Chandramohan, M., Chen, L., Liu, Y.: A Multi-view Context-aware approach to android malware detection and malicious code localization (2017). arXiv preprint: arXiv:1704.01759
  19. 19.
    Desnos, A., Lantz, P.: Droidbox: An android application sandbox for dynamic analysis (2011)Google Scholar
  20. 20.
  21. 21.
    Winsniewski, R.: Android–apktool: a tool for reverse engineering android APK files. Technical report (2012)Google Scholar
  22. 22.
    Ma, Z., Wang, H., Guo, Y., Chen, X.: LibRadar: fast and accurate detection of third-party libraries in Android apps. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 653–656. ACM (2016)Google Scholar
  23. 23.
    Backes, M., Bugiel, S., Derr, E.: Reliable third-party library detection in Android and its security applications. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 356–367. ACM (2016)Google Scholar
  24. 24.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988)CrossRefGoogle Scholar
  25. 25.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)CrossRefGoogle Scholar
  26. 26.
    Chen, S., Xue, M., Tang, Z., Xu, L., Zhu, H.: Stormdroid: a streaminglized machine learning-based system for detecting android malware. In: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 377–388. ACM (2016)Google Scholar
  27. 27.
    Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)Google Scholar
  28. 28.
    Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  29. 29.
    Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: AndroZoo: collecting millions of android apps for the research community. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp. 468–471. IEEE (2016)Google Scholar
  30. 30.
    Total, V.: VirusTotal-Free online virus, malware and URL scanner (2012).
  31. 31.
    Allix, K., Bissyandé, T.F., Jérome, Q., Klein, J., Le Traon, Y.: Empirical assessment of machine learning-based malware detectors for Android. Empir. Softw. Eng. 21, 183–211 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yucai Song
    • 1
  • Yang Chen
    • 2
  • Bo Lang
    • 1
    Email author
  • Hongyu Liu
    • 1
  • Shaojie Chen
    • 1
  1. 1.State Key Lab of Software Development Environment, School of Computer Science and EngineeringBeihang UniversityBeijingChina
  2. 2.National Computer Network Emergency Response Technical Team/Coordination Center of ChinaBeijingChina

Personalised recommendations