Skip to main content

Entropy-Based Feature Grouping in Machine Learning for Android Malware Classification

  • Conference paper
  • First Online:
Information Security Applications (WISA 2020)

Abstract

In this paper, we have developed a tool to perform an analysis for all APIs over an APK and all APIs of every version of Android, to solve problems of overfitting in machine-learning-based malware classification. The tool is Java-based software consisting of approximately 2,000 lines, performing frequency analysis for the entire API or performing frequency analysis based on the decompiled APK. For frequency analysis, we split all API signatures into word units and grouped them according to their entropy, which is calculated by the number of the emergence of each unit words. As a result, the tool reduces 39,031 methods to 4,972 groups and 12,123 groups when including classes. This shows an approximately 69% feature reduction rate. For classification using machine learning, 14,290 APKs from 14 different categories are collected and trained with 10,003 APKs and tested with 4,287 APKs among them. As a result, we got 98.83% of true positive rate and 1.16% of false positive rate on average, with 98.8% of F-measure score.

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2019-0-00477, Development of android security framework technology using virtualized trusted execution environment) and this work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2020-0-00952, Development of 5G Edge Security Technology for Ensuring 5G+ Service Stability and Availability).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Statista, Mobile operating systems’ market share worldwide from January 2012 to July 2019. https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/. Accessed Jan 2020

  2. IDC, Smartphone Market Share. https://www.idc.com/promo/smartphone-market-share/os. Accessed Jan 2020

  3. Forbes, Many Popular Android Apps Leak Sensitive Data, Leaving Millions Of Consumers At Risk https://www.forbes.com/sites/ajdellinger/2019/06/07/many-popular-android-apps-leak-sensitive-data-leaving-millions-of-consumers-at-risk/#69643a7b521e. Accessed Jan 2020

  4. Nokia, Threat Intelligence Report 2019. https://blog.drhack.net/wp-content/uploads/2018/12/Nokia_Threat_Intelligence_Report_White_Paper_EN.pdf

  5. Vafaie, H., De Jong, K.: Genetic algorithms as a tool for feature selection in machine learning. In: 4th International Conference on Tools with Artificial Intelligence TAI 1992, pp. 200–203 (1992)

    Google Scholar 

  6. Lawrence, S., Giles, C.L.: Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), vol. 1, pp. 114–119 (2000)

    Google Scholar 

  7. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  8. Graf, A.B.A., Smola, A.J., Borer, S.: Classification in a normalized feature space using support vector machines. IEEE Trans. Neural Netw. 14(3), 597–605 (2003)

    Article  Google Scholar 

  9. Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: 26th Annual International Conference on Machine Learning (ICML 2009), pp. 521–528 (2009)

    Google Scholar 

  10. Mukherjee, S., Sharma, N.: Intrusion detection using Naive Bayes classifier with feature reduction. In: 2nd International Conferences on Computer, Communication, Control and Information Technology (C3IT- 2012), pp. 119–128 (2012)

    Google Scholar 

  11. Peiravian, N., Zhu, X.: Machine learning for android malware detection using permission and API calls. In: 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pp. 300–305 (2013)

    Google Scholar 

  12. Grace, M., Zhou, Y., Wang, Z., Jiang, X.: Systematic detection of capability leaks in stock android smartphones. In: 19th Network and Distributed System Security Symposium (NDSS), pp. 1–15 (2012)

    Google Scholar 

  13. Kim, D., Kim, J., Kim, S.: A malicious application detection framework using automatic feature extraction tool on Android market. In: 3rd International Conference on Computer Science and Information Technology (ICCSIT 2013), pp. 1–4, 2013

    Google Scholar 

  14. Yang, M., Wen, Q.Y.: Detecting android malware with intensive feature engineering. In: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 157–161 (2016)

    Google Scholar 

  15. Ugur, P., Nuray, B., Cengiz, A., Nazife, B.: The analysis of feature selection methods and classification algorithms in permission based Android malware detection. In: 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 1–8 (2014)

    Google Scholar 

  16. Alam, M.S., Vuong, S.T.: Random forest classification for detecting android malware. In: 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 663–669 (2013)

    Google Scholar 

  17. Yerima, S.Y., Sezer, S., McWilliams, G., Muttik, I.: A new Android Malware detection approach using Bayesian classification. In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp. 121–128 (2013)

    Google Scholar 

  18. McLaughlin, N. et al.: Deep android malware detection. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, CODASPY 2017, pp. 301–308 (2017)

    Google Scholar 

  19. Hyo-Sik H., Mi-Jung, C.: Analysis of android malware detection performance using machine learning classifiers. In: International Conference on ICT Convergence (ICTC), pp. 490–495 (2013)

    Google Scholar 

  20. Crussell, J., Gibler, C., Chen, H.: Attack of the clones: detecting cloned applications on android markets. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 37–54. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33167-1_3

    Chapter  Google Scholar 

  21. Wiśniewski, R., Tumbleson, C.: APKtool (2020). https://ibotpeaches.github.io/Apktool/install/. Accessed Mar 2020

  22. Wei, F., et al.: Amandroid: a precise and general inter-component data flow analysis framework for security vetting of android apps. ACM Trans. Priv. Secur. 21(3), 1–32 (2018)

    Article  Google Scholar 

  23. Vallée-Rai, R. et al.: Soot: a java bytecode optimization framework. In: CASCON First Decade High Impact Papers, pp. 214–224 (2010)

    Google Scholar 

  24. University of Waikato, Weka3 - Data Mining with Open Source Machine Learning Software in Java. https://www.cs.waikato.ac.nz/ml/weka/. Accessed Mar 2020

  25. Wei, F. et al.: Deep ground truth analysis of current android malware. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2017), pp. 252–276 (2017)

    Google Scholar 

  26. Banko, M., Brill, E.: Scaling to very very large corpora for natural language disambiguation. In: 39th Annual Meeting on Association for Computational Linguistics (ACL 2001), pp. 26–33 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Souhwan Jung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shim, H., Jung, S. (2020). Entropy-Based Feature Grouping in Machine Learning for Android Malware Classification. In: You, I. (eds) Information Security Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12583. Springer, Cham. https://doi.org/10.1007/978-3-030-65299-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65299-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65298-2

  • Online ISBN: 978-3-030-65299-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics