Abstract
In this paper, we have developed a tool to perform an analysis for all APIs over an APK and all APIs of every version of Android, to solve problems of overfitting in machine-learning-based malware classification. The tool is Java-based software consisting of approximately 2,000 lines, performing frequency analysis for the entire API or performing frequency analysis based on the decompiled APK. For frequency analysis, we split all API signatures into word units and grouped them according to their entropy, which is calculated by the number of the emergence of each unit words. As a result, the tool reduces 39,031 methods to 4,972 groups and 12,123 groups when including classes. This shows an approximately 69% feature reduction rate. For classification using machine learning, 14,290 APKs from 14 different categories are collected and trained with 10,003 APKs and tested with 4,287 APKs among them. As a result, we got 98.83% of true positive rate and 1.16% of false positive rate on average, with 98.8% of F-measure score.
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2019-0-00477, Development of android security framework technology using virtualized trusted execution environment) and this work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2020-0-00952, Development of 5G Edge Security Technology for Ensuring 5G+ Service Stability and Availability).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Statista, Mobile operating systems’ market share worldwide from January 2012 to July 2019. https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/. Accessed Jan 2020
IDC, Smartphone Market Share. https://www.idc.com/promo/smartphone-market-share/os. Accessed Jan 2020
Forbes, Many Popular Android Apps Leak Sensitive Data, Leaving Millions Of Consumers At Risk https://www.forbes.com/sites/ajdellinger/2019/06/07/many-popular-android-apps-leak-sensitive-data-leaving-millions-of-consumers-at-risk/#69643a7b521e. Accessed Jan 2020
Nokia, Threat Intelligence Report 2019. https://blog.drhack.net/wp-content/uploads/2018/12/Nokia_Threat_Intelligence_Report_White_Paper_EN.pdf
Vafaie, H., De Jong, K.: Genetic algorithms as a tool for feature selection in machine learning. In: 4th International Conference on Tools with Artificial Intelligence TAI 1992, pp. 200–203 (1992)
Lawrence, S., Giles, C.L.: Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), vol. 1, pp. 114–119 (2000)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Graf, A.B.A., Smola, A.J., Borer, S.: Classification in a normalized feature space using support vector machines. IEEE Trans. Neural Netw. 14(3), 597–605 (2003)
Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: 26th Annual International Conference on Machine Learning (ICML 2009), pp. 521–528 (2009)
Mukherjee, S., Sharma, N.: Intrusion detection using Naive Bayes classifier with feature reduction. In: 2nd International Conferences on Computer, Communication, Control and Information Technology (C3IT- 2012), pp. 119–128 (2012)
Peiravian, N., Zhu, X.: Machine learning for android malware detection using permission and API calls. In: 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pp. 300–305 (2013)
Grace, M., Zhou, Y., Wang, Z., Jiang, X.: Systematic detection of capability leaks in stock android smartphones. In: 19th Network and Distributed System Security Symposium (NDSS), pp. 1–15 (2012)
Kim, D., Kim, J., Kim, S.: A malicious application detection framework using automatic feature extraction tool on Android market. In: 3rd International Conference on Computer Science and Information Technology (ICCSIT 2013), pp. 1–4, 2013
Yang, M., Wen, Q.Y.: Detecting android malware with intensive feature engineering. In: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 157–161 (2016)
Ugur, P., Nuray, B., Cengiz, A., Nazife, B.: The analysis of feature selection methods and classification algorithms in permission based Android malware detection. In: 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 1–8 (2014)
Alam, M.S., Vuong, S.T.: Random forest classification for detecting android malware. In: 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 663–669 (2013)
Yerima, S.Y., Sezer, S., McWilliams, G., Muttik, I.: A new Android Malware detection approach using Bayesian classification. In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp. 121–128 (2013)
McLaughlin, N. et al.: Deep android malware detection. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, CODASPY 2017, pp. 301–308 (2017)
Hyo-Sik H., Mi-Jung, C.: Analysis of android malware detection performance using machine learning classifiers. In: International Conference on ICT Convergence (ICTC), pp. 490–495 (2013)
Crussell, J., Gibler, C., Chen, H.: Attack of the clones: detecting cloned applications on android markets. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 37–54. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33167-1_3
Wiśniewski, R., Tumbleson, C.: APKtool (2020). https://ibotpeaches.github.io/Apktool/install/. Accessed Mar 2020
Wei, F., et al.: Amandroid: a precise and general inter-component data flow analysis framework for security vetting of android apps. ACM Trans. Priv. Secur. 21(3), 1–32 (2018)
Vallée-Rai, R. et al.: Soot: a java bytecode optimization framework. In: CASCON First Decade High Impact Papers, pp. 214–224 (2010)
University of Waikato, Weka3 - Data Mining with Open Source Machine Learning Software in Java. https://www.cs.waikato.ac.nz/ml/weka/. Accessed Mar 2020
Wei, F. et al.: Deep ground truth analysis of current android malware. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2017), pp. 252–276 (2017)
Banko, M., Brill, E.: Scaling to very very large corpora for natural language disambiguation. In: 39th Annual Meeting on Association for Computational Linguistics (ACL 2001), pp. 26–33 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Shim, H., Jung, S. (2020). Entropy-Based Feature Grouping in Machine Learning for Android Malware Classification. In: You, I. (eds) Information Security Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12583. Springer, Cham. https://doi.org/10.1007/978-3-030-65299-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-65299-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65298-2
Online ISBN: 978-3-030-65299-9
eBook Packages: Computer ScienceComputer Science (R0)