Abstract
Android malware has become a major challenge. As a consequence, practitioners and researchers spend a significant time analyzing Android applications (APK). A common procedure (especially for data scientists) is to extract features such as permissions, APIs or strings which can then be analyzed. Current state of the art tools have three major issues: (1) a single tool cannot extract all the significant features used by scientists and practitioners (2) Current tools are not designed to be extensible and (3) Existing parsers can be timely as they are not runtime efficient or scalable. Therefore, this work presents AndroParse which is an open-source Android parser written in Golang that currently extracts the four most common features: Permissions, APIs, Strings and Intents. AndroParse outputs JSON files as they can easily be used by most major programming languages. Constructing the parser allowed us to create an extensive feature dataset which can be accessed by our independent REST API. Our dataset currently has 67,703 benign and 46,683 malicious APK samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://github.com/rschmicker/AndroParse (last accessed 13-April-2018).
- 2.
https://64.251.61.74/ (last accessed 13-April-2018).
- 3.
A prominent example that these services are valuable for the community is the UCI Machine Learning Repository [25] which includes a multitude of data and repositories and is frequently referenced in literature.
- 4.
http://www.malgenomeproject.org (last accessed 13-April-2018).
- 5.
https://wiki.python.org/moin/GlobalInterpreterLock (last accessed 13-April-2018).
- 6.
https://github.com/Masterminds/glide (last accessed 13-April-2018).
- 7.
This portion of code must be performed sequentially as there is a low-level JVM memory error when multiple threads access the library at once.
- 8.
https://golang.org/pkg/plugin/ (last accessed 13-April-2018).
- 9.
One can use any language as long as the code can be compiled into a shared object file.
- 10.
https://github.com/rschmicker/AndroParse/wiki/Develop-Plugins (last accessed 13-April-2018).
- 11.
https://golang.org/doc/effective_go.html#interfaces (last accessed 13-April-2018).
- 12.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html (last accessed 13-April-2018).
- 13.
https://github.com/rschmicker/AndroParse/wiki/Develop-Plugins (last accessed 13-April-2018).
- 14.
https://developer.android.com/reference/android/Manifest.permission.html (last accessed 13-April-2018).
References
apktool (2010). http://ibotpeaches.github.io/Apktool/
Aafer, Y., Du, W., Yin, H.: DroidAPIMiner: mining API-level features for robust malware detection in android. In: Zia, T., Zomaya, A., Varadharajan, V., Mao, M. (eds.) SecureComm 2013. LNICST, vol. 127, pp. 86–103. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-04283-1_6
Anonymous. CAPIL: Component-API linkage for android malware detection (2016, unpublished)
APK-DL. Apk downloader (2016). http://apk-dl.com. Accessed 13 Apr 2018
APKPure. Download APK free online (2016). https://apkpure.com. Accessed 13 Apr 2018
Apvrille, L., Apvrille, A.: Identifying unknown android malware with feature extractions and classification techniques. In: 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 1, pp. 182–189. IEEE (2015)
Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K., CERT Siemens: DREBIN: effective and explainable detection of android malware in your pocket. In: Proceedings of the Annual Symposium on Network and Distributed System Security (NDSS) (2014). https://www.sec.cs.tu-bs.de/~danarp/drebin/. Accessed 13 Apr 2018
Au, K.W.Y., Zhou, Y.F., Huang, Z., Lie, D.: PScout: analyzing the android permission specification. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 217–228. ACM (2012)
Aung, Z., Zaw, W.: Permission-based android malware detection. Int. J. Sci. Technol. Res. 2(3), 228–234 (2013)
Babu Rajesh, V., Reddy, P., Himanshu, P., Patil, M.U.: Droidswan: detecting malicious android applications based on static feature analysis. Comput. Sci. Inf. Technol., 163 (2015)
Baskaran, B., Ralescu, A.: A study of android malware detection techniques and machine learning. University of Cincinnati (2016)
Bhatia, A.: Android-security-awesome, February 2017. https://github.com/ashishb/android-security-awesome. Accessed 13 Apr 2018
Desnos, A.: Androguard-reverse engineering, malware and goodware analysis of android applications. URL code. google.com/p/androguard (2013)
eLinux. Android AAPT, June 2010. http://www.elinux.org/android_aapt. Accessed 13 Apr 2018
Faruki, P., Bharmal, A., Laxmi, V., Gaur, M.S., Conti, M., Rajarajan, M.: Evaluation of android anti-malware techniques against Dalvik bytecode obfuscation. In: 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 414–421. IEEE (2014)
Feizollah, A., Anuar, N.B., Salleh, R., Wahab, A.W.A.: A review on feature selection in mobile malware detection. Digit. Invest. 13, 22–37 (2015)
Fereidooni, H., Moonsamy, V., Conti, M., Batina, L.: Efficient classification of android malware in the wild using robust static features (2016)
Geneiatakis, D., Satta, R., Fovino, I.N., Neisse, R.: On the efficacy of static features to detect malicious applications in android. In: Fischer-Hübner, S., Lambrinoudakis, C., Lopez, J. (eds.) TrustBus 2015. LNCS, vol. 9264, pp. 87–98. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22906-5_7
Holmes, G., Donkin, A., Witten, I.H.: WEKA: a machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)
Kaushik, P., Jain, A.: Malware detection techniques in android. Int. J. Comput. Appl. 122(17), 22–26 (2015)
Maggi, F., Valdi, A., Zanero, S.: Andrototal: a flexible, scalable toolbox and service for testing mobile malware detectors. In: Proceedings of the Third ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 49–54. ACM (2013)
Maiorca, D., Ariu, D., Corona, I., Aresu, M., Giacinto, G.: Stealth attacks: an extended insight into the obfuscation effects on android malware. Comput. Secur. 51, 16–31 (2015)
Malik, S., Khatter, K.: AndroData: a tool for static & dynamic feature extraction of android apps. Int. J. Appl. Eng. Res. 10(94), 98–102 (2015)
Nativ, Y.T., Shalev, S.: Thezoo (2015). http://thezoo.morirt.com. Accessed 13 Apr 2018
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998). http://mlearn.ics.uci.edu/MLRepository.html. Accessed 13 Apr 2018
Parkour, M.: Contagio mobile. Mobile malware mini dump (2013). https://contagiominidump.blogspot.ca/. Accessed 13 Apr 2018
Payload Security. Learn more about the standalone version or purchase a private web service (2016). https://www.hybrid-analysis.com/. Accessed 13 Apr 2018
Pehlivan, U., Baltaci, N., Acartürk, C., Baykal, N.: The analysis of feature selection methods and classification algorithms in permission based android malware detection. In: 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 1–8. IEEE (2014)
Rami, K., Desai, V.: Performance base static analysis of malware on android (2013)
Sahs, J., Khan, L.: A machine learning approach to android malware detection. In: 2012 European Intelligence and Security Informatics Conference (EISIC), pp. 141–147. IEEE (2012)
Sanz, B., Santos, I., Laorden, C., Ugarte-Pedrero, X., Bringas, P.G., Álvarez, G.: PUMA: permission usage to detect malware in android. In: Herrero, Á., et al. (eds.) International Joint Conference CISIS’12-ICEUTE’ 12-SOCO’ 12. AISC, vol. 189, pp. 289–298. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-33018-6_30
Seth, R., Kaushal, R.: Permission based malware analysis & detection in android (2014)
Spreitzenbarth, M., Schreck, T., Echtler, F., Arp, D., Hoffmann, J.: Mobile-sandbox: combining static and dynamic analysis with machine-learning techniques. Int. J. Inf. Secur. 14(2), 141–153 (2015)
SunFeith. php\(\_\)apk\(\_\)parser (2013). https://github.com/iwinmin/php_apk_parser. Accessed 13 Apr 2018
Svensson, R.: Das malwerk (2016). http://dasmalwerk.eu. Accessed 13 Apr 2018
Tdoly. tdoly/apk\(\_\)parse. GitHub (2015). https://github.com/tdoly/apk_parse. Accessed 13 Apr 2018
VirusTotalTeam. Virustotal-free online virus, malware and url scanner (2013). https://www.virustotal.com/. Accessed 13 Apr 2018
Wang, X., Yang, Y., Zeng, Y.: Accurate mobile malware detection and classification in the cloud. SpringerPlus 4(1), 1 (2015)
Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current android malware. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 252–276. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60876-1_12
Winsniewski, R.: Android–apktool: a tool for reverse engineering android APK files (2012)
Yerima, S.Y., Sezer, S., Muttik, I.: Android malware detection using parallel machine learning classifiers. In: 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies, pp. 37–42. IEEE (2014)
Zhang, X., Breitinger, F., Baggili, I.: Rapid android parser for investigating dex files (RAPID). Digit. Invest. 17, 28–39 (2016)
Zhou, Y., Jiang, X.: Android malware genome project. Disponibile a (2012). http://www.malgenomeproject.org
Zhou, Y., Wang, Z., Zhou, W., Jiang, X.: Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: NDSS, vol. 25, pp. 50–52 (2012)
Acknowledgements
We like to thank the University of New Haven’s Summer Undergraduate Research Fellowship (SURF) program who supported this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Identifying Relevant Features Used
A Identifying Relevant Features Used
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Schmicker, R., Breitinger, F., Baggili, I. (2019). AndroParse - An Android Feature Extraction Framework and Dataset. In: Breitinger, F., Baggili, I. (eds) Digital Forensics and Cyber Crime. ICDF2C 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 259. Springer, Cham. https://doi.org/10.1007/978-3-030-05487-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-05487-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05486-1
Online ISBN: 978-3-030-05487-8
eBook Packages: Computer ScienceComputer Science (R0)