Skip to main content

AndroParse - An Android Feature Extraction Framework and Dataset

  • Conference paper
  • First Online:
Digital Forensics and Cyber Crime (ICDF2C 2018)

Abstract

Android malware has become a major challenge. As a consequence, practitioners and researchers spend a significant time analyzing Android applications (APK). A common procedure (especially for data scientists) is to extract features such as permissions, APIs or strings which can then be analyzed. Current state of the art tools have three major issues: (1) a single tool cannot extract all the significant features used by scientists and practitioners (2) Current tools are not designed to be extensible and (3) Existing parsers can be timely as they are not runtime efficient or scalable. Therefore, this work presents AndroParse which is an open-source Android parser written in Golang that currently extracts the four most common features: Permissions, APIs, Strings and Intents. AndroParse outputs JSON files as they can easily be used by most major programming languages. Constructing the parser allowed us to create an extensive feature dataset which can be accessed by our independent REST API. Our dataset currently has 67,703 benign and 46,683 malicious APK samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/rschmicker/AndroParse (last accessed 13-April-2018).

  2. 2.

    https://64.251.61.74/ (last accessed 13-April-2018).

  3. 3.

    A prominent example that these services are valuable for the community is the UCI Machine Learning Repository [25] which includes a multitude of data and repositories and is frequently referenced in literature.

  4. 4.

    http://www.malgenomeproject.org (last accessed 13-April-2018).

  5. 5.

    https://wiki.python.org/moin/GlobalInterpreterLock (last accessed 13-April-2018).

  6. 6.

    https://github.com/Masterminds/glide (last accessed 13-April-2018).

  7. 7.

    This portion of code must be performed sequentially as there is a low-level JVM memory error when multiple threads access the library at once.

  8. 8.

    https://golang.org/pkg/plugin/ (last accessed 13-April-2018).

  9. 9.

    One can use any language as long as the code can be compiled into a shared object file.

  10. 10.

    https://github.com/rschmicker/AndroParse/wiki/Develop-Plugins (last accessed 13-April-2018).

  11. 11.

    https://golang.org/doc/effective_go.html#interfaces (last accessed 13-April-2018).

  12. 12.

    https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html (last accessed 13-April-2018).

  13. 13.

    https://github.com/rschmicker/AndroParse/wiki/Develop-Plugins (last accessed 13-April-2018).

  14. 14.

    https://developer.android.com/reference/android/Manifest.permission.html (last accessed 13-April-2018).

References

  1. apktool (2010). http://ibotpeaches.github.io/Apktool/

  2. Aafer, Y., Du, W., Yin, H.: DroidAPIMiner: mining API-level features for robust malware detection in android. In: Zia, T., Zomaya, A., Varadharajan, V., Mao, M. (eds.) SecureComm 2013. LNICST, vol. 127, pp. 86–103. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-04283-1_6

    Chapter  Google Scholar 

  3. Anonymous. CAPIL: Component-API linkage for android malware detection (2016, unpublished)

    Google Scholar 

  4. APK-DL. Apk downloader (2016). http://apk-dl.com. Accessed 13 Apr 2018

  5. APKPure. Download APK free online (2016). https://apkpure.com. Accessed 13 Apr 2018

  6. Apvrille, L., Apvrille, A.: Identifying unknown android malware with feature extractions and classification techniques. In: 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 1, pp. 182–189. IEEE (2015)

    Google Scholar 

  7. Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K., CERT Siemens: DREBIN: effective and explainable detection of android malware in your pocket. In: Proceedings of the Annual Symposium on Network and Distributed System Security (NDSS) (2014). https://www.sec.cs.tu-bs.de/~danarp/drebin/. Accessed 13 Apr 2018

  8. Au, K.W.Y., Zhou, Y.F., Huang, Z., Lie, D.: PScout: analyzing the android permission specification. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 217–228. ACM (2012)

    Google Scholar 

  9. Aung, Z., Zaw, W.: Permission-based android malware detection. Int. J. Sci. Technol. Res. 2(3), 228–234 (2013)

    Google Scholar 

  10. Babu Rajesh, V., Reddy, P., Himanshu, P., Patil, M.U.: Droidswan: detecting malicious android applications based on static feature analysis. Comput. Sci. Inf. Technol., 163 (2015)

    Google Scholar 

  11. Baskaran, B., Ralescu, A.: A study of android malware detection techniques and machine learning. University of Cincinnati (2016)

    Google Scholar 

  12. Bhatia, A.: Android-security-awesome, February 2017. https://github.com/ashishb/android-security-awesome. Accessed 13 Apr 2018

  13. Desnos, A.: Androguard-reverse engineering, malware and goodware analysis of android applications. URL code. google.com/p/androguard (2013)

    Google Scholar 

  14. eLinux. Android AAPT, June 2010. http://www.elinux.org/android_aapt. Accessed 13 Apr 2018

  15. Faruki, P., Bharmal, A., Laxmi, V., Gaur, M.S., Conti, M., Rajarajan, M.: Evaluation of android anti-malware techniques against Dalvik bytecode obfuscation. In: 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 414–421. IEEE (2014)

    Google Scholar 

  16. Feizollah, A., Anuar, N.B., Salleh, R., Wahab, A.W.A.: A review on feature selection in mobile malware detection. Digit. Invest. 13, 22–37 (2015)

    Article  Google Scholar 

  17. Fereidooni, H., Moonsamy, V., Conti, M., Batina, L.: Efficient classification of android malware in the wild using robust static features (2016)

    Google Scholar 

  18. Geneiatakis, D., Satta, R., Fovino, I.N., Neisse, R.: On the efficacy of static features to detect malicious applications in android. In: Fischer-Hübner, S., Lambrinoudakis, C., Lopez, J. (eds.) TrustBus 2015. LNCS, vol. 9264, pp. 87–98. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22906-5_7

    Chapter  Google Scholar 

  19. Holmes, G., Donkin, A., Witten, I.H.: WEKA: a machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)

    Google Scholar 

  20. Kaushik, P., Jain, A.: Malware detection techniques in android. Int. J. Comput. Appl. 122(17), 22–26 (2015)

    Google Scholar 

  21. Maggi, F., Valdi, A., Zanero, S.: Andrototal: a flexible, scalable toolbox and service for testing mobile malware detectors. In: Proceedings of the Third ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 49–54. ACM (2013)

    Google Scholar 

  22. Maiorca, D., Ariu, D., Corona, I., Aresu, M., Giacinto, G.: Stealth attacks: an extended insight into the obfuscation effects on android malware. Comput. Secur. 51, 16–31 (2015)

    Article  Google Scholar 

  23. Malik, S., Khatter, K.: AndroData: a tool for static & dynamic feature extraction of android apps. Int. J. Appl. Eng. Res. 10(94), 98–102 (2015)

    Google Scholar 

  24. Nativ, Y.T., Shalev, S.: Thezoo (2015). http://thezoo.morirt.com. Accessed 13 Apr 2018

  25. Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998). http://mlearn.ics.uci.edu/MLRepository.html. Accessed 13 Apr 2018

  26. Parkour, M.: Contagio mobile. Mobile malware mini dump (2013). https://contagiominidump.blogspot.ca/. Accessed 13 Apr 2018

  27. Payload Security. Learn more about the standalone version or purchase a private web service (2016). https://www.hybrid-analysis.com/. Accessed 13 Apr 2018

  28. Pehlivan, U., Baltaci, N., Acartürk, C., Baykal, N.: The analysis of feature selection methods and classification algorithms in permission based android malware detection. In: 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 1–8. IEEE (2014)

    Google Scholar 

  29. Rami, K., Desai, V.: Performance base static analysis of malware on android (2013)

    Google Scholar 

  30. Sahs, J., Khan, L.: A machine learning approach to android malware detection. In: 2012 European Intelligence and Security Informatics Conference (EISIC), pp. 141–147. IEEE (2012)

    Google Scholar 

  31. Sanz, B., Santos, I., Laorden, C., Ugarte-Pedrero, X., Bringas, P.G., Álvarez, G.: PUMA: permission usage to detect malware in android. In: Herrero, Á., et al. (eds.) International Joint Conference CISIS’12-ICEUTE’ 12-SOCO’ 12. AISC, vol. 189, pp. 289–298. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-33018-6_30

    Chapter  Google Scholar 

  32. Seth, R., Kaushal, R.: Permission based malware analysis & detection in android (2014)

    Google Scholar 

  33. Spreitzenbarth, M., Schreck, T., Echtler, F., Arp, D., Hoffmann, J.: Mobile-sandbox: combining static and dynamic analysis with machine-learning techniques. Int. J. Inf. Secur. 14(2), 141–153 (2015)

    Article  Google Scholar 

  34. SunFeith. php\(\_\)apk\(\_\)parser (2013). https://github.com/iwinmin/php_apk_parser. Accessed 13 Apr 2018

  35. Svensson, R.: Das malwerk (2016). http://dasmalwerk.eu. Accessed 13 Apr 2018

  36. Tdoly. tdoly/apk\(\_\)parse. GitHub (2015). https://github.com/tdoly/apk_parse. Accessed 13 Apr 2018

  37. VirusTotalTeam. Virustotal-free online virus, malware and url scanner (2013). https://www.virustotal.com/. Accessed 13 Apr 2018

  38. Wang, X., Yang, Y., Zeng, Y.: Accurate mobile malware detection and classification in the cloud. SpringerPlus 4(1), 1 (2015)

    Article  Google Scholar 

  39. Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current android malware. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 252–276. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60876-1_12

    Chapter  Google Scholar 

  40. Winsniewski, R.: Android–apktool: a tool for reverse engineering android APK files (2012)

    Google Scholar 

  41. Yerima, S.Y., Sezer, S., Muttik, I.: Android malware detection using parallel machine learning classifiers. In: 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies, pp. 37–42. IEEE (2014)

    Google Scholar 

  42. Zhang, X., Breitinger, F., Baggili, I.: Rapid android parser for investigating dex files (RAPID). Digit. Invest. 17, 28–39 (2016)

    Article  Google Scholar 

  43. Zhou, Y., Jiang, X.: Android malware genome project. Disponibile a (2012). http://www.malgenomeproject.org

  44. Zhou, Y., Wang, Z., Zhou, W., Jiang, X.: Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: NDSS, vol. 25, pp. 50–52 (2012)

    Google Scholar 

Download references

Acknowledgements

We like to thank the University of New Haven’s Summer Undergraduate Research Fellowship (SURF) program who supported this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Breitinger .

Editor information

Editors and Affiliations

A Identifying Relevant Features Used

A Identifying Relevant Features Used

figure y
figure z
Table 7. Overview of articles including their features utilized for our work.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schmicker, R., Breitinger, F., Baggili, I. (2019). AndroParse - An Android Feature Extraction Framework and Dataset. In: Breitinger, F., Baggili, I. (eds) Digital Forensics and Cyber Crime. ICDF2C 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 259. Springer, Cham. https://doi.org/10.1007/978-3-030-05487-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05487-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05486-1

  • Online ISBN: 978-3-030-05487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics