Skip to main content

Library Function Identification

  • Chapter
  • First Online:
Binary Code Fingerprinting for Cybersecurity

Abstract

Program binaries typically contain a significant amount of library functions taken from standard libraries or free open-source software packages . Automatically identifying such library functions not only enhances the quality and efficiency of threat analysis and reverse engineering tasks, but also improves their accuracy by avoiding false correlations between irrelevant code bases. Furthermore, such automation has a strong positive impact in other applications such as clone detection, function fingerprinting, authorship attribution, vulnerability analysis, and malware analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/Visgean/Zeus.

References

  1. Weka: Machine Learning Software. https://weka.wikispaces.com/. Accessed: January 2017.

  2. The C Language Library, Cplusplus website. http://www.cplusplus.com/reference/clibrary/, 2011. Accessed: May, 2017.

  3. NIST/SEMATECH e-Handbook of Statistical Methods. http://www.itl.nist.gov/div898/handbook/, 2013. Accessed: 2015.

  4. MongoDB. https://www.mongodb.com/, 2015. Accessed: 2015.

  5. EXEINFO PE. http://exeinfo.atwebpages.com/, 2019. Accessed: June 2019.

  6. Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.

  7. Morton B Brown and Wilfrid Joseph Dixon. BMDP statistical software. Univ. of California Press, 1983.

    Google Scholar 

  8. Thomas H Cormen. Introduction to algorithms. MIT Press, 2009.

    Google Scholar 

  9. Chris Eagle. The IDA pro book: the unofficial guide to the world’s most popular disassembler. No Starch Press, 2011. http://www.amazon.ca/The-IDA-Pro-Book-Disassembler/dp/1593272898.

    Google Scholar 

  10. Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys (CSUR), 44(2):6, 2012.

    Google Scholar 

  11. Kimberly L Elmore and Michael B Richman. Euclidean distance as a similarity metric for principal component analysis. Monthly Weather Review, 129(3):540–549, 2001.

    Article  Google Scholar 

  12. Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. discovRE: Efficient cross-architecture identification of bugs in binary code. In Proceedings of the 23rd Symposium on Network and Distributed System Security (NDSS), 2016.

    Google Scholar 

  13. Mohammad Reza Farhadi, Benjamin CM Fung, Yin Bun Fung, Philippe Charland, Stere Preda, and Mourad Debbabi. Scalable code clone search for malware analysis. Digital Investigation, 15:46–60, 2015.

    Article  Google Scholar 

  14. Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. Scalable Graph-based Bug Search for Firmware Images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 480–491. ACM, 2016.

    Google Scholar 

  15. Eibe Frank, Yong Wang, Stuart Inglis, Geoffrey Holmes, and Ian H Witten. Using model trees for classification. Machine Learning, 32(1):63–76, 1998.

    Article  Google Scholar 

  16. Hugo Gascon, Fabian Yamaguchi, Daniel Arp, and Konrad Rieck. Structural detection of Android malware using embedded call graphs. In Proceedings of the 2013 ACM workshop on Artificial intelligence and security, pages 45–54. ACM, 2013.

    Google Scholar 

  17. Christopher Griffin. Graph Theory: Penn State Math 485 Lecture Notes, 2012. http://www.personal.psu.edu/cxg286/Math485.pdf.

    Google Scholar 

  18. Ilfak Guilfanov. IDA fast library identification and recognition technology (FLIRT Technology): In-depth. https://www.hex\-rays.com/products/ida/tech/flirt/in_depth.shtml, 2012.

  19. Shohei Hido and Hisashi Kashima. A linear-time graph kernel. In Ninth IEEE International Conference on Data Mining (ICDM’09), pages 179–188. IEEE, 2009.

    Google Scholar 

  20. Xin Hu, Tzi-cker Chiueh, and Kang G Shin. Large-scale malware indexing using function-call graphs. In Proceedings of the 16th ACM conference on Computer and communications security (CCS), pages 611–620. ACM, 2009.

    Google Scholar 

  21. Emily R Jacobson, Nathan Rosenblum, and Barton P Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (PASTE), pages 1–8. ACM, 2011.

    Google Scholar 

  22. Min Gyung Kang, Pongsin Poosankam, and Heng Yin. Renovo: A hidden code extractor for packed executables. In Proceedings of the 2007 ACM Workshop on Recurring Malcode (WORM), pages 46–53. ACM, 2007.

    Google Scholar 

  23. Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. Polymorphic worm detection using structural information of executables. In International Workshop on Recent Advances in Intrusion Detection (RAID), pages 207–226. Springer, 2005.

    Google Scholar 

  24. Lorenzo Livi and Antonello Rizzi. The graph matching problem. Pattern Analysis and Applications, 16(3):253–283, 2013.

    Article  MathSciNet  Google Scholar 

  25. Lorenzo Martignoni, Mihai Christodorescu, and Somesh Jha. Omniunpack: Fast, generic, and safe unpacking of malware. In Twenty-Third Annual Computer Security Applications Conference (ACSAC), pages 431–441. IEEE, 2007.

    Google Scholar 

  26. Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(8):1226–1238, 2005.

    Article  Google Scholar 

  27. Jing Qiu, Xiaohong Su, and Peijun Ma. Using reduced execution flow graph to identify library functions in binary code. IEEE Transactions on Software Engineering (TSE), 42(2):187–202, 2016.

    Article  Google Scholar 

  28. Babak Bashari Rad, Maslin Masrom, and Suahimi Ibrahim. Opcodes histogram for classifying metamorphic portable executables malware. In e-Learning and e-Technologies in Education (ICEEE), 2012 International Conference on, pages 209–213. IEEE, 2012.

    Google Scholar 

  29. M Ramaswami and R Bhaskaran. A study on feature selection techniques in educational data mining. arXiv preprint arXiv:0912.3924, 2009.

    Google Scholar 

  30. Danny Roobaert, Grigoris Karakoulas, and Nitesh V Chawla. Information Gain, Correlation and Support Vector Machines. In Feature Extraction, pages 463–470. Springer, 2006.

    Google Scholar 

  31. Annie H Toderici and Mark Stamp. Chi-squared distance and metamorphic virus detection. Journal of Computer Virology and Hacking Techniques, 9(1):1–14, 2013.

    Google Scholar 

  32. Eric R Ziegel. Probability and Statistics for Engineering and the Sciences. Technometrics, 2012.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alrabaee, S. et al. (2020). Library Function Identification. In: Binary Code Fingerprinting for Cybersecurity. Advances in Information Security, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34238-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34237-1

  • Online ISBN: 978-3-030-34238-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics