Toward Validation of Textual Information Retrieval Techniques for Software Weaknesses

Ruohonen, Jukka; Leppänen, Ville

doi:10.1007/978-3-319-99133-7_22

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 903))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

682 Accesses
8 Citations
1 Altmetric

Abstract

This paper presents a preliminary validation of common textual information retrieval techniques for mapping unstructured software vulnerability information to distinct software weaknesses. The validation is carried out with a dataset compiled from four software repositories tracked in the Snyk vulnerability database. According to the results, the information retrieval techniques used perform unsatisfactorily compared to regular expression searches. Although the results vary from a repository to another, the preliminary validation presented indicates that explicit referencing of vulnerability and weakness identifiers is preferable for concrete vulnerability tracking. Such referencing allows the use of keyword-based searches, which currently seem to yield more consistent results compared to information retrieval techniques. Further validation work is required for improving the precision of the techniques, however.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alsaleh, M.N., Al-Shaer, E., Husari, G.: ROI-driven cyber risk mitigation using host compliance and network configuration. J. Netw. Syst. Manag. 25(4), 759–783 (2017)
Article Google Scholar
Bojanova, I., Black, P.E., Yesha, Y., Wu, Y.: The bugs framework (BF): a structured approach to express bugs. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security (QRS 2016), Vienna, pp. 175–182. IEEE (2016)
Google Scholar
dos Santos, J.C.A., Favero, E.L.: Practical use of a latent semantic analysis (LSA) model for automatic evaluation of written answers. J. Braz. Comput. Soc. 21(1), 1–21 (2015)
Article Google Scholar
Du, D.: Refining traceability links between vulnerability and software component in a vulnerability knowledge graph. In: Mikkonen, T., Klamma, R., Hernández, J. (eds.) ICWE 2018. LNCS, vol. 10845, pp. 33–49. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91662-0_3
Chapter Google Scholar
Fautsch, C., Savoy, J.: Adapting the TF IDF vector-space model to domain specific information retrieval. In: Proceedings of the 2010 ACM Symposium on Applied Computing (SAC 2010), Sierre, pp. 1708–1712. ACM (2010)
Google Scholar
Franqueira, V.N.L., Tun, T.T., Yu, Y., Wieringa, R., Nuseibeh, B.: Risk and argument: a risk-based argumentation method for practical security. In: Proceedings of the IEEE 19th International Requirements Engineering Conference (RE 2011), Trento, pp. 239–248. IEEE (2011)
Google Scholar
Gamallo, P., Bordag, S.: Is singular value decomposition useful for word similarity extraction? Lang. Resour. Eval. 45(2), 95–119 (2011)
Article Google Scholar
Goseva-Popstojanova, K., Tyo, J.: Experience report: security vulnerability profiles of mission critical software: empirical analysis of security related bug reports. In: Proceedings of the IEEE 28th International Symposium on Software Reliability Engineering (ISSRE 2017), Toulouse, pp. 152–163. IEEE (2017)
Google Scholar
Hale, M.L., Gamble, R.F.: Semantic hierarchies for extracting, modeling, and connecting compliance requirements in information security control standards. Requir. Eng. 1–38 (2018). Published online in December 2017
Google Scholar
Han, Z., Li, X., Liu, H., Xing, Z., Feng, Z.: DeepWeak: reasoning common software weaknesses via knowledge graph embedding. In: Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER 2018), Campobasso, pp. 456–466. IEEE (2018)
Google Scholar
Hussain, S.F., Suryani, A.: On retrieving intelligently plagiarized documents using semantic similarity. Eng. Appl. Artif. Intell. 45, 246–258 (2015)
Article Google Scholar
Ibrahim, O.A.S., Landa-Silva, D.: Term frequency with average term occurrences for textual information retrieval. Soft. Comput. 20(8), 3045–3061 (2016)
Article Google Scholar
Jimenez, M., Papadakis, M., Traon, Y.L.: An empirical analysis of vulnerabilities in OpenSSL and the Linux Kernel. In: Proceedings of the 23rd Asia-Pacific Software Engineering Conference (APSEC 2016), Hamilton, pp. 105–112. IEEE (2016)
Google Scholar
Jin, R., Chai, J.Y., Si, L.: Learn to weight terms in information retrieval using category information. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), Bonn, pp. 353–360. ACM (2005)
Google Scholar
Kang, J., Park, J.H.: A secure-coding and vulnerability check system based on smart-fuzzing and exploit. Neurocomputing 256, 23–34 (2017)
Article Google Scholar
Martin, R.A., Barnum, S.: Common weaknesses enumeration (CWE) status update. ACM SIGAda Ada Lett. Arch. XXVII(1), 88–91 (2008)
Article Google Scholar
McManus, J.: SEI CERT Oracle Coding Standard for Java, Carnegie Mellon University, Software Engineering Institute (SEI) (2018). https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java. Accessed May 2018
MITRE: Common Weaknesses Enumeration, CWE List Version 3.1, CWE Comprehensive View (2018). http://cwe.mitre.org/data/csv/2000.csv.zip. Accessed April 2018
MITRE: CWE VIEW: Weaknesses Originally Used by NVD from 2008 to 2016 (2018). http://cwe.mitre.org/data/definitions/635.html. Accessed January 2018
Mitropoulos, D., Karakoidas, V., Louridas, P., Gousios, G., Spinellis, D.: The bug catalog of the maven ecosystem. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014), Hyderabad, pp. 372–375. ACM (2014)
Google Scholar
Muñoz, F.R., Villalba, L.J.G.: An algorithm to find relationships between web vulnerabilities. J. Supercomput. 74(3), 1061–1089 (2018)
Article Google Scholar
Murtaza, S., Khreich, W., Hamou-Lhadj, A., Bener, A.B.: Mining trends and patterns of software vulnerabilities. J. Syst. Softw. 117, 218–228 (2016)
Article Google Scholar
NIST: NVD Data Feeds, National Institute of Standards and Technology (NIST) (2018). https://nvd.nist.gov/vuln/data-feeds. Accessed April 2018
The Natural Language Toolkit (NLTK): NLTK 3.2.5 Documentation (2017). http://www.nltk.org. Accessed April 2018
Oyetoyan, T.D., Milosheska, B., Grini, M., Soares Cruzes, D.: Myths and facts about static application security testing tools: an action research at Telenor digital. In: Garbajosa, J., Wang, X., Aguiar, A. (eds.) XP 2018. LNBIP, vol. 314, pp. 86–103. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91602-6_6
Chapter Google Scholar
Paik, J.H.: A novel TF-IDF weighting scheme for effective ranking. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2013), Dublin, pp. 343–352. ACM (2013)
Google Scholar
Peclat, R.N., Ramos, G.N.: Semantic analysis for identifying security concerns in software procurement edicts. New Gener. Comput. 36(1), 21–40 (2018)
Article Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Raemaekers, S., van Deursen, A., Visser, J.: Semantic versioning and impact of breaking changes in the maven repository. J. Syst. Softw. 129, 140–158 (2017)
Article Google Scholar
Ruohonen, J.: Classifying web exploits with topic modeling. In: Proceedings of the 28th International Workshop on Database and Expert Systems Applications (DEXA 2017), Lyon, pp. 93–97. IEEE (2017)
Google Scholar
Ruohonen, J., Rauti, S., Hyrynsalmi, S., Leppänen, V.: Mining social networks of open source CVE coordination. In: Proceedings of the 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement (IWSM Mensura 2017), Gothenburg, pp. 176–188. ACM (2017)
Google Scholar
Snyk Ltd.: Snyk Vulnerability Database (2018). https://github.com/snyk/vulnerabilitydb. Accessed April 2018
Squire, M.: Data sets describing the circle of life in Ruby hosting, 2003–2016. Empir. Softw. Eng. 23(2), 1123–1152 (2018)
Article Google Scholar
Tsipenyuk, K., Chess, B., McGraw, G.: Seven Pernicious Kingdoms: a taxonomy of software security errors. IEEE Secur. Priv. 3(6), 81–84 (2005)
Article Google Scholar
Wen, T., Zhang, Y., Wu, Q., Yang, G.: ASVC: an automatic security vulnerability categorization framework based on novel features of vulnerability data. J. Commun. 10(2), 107–116 (2015)
Article Google Scholar
Wu, Y., Gandhi, R.A., Siy, H.: Using semantic templates to study vulnerabilities recorded in large software repositories. In: Proceedings of the 2010 ICSE Workshop on Software Engineering for Secure Systems (SESS 2010), Cape Town, pp. 22–28. ACM (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Future Technologies, University of Turku, Turku, Finland
Jukka Ruohonen & Ville Leppänen

Authors

Jukka Ruohonen
View author publications
You can also search for this author in PubMed Google Scholar
Ville Leppänen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jukka Ruohonen .

Editor information

Editors and Affiliations

University of Tunis, Tunis, Tunisia
Mourad Elloumi
MiCS, Media Computer Science, University of Passau, Passau, Bayern, Germany
Michael Granitzer
IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
University of Twente, Enschede, Overijssel, The Netherlands
Christin Seifert
Fak. Medien, Bauhaus Universität Weimar, Weimar, Thüringen, Germany
Benno Stein
Inst. für Softwaretechnik, Vienna University of Technology, Vienna, Austria
A Min Tjoa
FAW, Johannes Kepler University of Linz, Linz, Austria
Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruohonen, J., Leppänen, V. (2018). Toward Validation of Textual Information Retrieval Techniques for Software Weaknesses. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-99133-7_22
Published: 07 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99132-0
Online ISBN: 978-3-319-99133-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics