Abstract
This paper presents a preliminary validation of common textual information retrieval techniques for mapping unstructured software vulnerability information to distinct software weaknesses. The validation is carried out with a dataset compiled from four software repositories tracked in the Snyk vulnerability database. According to the results, the information retrieval techniques used perform unsatisfactorily compared to regular expression searches. Although the results vary from a repository to another, the preliminary validation presented indicates that explicit referencing of vulnerability and weakness identifiers is preferable for concrete vulnerability tracking. Such referencing allows the use of keyword-based searches, which currently seem to yield more consistent results compared to information retrieval techniques. Further validation work is required for improving the precision of the techniques, however.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alsaleh, M.N., Al-Shaer, E., Husari, G.: ROI-driven cyber risk mitigation using host compliance and network configuration. J. Netw. Syst. Manag. 25(4), 759–783 (2017)
Bojanova, I., Black, P.E., Yesha, Y., Wu, Y.: The bugs framework (BF): a structured approach to express bugs. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security (QRS 2016), Vienna, pp. 175–182. IEEE (2016)
dos Santos, J.C.A., Favero, E.L.: Practical use of a latent semantic analysis (LSA) model for automatic evaluation of written answers. J. Braz. Comput. Soc. 21(1), 1–21 (2015)
Du, D.: Refining traceability links between vulnerability and software component in a vulnerability knowledge graph. In: Mikkonen, T., Klamma, R., Hernández, J. (eds.) ICWE 2018. LNCS, vol. 10845, pp. 33–49. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91662-0_3
Fautsch, C., Savoy, J.: Adapting the TF IDF vector-space model to domain specific information retrieval. In: Proceedings of the 2010 ACM Symposium on Applied Computing (SAC 2010), Sierre, pp. 1708–1712. ACM (2010)
Franqueira, V.N.L., Tun, T.T., Yu, Y., Wieringa, R., Nuseibeh, B.: Risk and argument: a risk-based argumentation method for practical security. In: Proceedings of the IEEE 19th International Requirements Engineering Conference (RE 2011), Trento, pp. 239–248. IEEE (2011)
Gamallo, P., Bordag, S.: Is singular value decomposition useful for word similarity extraction? Lang. Resour. Eval. 45(2), 95–119 (2011)
Goseva-Popstojanova, K., Tyo, J.: Experience report: security vulnerability profiles of mission critical software: empirical analysis of security related bug reports. In: Proceedings of the IEEE 28th International Symposium on Software Reliability Engineering (ISSRE 2017), Toulouse, pp. 152–163. IEEE (2017)
Hale, M.L., Gamble, R.F.: Semantic hierarchies for extracting, modeling, and connecting compliance requirements in information security control standards. Requir. Eng. 1–38 (2018). Published online in December 2017
Han, Z., Li, X., Liu, H., Xing, Z., Feng, Z.: DeepWeak: reasoning common software weaknesses via knowledge graph embedding. In: Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER 2018), Campobasso, pp. 456–466. IEEE (2018)
Hussain, S.F., Suryani, A.: On retrieving intelligently plagiarized documents using semantic similarity. Eng. Appl. Artif. Intell. 45, 246–258 (2015)
Ibrahim, O.A.S., Landa-Silva, D.: Term frequency with average term occurrences for textual information retrieval. Soft. Comput. 20(8), 3045–3061 (2016)
Jimenez, M., Papadakis, M., Traon, Y.L.: An empirical analysis of vulnerabilities in OpenSSL and the Linux Kernel. In: Proceedings of the 23rd Asia-Pacific Software Engineering Conference (APSEC 2016), Hamilton, pp. 105–112. IEEE (2016)
Jin, R., Chai, J.Y., Si, L.: Learn to weight terms in information retrieval using category information. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), Bonn, pp. 353–360. ACM (2005)
Kang, J., Park, J.H.: A secure-coding and vulnerability check system based on smart-fuzzing and exploit. Neurocomputing 256, 23–34 (2017)
Martin, R.A., Barnum, S.: Common weaknesses enumeration (CWE) status update. ACM SIGAda Ada Lett. Arch. XXVII(1), 88–91 (2008)
McManus, J.: SEI CERT Oracle Coding Standard for Java, Carnegie Mellon University, Software Engineering Institute (SEI) (2018). https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java. Accessed May 2018
MITRE: Common Weaknesses Enumeration, CWE List Version 3.1, CWE Comprehensive View (2018). http://cwe.mitre.org/data/csv/2000.csv.zip. Accessed April 2018
MITRE: CWE VIEW: Weaknesses Originally Used by NVD from 2008 to 2016 (2018). http://cwe.mitre.org/data/definitions/635.html. Accessed January 2018
Mitropoulos, D., Karakoidas, V., Louridas, P., Gousios, G., Spinellis, D.: The bug catalog of the maven ecosystem. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014), Hyderabad, pp. 372–375. ACM (2014)
Muñoz, F.R., Villalba, L.J.G.: An algorithm to find relationships between web vulnerabilities. J. Supercomput. 74(3), 1061–1089 (2018)
Murtaza, S., Khreich, W., Hamou-Lhadj, A., Bener, A.B.: Mining trends and patterns of software vulnerabilities. J. Syst. Softw. 117, 218–228 (2016)
NIST: NVD Data Feeds, National Institute of Standards and Technology (NIST) (2018). https://nvd.nist.gov/vuln/data-feeds. Accessed April 2018
The Natural Language Toolkit (NLTK): NLTK 3.2.5 Documentation (2017). http://www.nltk.org. Accessed April 2018
Oyetoyan, T.D., Milosheska, B., Grini, M., Soares Cruzes, D.: Myths and facts about static application security testing tools: an action research at Telenor digital. In: Garbajosa, J., Wang, X., Aguiar, A. (eds.) XP 2018. LNBIP, vol. 314, pp. 86–103. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91602-6_6
Paik, J.H.: A novel TF-IDF weighting scheme for effective ranking. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2013), Dublin, pp. 343–352. ACM (2013)
Peclat, R.N., Ramos, G.N.: Semantic analysis for identifying security concerns in software procurement edicts. New Gener. Comput. 36(1), 21–40 (2018)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Raemaekers, S., van Deursen, A., Visser, J.: Semantic versioning and impact of breaking changes in the maven repository. J. Syst. Softw. 129, 140–158 (2017)
Ruohonen, J.: Classifying web exploits with topic modeling. In: Proceedings of the 28th International Workshop on Database and Expert Systems Applications (DEXA 2017), Lyon, pp. 93–97. IEEE (2017)
Ruohonen, J., Rauti, S., Hyrynsalmi, S., Leppänen, V.: Mining social networks of open source CVE coordination. In: Proceedings of the 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement (IWSM Mensura 2017), Gothenburg, pp. 176–188. ACM (2017)
Snyk Ltd.: Snyk Vulnerability Database (2018). https://github.com/snyk/vulnerabilitydb. Accessed April 2018
Squire, M.: Data sets describing the circle of life in Ruby hosting, 2003–2016. Empir. Softw. Eng. 23(2), 1123–1152 (2018)
Tsipenyuk, K., Chess, B., McGraw, G.: Seven Pernicious Kingdoms: a taxonomy of software security errors. IEEE Secur. Priv. 3(6), 81–84 (2005)
Wen, T., Zhang, Y., Wu, Q., Yang, G.: ASVC: an automatic security vulnerability categorization framework based on novel features of vulnerability data. J. Commun. 10(2), 107–116 (2015)
Wu, Y., Gandhi, R.A., Siy, H.: Using semantic templates to study vulnerabilities recorded in large software repositories. In: Proceedings of the 2010 ICSE Workshop on Software Engineering for Secure Systems (SESS 2010), Cape Town, pp. 22–28. ACM (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ruohonen, J., Leppänen, V. (2018). Toward Validation of Textual Information Retrieval Techniques for Software Weaknesses. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-99133-7_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99132-0
Online ISBN: 978-3-319-99133-7
eBook Packages: Computer ScienceComputer Science (R0)