Ontological Evolutionary Encoding to Bridge Machine Learning and Conceptual Models: Approach and Industrial Evaluation

  • Ana C. MarcénEmail author
  • Francisca Pérez
  • Carlos Cetina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10650)


In this work, we propose an evolutionary ontological encoding approach to enable Machine Learning techniques to be used to perform Software Engineering tasks in models. The approach is based on a domain ontology to encode a model and on an Evolutionary Algorithm to optimize the encoding. As a result, the encoded model that is returned by the approach can then be used by Machine Learning techniques to perform Software Engineering tasks such as concept location, traceability link retrieval, reuse, impact analysis, etc. We have evaluated the approach with an industrial case study to recover the traceability link between the requirements and the models through a Machine Learning technique (RankBoost). Our results in terms of recall, precision, and the combination of both (F-measure) show that our approach outperforms the baseline (Latent Semantic Indexing). We also performed a statistical analysis to assess the magnitude of the improvement.


Machine learning Traceability link recovery Evolutionary computation Model driven engineering 



This work has been developed with the financial support of the Spanish Ministry of Economy and Competitiveness under the project TIN2016-80811-P and co-financed with ERDF. We also thank both ITEA3 15010 REVaMP\(^2\) Project and MINECO TIN2015-64397-R VARIAMOS Project.


  1. 1.
    Apache opennlp: Toolkit for the processing of natural language text. Accessed Apr 2017
  2. 2.
    Efficient Java matrix library. Accessed Apr 2017
  3. 3.
    The English (porter2) stemming algorithm. Accessed Apr 2017
  4. 4.
    Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28(10), 970–983 (2002)CrossRefGoogle Scholar
  5. 5.
    Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014)CrossRefGoogle Scholar
  6. 6.
    Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Softw. Eng. 18(3), 594–623 (2013)CrossRefGoogle Scholar
  7. 7.
    B Le, T.D., Lo, D., Le Goues, C., Grunske, L.: A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 177–188. ACM (2016)Google Scholar
  8. 8.
    Bianchini, M., Maggini, M., Jain, L.C.: Handbook on Neural Information Processing. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-36657-4CrossRefGoogle Scholar
  9. 9.
    Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 129–136. ACM, New York (2007)Google Scholar
  10. 10.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRefGoogle Scholar
  11. 11.
    Dang, V.: The lemur project - wiki - ranklib (2013). Accessed Apr 2017
  12. 12.
    De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Enhancing an artefact management system with traceability recovery features. In: Proceedings of 20th IEEE International Conference on Software Maintenance, pp. 306–315. IEEE (2004)Google Scholar
  13. 13.
    Dyer, D.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for Java). Accessed Apr 2017
  14. 14.
    Eaddy, M., Aho, A., Murphy, G.C.: Identifying, assigning, and quantifying crosscutting concerns. In: Proceedings of the First International Workshop on Assessment of Contemporary Modularization Techniques, p. 2 (2007)Google Scholar
  15. 15.
    Eaddy, M., Aho, A.V., Antoniol, G., Guéhéneuc, Y.G.: Cerberus: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In: ICPC 2008 Conference, pp. 53–62. IEEE (2008)Google Scholar
  16. 16.
    Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Haiduc, S., Bavota, G., Oliveto, R., De Lucia, A., Marcus, A.: Automatic query performance assessment during the retrieval of software artifacts. In: International Conference on Automated Software Engineering, pp. 90–99. ACM (2012)Google Scholar
  18. 18.
    Hirzel, A.H., Le Lay, G., Helfer, V., Randin, C., Guisan, A.: Evaluating the ability of habitat suitability models to predict species presences. Ecol. Model. 199(2), 142–152 (2006)CrossRefGoogle Scholar
  19. 19.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi: 10.1007/BFb0026683CrossRefGoogle Scholar
  20. 20.
    Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, Stanford, CA, vol. 14, pp. 1137–1145 (1995)Google Scholar
  21. 21.
    Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, pp. 214–223, November 2004Google Scholar
  22. 22.
    Marcus, A., Maletic, J.I.: Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th International Conference on Software Engineering, pp. 125–135. IEEE (2003)Google Scholar
  23. 23.
    Navot, A., Shpigelman, L., Tishby, N., Vaadia, E.: Nearest neighbor based feature selection for regression and its application to neural activity. Adv. Neural Inf. Process. Syst. 18, 995 (2006)Google Scholar
  24. 24.
    Poshyvanyk, D., Gueheneuc, Y.G., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng. 33(6), 420–432 (2007)CrossRefGoogle Scholar
  25. 25.
    Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 465–474, November 2013Google Scholar
  26. 26.
    Shabtai, A., Moskovitch, R., Elovici, Y., Glezer, C.: Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Inf. Secur. Tech. Rep. 14(1), 16–29 (2009)CrossRefGoogle Scholar
  27. 27.
    Svendsen, A., Zhang, X., Lind-Tviberg, R., Fleurey, F., Haugen, Ø., Møller-Pedersen, B., Olsen, G.K.: Developing a software product line for train control: a case study of CVL. In: Bosch, J., Lee, J. (eds.) SPLC 2010. LNCS, vol. 6287, pp. 106–120. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15579-6_8CrossRefGoogle Scholar
  28. 28.
    Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)Google Scholar
  29. 29.
    Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer Science & Business Media, Heidelberg (2012). doi: 10.1007/978-3-642-29044-2CrossRefzbMATHGoogle Scholar
  30. 30.
    Wolf, L., Martin, I.: Robust boosting for learning from few examples. In: Computer Vision and Pattern Recognition, vol. 1, pp. 359–364. IEEE (2005)Google Scholar
  31. 31.
    Xuan, J., Monperrus, M.: Learning to combine multiple ranking metrics for fault localization. In: Proceedings of the 30th International Conference on Software Maintenance and Evolution (2014)Google Scholar
  32. 32.
    Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRefGoogle Scholar
  33. 33.
    Ye, X., Bunescu, R., Liu, C.: Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 689–699. ACM (2014)Google Scholar
  34. 34.
    Ye, X., Bunescu, R., Liu, C.: Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans. Softw. Eng. 42(4), 379–402 (2016)CrossRefGoogle Scholar
  35. 35.
    Zisman, A., Spanoudakis, G., Pérez-Miñana, E., Krause, P.: Tracing software requirements artifacts. In: Software Engineering Research and Practice, pp. 448–455 (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ana C. Marcén
    • 1
    • 2
    Email author
  • Francisca Pérez
    • 2
  • Carlos Cetina
    • 2
  1. 1.Centro de Investigación en Métodos de Producción de SoftwareUniversitat Politècnica de ValènciaValenciaSpain
  2. 2.SVIT Research GroupUniversidad San JorgeZaragozaSpain

Personalised recommendations