Skip to main content

Ontological Evolutionary Encoding to Bridge Machine Learning and Conceptual Models: Approach and Industrial Evaluation

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10650))

Included in the following conference series:

Abstract

In this work, we propose an evolutionary ontological encoding approach to enable Machine Learning techniques to be used to perform Software Engineering tasks in models. The approach is based on a domain ontology to encode a model and on an Evolutionary Algorithm to optimize the encoding. As a result, the encoded model that is returned by the approach can then be used by Machine Learning techniques to perform Software Engineering tasks such as concept location, traceability link retrieval, reuse, impact analysis, etc. We have evaluated the approach with an industrial case study to recover the traceability link between the requirements and the models through a Machine Learning technique (RankBoost). Our results in terms of recall, precision, and the combination of both (F-measure) show that our approach outperforms the baseline (Latent Semantic Indexing). We also performed a statistical analysis to assess the magnitude of the improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    www.caf.net/en.

References

  1. Apache opennlp: Toolkit for the processing of natural language text. https://opennlp.apache.org/. Accessed Apr 2017

  2. Efficient Java matrix library. http://ejml.org/. Accessed Apr 2017

  3. The English (porter2) stemming algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html. Accessed Apr 2017

  4. Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28(10), 970–983 (2002)

    Article  Google Scholar 

  5. Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014)

    Article  Google Scholar 

  6. Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Softw. Eng. 18(3), 594–623 (2013)

    Article  Google Scholar 

  7. B Le, T.D., Lo, D., Le Goues, C., Grunske, L.: A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 177–188. ACM (2016)

    Google Scholar 

  8. Bianchini, M., Maggini, M., Jain, L.C.: Handbook on Neural Information Processing. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36657-4

    Book  Google Scholar 

  9. Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 129–136. ACM, New York (2007)

    Google Scholar 

  10. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  11. Dang, V.: The lemur project - wiki - ranklib (2013). http://sourceforge.net/p/lemur/wiki/RankLib/. Accessed Apr 2017

  12. De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Enhancing an artefact management system with traceability recovery features. In: Proceedings of 20th IEEE International Conference on Software Maintenance, pp. 306–315. IEEE (2004)

    Google Scholar 

  13. Dyer, D.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for Java). http://watchmaker.uncommons.org/. Accessed Apr 2017

  14. Eaddy, M., Aho, A., Murphy, G.C.: Identifying, assigning, and quantifying crosscutting concerns. In: Proceedings of the First International Workshop on Assessment of Contemporary Modularization Techniques, p. 2 (2007)

    Google Scholar 

  15. Eaddy, M., Aho, A.V., Antoniol, G., Guéhéneuc, Y.G.: Cerberus: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In: ICPC 2008 Conference, pp. 53–62. IEEE (2008)

    Google Scholar 

  16. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)

    MathSciNet  MATH  Google Scholar 

  17. Haiduc, S., Bavota, G., Oliveto, R., De Lucia, A., Marcus, A.: Automatic query performance assessment during the retrieval of software artifacts. In: International Conference on Automated Software Engineering, pp. 90–99. ACM (2012)

    Google Scholar 

  18. Hirzel, A.H., Le Lay, G., Helfer, V., Randin, C., Guisan, A.: Evaluating the ability of habitat suitability models to predict species presences. Ecol. Model. 199(2), 142–152 (2006)

    Article  Google Scholar 

  19. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi:10.1007/BFb0026683

    Chapter  Google Scholar 

  20. Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, Stanford, CA, vol. 14, pp. 1137–1145 (1995)

    Google Scholar 

  21. Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, pp. 214–223, November 2004

    Google Scholar 

  22. Marcus, A., Maletic, J.I.: Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th International Conference on Software Engineering, pp. 125–135. IEEE (2003)

    Google Scholar 

  23. Navot, A., Shpigelman, L., Tishby, N., Vaadia, E.: Nearest neighbor based feature selection for regression and its application to neural activity. Adv. Neural Inf. Process. Syst. 18, 995 (2006)

    Google Scholar 

  24. Poshyvanyk, D., Gueheneuc, Y.G., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng. 33(6), 420–432 (2007)

    Article  Google Scholar 

  25. Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 465–474, November 2013

    Google Scholar 

  26. Shabtai, A., Moskovitch, R., Elovici, Y., Glezer, C.: Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Inf. Secur. Tech. Rep. 14(1), 16–29 (2009)

    Article  Google Scholar 

  27. Svendsen, A., Zhang, X., Lind-Tviberg, R., Fleurey, F., Haugen, Ø., Møller-Pedersen, B., Olsen, G.K.: Developing a software product line for train control: a case study of CVL. In: Bosch, J., Lee, J. (eds.) SPLC 2010. LNCS, vol. 6287, pp. 106–120. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15579-6_8

    Chapter  Google Scholar 

  28. Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)

    Google Scholar 

  29. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer Science & Business Media, Heidelberg (2012). doi:10.1007/978-3-642-29044-2

    Book  MATH  Google Scholar 

  30. Wolf, L., Martin, I.: Robust boosting for learning from few examples. In: Computer Vision and Pattern Recognition, vol. 1, pp. 359–364. IEEE (2005)

    Google Scholar 

  31. Xuan, J., Monperrus, M.: Learning to combine multiple ranking metrics for fault localization. In: Proceedings of the 30th International Conference on Software Maintenance and Evolution (2014)

    Google Scholar 

  32. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  33. Ye, X., Bunescu, R., Liu, C.: Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 689–699. ACM (2014)

    Google Scholar 

  34. Ye, X., Bunescu, R., Liu, C.: Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans. Softw. Eng. 42(4), 379–402 (2016)

    Article  Google Scholar 

  35. Zisman, A., Spanoudakis, G., Pérez-Miñana, E., Krause, P.: Tracing software requirements artifacts. In: Software Engineering Research and Practice, pp. 448–455 (2003)

    Google Scholar 

Download references

Acknowledgments

This work has been developed with the financial support of the Spanish Ministry of Economy and Competitiveness under the project TIN2016-80811-P and co-financed with ERDF. We also thank both ITEA3 15010 REVaMP\(^2\) Project and MINECO TIN2015-64397-R VARIAMOS Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana C. Marcén .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Marcén, A.C., Pérez, F., Cetina, C. (2017). Ontological Evolutionary Encoding to Bridge Machine Learning and Conceptual Models: Approach and Industrial Evaluation. In: Mayr, H., Guizzardi, G., Ma, H., Pastor, O. (eds) Conceptual Modeling. ER 2017. Lecture Notes in Computer Science(), vol 10650. Springer, Cham. https://doi.org/10.1007/978-3-319-69904-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69904-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69903-5

  • Online ISBN: 978-3-319-69904-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics