Recommending Library Methods: An Evaluation of the Vector Space Model (VSM) and Latent Semantic Indexing (LSI)

  • Frank McCarey
  • Mel Ó Cinnéide
  • Nicholas Kushmerick
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4039)


The development and maintenance of a reuse repository requires significant investment, planning and managerial support. To minimise risk and ensure a healthy return on investment, reusable components should be accessible, reliable and of a high quality. In this paper we concentrate on accessability; we describe a technique which enables a developer to effectively and conveniently make use of large scale libraries. Unlike most previous solutions to component retrieval, our tool, RASCAL, is a proactive component recommender.

RASCAL recommends a set of task-relevant reusable components to a developer. Recommendations are produced using Collaborative Filtering (CF). We compare and contrast CF effectiveness when using two information retrieval techniques, namely Vector Space Model (VSM) and Latent Semantic Indexing (LSI). We validate our technique on real world examples and find overall results are encouraging; notably, RASCAL can produce reasonably good recommendations when they are most valuable i.e., at an early stage in code development.


Singular Value Decomposition Recommender System Collaborative Filter Vector Space Model Latent Semantic Indexing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mohagheghi, P., et al.: An empirical study of software reuse vs. defect-density and stability. In: ICSE 2004: Proceedings of the 26th International Conference on Software Engineering, Washington, DC, USA, pp. 282–292. IEEE Computer Society, Los Alamitos (2004)CrossRefGoogle Scholar
  2. 2.
    Yongbeom, K., Stohr, E.: Software reuse: Survey and research directions. Management Information Systems 14(4), 113–147 (1998)Google Scholar
  3. 3.
    Ye, Y., Fischer, G.: Reuse-conducive development environments. International Journal of Automated Software Engineering 12, 199–235 (2005)CrossRefGoogle Scholar
  4. 4.
    Poulin, J.: Reuse: Been there done that. Communications of the ACM 42(5) (1999)Google Scholar
  5. 5.
    Inoue, K., et al.: Component rank: relative significance rank for software component search. In: ICSE 2003: Proceedings of the 25th International Conference on Software Engineering, Washington, DC, USA, pp. 14–24. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  6. 6.
    Sarwar, B.M., Karypis, G., Konstan, J.A., Reidl, J.: Item-based collaborative filtering recommendation algorithms. In: World Wide Web, pp. 285–295 (2001)Google Scholar
  7. 7.
    Letsche, T.A., Berry, M.W.: Large-scale information retrieval with latent semantic indexing. Inf. Sci. 100(1-4), 105–137 (1997)CrossRefGoogle Scholar
  8. 8.
    Landauer, T., Foltz, P., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  9. 9.
    Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  10. 10.
    Prieto-Diaz, R., Freeman, P.: Classifying software for reuse. IEEE Software 4(1), 6–16 (1987)CrossRefGoogle Scholar
  11. 11.
    Mili, A., Mili, R., Mittermeir, R.T.: A survey of software reuse libraries. Annals of Software Engineering 5, 349–414 (1998)CrossRefGoogle Scholar
  12. 12.
    Sugumaran, V., Storey, V.C.: A semantic-based approach to component retrieval. SIGMIS Database 34(3), 8–24 (2003)CrossRefGoogle Scholar
  13. 13.
    Girardi, M., Ibrahim, B.: Using english to retrieve software. Journals of Systems and Software 30(3), 249–270 (1995)CrossRefGoogle Scholar
  14. 14.
    Drummond, C.G., Ionescu, D., Holte, R.C.: A learning agent that assists the browsing of software libraries. IEEE Trans. Softw. Eng. 26(12), 1179–1196 (2000)CrossRefGoogle Scholar
  15. 15.
    Sarwar, B.M., et al.: Application of dimensionality reduction in recommender systems–a case study. In: Proceedings of ACM WebKDD Workshop (2000)Google Scholar
  16. 16.
    Marcus, A., Maletic, J.I.: Recovering documentation-to-source-code traceability links using latent semantic indexing. In: ICSE 2003: Proceedings of the 25th International Conference on Software Engineering, Washington, DC, USA, pp. 125–135. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  17. 17.
    Marcus, A., Maletic, J.I.: Identification of high-level concept clones in source code. In: ASE 2001: Proceedings of the 16th IEEE International Conference on Automated Software Engineering, Washington, DC, USA, p. 107. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  18. 18.
    Ebert, J.: Storm - a user story tool (2002),
  19. 19.
    Apache: Apache software foundation - bytecode engineering library (2002-2003) (2003),
  20. 20.
    Dumais, S.: Improving the retrieval of information from external sources. Behavior Research Methods, Instruments and Computers 23(2), 229–236 (1991)CrossRefGoogle Scholar
  21. 21.
    Dumais, S.: Latent semantic indexing (lsi) and trec-2. In: The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215, pp. 105–116 (1994)Google Scholar
  22. 22.
    Zelikovitz, S., Hirsh, H.: Using lsi for text classification in the presence of background text. In: CIKM 2001: Proceedings of the tenth international conference on Information and knowledge management, pp. 113–118. ACM Press, New York (2001)CrossRefGoogle Scholar
  23. 23.
    Berry, M.: Large scale singular value computations. Int. Journal of Supercomputer Applications 6, 13–49 (1992)Google Scholar
  24. 24.
    Bezos, J.: plc., Seattle, WA, USA, 98108–91226 (2004),
  25. 25.
    Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998)Google Scholar
  26. 26.
    McCarey, F., Cinnéide, M.O., Kushmerick, N.: Knowledge reuse for software reuse. In: Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering (2005)Google Scholar
  27. 27.
    van Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Frank McCarey
    • 1
  • Mel Ó Cinnéide
    • 1
  • Nicholas Kushmerick
    • 1
  1. 1.School of Computer Science and InformaticsUniversity College DublinBelfield, Dublin 4Ireland

Personalised recommendations