Measures for Benchmarking Semantic Web Service Matchmaking Correctness

  • Ulrich Küster
  • Birgitta König-Ries
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6089)


Semantic Web Services (SWS) promise to take service oriented computing to a new level by allowing to semi-automate time-consuming programming tasks. At the core of SWS are solutions to the problem of SWS matchmaking, i.e., the problem of filtering and ranking a set of services with respect to a service query. Comparative evaluations of different approaches to this problem form the base for future progress in this area. Reliable evaluations require informed choices of evaluation measures and parameters. This paper establishes a solid foundation for such choices by providing a systematic discussion of the characteristics and behavior of various retrieval correctness measures in theory and through experimentation.


Relevant Item Gain Function Relevance Judgment Discount Function Binary Relevance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Klusch, M.: Semantic web service coordination. In: Schumacher, M., Helin, H. (eds.) CASCOM - Intelligent Service Coordination in the Semantic Web. Springer, Heidelberg (2008)Google Scholar
  2. 2.
    Küster, U., König-Ries, B.: Relevance judgments for web services retrieval - a methodology and test collection for sws discovery evaluation. In: Proc. of the 7th IEEE European Conference on Web Services (ECOWS 2009), Eindhoven, The Netherlands (2009)Google Scholar
  3. 3.
    Küster, U., König-Ries, B.: Evaluating semantic web service matchmaking effectiveness based on graded relevance. In: Proc. of the 2nd International Workshop SMR2 on Service Matchmaking and Resource Retrieval in the Semantic Web at ISWC 2008 (2008)Google Scholar
  4. 4.
    Küster, U., König-Ries, B., Petrie, C., Klusch, M.: On the evaluation of semantic web service frameworks. International Journal on Semantic Web and Information Systems 4(4) (2008)Google Scholar
  5. 5.
    Tsetsos, V., Anagnostopoulos, C., Hadjiefthymiades, S.: On the evaluation of semantic web service matchmaking systems. In: 4th IEEE European Conference on Web Services (ECOWS 2006), Zürich, Switzerland (2006)Google Scholar
  6. 6.
    Sakai, T.: Ranking the NTCIR systems based on multigrade relevance. In: Revised Selected Papers of the Asia IR Symposium, Beijing, China, pp. 251–262 (2004)Google Scholar
  7. 7.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)CrossRefGoogle Scholar
  8. 8.
    Sakai, T.: On the reliability of information retrieval metrics based on graded relevance. Information Processing and Management 43(2), 531–548 (2007)CrossRefGoogle Scholar
  9. 9.
    Kishida, K.: Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments. Technical Report NII-2005-014E, National Institute of Informatics, Tokyo, Japan (2005)Google Scholar
  10. 10.
    Saracevic, T.: Effects of inconsistent relevance judgments on information retrieval test results: A historical perspective. Library Trends 56(4), 763–783 (2008)Google Scholar
  11. 11.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  12. 12.
    Demartini, G., Mizzaro, S.: A classification of IR effectiveness metrics. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 488–491. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Melucci, M.: On rank correlation in information retrieval evaluation. ACM SIGIR Forum 41(1), 18–33 (2007)CrossRefGoogle Scholar
  14. 14.
    Burges, C.J.C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.N.: Learning to rank using gradient descent. In: Proc. of the Twenty-Second International Conference on Machine Learning (ICML 2005), Bonn, Germany, pp. 89–96 (2005)Google Scholar
  15. 15.
    Küster, U., König-Ries, B.: Towards standard test collections for the empirical evaluation of semantic web service approaches. International Journal of Semantic Computing 2(3), 381–402 (2008)CrossRefGoogle Scholar
  16. 16.
    Voorhees, E.M.: Evaluation by highly relevant documents. In: Proc. of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2001), New Orleans, LA, USA, pp. 74–82 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ulrich Küster
    • 1
  • Birgitta König-Ries
    • 1
  1. 1.Institute of Computer ScienceFriedrich-Schiller-University JenaJenaGermany

Personalised recommendations