Revisiting the Evaluation of Diversified Search Evaluation Metrics with User Preferences

Chen, Fei; Liu, Yiqun; Dou, Zhicheng; Xu, Keyang; Cao, Yujie; Zhang, Min; Ma, Shaoping

doi:10.1007/978-3-319-12844-3_5

Revisiting the Evaluation of Diversified Search Evaluation Metrics with User Preferences

Fei Chen²²,
Yiqun Liu²²,
Zhicheng Dou²³,
Keyang Xu²²,
Yujie Cao²²,
Min Zhang²² &
…
Shaoping Ma²²

Conference paper

1403 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Abstract

To validate the credibility of diversity evaluation metrics, a number of methods that “evaluate evaluation metrics” are adopted in diversified search evaluation studies, such as Kendall’s τ, Discriminative Power, and the Intuitiveness Test. These methods have been widely adopted and have aided us in gaining much insight into the effectiveness of evaluation metrics. However, they also follow certain types of user behaviors or statistical assumptions and do not take the information of users’ actual search preferences into consideration. With multi-grade user preference judgments collected for diversified search result lists displayed parallel, we take user preferences as the ground truth to investigate the evaluation of diversity metrics. We find that user preference at the subtopic level gain similar results with those at the topic level, which means we can use user preference at the topic level with much less human efforts in future experiments. We further find that most existing evaluation metrics correlate with user preferences well for result lists with large performance differences, no matter the differences is detected by the metric or the users. According to these findings, we then propose a preference-weighted correlation, the Multi-grade User Preference (MUP) method, to evaluate the diversity metrics based on user preferences. The experimental results reveal that MUP evaluates diversity metrics from real users’ perspective that may differ from other methods. In addition, we find the relevance of the search result is more important than the diversity of the search result in the diversified search evaluation of our experiments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gollapudi, S., Halverson, A., Leong, S.: Diversifying search results. In: Proc. of ACM WSDM 2009, pp. 1043–1052. ACM, Barcelona (2009)
Google Scholar
Amigó, E., Gonzalo, J., Verdejo, F.: A general evaluation measure for document organization tasks. In: Proc. of SIGIR 2013, pp. 643–652. ACM, Ireland (2013)
Google Scholar
Ashkan, A., Clarke, C.L.A.: On the informativeness of cascade and intent-aware effectiveness measures. In: Proc. of ACM, Hyderabad, India, pp. 407–416 (2011)
Google Scholar
Aslam, J.A., Pavlu, V., Savell, R.: A unified model for metasearch, pooling, and system evaluation. In: Proc. of ACM CIKM 2003, pp. 484–491. ACM, New Orleans (2003)
Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proc. of ACM SIGIR 2004, pp. 25–32. ACM, New York (2001)
Google Scholar
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proc. of ACM CIKM 2009, pp. 621–630. ACM, New York (2009)
Google Scholar
Clarke, C.L.A., Kolla, M., Cormack, G.V., Vechtomova, O.: Novelty and diversity in information retrieval evaluation. In: Proc. of ACM SIGIR 2008, pp. 659–666. ACM, Singapore (2008)
Google Scholar
Clarke, C.L.A., Kolla, M., Vechtomova, O.: An effectiveness measure for ambiguous and underspecified queries. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 188–199. Springer, Heidelberg (2009)
Chapter Google Scholar
Kendall, M.: A new measure of rank correlation. Biometrica 30, 81–89 (1938)
Article MATH Google Scholar
Moffat, A.: Seven numeric properties of effectiveness metrics. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 1–12. Springer, Heidelberg (2013)
Chapter Google Scholar
Sakai, T.: Evaluating evaluation metrics based on the bootstrap. In: Proc. of ACM SIGIR 2006, pp. 525–532. ACM, Seattle (2006)
Google Scholar
Sakai, T.: Evaluation with informational and navigational intents. In: Proc.s of ACM WWW 2012, pp. 499–508. ACM, Lyon (2012)
Google Scholar
Sakai, T., Dou, Z., Yamamoto, T., Liu, Y., Zhang, M., Song, R.: Overview of the ntcir-10 intent-2 task. In: Proc. of NTCIR 2010, Tokyo, Japan (2011)
Google Scholar
Sakai, T., Song, R.: Evaluating diversified search results using per-intent graded relevance. In: Proc. of SIGIR 2011, pp. 1043–1052. ACM, Beijing (2011)
Google Scholar
Sakai, T., Song, R.: Diversified search evaluation: Lessons from the ntcir-9 intent task. Journal of Information Retrieval 16, 504–529 (2013)
Article Google Scholar
Sanderson, M., Paramita, M.L., Clough, P., Kanoulas, E.: Do user preferences and evaluation measures line up? In: Proc. of ACM SIGIR 2010, pp. 555–562. ACM, Geneva (2010)
Google Scholar
Smucker, M.D., Clarke, C.L.A.: Time-based calibration of effectiveness measures. In: Proc. of ACM SIGIR 2012, pp. 95–104. ACM, Portland (2012)
Google Scholar
Turpin, A., Scholer, F.: User performance versus precision measures for simple search tasks. In: Proc. of SIGIR 2006, pp. 11–18. ACM, Seattle (2006)
Google Scholar
Turpin, A.H., Hersh, W.: Why batch and user evaluations do not give the same results. In: Proc. of SIGIR 2001, pp. 225–231. ACM, New Orleans (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua University, Beijing, China
Fei Chen, Yiqun Liu, Keyang Xu, Yujie Cao, Min Zhang & Shaoping Ma
Renmin University of China, Beijing, China
Zhicheng Dou

Authors

Fei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yiqun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhicheng Dou
View author publications
You can also search for this author in PubMed Google Scholar
Keyang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Cao
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shaoping Ma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Visual Informatic, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Azizah Jaafar
Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Nazlena Mohamad Ali
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Shahrul Azman Mohd Noah
Insight Centre for Data Analytics, Dublin City University, Glasnevin, 9, Dublin, Ireland
Alan F. Smeaton
Information Systems, Queensland University of Technology, 4001, Brisbane, QLD, Australia
Peter Bruza
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia
Zainab Abu Bakar & Nursuriati Jamil &
Cyber Security Center, Universiti Pertahanan Nasional Malaysia, Kem Sungai Besi, 57000, Kuala Lumpur, Malaysia
Tengku Mohd Tengku Sembok

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, F. et al. (2014). Revisiting the Evaluation of Diversified Search Evaluation Metrics with User Preferences. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-12844-3_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics