Towards Risk-Aware Resource Selection

Markov, Ilya; Carman, Mark; Crestani, Fabio

doi:10.1007/978-3-319-12844-3_13

Ilya Markov²²,
Mark Carman²³ &
Fabio Crestani²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Included in the following conference series:

Asia Information Retrieval Symposium

1394 Accesses

Abstract

When searching multiple sources of information it is crucial to select only relevant sources for a given query, thus filtering out non-relevant content. This task is known as resource selection and is used in many areas of information retrieval such as federated and aggregated search, blog distillation, etc. Resource selection often operates with limited and incomplete data and, therefore, is associated with a certain risk of selecting non-relevant sources due to the uncertainty in the produced source ranking. Despite the large volume of research on resource selection, the problem of risk within resource selection has been rarely addressed. In this work we propose a resource selection method based on document score distribution models that supports estimation of uncertainty of produced source scores and results in a novel risk-aware resource selection technique. We analyze two distributed retrieval scenarios and show that many queries are risk-sensitive and, because of that, the proposed risk-aware approach provides a basis for significant improvements in resource selection performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of SIGIR, pp. 21–28 (1995)
Google Scholar
Paltoglou, G., Salampasis, M., Satratzemi, M.: Integral based source selection for uncooperative distributed information retrieval environments. In: Proceeding of workshop on LSDS for IR, pp. 67–74 (2008)
Google Scholar
Shokouhi, M.: Central-rank-based collection selection in uncooperative distributed information retrieval. In: Proceedings of ECIR, pp. 160–172 (2007)
Google Scholar
Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proceedings of SIGIR, pp. 298–305 (2003)
Google Scholar
Thomas, P., Shokouhi, M.: Sushi: scoring scaled samples for server selection. In: Proceedings of SIGIR, pp. 419–426 (2009)
Google Scholar
Callan, J.: Distributed Information Retrieval. In: Advances in Information Retrieval, pp. 127–150. Kluwer Academic Publishers (2000)
Google Scholar
Crestani, F., Markov, I.: Distributed information retrieval and applications. In: Proceedings of ECIR, pp. 865–868 (2013)
Google Scholar
Shokouhi, M., Si, L.: Federated search. Foundations and Trends in Information Retrieval 5, 1–102 (2011)
Article Google Scholar
Markov, I., Azzopardi, L., Crestani, F.: Reducing the uncertainty in resource selection. In: Proceedings of ECIR, pp. 507–519 (2013)
Google Scholar
Zhu, J., Wang, J., Cox, I.J., Taylor, M.J.: Risky business: modeling and exploiting uncertainty in information retrieval. In: Proceedings of SIGIR, pp. 99–106 (2009)
Google Scholar
Nguyen, D., Demeester, T., Trieschnigg, D., Hiemstra, D.: Federated search in the wild: the combined power of over a hundred search engines. In: Proceedings of CIKM, pp. 1874–1878 (2012)
Google Scholar
Kulkarni, A., Tigelaar, A.S., Hiemstra, D., Callan, J.: Shard ranking and cutoff estimation for topically partitioned collections. In: Proceedings of CIKM, pp. 555–564 (2012)
Google Scholar
Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of SIGIR, pp. 254–261 (1999)
Google Scholar
Markov, I., Crestani, F.: Theoretical, qualitative, and quantitative analyses of small-document approaches to resource selection. ACM Transactions on Information Systems 32(2), 9:1–9:37 (2014)
Google Scholar
Aly, R., Hiemstra, D., Demeester, T.: Taily: shard selection using the tail of score distributions. In: Proceedings of SIGIR, pp. 673–682 (2013)
Google Scholar
Baumgarten, C.: A probabilistic solution to the selection and fusion problem in distributed information retrieval. In: Proceedings of SIGIR, pp. 246–253 (1999)
Google Scholar
Markov, I.: Modeling document scores for distributed information retrieval. In: Proceedings of SIGIR, pp. 1321–1322 (2011)
Google Scholar
Arampatzis, A., Robertson, S.: Modeling score distributions in information retrieval. Information Retrieval 14(1), 26–46 (2011)
Article Google Scholar
Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proceedings of SIGIR, pp. 267–275 (2001)
Google Scholar
Arguello, J., Callan, J., Diaz, F.: Classification-based resource selection. In: Proceedings of CIKM, pp. 1277–1286 (2009)
Google Scholar
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems 19(2), 97–130 (2001)
Article Google Scholar
Shokouhi, M., Zobel, J., Scholer, F., Tahaghoghi, S.M.M.: Capturing collection size for distributed non-cooperative retrieval. In: Proceedings of SIGIR, pp. 316–323 (2006)
Google Scholar
Markov, I., Arampatzis, A., Crestani, F.: On cori results merging. In: Proceedings of ECIR, pp. 752–755 (2013)
Google Scholar
Zuccon, G., Azzopardi, L., van Rijsbergen, K.: Back to the roots: Mean-variance analysis of relevance estimations. In: Proceedings of ECIR, pp. 716–720 (2011)
Google Scholar
Wang, J., Zhu, J.: Portfolio theory of information retrieval. In: Proceeding of SIGIR, pp. 115–122 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Lugano (USI), Via G. Buffi 13, Lugano, 6900, Switzerland
Ilya Markov & Fabio Crestani
Monash University, Victoria, 3800, Australia
Mark Carman

Authors

Ilya Markov
View author publications
You can also search for this author in PubMed Google Scholar
Mark Carman
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Crestani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Visual Informatic, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Azizah Jaafar
Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Nazlena Mohamad Ali
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Shahrul Azman Mohd Noah
Insight Centre for Data Analytics, Dublin City University, Glasnevin, 9, Dublin, Ireland
Alan F. Smeaton
Information Systems, Queensland University of Technology, 4001, Brisbane, QLD, Australia
Peter Bruza
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia
Zainab Abu Bakar & Nursuriati Jamil &
Cyber Security Center, Universiti Pertahanan Nasional Malaysia, Kem Sungai Besi, 57000, Kuala Lumpur, Malaysia
Tengku Mohd Tengku Sembok

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Markov, I., Carman, M., Crestani, F. (2014). Towards Risk-Aware Resource Selection. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-12844-3_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics