LDA-Based Resource Selection for Results Diversification in Federated Search

  • Liang Li
  • Zhongmin Zhang
  • Shengli WuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11242)


Resource selection is an important step in federated search environment, especially for search result diversification. Most of prior work on resource selection in federated search only considered relevance of the resource to the information need, and very few considered both relevance and diversification of the information inside them. In this paper, we propose a method that uses the Latent Dirichlet Allocation (LDA) model to discover underlying topics in each resource by sampling a number of documents from it. Thus the vector representation of each resource can be used to calculate the similarity between different resources and to decide the diversity of them. Using a group of diversity-related metrics, we find that the LDA-based resource selection method is more effective than other state-of-the-art methods in the same category.


Resource selection Latent Dirichlet Allocation model Results diversification Federated web search Information retrieval 


  1. 1.
    Nguyen, D., Demeester, T., Trieschnigg, D., Hiemstra, D.: Federated search in the wild: the combined power of over a hundred search engines. In: CIKM 2012, pp. 1874–1878 (2012)Google Scholar
  2. 2.
    Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the internet. In: DASFAA, pp. 41–50 (1997)Google Scholar
  3. 3.
    Si, L., Callan, J.P.: Relevant document distribution estimation method for resource selection. In: SIGIR 2003, pp. 298–305 (2003)Google Scholar
  4. 4.
    Hong, D., Si, L.: Search result diversification in resource selection for federated search. In: SIGIR 2013, pp. 613–622 (2013)Google Scholar
  5. 5.
    Carbonell, J.G., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR 1998, pp. 335–336 (1998)Google Scholar
  6. 6.
    Naini, K.D., Altingovde, I.S., Siberski, W.: Scalable and efficient web search result diversification. TWEB 10(3), 15:1–15:30 (2016)CrossRefGoogle Scholar
  7. 7.
    Thomas, P., Shokouhi, M.: SUSHI: scoring scaled samples for server selection. In: SIGIR 2009, pp. 419–426 (2009)Google Scholar
  8. 8.
    Dang, V., Croft, W.B.: Diversity by proportionality: an election-based approach to search result diversification. In: SIGIR 2012, pp. 65–74 (2012)Google Scholar
  9. 9.
    Ghansah, B., Shengli, W.: A mean-variance analysis based approach for search result diversification in federated search. Int. J. Uncert. Fuzz. Knowl. Based Syst. 24(2), 195–212 (2016)CrossRefGoogle Scholar
  10. 10.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  11. 11.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)CrossRefGoogle Scholar
  12. 12.
    Metzler, D., Croft, W.B.: Combining the language model and inference network approaches to retrieval. Inf. Process. Manag. 40(5), 735–750 (2004)CrossRefGoogle Scholar
  13. 13.
    Demeester, T., Trieschnigg, D., Nguyen, D., Hiemstra, D.: Overview of the TREC 2013 federated web search track. In: TREC 2013 (2013)Google Scholar
  14. 14.
    Collins-Thompson, K., Macdonald, C., Bennett, P.N., Diaz, F., Voorhees, E.M.: TREC 2014 web track overview. In: TREC 2014 (2014)Google Scholar
  15. 15.
    Krestel, R., Fankhauser, P.: Reranking web search results for diversity. Inf. Retr. 15(5), 458–477 (2012)CrossRefGoogle Scholar
  16. 16.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004). Scholar
  17. 17.
    Minka, T.P., Lafferty, J.D.: Expectation-propagation for the generative aspect model. In: UAI 2002, pp. 352–359 (2002)Google Scholar
  18. 18.
    Cormack, G.V., Clarke, C.L.A., Büttcher, S.: Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In: SIGIR 2009, pp. 758–759 (2009)Google Scholar
  19. 19.
    Khudyak Kozorovitzky, A., Kurland, O.: Cluster-based fusion of retrieved lists. In: SIGIR 2011, pp. 893–902 (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer ScienceJiangsu UniversityZhenjiangChina

Personalised recommendations