Skip to main content

Pseudo Topic Analysis for Boosting Pseudo Relevance Feedback

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11641))

Abstract

Traditional Pseudo Relevance Feedback (PRF) approaches fail to mode real-world intricate user activities. They naively assume that the first-pass top-ranked search results, i.e. the pseudo relevant set, have potentially relevant aspects for the user query. It is make the major challenge in PRF lies in how to get the reliability relevant feedback contents for the user real information need. Actually, there are two problems should not be ignored: (1) the assumed relevant documents are intertwined with the relevant and the non-relevant content, which influence the reliability of the expansion resource and can not concentrate in the real relevant portion; (2) even if the assumed relevant documents are real relevant to the user query, but they are always semantic redundance with various forms because the peculiarity of natural language expression. Furthermore, it will aggravate the ‘query drift’ problem. To alleviate these problems, in this paper, we propose a novel PRF approach by diversifying feedback source, which main aim is to converge the relatively single semantic as well as diversity relevant information from the pseudo relevant set. The key idea behind our PRF approach is to construct an abstract pseudo content obtained from topical networks modeling over the set of top-ranked documents to represent the feedback documents, so as to cover as diverse aspects of the feedback set as possible in a small semantic granularity. Experimental results conducted in real datasets indicate that the proposed strategies show great promise for searching more reliable feedback source by helping to achieve query and search result diversity without giving up precision.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://code.google.com/p/word2vec.

  2. 2.

    http://research.nii.ac.jp/ntcir/index-en.html.

  3. 3.

    http://mlr.cs.umass.edu/ml/machine-learning-databases/ohsumed/.

  4. 4.

    http://www.lemurproject.org/.

References

  1. Abid, A., et al.: A survey on search results diversification techniques. Neural Comput. Appl. 27(5), 1207–1229 (2016)

    Article  Google Scholar 

  2. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM 2009, Barcelona, Spain , pp. 5–14, February 2009

    Google Scholar 

  3. Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, NIPS 2005, pp. 147–154. MIT Press, MA (2005)

    Google Scholar 

  4. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. P1008, 155–168 (2008)

    Google Scholar 

  5. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 335–336. Melbourne, Australia, August 1998

    Google Scholar 

  6. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 159–170 (2012)

    Article  Google Scholar 

  7. Chen, M., Jin, X.M., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of 22nd International Joint Conference on Artificial Intelligence, pp. 1776–1781 (2011)

    Google Scholar 

  8. Fu, J.C., Wu, J.L., Liu, C.J., Xu, J.: Leaders in communities of real-world networks. Phys. A: Stat. Mech. Appl. 444, 428–441 (2016)

    Article  Google Scholar 

  9. Fu, J.C., Zhang, W.X., Wu, J.L.: Identification of leader and self-organizing communities in complex networks. Sci. Rep. 7(1), 1–10 (2017)

    Article  Google Scholar 

  10. Ganguly, D., Jones, J.F.G.: A non-parametric topical relevance model. Inf. Retr. J. 1–31 (2018)

    Google Scholar 

  11. Han, X., et al.: Emergence of communities and diversity in social networks. Proc. Nat. Acad. Sci. 114(11), 2887 (2017)

    Article  Google Scholar 

  12. Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments part 2. Inf. Process. Manag. 36(6), 809–840 (2000)

    Article  Google Scholar 

  13. Li, X.M., Ouyang, J.H., Lu, Y., Zhou, X.T., Tian, T.: Group topic model: organizing topics into groups. Inf. Retr. J. 18(1), 1–25 (2015)

    Article  Google Scholar 

  14. Liu, C.J.: Community ditection and analytical application in complex networks. Ph.D. thesis, Shandong University, Shandong, China (2014)

    Google Scholar 

  15. Lv, L.Y., Zhou, T.: Link prediction in complex networks: a survey. Phys. A 390(6), 1150–1170 (2011)

    Article  Google Scholar 

  16. Miao, J., Huang, X., Zhao, J.S.: TopPRF: A probabilistic framework for integrating topic space into pseudo relevance feedback. ACM Trans. Inf. Syst. 34(4), 1–36 (2016)

    Article  Google Scholar 

  17. Santos, R.L.T., Macdonald, C., Ounis, I.: Search result diversification. Found. Trends Inf. Retr. 9(1), 1–90 (2015)

    Article  Google Scholar 

  18. Serizawa, M., Kobayashi, I.: A study on query expansion based on topic distributions of retrieved documents. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7817, pp. 369–379. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37256-8_31

    Chapter  Google Scholar 

  19. Shen, H.W., Cheng, X.Q., Cai, K., Hu, M.B.: Detect overlapping and hierarchical community structure in networks. Phys. A 388(8), 1706–1712 (2009)

    Article  Google Scholar 

  20. Shen, X.H., Zhai, C.X.: Active feedback in ad hoc information retrieval. In: Proceedings of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, Salvador, Brazil, pp. 59–66, August 2005

    Google Scholar 

  21. Smith, A., Chuang, J., Hu, Y.N., Boyd-Graber, J., Findlater, L.: Concurrent visualization of relationships between words and topics in topic models. In: Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, ACL 2014, pp. 79–82. ACM Press, New York (2014)

    Google Scholar 

  22. Stephen, R.: Okapi at TREC 3. In: Overview of the Third Text Retrieval Conference (TREC 3), pp. 109–125 (1994)

    Google Scholar 

  23. Vargas, S., Santos, R.L.T., Macdonald, C., Ounis, I.: Selecting effective expansion terms for diversity. In: Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, OAIR 2013, Lisbon, Portugal, pp. 69–76, May 2013

    Google Scholar 

  24. Wang, X.W., Zhang, Q., Wang, X.J., Sun, Y.P.: LDA based pseudo relevance feedback for cross language information retrieval. In: IEEE International Conference on Cloud Computing and Intelligent Systems, CCIS 2012, vol. 3, pp. 1511–1516 (2012)

    Google Scholar 

  25. Wei, F.R., et al.: TIARA: a visual exploratory text analytic system. In: Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining, SIGKDD 2010, Washington, DC, USA, pp. 168–168, July 2010

    Google Scholar 

  26. Yan, R., Gao, G.L.: Pseudo-based relevance analysis for information retrieval. In: 2017 IEEE 29th International Conference on Tools with Artificial Intelligence, ICTAI 2017, Boston, MA, USA, pp. 1259–1266, November 2017

    Google Scholar 

  27. Ye, Z., Huang, J.X., Lin, H.F.: Finding a good query-related topic for boosting pseudo-relevance feedback. J. Assoc. Inf. Sci. Technol. 62(4), 748–760 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

This research is jointly supported by the National Natural Science Foundation of China (Grant No. 61866029, 61763034), Natural Science Foundation of Inner Mongolia Autonomous Region (Grant No. 2018MS06025) and Program of Higher-Level Talents of Inner Mongolia University (Grant No. 21500-5175128).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yan, R., Gao, G. (2019). Pseudo Topic Analysis for Boosting Pseudo Relevance Feedback. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11641. Springer, Cham. https://doi.org/10.1007/978-3-030-26072-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26072-9_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26071-2

  • Online ISBN: 978-3-030-26072-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics