An implicit aspect modelling framework for diversity focused query expansion

Abstract

Diversified Query Expansion aims to present the user with a diverse list of query expansions so as to better communicate their intent to the retrieval system. Current diversified expansion techniques either make use of external knowledge sources to explicitly model the various aspects and their relationships underlying the user query or implicitly model query aspects. However these techniques assume query aspects to be independent of each other. We propose a unified framework that produces diversified query expansions in a completely implicit manner while also considering the relationships between query aspects. In particular, the framework identifies query aspects and their relationships by making use of the semantic properties of context phrases that occur within the top-ranked retrieved documents for the supplied user query, and maps them onto a Mutating Markov Chain model to generate a diverse ordering of query aspects. We test our framework against a set of ambiguous and faceted queries used in the NTCIR-12 IMine-2 Task and through an extensive empirical analysis, we show that our framework consistently outperforms existing implicit diversified query expansion algorithms. The utility of our algorithm truly comes up in the second set of experiments where we generate diversified query expansions for a retrieval engine indexing documents from specific scientific domains. Even in such a niche scenario our algorithm consistently provides robust results and performs better than other implicit approaches.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    Since the nodes in the aspect graph represent query aspects, the states of the Markov Chain also represent them. Hence, the terms node and state both refer to the aspects of the query.

  2. 2.

    https://github.com/attardi/wikiextractor

  3. 3.

    Even though query disambiguation pages are curated, they need not be up to date. This measure is taken to address such issues

  4. 4.

    https://link.springer.com/

References

  1. Amati, G., & Van Rijsbergen, C.J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems, 20(4), 357–389. https://doi.org/10.1145/582415.582416.

    Article  Google Scholar 

  2. Balagopalan, A., Balasubramanian, L.L., Balasubramanian, V., Chandrasekharan, N., Damodar, A. (2012). Automatic keyphrase extraction and segmentation of video lectures. In 2012 IEEE International conference on technology enhanced education (ICTEE) (pp. 1–10).

  3. Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., Vigna, S. (2008). The query-flow graph: model and applications. In CIKM.

  4. Bouchoucha, A., He, J., Nie, J.Y. (2013). Diversified query expansion using conceptnet. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13 (pp. 1861–1864). New York: ACM. https://doi.org/10.1145/2505515.2507881.

  5. Bouchoucha, A., Liu, X., Nie, J.Y. (2014). Integrating multiple resources for diversified query expansion. In de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C., de Jong, F., Radinsky, K., Hofmann, K. (Eds.) Advances in Information Retrieval (pp. 437–442). Cham: Springer International Publishing.

  6. Buckley, C. (2009). Relevance feedback track overview : Trec 2008.

  7. Carbonell, J., & Goldstein, J. (1998). The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98 (pp. 335–336). New York: ACM. https://doi.org/10.1145/290941.291025.

  8. Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P. (2009). Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09 (pp. 621–630). New York: ACM. https://doi.org/10.1145/1645953.1646033.

  9. Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I. (2008). Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08 (pp. 659–666). New York: ACM. https://doi.org/10.1145/1390334.1390446.

  10. Dang, V., & Croft, B.W. (2013). Term level search result diversification. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13 (pp. 603–612). New York: ACM. https://doi.org/10.1145/2484028.2484095.

  11. Daumé, H.III, & Brill, E. (2004). Web search intent induction via automatic query reformulation. In Proceedings of HLT-NAACL 2004: Short Papers, HLT-NAACL-Short ’04 (pp. 49–52). Stroudsburg: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1613984.1613997.

  12. Gong, Z., Cheang, C.W., Leong Hou, U. (2005). Web query expansion by wordnet. In Andersen, K.V., Debenham, J., Wagner, R. (Eds.) Database and Expert Systems Applications (pp. 166–175). Berlin: Springer.

  13. Jansen, B.J., Spink, A., Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2), 207–227. https://doi.org/10.1016/S0306-4573(99)00056-4.

    Article  Google Scholar 

  14. Krishnan, A., Deepak, P., Ranu, S., Mehta, S. (2016). Select, link and rank: Diversified query expansion and entity ranking using wikipedia. In WISE.

  15. Krishnan, A., Deepak, P., Ranu, S., Mehta, S. (2017). Leveraging semantic resources in diversified query expansion. World Wide Web, 21, 1041–1067.

    Article  Google Scholar 

  16. Kurland, O., & Lee, L. (2006). Pagerank without hyperlinks: Structural re-ranking using links induced by language models. arXiv:abs/cs/0601045.

  17. Lawrie, D., Croft, W.B., Rosenberg, A. (2001). Finding topic words for hierarchical summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01 (pp. 349–357). New York: ACM. https://doi.org/10.1145/383952.384022.

  18. Lawrie, D.J., & Croft, W.B. (2003). Generating hierarchical summaries for web searches. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR ’03 (pp. 457–458). New York: ACM. https://doi.org/10.1145/860435.860549.

  19. Liu, X., Bouchoucha, A., Sordoni, A., Nie, J.Y. (2014). Compact aspect embedding for diversified query expansions. In AAAI.

  20. Mei, Q., Guo, J., Radev, D. (2010). Divrank: the interplay of prestige and diversity in information networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10 (pp. 1009–1018). New York: ACM. https://doi.org/10.1145/1835804.1835931.

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:abs/1301.3781.

  22. Newman, M.E.J. (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(036), 104.

    MathSciNet  Google Scholar 

  23. Radlinski, F., & Dumais, S. (2006). Improving personalized web search using result diversification. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06 (pp. 691–692). New York: ACM, DOI https://doi.org/10.1145/1148170.1148320, (to appear in print).

  24. Radlinski, F., Szummer, M., Craswell, N. (2010). Inferring query intent from reformulations and clicks. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10 (pp. 1171–1172). New York: ACM. https://doi.org/10.1145/1772690.1772859.

  25. Sakai, T., Craswell, N., Song, R., Robertson, S.E., Dou, Z., Lin, C.Y. (2010). Simple evaluation metrics for diversified search results. In EVIA@NTCIR.

  26. Santos, R.L., Macdonald, C., Ounis, I. (2010). Exploiting query reformulations for web search result diversification. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10 (pp. 881–890). New York: ACM. https://doi.org/10.1145/1772690.1772780.

  27. Santos, R.L.T., Macdonald, C., Ounis, I. (2013). Learning to rank query suggestions for adhoc and diversity search. Information Retrieval, 16(4), 429–451. https://doi.org/10.1007/s10791-012-9211-2.

    Article  Google Scholar 

  28. Santos, R.L.T., Macdonald, C., Ounis, I. (2015). Search result diversification. Foundations and Trends in Information Retrieval, 9(1), 1–90. https://doi.org/10.1561/1500000040.

    Article  Google Scholar 

  29. Shen, X., Tan, B., Zhai, C. (2005). Implicit user modeling for personalized search. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05 (pp. 824–831). New York: AC. https://doi.org/10.1145/1099554.1099747M.

  30. Sieg, A., Mobasher, B., Burke, R. (2007). Web search personalization with ontological user profiles. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07 (pp. 525–534). New York: ACM. https://doi.org/10.1145/1321440.1321515.

  31. Song, W., Liu, Y., Liu, L.Z., Wang, H.S. (2018). Semantic composition of distributed representations for query subtopic mining. Frontiers of Information Technology & Electronic Engineering, 19, 1409–1419. https://doi.org/10.1631/FITEE.1601476.

    Article  Google Scholar 

  32. Teevan, J., Dumais, S.T., Liebling, D.J. (2008). To personalize or not to personalize: Modeling queries with variation in user intent. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08 (pp. 163–170). New York: ACM.

  33. Vargas, S., Santos, R.L.T., Macdonald, C., Ounis, I. (2013). Selecting effective expansion terms for diversity. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, OAIR ’13 (pp. 69–76). LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, Paris, France, France. http://dl.acm.org/citation.cfm?id=2491748.2491767.

  34. Vechtomova, O., Robertson, S., Jones, S. (2003). Query expansion with long-span collocates. Information Retrieval, 6(2), 251–273.

    Article  Google Scholar 

  35. Wang, Q., Qian, Y., Song, R., Dou, Z., Zhang, F., Sakai, T., Zheng, Q. (2013). Mining subtopics from text fragments for a web query. Information Retrieval 16. https://doi.org/10.1007/s10791-013-9221-8.

  36. Xavier, S.F., Selvaraj, L.P., Balasubramanian, V. (2015). Enhancing statistical semantic networks with concept hierarchies. In 2015 International conference on advances in computing, communications and informatics (ICACCI) (pp. 1298–1307).

  37. Xu, J., & Croft, W.B. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’96 (pp. 4–11). New York: ACM.

  38. Yamamoto, T., Liu, Y., Zhang, M., Dou, Z., Zhou, K., Markov, I., Kato, M.P., Ohshima, H., Fujita, S. (2016). Overview of the NTCIR-12 imine-2 task. In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016.

  39. Zhu, X., Goldberg, A., Gael, J.V., Andrzejewski, D. (2007). Improving diversity in ranking using absorbing random walks. HLT-NAACL pp. 97–104.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Vidhya Balasubramanian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dev, R.E., Balasubramanian, V. An implicit aspect modelling framework for diversity focused query expansion. J Intell Inf Syst 55, 207–231 (2020). https://doi.org/10.1007/s10844-019-00581-w

Download citation

Keywords

  • Query expansion
  • Diversification
  • Diversified query expansion
  • Implicit diversification