Skip to main content

Improve Web Search Diversification with Intent Subtopic Mining

  • Conference paper
Book cover Natural Language Processing and Chinese Computing (NLPCC 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 400))

Abstract

A number of search user behavior studies show that queries with unclear intents are commonly submitted to search engines. Result diversification is usually adopted to deal with those queries, in which search engine tries to trade-off some relevancy for some diversity to improve user experience. In this work, we aim to improve the performance of search results diversification by generating an intent subtopics list with fusion of multiple resources. We based our approach by thinking that to collect a large panel of intent subtopics, we should consider as well a wide range of resources from which to extract. The resources adopted cover a large panel of sources, such as external resources (Wikipedia, Google Keywords Generator, Google Insights, Search Engines query suggestion and completion), anchor texts, page snippets and more. We selected resources to cover both information seeker (What a user is searching for) and information provider (The websites) aspects. We also proposed an efficient Bayesian optimization approach to maximize resources selection performances, and a new technique to cluster subtopics based on the top results snippet information and Jaccard Similarity coefficient. Experiments based on TREC 2012 web track and NTCIR-10 intent task show that our framework can greatly improve diversity while keeping a good precision. The system developed with the proposed techniques also achieved the best English subtopic mining performance in NTCIR-10 intent task.

This work was supported by Natural Science Foundation (60903107, 61073071) and National High Technology Research and Development (863) Program (2011AA01A205) of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhai, C.X., Cohen, W.W., Lafferty, J.D.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: SIGIR, pp. 10–17 (2003)

    Google Scholar 

  2. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR 1998: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 335–336 (1998)

    Google Scholar 

  3. Ziegler, C.-N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: WWW 2005: Proceedings of the 14th International Conference on World Wide Web, pp. 22–32. ACM, New York (2005)

    Google Scholar 

  4. Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: EDBT 2009: Proceedings of the 12th International Conference on Extending Database Technology, pp. 368–378. ACM, New York (2009)

    Google Scholar 

  5. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM 2009: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 5–14. ACM, New York (2009)

    Chapter  Google Scholar 

  6. Hu, J., Wang, G., Lochovsky, F., Tao Sun, J., Chen, Z.: Understanding user’s query intent with Wikipedia. In: Proceedings of WWW 2009, pp. 471–480 (2009)

    Google Scholar 

  7. Guo, J., Cheng, X., Xu, G., Zhu, X.: Intent-aware query similarity. In: CIKM 2011, pp. 259–268 (2011)

    Google Scholar 

  8. Han, J., Wang, Q., Orii, N., Dou, Z., Sakai, T., Song, R.: Microsoft Research Asia at the NTCIR-9 Intent Task. In: NTCIR-9 Proceedings, pp. 116–122 (December 2011)

    Google Scholar 

  9. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E., Milios, E.: Semantic similarity methods in wordNet and their application to information retrieval on the web. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, pp. 10–16 (2005)

    Google Scholar 

  10. Sakai, T.: NTCIREVAL: A generic toolkit for information access evaluation. In: Proceedings of FIT 2011, vol. 2, pp. 23–30 (2011)

    Google Scholar 

  11. Clarke, C.L.A., Craswell, N., Soboroff, I., Ashkan, A.: A comparative analysis of cascade measures for novelty and diversity. In: Proceedings of ACM WSDM 2011, vol. (2011)

    Google Scholar 

  12. Sakai, T., Song, R.: Evaluating Diversified Search ResultsUsing Per-Intent Graded Relevance. In: Proceedings of ACM SIGIR 2011, pp. 1043–1052 (2011)

    Google Scholar 

  13. Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gatford, M., Payne, A.: Okapi at TREC-4. In: NIST Special Publication 500-236: The Fourth Text Retrieval Conference (TREC-4), pp. 73–96 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Damien, A., Zhang, M., Liu, Y., Ma, S. (2013). Improve Web Search Diversification with Intent Subtopic Mining. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41644-6_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41643-9

  • Online ISBN: 978-3-642-41644-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics