Advertisement

Information Retrieval Journal

, Volume 19, Issue 6, pp 573–593 | Cite as

Enhancing web search with queries of equivalent intents

  • Ruihua Song
  • Dingquan Wang
  • Jian-Yun Nie
  • Ji-Rong Wen
  • Yong Yu
Article
  • 537 Downloads

Abstract

Users often issue all kinds of queries to look for the same target due to the intrinsic ambiguity and flexibility of natural languages. Some previous work clusters queries based on co-clicks; however, the intents of queries in one cluster are not that similar but roughly related. It is desirable to conduct automatic mining of queries with equivalent intents from a large scale search logs. In this paper, we take account of similarities between query strings. There are two issues associated with such similarities: it is too costly to compare any pair of queries in large scale search logs, and two queries with a similar formulation, such as “SVN” (Apache Subversion) and support vector machine (SVM), are not necessarily similar in their intents. To address these issues, we propose using the similarities of query strings above the co-click based clustering results. Our method improves precision over the co-click based clustering method (lifting precision from 0.37 to 0.62), and outperforms a commercial search engine’s query alteration (lifting \(F_1\) measure from 0.42 to 0.56). As an application, we consider web document retrieval. We aggregate similar queries’ click-throughs with the query’s click-throughs and evaluate them on a large scale dataset. Experimental results indicate that our proposed method significantly outperforms the baseline method of using a query’s own click-throughs in all metrics.

Keywords

Mining similar queries Query intent Web search 

References

  1. Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19–26). ACM.Google Scholar
  2. Agrawal, R., Gollapudi, S., Halverson, A., & Ieong, S. (2009). Diversifying search results. In Proceedings of the second ACM international conference on web search and data mining (pp. 5–14). New York, NY, USA: ACM.Google Scholar
  3. Ahmad, F., & Kondrak, G. (2005). Learning a spelling error model from search query logs. In Proceedings of the 5th conference on HLT and EMNLP (pp. 955–962). ACL.Google Scholar
  4. Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 407–416). ACM.Google Scholar
  5. Brill, E., & Moore, R. (2000). An improved error model for noisy channel spelling correction. In Proceedings of the 38th annual meeting of the association for computational linguistics (pp. 286–293). ACL.Google Scholar
  6. Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., & Li, H. (2008). Context-aware query suggestion by mining click-through and session data. In Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 875–883). ACM.Google Scholar
  7. Chapelle, O., & Zhang, Y. (2009) A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th international conference on world wide web (pp. 1–10). ACM.Google Scholar
  8. Cucerzan, S., & Brill, E. (2004). Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP (vol. 4, pp. 293–300).Google Scholar
  9. Dang, V., & Croft, B. (2010). Query reformulation using anchor text. In Proceedings of the 3rd ACM international conference on web search and data mining (pp. 41–50). ACM.Google Scholar
  10. de Borda, J. C. (1781). Mémoire sur les élections au scrution. Histoire de l’Académie Royal des (pp. 657–665). Paris.Google Scholar
  11. Fagni, T., Perego, R., Silvestri, F., Orlando, S., Ca, U., & Venezia, F. (2006). Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Transactions on Information Systems, 24(1), 51–78.CrossRefGoogle Scholar
  12. Gao, J., Li, X., Micol, D., Quirk, C., & Sun, X. (2010). A large scale ranker-based system for search query spelling correction. In Proceedings of the 23rd international conference on computational linguistics (pp. 358–366). ACL.Google Scholar
  13. Gao, J., Yuan, W., Li, X., Deng, K., & Nie, J. (2009). Smoothing clickthrough data for web search ranking. In Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 355–362). ACM.Google Scholar
  14. Granka, L., Joachims, T., & Gay, G. (2004). Eye-tracking analysis of user behavior in www search. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 478–479). ACM.Google Scholar
  15. Guo, F., Liu, C., & Wang, Y. (2009). Efficient multiple-click models in web search. In Proceedings of the second ACM international conference on web search and data mining (pp. 124–131). ACM.Google Scholar
  16. Guo, J., Xu, G., Li, H., & Cheng, X. (2008). A unified and discriminative model for query refinement. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 379–386). ACM.Google Scholar
  17. Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., & Zheng, Q. (2012). Mining query subtopics from search log data. In Proceedings of the 35th annual international ACM SIGIR conference on research and development in information retrieval (pp. 305–314). ACM.Google Scholar
  18. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRefGoogle Scholar
  19. Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). ACM.Google Scholar
  20. Kernighan, M., Church, K., & Gale, W. (1990). A spelling correction program based on a noisy channel model. In Proceedings of the 13th international conference on computational linguistics (vol. 2, pp. 205–210). Association for Computational Linguistics.Google Scholar
  21. Li, M., Zhang, Y., Zhu, M., & Zhou, M. (2006). Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics (pp. 1025–1032). ACL.Google Scholar
  22. Och, F. (2002). Statistical machine translation: from single-word models to alignment templates. Aachen: RW TH Aachen.Google Scholar
  23. Peng, F., Ahmed, N., Li, X., & Lu, Y. (2007). Context sensitive stemming for web search. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 639–646). ACM.Google Scholar
  24. Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple bm25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on information and knowledge management (pp. 42–49). New York, NY, USA: ACM.Google Scholar
  25. Robertson, S.E. and Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (pp. 232–241). New York, NY, USA: Springer.Google Scholar
  26. Sadikov, E., Madhavan, J., Wang, L., & Halevy, A. (2010). Clustering query refinements by user intent. In Proceedings of the 19th international conference on world wide web (pp. 841–850). ACM.Google Scholar
  27. Sakai, T., Dou, Z., Yamamoto, T., Liu, Y., Zhang, M., & Song, R. (2003). Overview of the ntcir-10 intent-2 task. In Proceedings of NTCIR-10.Google Scholar
  28. Song, Y., & He, L. (2010). Optimal rare query suggestion with implicit user feedback. In Proceedings of the 19th international conference on world wide web (pp. 901–910). ACM.Google Scholar
  29. Tyler, S., & Teevan, J. (2010). Large scale query log analysis of re-finding. In Proceedings of the 3rd ACM international conference on web search and data miming (pp. 191–200). ACM.Google Scholar
  30. Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal of the Association for Computing Machinery, 21(1), 168–173.MathSciNetCrossRefMATHGoogle Scholar
  31. Wang, X., & Zhai, C. (2008). Mining term association patterns from search logs for effective query reformulation. In Proceeding of the 17th ACM international conference on information and knowledge management (pp. 479–488). ACM.Google Scholar
  32. Wen, J., Nie, J., & Zhang, H. (2002). Query clustering using user logs. ACM Transactions on Information Systems, 20(1), 59–81.CrossRefGoogle Scholar
  33. Wu, W., Xu, J., Li, H., & Oyama, S. (2011). Learning a robust relevance model for search using kernel methods. Journal of Machine Learning Research, 12, 1429–1458.MathSciNetMATHGoogle Scholar
  34. Xue, G., Zeng, H., Chen, Z., Yu, Y., Ma, W., Xi, W., & Fan, W. (2004). Optimizing web search using web click-through data. In Proceedings of the thirteenth ACM international conference on information and knowledge management (pp. 118–126). ACM.Google Scholar
  35. Zhang, Y., Chen, W., Wang, D., & Yang, Q. (2011). User-click modeling for understanding and predicting search-behavior. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1388–1396). ACM.Google Scholar
  36. Zhu, Z., Chen, W., Minka, T., Zhu, C., & Chen, Z. (2010). A novel click model and its applications to online advertising. In Proceedings of the third ACM international conference on web search and data mining (pp. 321–330). ACM.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Microsoft Research AsiaBeijingChina
  2. 2.John Hopkins UniversityBaltimoreUSA
  3. 3.University of MontrealMontrealCanada
  4. 4.Renmin UniversityBeijingChina
  5. 5.Shanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations