Skip to main content
Log in

QUBiC: An adaptive approach to query-based recommendation

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Search engine users often encounter the difficulty of phrasing the precise query that could lead to satisfactory search results. Query recommendation is considered an effective assistant in enhancing keyword-based queries in search engines and Web search software. In this paper, we present a Query-URL Bipartite based query reCommendation approach, called QUBiC. It utilizes the connectivity of a query-URL bipartite graph to recommend related queries and can significantly improve the accuracy and effectiveness of personalized query recommendation systems comparing with the conventional pairwise similarity based approach. The main contribution of the QUBiC approach is its three-phase framework for personalized query recommendations. The first phase is the preparation of queries and their search results returned by a search engine, which generates a historical query-URL bipartite collection. The second phase is the discovery of similar queries by extracting a query affinity graph from the bipartite graph, instead of operating on the original bipartite graph directly using biclique-based approach or graph clustering. The query affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (dissimilarity) measure. The third phase is the ranking of similar queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering (HAC). By utilizing the query affinity graph and the HAC-based ranking, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. Furthermore, the flexibility of the HAC strategy makes it possible for users to interactively participate in the query recommendation process, and helps to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users’ information needs, allowing the lists of related queries to be changed from user to user and query to query, thus adaptively recommending related queries on demand. Our experimental evaluation results show that the QUBiC approach is highly efficient and more effective compared to the conventional query recommendation systems, yielding about 13.3 % as the most improvement in terms of precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://en.wikipedia.org/wiki/Tf*idf

  2. http://www.acm.org/sigs/sigkdd/kddcup/index.php

  3. http://trec.nist.gov/data/terabyte06.html

  4. http://code.google.com/apis/soapsearch/

  5. Thirty search queries * 2 recommended queries = 60 evaluated queries

  6. http://www.r-project.org/

References

  • Aris, A., Luca, B., Carlos, C., Aristides, G. (2010). An optimization framework for query recommendation. In Proc. of the Third international conference on web search and web data mining (WSDM’10) (pp. 161–170). New York: ACM.

    Google Scholar 

  • Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M. (2007). Improving search engines by query clustering. JASIST, 58, 1793–1804.

    Article  Google Scholar 

  • Bayardo, R.J., Ma, Y., Srikant, R. (2007). Scaling up all pairs similarity search. In Proc. of the 16th international conference on World Wide Web (WWW’07), Banff, Alberta, Canada (pp. 131–140). New York: ACM.

    Chapter  Google Scholar 

  • Beeferman, D., & Berger, A.L. (2000). Agglomerative clustering of a search engine query log. In Proc. of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’00), Boston, MA, USA (pp. 407–416). New York: ACM.

    Google Scholar 

  • Boley, D. (1998). Principal direction divisive partitioning. Data Mining and Knowledge Discovery, 2, 325–344.

    Article  Google Scholar 

  • Borda, J. (1781). Mémoire sur les élections au scrutin. Comptes rendus de l’Académie des sciences, 44, 42–51.

    Google Scholar 

  • Buckley, C., Salton, G., Allan, J., Singhal, A. (1994). Automatic query expansion using SMART. In Proc. of text retrieval conference (TREC’03) (pp. 69–80). Gaithersburg: National Institute of Standards and Technology (NIST).

    Google Scholar 

  • Calado, P., Cristo, M., Gonçalves, M.A., de Moura, E.S., Ribeiro-Neto, B.A., Ziviani, N. (2006). Link-based similarity measures for the classification of web documents. JASIST, 57, 208–221.

    Article  Google Scholar 

  • Chirita, P.-A., Firan, C.S., Nejdl, W. (2007). Personalized query expansion for the web. In Proc. of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), Amsterdam, The Netherlands (pp. 7–14). New York: ACM.

    Chapter  Google Scholar 

  • Collins-Thompson, K., & Callan, J. (2005). Query expansion using random walk models. In Proc. of the 14th ACM CIKM international conference on information and knowledge management (CIKM’05), Bremen, Germany (pp. 704–711). New York: ACM.

    Chapter  Google Scholar 

  • Copeland, A. (1951). A reasonable social welfare function. In Seminar on Mathematics in Social Sciences. University of Michigan.

  • Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (1990). Introduction to Algorithms. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y. (2003). Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 15, 829–839.

    Article  Google Scholar 

  • Dawande, M., Keskinocak, P., Swaminathan, J.M., Tayur, S. (2001). On bipartite and multipartite clique problems. Journal of Algorithms, 41, 388–403.

    Article  MathSciNet  MATH  Google Scholar 

  • Fitzpatrick, L., & Dent, M. (1997). Automatic feedback using past queries: social searching? In Proc. of the 20th annual international acm sigir conference on research and development in information retrieval (SIGIR’97), Philadelphia, PA, USA (pp. 306–313). New York: ACM.

    Chapter  Google Scholar 

  • Glance, N.S. (2001). Community search assistant. In Proc. of the 9th international conference on intelligent user interfaces (IUI’01), Santa Fe, NM, USA (pp. 91–96). New York: ACM.

    Chapter  Google Scholar 

  • Hansen, M., & Shriver, E. (2001). Using navigation data to improve ir functions in the context of web search. In Proc. of the 10th ACM CIKM international conference on information and knowledge management (CIKM’01), Atlanta, Georgia, USA (pp. 135–142). New York: ACM.

    Google Scholar 

  • Jansen, B.J., Spink, A., Bateman, J., and Saracevic, T. (1998). Real life information retrieval: a study of user queries on the web. SIGIR Forum, 32, 5–17.

    Article  Google Scholar 

  • Jeh, G., & Widom, J. (2002). Simrank: a measure of structural-context similarity. In Proc. of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), Edmonton, Alberta, Canada (pp. 538–543). New York: ACM.

    Google Scholar 

  • Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46, 604–632.

    Article  MathSciNet  MATH  Google Scholar 

  • Lance, G.N., & Williams, W.T. (1966). A generalized sorting strategy for computer classifications. Nature, 212, 218.

    Article  Google Scholar 

  • Lance, G.N., & Williams, W.T. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. Computer Journal, 9, 373–380.

    Article  Google Scholar 

  • Li, L., Otsuga, S., Kitsuregawa, M. (2010). Finding related search engine queries by web community based query enrichment. World Wide Web, 13, 121–142.

    Article  Google Scholar 

  • Li, L., Xu, G., Zhang, Y., Kitsuregawa, M. (2011). Random walk based rank aggregation to improving web search. Knowledge-Based Systems, 24, 943–951.

    Article  Google Scholar 

  • Lin, J. (1991). Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37, 145–151.

    Article  MATH  Google Scholar 

  • Manning, C.D., Raghavan, P., Schutze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Otsuka, S., & Kitsuregawa, M. (2006). Clustering of search engine keywords using access logs. In Proc. of the 17th international conference on database and expert systems applications (DEXA’06), Kraków, Poland (pp. 842–852). Berlin: Springer.

    Chapter  Google Scholar 

  • Page, L., Brin, S., Motwani, R., Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web. Technical report. Stanford Digital Library Technologies Project.

  • Pereira, F.C.N., Tishby, N., Lee, L. (1993). Distributional clustering of english words. In Proc. of the 30th annual meeting of the association for computational linguistics (ACL’93), Columbus, Ohio, USA (pp. 183–190). Menlo Park: Association for Computational Linguistics.

    Chapter  Google Scholar 

  • Qixia, J., & Maosong, S. (2011). Fast query recommendation by search. In Proc. of the twenty-fifth AAAI conference on artificial intelligence (AAAI’11), San Francisco, California, USA (pp. 1192–1197). Menlo Park: AAAI Press.

    Google Scholar 

  • Raghavan, V.V., & Sever, H. (1995). On the reuse of past optimal queries. In Proc. of the 18th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’95), Seattle, Washington, USA (pp. 344–350). New York: ACM.

    Chapter  Google Scholar 

  • Rege, M., Dong, M., Fotouhi, F. (2006). Co-clustering documents and words using bipartite isoperimetric graph partitioning. In Proc. of the 6th IEEE international conference on data mining (ICDM’06), Hong Kong, China (pp. 532–541). Los Alamitos: IEEE Computer Society.

    Google Scholar 

  • Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. JASIS, 41, 288–297.

    Article  Google Scholar 

  • Siegel, S., & Castellan, N.J. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York: McGraw-Hill.

    Google Scholar 

  • Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In Proc. of the 5th IEEE international conference on data mining (ICDM’05), Houston, Texas, USA (pp. 418–425). Los Alamitos: IEEE Computer Society.

    Google Scholar 

  • Sun, R., Ong, C.-H., Chua, T.-S. (2006). Mining dependency relations for query expansion in passage retrieval. In Proc. of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’06), Seattle, Washington, USA (pp. 382–389). New York: ACM.

    Chapter  Google Scholar 

  • Voorhees, E.M. (1994). Query expansion using lexical-semantic relations. In Proc. of the 17th annual international ACM-SIGIR conference on research and development in information retrieval (SIGIR’94), Dublin, Ireland (pp. 61–69). New York: ACM.

    Google Scholar 

  • Vuong, N., & Tru, C. (2010). Ontology-Based query expansion with latently related named entities for semantic text search. Advances in Intelligent Information and Database Systems, 283, 41–52.

    Article  Google Scholar 

  • Wen, J.-R., Nie, J.-Y., Zhang, H. (2002). Query clustering using user logs. ACM Transactions on Information Systems, 20, 59–81.

    Article  Google Scholar 

  • Xiaohui, Y., Jiafeng, G., Xueqi, C. (2011). Context-aware query recommendation by learning high-order relation in query logs. In Proc. of the 20th ACM conference on information and knowledge management (CIKM’11), Glasgow, United Kingdom (pp. 2073–2076). New York: ACM.

    Google Scholar 

  • Xu, J., & Croft, W.B. (1996). Query expansion using local and global document analysis. In Proc. of the 19th annual international acm sigir conference on research and development in information retrieval (SIGIR’96), Zurich, Switzerland (pp. 4–11). New York: ACM.

    Chapter  Google Scholar 

  • Xu, J., & Croft, W.B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18, 79–112.

    Article  Google Scholar 

  • Yunlong, M., Hongfei, L., Yuan, L. (2011). Selecting related terms in query-logs using two-stage SimRank. In Proc. of the 20th ACM conference on information and knowledge management (CIKM’11), Glasgow, United Kingdom (pp. 1969–1972). New York: ACM.

    Google Scholar 

  • Zha, H., He, X., Ding, C.H.Q., Gu, M., Simon, H.D. (2001). Bipartite graph partitioning and data clustering. In Proc. of the 10th ACM CIKM international conference on information and knowledge management (CIKM’01), Atlanta, Georgia, USA (pp. 25–32). New York: ACM.

    Google Scholar 

  • Zhu, Y., & Gruenwald, L. (2005). Query expansion using web access log files. In Proc. of the 16th international conference on database and expert systems applications (DEXA’05), Copenhagen, Denmark (pp. 686–695). Berlin: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Li.

Additional information

The work was done when Lin Li was a Ph.D. student at Kitsuregawa Lab., University of Tokyo, Japan.

This research was undertaken as part of Project 61003130 funded by National Natural Science Foundation of China and and Project 2011CDB254 founded by Ministry of Education of China.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Zhong, L., Yang, Z. et al. QUBiC: An adaptive approach to query-based recommendation. J Intell Inf Syst 40, 555–587 (2013). https://doi.org/10.1007/s10844-013-0237-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0237-8

Keywords

Navigation