Abstract
Search engine users often encounter the difficulty of phrasing the precise query that could lead to satisfactory search results. Query recommendation is considered an effective assistant in enhancing keyword-based queries in search engines and Web search software. In this paper, we present a Query-URL Bipartite based query reCommendation approach, called QUBiC. It utilizes the connectivity of a query-URL bipartite graph to recommend related queries and can significantly improve the accuracy and effectiveness of personalized query recommendation systems comparing with the conventional pairwise similarity based approach. The main contribution of the QUBiC approach is its three-phase framework for personalized query recommendations. The first phase is the preparation of queries and their search results returned by a search engine, which generates a historical query-URL bipartite collection. The second phase is the discovery of similar queries by extracting a query affinity graph from the bipartite graph, instead of operating on the original bipartite graph directly using biclique-based approach or graph clustering. The query affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (dissimilarity) measure. The third phase is the ranking of similar queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering (HAC). By utilizing the query affinity graph and the HAC-based ranking, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. Furthermore, the flexibility of the HAC strategy makes it possible for users to interactively participate in the query recommendation process, and helps to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users’ information needs, allowing the lists of related queries to be changed from user to user and query to query, thus adaptively recommending related queries on demand. Our experimental evaluation results show that the QUBiC approach is highly efficient and more effective compared to the conventional query recommendation systems, yielding about 13.3 % as the most improvement in terms of precision.
Similar content being viewed by others
Notes
Thirty search queries * 2 recommended queries = 60 evaluated queries
References
Aris, A., Luca, B., Carlos, C., Aristides, G. (2010). An optimization framework for query recommendation. In Proc. of the Third international conference on web search and web data mining (WSDM’10) (pp. 161–170). New York: ACM.
Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M. (2007). Improving search engines by query clustering. JASIST, 58, 1793–1804.
Bayardo, R.J., Ma, Y., Srikant, R. (2007). Scaling up all pairs similarity search. In Proc. of the 16th international conference on World Wide Web (WWW’07), Banff, Alberta, Canada (pp. 131–140). New York: ACM.
Beeferman, D., & Berger, A.L. (2000). Agglomerative clustering of a search engine query log. In Proc. of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’00), Boston, MA, USA (pp. 407–416). New York: ACM.
Boley, D. (1998). Principal direction divisive partitioning. Data Mining and Knowledge Discovery, 2, 325–344.
Borda, J. (1781). Mémoire sur les élections au scrutin. Comptes rendus de l’Académie des sciences, 44, 42–51.
Buckley, C., Salton, G., Allan, J., Singhal, A. (1994). Automatic query expansion using SMART. In Proc. of text retrieval conference (TREC’03) (pp. 69–80). Gaithersburg: National Institute of Standards and Technology (NIST).
Calado, P., Cristo, M., Gonçalves, M.A., de Moura, E.S., Ribeiro-Neto, B.A., Ziviani, N. (2006). Link-based similarity measures for the classification of web documents. JASIST, 57, 208–221.
Chirita, P.-A., Firan, C.S., Nejdl, W. (2007). Personalized query expansion for the web. In Proc. of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), Amsterdam, The Netherlands (pp. 7–14). New York: ACM.
Collins-Thompson, K., & Callan, J. (2005). Query expansion using random walk models. In Proc. of the 14th ACM CIKM international conference on information and knowledge management (CIKM’05), Bremen, Germany (pp. 704–711). New York: ACM.
Copeland, A. (1951). A reasonable social welfare function. In Seminar on Mathematics in Social Sciences. University of Michigan.
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (1990). Introduction to Algorithms. New York: McGraw-Hill.
Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y. (2003). Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 15, 829–839.
Dawande, M., Keskinocak, P., Swaminathan, J.M., Tayur, S. (2001). On bipartite and multipartite clique problems. Journal of Algorithms, 41, 388–403.
Fitzpatrick, L., & Dent, M. (1997). Automatic feedback using past queries: social searching? In Proc. of the 20th annual international acm sigir conference on research and development in information retrieval (SIGIR’97), Philadelphia, PA, USA (pp. 306–313). New York: ACM.
Glance, N.S. (2001). Community search assistant. In Proc. of the 9th international conference on intelligent user interfaces (IUI’01), Santa Fe, NM, USA (pp. 91–96). New York: ACM.
Hansen, M., & Shriver, E. (2001). Using navigation data to improve ir functions in the context of web search. In Proc. of the 10th ACM CIKM international conference on information and knowledge management (CIKM’01), Atlanta, Georgia, USA (pp. 135–142). New York: ACM.
Jansen, B.J., Spink, A., Bateman, J., and Saracevic, T. (1998). Real life information retrieval: a study of user queries on the web. SIGIR Forum, 32, 5–17.
Jeh, G., & Widom, J. (2002). Simrank: a measure of structural-context similarity. In Proc. of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), Edmonton, Alberta, Canada (pp. 538–543). New York: ACM.
Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46, 604–632.
Lance, G.N., & Williams, W.T. (1966). A generalized sorting strategy for computer classifications. Nature, 212, 218.
Lance, G.N., & Williams, W.T. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. Computer Journal, 9, 373–380.
Li, L., Otsuga, S., Kitsuregawa, M. (2010). Finding related search engine queries by web community based query enrichment. World Wide Web, 13, 121–142.
Li, L., Xu, G., Zhang, Y., Kitsuregawa, M. (2011). Random walk based rank aggregation to improving web search. Knowledge-Based Systems, 24, 943–951.
Lin, J. (1991). Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37, 145–151.
Manning, C.D., Raghavan, P., Schutze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.
Otsuka, S., & Kitsuregawa, M. (2006). Clustering of search engine keywords using access logs. In Proc. of the 17th international conference on database and expert systems applications (DEXA’06), Kraków, Poland (pp. 842–852). Berlin: Springer.
Page, L., Brin, S., Motwani, R., Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web. Technical report. Stanford Digital Library Technologies Project.
Pereira, F.C.N., Tishby, N., Lee, L. (1993). Distributional clustering of english words. In Proc. of the 30th annual meeting of the association for computational linguistics (ACL’93), Columbus, Ohio, USA (pp. 183–190). Menlo Park: Association for Computational Linguistics.
Qixia, J., & Maosong, S. (2011). Fast query recommendation by search. In Proc. of the twenty-fifth AAAI conference on artificial intelligence (AAAI’11), San Francisco, California, USA (pp. 1192–1197). Menlo Park: AAAI Press.
Raghavan, V.V., & Sever, H. (1995). On the reuse of past optimal queries. In Proc. of the 18th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’95), Seattle, Washington, USA (pp. 344–350). New York: ACM.
Rege, M., Dong, M., Fotouhi, F. (2006). Co-clustering documents and words using bipartite isoperimetric graph partitioning. In Proc. of the 6th IEEE international conference on data mining (ICDM’06), Hong Kong, China (pp. 532–541). Los Alamitos: IEEE Computer Society.
Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. JASIS, 41, 288–297.
Siegel, S., & Castellan, N.J. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York: McGraw-Hill.
Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In Proc. of the 5th IEEE international conference on data mining (ICDM’05), Houston, Texas, USA (pp. 418–425). Los Alamitos: IEEE Computer Society.
Sun, R., Ong, C.-H., Chua, T.-S. (2006). Mining dependency relations for query expansion in passage retrieval. In Proc. of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’06), Seattle, Washington, USA (pp. 382–389). New York: ACM.
Voorhees, E.M. (1994). Query expansion using lexical-semantic relations. In Proc. of the 17th annual international ACM-SIGIR conference on research and development in information retrieval (SIGIR’94), Dublin, Ireland (pp. 61–69). New York: ACM.
Vuong, N., & Tru, C. (2010). Ontology-Based query expansion with latently related named entities for semantic text search. Advances in Intelligent Information and Database Systems, 283, 41–52.
Wen, J.-R., Nie, J.-Y., Zhang, H. (2002). Query clustering using user logs. ACM Transactions on Information Systems, 20, 59–81.
Xiaohui, Y., Jiafeng, G., Xueqi, C. (2011). Context-aware query recommendation by learning high-order relation in query logs. In Proc. of the 20th ACM conference on information and knowledge management (CIKM’11), Glasgow, United Kingdom (pp. 2073–2076). New York: ACM.
Xu, J., & Croft, W.B. (1996). Query expansion using local and global document analysis. In Proc. of the 19th annual international acm sigir conference on research and development in information retrieval (SIGIR’96), Zurich, Switzerland (pp. 4–11). New York: ACM.
Xu, J., & Croft, W.B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18, 79–112.
Yunlong, M., Hongfei, L., Yuan, L. (2011). Selecting related terms in query-logs using two-stage SimRank. In Proc. of the 20th ACM conference on information and knowledge management (CIKM’11), Glasgow, United Kingdom (pp. 1969–1972). New York: ACM.
Zha, H., He, X., Ding, C.H.Q., Gu, M., Simon, H.D. (2001). Bipartite graph partitioning and data clustering. In Proc. of the 10th ACM CIKM international conference on information and knowledge management (CIKM’01), Atlanta, Georgia, USA (pp. 25–32). New York: ACM.
Zhu, Y., & Gruenwald, L. (2005). Query expansion using web access log files. In Proc. of the 16th international conference on database and expert systems applications (DEXA’05), Copenhagen, Denmark (pp. 686–695). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work was done when Lin Li was a Ph.D. student at Kitsuregawa Lab., University of Tokyo, Japan.
This research was undertaken as part of Project 61003130 funded by National Natural Science Foundation of China and and Project 2011CDB254 founded by Ministry of Education of China.
Rights and permissions
About this article
Cite this article
Li, L., Zhong, L., Yang, Z. et al. QUBiC: An adaptive approach to query-based recommendation. J Intell Inf Syst 40, 555–587 (2013). https://doi.org/10.1007/s10844-013-0237-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-013-0237-8