Learning a Semantic Space of Web Search via Session Data

Bing, Lidong; Niu, Zheng-Yu; Lam, Wai; Wang, Haifeng

doi:10.1007/978-3-319-48051-0_7

Lidong Bing²⁰,
Zheng-Yu Niu²¹,
Wai Lam²² &
…
Haifeng Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9994))

Included in the following conference series:

Asia Information Retrieval Symposium

909 Accesses

Abstract

In Web search, a user first comes up with an information need and issues an initial query. Then some retrieved URLs are clicked and other queries are issued if he/she is not satisfied. We advocate that Web search is governed by a hidden semantic space, and each involved element such as query and URL has its projection, i.e., as a vector, in this space. Each of above actions in the search procedure, i.e. issuing queries or clicking URLs, is an interaction result of those elements in the space. In this paper, we aim at uncovering such a semantic space of Web search that uniformly captures the hidden semantics of search queries, URLs and other elements. We propose session2vec and session2vec+ models to learn vectors in the space with search session data, where a search session is regarded as an instantiation of an information need and keeps the interaction information of queries and URLs. Vector learning is done on a large query log from a search engine, and the efficacy of learnt vectors is examined in a few tasks.

This work is substantially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Code: CUHK413510).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Existing methods for vector representation learning [2, 15, 16, 20] cannot be readily applied here due to: (1) our training data is a set of sessions and each of them is represented as a graph, while the training data of existing methods is a set of word sequences; (2) a vector capturing users’ information need is incorporated into our learning procedure. Moreover, we intend to learn a space that uniformly embeds elements of different types such as queries and URLs.
2.
The number of contextual elements varies, so we calculate the average of contextual vectors.
3.
One may notice that both \(P(s_i;\theta )\) and \(P'(s_i;\theta )\) are defined a s probability of \(s_i\) and they may be unequal. In fact, refer to Eqs. 2, 4, and 5, the probability of a session is calculated from element vectors and parameter vectors associated with the Huffman tree. Therefore, it is possible that different types of input vectors, term-based or element-based, output different values. We would not restrict \(P(s_i;\theta )\) = \(P'(s_i;\theta )\) since such constraint will make the model less flexible in learning vectors for different elements. On the other hand, the session vector \(\mathbf {v}(s_i)\), as an intermediary, softly aligns the dimensions of element vectors and term vectors.
4.
https://code.google.com/p/word2vec/.

References

Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL, pp. 238–247 (2014)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Bing, L., Lam, W., Wong, T.L., Jameel, S.: Web query reformulation via joint modeling of latent topic dependency and term context. ACM Trans. Inf. Syst. 33(2), 1–38 (2015)
Article Google Scholar
Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246 (2007)
Google Scholar
Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: SIGIR, pp. 49–56 (2004)
Google Scholar
Gao, J., He, X., Nie, J.Y.: Clickthrough-based translation models for web search: from word models to phrase models. In: CIKM (2010)
Google Scholar
Gao, J., Toutanova, K., Yih, W.T.: Clickthrough-based latent semantic models for web search. In: SIGIR, pp. 675–684 (2011)
Google Scholar
Grbovic, M., Djuric, N., Radosavljevic, V., Silvestri, F., Bhamidipati, N.: Context- and content-aware embeddings for query rewriting in sponsored search. In: SIGIR 2015 (2015)
Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)
Article MATH Google Scholar
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp. 2333–2338 (2013)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Article Google Scholar
Jones, R., Klinkner, K.L.: Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In: CIKM 2008, pp. 699–708 (2008)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)
Google Scholar
Lee, S., Hu, Y.: Joint embedding of query and ad by leveraging implicit feedback. In: EMNLP, pp. 482–491 (2015)
Google Scholar
Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis (2012)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Google Scholar
Ren, X., Wang, Y., Yu, X., Yan, J., Chen, Z., Han, J.: Heterogeneous graph-based intent learning with queries, web pages and wikipedia concepts. In: WSDM, pp. 23–32 (2014)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: TREC, pp. 109–126 (1994)
Google Scholar
Schwenk, H.: Continuous space language models. Comput. Speech Lang. 21(3), 492–518 (2007)
Article Google Scholar
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: CIKM, pp. 101–110 (2014)
Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: SIGIR, pp. 21–29 (1996)
Google Scholar
Socher, R., Bengio, Y., Manning, C.D.: Deep learning for NLP (without magic). In: Tutorial Abstracts of ACL 2012, p. 5 (2012)
Google Scholar
Wu, W., Li, H., Xu, J.: Learning query and document similarities from click-through bipartite graph with metadata. In: WSDM, pp. 687–696 (2013)
Google Scholar
Yih, W.t., Toutanova, K., Platt, J.C., Meek, C.: Learning discriminative projections for text similarity measures. In: CoNLL, pp. 247–256 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Tencent Inc., Shenzhen, China
Lidong Bing
Baidu Inc., Beijing, China
Zheng-Yu Niu & Haifeng Wang
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Wai Lam

Authors

Lidong Bing
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-Yu Niu
View author publications
You can also search for this author in PubMed Google Scholar
Wai Lam
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lidong Bing .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Shaoping Ma
Renmin University of China , Beijing, China
Ji-Rong Wen
Tsinghua University , Beijing, China
Yiqun Liu
Renmin University of China , Beijing, China
Zhicheng Dou
Tsinghua University , Beijing, China
Min Zhang
Yahoo Labs , Sunnyvale, California, USA
Yi Chang
Renmin University of China , Beijing, China
Xin Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bing, L., Niu, ZY., Lam, W., Wang, H. (2016). Learning a Semantic Space of Web Search via Session Data. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-48051-0_7
Published: 15 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48050-3
Online ISBN: 978-3-319-48051-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics