Skip to main content

Learning a Semantic Space of Web Search via Session Data

  • Conference paper
  • First Online:
Information Retrieval Technology (AIRS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9994))

Included in the following conference series:

  • 909 Accesses

Abstract

In Web search, a user first comes up with an information need and issues an initial query. Then some retrieved URLs are clicked and other queries are issued if he/she is not satisfied. We advocate that Web search is governed by a hidden semantic space, and each involved element such as query and URL has its projection, i.e., as a vector, in this space. Each of above actions in the search procedure, i.e. issuing queries or clicking URLs, is an interaction result of those elements in the space. In this paper, we aim at uncovering such a semantic space of Web search that uniformly captures the hidden semantics of search queries, URLs and other elements. We propose session2vec and session2vec+ models to learn vectors in the space with search session data, where a search session is regarded as an instantiation of an information need and keeps the interaction information of queries and URLs. Vector learning is done on a large query log from a search engine, and the efficacy of learnt vectors is examined in a few tasks.

This work is substantially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Code: CUHK413510).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Existing methods for vector representation learning [2, 15, 16, 20] cannot be readily applied here due to: (1) our training data is a set of sessions and each of them is represented as a graph, while the training data of existing methods is a set of word sequences; (2) a vector capturing users’ information need is incorporated into our learning procedure. Moreover, we intend to learn a space that uniformly embeds elements of different types such as queries and URLs.

  2. 2.

    The number of contextual elements varies, so we calculate the average of contextual vectors.

  3. 3.

    One may notice that both \(P(s_i;\theta )\) and \(P'(s_i;\theta )\) are defined a s probability of \(s_i\) and they may be unequal. In fact, refer to Eqs. 2, 4, and 5, the probability of a session is calculated from element vectors and parameter vectors associated with the Huffman tree. Therefore, it is possible that different types of input vectors, term-based or element-based, output different values. We would not restrict \(P(s_i;\theta )\) = \(P'(s_i;\theta )\) since such constraint will make the model less flexible in learning vectors for different elements. On the other hand, the session vector \(\mathbf {v}(s_i)\), as an intermediary, softly aligns the dimensions of element vectors and term vectors.

  4. 4.

    https://code.google.com/p/word2vec/.

References

  1. Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL, pp. 238–247 (2014)

    Google Scholar 

  2. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  3. Bing, L., Lam, W., Wong, T.L., Jameel, S.: Web query reformulation via joint modeling of latent topic dependency and term context. ACM Trans. Inf. Syst. 33(2), 1–38 (2015)

    Article  Google Scholar 

  4. Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246 (2007)

    Google Scholar 

  5. Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: SIGIR, pp. 49–56 (2004)

    Google Scholar 

  6. Gao, J., He, X., Nie, J.Y.: Clickthrough-based translation models for web search: from word models to phrase models. In: CIKM (2010)

    Google Scholar 

  7. Gao, J., Toutanova, K., Yih, W.T.: Clickthrough-based latent semantic models for web search. In: SIGIR, pp. 675–684 (2011)

    Google Scholar 

  8. Grbovic, M., Djuric, N., Radosavljevic, V., Silvestri, F., Bhamidipati, N.: Context- and content-aware embeddings for query rewriting in sponsored search. In: SIGIR 2015 (2015)

    Google Scholar 

  9. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)

    Article  MATH  Google Scholar 

  10. Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp. 2333–2338 (2013)

    Google Scholar 

  11. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  12. Jones, R., Klinkner, K.L.: Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In: CIKM 2008, pp. 699–708 (2008)

    Google Scholar 

  13. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)

    Google Scholar 

  14. Lee, S., Hu, Y.: Joint embedding of query and ad by leveraging implicit feedback. In: EMNLP, pp. 482–491 (2015)

    Google Scholar 

  15. Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis (2012)

    Google Scholar 

  16. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)

    Google Scholar 

  17. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  18. Ren, X., Wang, Y., Yu, X., Yan, J., Chen, Z., Han, J.: Heterogeneous graph-based intent learning with queries, web pages and wikipedia concepts. In: WSDM, pp. 23–32 (2014)

    Google Scholar 

  19. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: TREC, pp. 109–126 (1994)

    Google Scholar 

  20. Schwenk, H.: Continuous space language models. Comput. Speech Lang. 21(3), 492–518 (2007)

    Article  Google Scholar 

  21. Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: CIKM, pp. 101–110 (2014)

    Google Scholar 

  22. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: SIGIR, pp. 21–29 (1996)

    Google Scholar 

  23. Socher, R., Bengio, Y., Manning, C.D.: Deep learning for NLP (without magic). In: Tutorial Abstracts of ACL 2012, p. 5 (2012)

    Google Scholar 

  24. Wu, W., Li, H., Xu, J.: Learning query and document similarities from click-through bipartite graph with metadata. In: WSDM, pp. 687–696 (2013)

    Google Scholar 

  25. Yih, W.t., Toutanova, K., Platt, J.C., Meek, C.: Learning discriminative projections for text similarity measures. In: CoNLL, pp. 247–256 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lidong Bing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Bing, L., Niu, ZY., Lam, W., Wang, H. (2016). Learning a Semantic Space of Web Search via Session Data. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48051-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48050-3

  • Online ISBN: 978-3-319-48051-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics