Skip to main content

Query Intent Detection Based on Clustering of Phrase Embedding

  • Conference paper
  • First Online:
Book cover Social Media Processing (SMP 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 669))

Included in the following conference series:

Abstract

Understanding ambiguous or multi-faceted search queries is essential for information retrieval. The task of identifying the major aspects or senses of queries can be viewed as detection of query intents, where the intents are represented as a number of clusters. So the challenging issue in this task is how to generate intent candidates and group them semantically. This paper explores the competence of lexical statistics and embedding method. First a novel term expansion algorithm is designed to sketch all possible intent candidates. Moreover, an efficient query intent generation model is proposed, which learns latent representations for intent candidates via embedding-based methods. And then vectorized intent candidates are clustered and detected as query intents. Experimental results, based on the NTCIR-12 IMine-2 corpus, show that query intent generation model via phrase embedding significantly outperforms the state-of-art clustering algorithms in query intent detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The overview of the NTCIR-12 IMine-2 Task have been released in URL: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/pdf/ntcir/OVERVIEW/01-NTCIR12-OV-IMINE-YamamotoT.pdf.

  2. 2.

    http://www.sogou.com/labs/dl/q.html.

  3. 3.

    https://github.com/keaigongzhugu/labeled-dataset.

References

  1. Liu, Y., Song, R., Zhang, M., Dou, Z., Yamamoto, T., Kato, M.P, Ohshima, H., Zhou, K.: Overview of the NTCIR-11 IMine task. In: NTCIR (2014)

    Google Scholar 

  2. Bendersky, M., Metzler, D., Croft, W. B.: Effective query formulation with multiple information sources. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 443–452. ACM (2012)

    Google Scholar 

  3. Bouchoucha, A., Nie, J.Y., Liu, X.: Université de Montréal at the NTCIR-11 IMine task. In: NTCIR(2014)

    Google Scholar 

  4. Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng. 15(4), 829–839 (2003)

    Article  Google Scholar 

  5. Bai, J., Song, D., Bruza, P., Nie, J. Y., Cao, G.: Query expansion using term relationships in language models for information retrieval. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 688–695. ACM (2005)

    Google Scholar 

  6. Radlinski, F., Szummer, M., Craswell, N.: Inferring query intent from reformulations and clicks. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1171–1172. ACM (2010)

    Google Scholar 

  7. Zhang, Z., Sun, L., Han, X.: Learning to mine query subtopics from query log. In: ACL Short papers, vol. 2, p. 341 (2015)

    Google Scholar 

  8. Jiang, D., Leung, K.W.T., Ng, W.: Query intent mining with multiple dimensions of web search data. World Wide Web 19(3), 475–497 (2016)

    Article  Google Scholar 

  9. Li, C., Yan, N., Roy, S.B., Lisham, L., Das, G.: Facetedpedia: dynamic generation of query-dependent faceted interfaces for Wikipedia. In: Proceedings of the 19th International Conference on World Wide Web, pp. 651–660, ACM (2010)

    Google Scholar 

  10. Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., Zheng, Q.: Mining query subtopics from search log data. In: SIGIR (2012)

    Google Scholar 

  11. Mei, L., Huang, H., Wei, X., Yuan, P., Mao, X.L.: FCL: a new network words extraction approach based on statistical language knowledge. Chinese National Conference on Social Media Processing. Communications in Computer and Information Science, vol. 568, pp. 119–130. Springer, Singapore (2015)

    Chapter  Google Scholar 

  12. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. ICML 14, 1188–1196 (2014)

    Google Scholar 

  13. Mikolov, T., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. (2013)

    Google Scholar 

  14. Yamamoto, T., Liu, Y., Zhang, M., Dou, Z., Zhou, K., Markov, I., Kato, M.P, Ohshima, H., Fujita, S.: Overview of the NTCIR-12 IMine-2 task. In: Proceedings of the NTCIR (2016)

    Google Scholar 

  15. Tsukuda, K., Dou, Z., Sakai, T.: Microsoft research Asia at the NTCIR-10 Intent Task. In: NTCIR (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiahui Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Gu, J., Feng, C., Gao, X., Wang, Y., Huang, H. (2016). Query Intent Detection Based on Clustering of Phrase Embedding. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds) Social Media Processing. SMP 2016. Communications in Computer and Information Science, vol 669. Springer, Singapore. https://doi.org/10.1007/978-981-10-2993-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2993-6_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2992-9

  • Online ISBN: 978-981-10-2993-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics