Enhanced Query Classification with Millions of Fine-Grained Topics

Ye, Qi; Wang, Feng; Li, Bo; Liu, Zhimin

doi:10.1007/978-3-319-39958-4_10

Qi Ye¹⁸,
Feng Wang¹⁸,
Bo Li¹⁸ &
…
Zhimin Liu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9659))

Included in the following conference series:

International Conference on Web-Age Information Management

1111 Accesses
2 Citations

Abstract

Query classification is a crucial task to understand user search intents. Although this problem has been well studied in the past decades, it is still a big challenge in real-world applications due to the sparse, noisy and ambiguous nature of queries. In this paper, we present another important issue called “the pomegranate phenomenon”. This phenomenon is named for the gap between manually manageable small taxonomy and massive coherent topics in each category. Furthermore, the fine-grained topics in the same category of the taxonomy may be textually more relevant to the topics in other categories. This phenomenon will hurt the performances of most traditional classification methods. To overcome this problem, we present a practical approach to enhance the performances of traditional query classifiers. First, we detect millions of fine-grained query topics from two years of click logs which can represent different query intents and give them category labels. Second, for a given query, we calculate the K most relevant topics and select the label by majority voting, then try to use this label to improve the results of classical query classification methods. Empirical evaluation confirms that our topic based classification algorithms can significantly enhance the performances of traditional classifiers in read-world query classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Barandela, R., Sánchez, J.S., et al.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–851 (2003)
Article Google Scholar
Bekkerman, R., Gavish, M.: High-precision phrase-based document classification on a modern scale. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 231–239. ACM, New York (2011)
Google Scholar
Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)
Article MATH Google Scholar
Broder, A., Fontoura, M., et al.: A semantic approach to contextual advertising. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 559–566. ACM, New York (2007)
Google Scholar
Broder, A.Z., Fontoura, M., et al.: Robust classification of rare queries using web knowledge. In: Proceedings of the 30th Annual International ACM SIGIR, pp. 231–238 (2007)
Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Galar, M., Fernández, A., et al.: Empowering difficult classes with a similarity-based aggregation in multi-class classification problems. Inf. Sci. 264, 135–157 (2014)
Article MathSciNet MATH Google Scholar
Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 91–100. ACM, New York (2008)
Google Scholar
Radlinski, F., Szummer, M., Craswell, N.: Inferring query intent from reformulations and clicks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1171–1172. ACM, New York (2010)
Google Scholar
Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 595–604, ACM, New York (2012)
Google Scholar
Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endowment 7(13), 1529–1540 (2014)
Article Google Scholar
Wang, F., Wang, Z., et al.: Concept-based short text classification and ranking. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1069–1078. Shanghai, 3–7 November 2014
Google Scholar
Wang, S.I., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the ACL, pp. 90–94 (2012)
Google Scholar
Yang, S., Kolcz, A., Schlaikjer, A., Gupta, P.: Large-scale high-precision topic modeling on twitter. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1907–1916. ACM New York (2014)
Google Scholar
Ye, Q., Bin, W., Bai, W.: The influence of technology on social network analysis and mining. In: Özyer, T., Rokne, J., Wagner, G., Reuser, A.H.P. (eds.) Detecting Communities in Massive Networks Efficiently with Flexible Resolution, pp. 373–392. Springer, Heidelberg (2013)
Google Scholar
Ye, Q., Wang, F., Li, B.: Starrysky: A practical system to track millions of high-precision query intents. In: 8th International Workshop on Web Intelligence & Communities, April 2016 (to appear)
Google Scholar
Yu, H.-F., Hoy, C.-H., et al.: Product title classification versus text classification. Technical report, Department of Computer Science, The University of Texas, Austin (2012). http://www.csie.ntu.edu.tw/~cjlin/papers/title.pdf
Yuan, G.-X., Ho, C.-H., Lin, C.-J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank all the members in ADRS (ADvertisement Research for Sponsered search) group in Sogou Inc. especially Ruining Wang for the help with parts of the data processing and experiments.

Author information

Authors and Affiliations

Sogou Inc., Beijing, China
Qi Ye, Feng Wang, Bo Li & Zhimin Liu

Authors

Qi Ye
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Ye .

Editor information

Editors and Affiliations

Peking University , Beijing, China
Bin Cui
The George Washington University , Washington, D.C., USA
Nan Zhang
Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Jianliang Xu
University of Texas Rio Grande Valley, Edinburg, Texas, USA
Xiang Lian
Jiangxi University of Finance and Economics, Nanchang, Jiangxi, China
Dexi Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, Q., Wang, F., Li, B., Liu, Z. (2016). Enhanced Query Classification with Millions of Fine-Grained Topics. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9659. Springer, Cham. https://doi.org/10.1007/978-3-319-39958-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-39958-4_10
Published: 02 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39957-7
Online ISBN: 978-3-319-39958-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics