Skip to main content

Enhanced Query Classification with Millions of Fine-Grained Topics

  • Conference paper
  • First Online:
Web-Age Information Management (WAIM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9659))

Included in the following conference series:

Abstract

Query classification is a crucial task to understand user search intents. Although this problem has been well studied in the past decades, it is still a big challenge in real-world applications due to the sparse, noisy and ambiguous nature of queries. In this paper, we present another important issue called “the pomegranate phenomenon”. This phenomenon is named for the gap between manually manageable small taxonomy and massive coherent topics in each category. Furthermore, the fine-grained topics in the same category of the taxonomy may be textually more relevant to the topics in other categories. This phenomenon will hurt the performances of most traditional classification methods. To overcome this problem, we present a practical approach to enhance the performances of traditional query classifiers. First, we detect millions of fine-grained query topics from two years of click logs which can represent different query intents and give them category labels. Second, for a given query, we calculate the K most relevant topics and select the label by majority voting, then try to use this label to improve the results of classical query classification methods. Empirical evaluation confirms that our topic based classification algorithms can significantly enhance the performances of traditional classifiers in read-world query classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://research.microsoft.com/en-us/projects/probase/.

  2. 2.

    https://www.freebase.com/.

References

  1. Barandela, R., Sánchez, J.S., et al.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–851 (2003)

    Article  Google Scholar 

  2. Bekkerman, R., Gavish, M.: High-precision phrase-based document classification on a modern scale. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 231–239. ACM, New York (2011)

    Google Scholar 

  3. Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)

    Article  MATH  Google Scholar 

  4. Broder, A., Fontoura, M., et al.: A semantic approach to contextual advertising. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 559–566. ACM, New York (2007)

    Google Scholar 

  5. Broder, A.Z., Fontoura, M., et al.: Robust classification of rare queries using web knowledge. In: Proceedings of the 30th Annual International ACM SIGIR, pp. 231–238 (2007)

    Google Scholar 

  6. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  7. Galar, M., Fernández, A., et al.: Empowering difficult classes with a similarity-based aggregation in multi-class classification problems. Inf. Sci. 264, 135–157 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 91–100. ACM, New York (2008)

    Google Scholar 

  9. Radlinski, F., Szummer, M., Craswell, N.: Inferring query intent from reformulations and clicks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1171–1172. ACM, New York (2010)

    Google Scholar 

  10. Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 595–604, ACM, New York (2012)

    Google Scholar 

  11. Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endowment 7(13), 1529–1540 (2014)

    Article  Google Scholar 

  12. Wang, F., Wang, Z., et al.: Concept-based short text classification and ranking. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1069–1078. Shanghai, 3–7 November 2014

    Google Scholar 

  13. Wang, S.I., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the ACL, pp. 90–94 (2012)

    Google Scholar 

  14. Yang, S., Kolcz, A., Schlaikjer, A., Gupta, P.: Large-scale high-precision topic modeling on twitter. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1907–1916. ACM New York (2014)

    Google Scholar 

  15. Ye, Q., Bin, W., Bai, W.: The influence of technology on social network analysis and mining. In: Özyer, T., Rokne, J., Wagner, G., Reuser, A.H.P. (eds.) Detecting Communities in Massive Networks Efficiently with Flexible Resolution, pp. 373–392. Springer, Heidelberg (2013)

    Google Scholar 

  16. Ye, Q., Wang, F., Li, B.: Starrysky: A practical system to track millions of high-precision query intents. In: 8th International Workshop on Web Intelligence & Communities, April 2016 (to appear)

    Google Scholar 

  17. Yu, H.-F., Hoy, C.-H., et al.: Product title classification versus text classification. Technical report, Department of Computer Science, The University of Texas, Austin (2012). http://www.csie.ntu.edu.tw/~cjlin/papers/title.pdf

  18. Yuan, G.-X., Ho, C.-H., Lin, C.-J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank all the members in ADRS (ADvertisement Research for Sponsered search) group in Sogou Inc. especially Ruining Wang for the help with parts of the data processing and experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Ye .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ye, Q., Wang, F., Li, B., Liu, Z. (2016). Enhanced Query Classification with Millions of Fine-Grained Topics. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9659. Springer, Cham. https://doi.org/10.1007/978-3-319-39958-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39958-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39957-7

  • Online ISBN: 978-3-319-39958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics