Skip to main content

A Set-Based Training Query Classification Approach for Twitter Search

  • Conference paper
  • First Online:
Web-Age Information Management (WAIM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9658))

Included in the following conference series:

  • 1573 Accesses

Abstract

Learning to rank is a popular technique of building a ranking model for Twitter search by utilizing a rich list of features. As most learning to rank algorithms are supervised, their effectiveness is heavily affected by the quality of labeled training data. Selecting training queries with high quality is an important means to improving the effectiveness of ranking model for Twitter search. Existing approach for this problem learns a query quality classifier, which estimates the training query quality on a per query basis, but ignores the dependence between queries. This paper proposes a set-based training query classification approach that estimates a training query’s quality by taking its usefulness in combination with other training queries into consideration. Evaluation on standard TREC Microblog track test collection shows effective retrieval performance brought by the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/lintool/twitter-tools/wiki/TREC-2014-Track-Guidelines.

  2. 2.

    https://github.com/lintool/twitter-tools/wiki/TREC-2013-Track-Guidelines.

References

  1. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of WWW, pp. 591–600 (2010)

    Google Scholar 

  2. Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.Y.: An empirical study on learning to rank of tweets. In: Proceedings of COLING, pp. 295–303 (2010)

    Google Scholar 

  3. Lin, J., Efron, M.: Overview of the TREC 2013 microblog track. In: TREC (2013)

    Google Scholar 

  4. Lin, J., Efron, M.: Overview of the TREC 2014 microblog track. In: TREC (2014)

    Google Scholar 

  5. Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)

    Article  Google Scholar 

  6. Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of ICML, pp. 129–136 (2007)

    Google Scholar 

  7. Long, B., Chapelle, O., Zhang, Y., Chang, Y., Zheng, Z., Tseng, B.: Active learning for ranking through expected loss optimization. In: Proceedings of SIGIR, pp. 267–274 (2010)

    Google Scholar 

  8. Zhang, X., He, B., Luo, T., Li, D., Xu, J.: Clustering-based transduction for learning a ranking model with limited human labels. In: Proceedings of CIKM, pp. 1777–1782 (2013)

    Google Scholar 

  9. Li, D., He, B., Luo, T., Zhang, X.: Selecting training data for learning-based twitter search. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 501–506. Springer, Heidelberg (2015)

    Google Scholar 

  10. Lv, C., Fan, F., Qiang, R., Fei, Y., Yang, J.: Pkuicst at TREC 2014 microblog track: feature extraction for effective microblog search and adaptive clustering algorithms for TTG. In: TREC (2014)

    Google Scholar 

  11. Xu, T., Oard, D.W., McNamee, P.: HLTCOE at TREC 2014: microblog and clinical decision support. In: TREC (2014)

    Google Scholar 

  12. Zhang, Z., Lan, M.: Estimating semantic similarity between expanded query and tweet content for microblog retrieval. In: TREC (2014)

    Google Scholar 

  13. Magdy, W., Gao, W., El-Ganainy, T., Wei, Z.: QCRI at TREC 2014: applying the KISS principle for the TTG task in the microblog track. In: TREC (2014)

    Google Scholar 

  14. Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of CIKM, pp. 538–548 (2002)

    Google Scholar 

  15. Li, C., Wang, Y., Mei, Q.: A user-in-the-loop process for investigational search: foreseer in TREC 2013 microblog track. In: TREC (2013)

    Google Scholar 

  16. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)

    Google Scholar 

  17. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)

    Article  Google Scholar 

  18. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)

    Article  Google Scholar 

  19. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)

    Google Scholar 

  20. Shtok, A., Kurland, O., Carmel, D., Raiber, F., Markovits, G.: Predicting query performance by query-drift estimation. ACM Trans. Inf. Syst. 30(2), 11:1–11:35 (2012)

    Article  Google Scholar 

  21. Cao, Y., Xu, J., Liu, T.Y., Li, H., Huang, Y., Hon, H.W.: Adapting ranking SVM to document retrieval. In: Proceedings of SIGIR, pp. 186–193 (2006)

    Google Scholar 

  22. Xia, F., Liu, T.Y., Wang, J., Zhang, W., Li, H.: Listwise approach to learning to rank: theory and algorithm. In: Proceedings of ICML, pp. 1192–1199 (2008)

    Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China (61472391), and Beijing Natural Science Foundation (4142050).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ben He or Jungang Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ma, Q., He, B., Xu, J., Wang, B. (2016). A Set-Based Training Query Classification Approach for Twitter Search. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9658. Springer, Cham. https://doi.org/10.1007/978-3-319-39937-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39937-9_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39936-2

  • Online ISBN: 978-3-319-39937-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics