Learning to Rank Microblog Posts for Real-Time Ad-Hoc Search

Li, Jing; Wei, Zhongyu; Wei, Hao; Zhao, Kangfei; Chen, Junwen; Wong, Kam-Fai

doi:10.1007/978-3-319-25207-0_40

Learning to Rank Microblog Posts for Real-Time Ad-Hoc Search

Jing Li^23,24,
Zhongyu Wei²⁵,
Hao Wei²³,
Kangfei Zhao²³,
Junwen Chen²⁶ &
…
Kam-Fai Wong^23,24

Conference paper
First Online: 20 October 2015

2305 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9362))

Abstract

Microblogging websites have emerged to the center of information production and diffusion, on which people can get useful information from other users’ microblog posts. In the era of Big Data, we are overwhelmed by the large amount of microblog posts. To make good use of these informative data, an effective search tool is required specialized for microblog posts. However, it is not trivial to do microblog search due to the following reasons: 1) microblog posts are noisy and time-sensitive rendering general information retrieval models ineffective. 2) Conventional IR models are not designed to consider microblog-specific features. In this paper, we propose to utilize learning to rank model for microblog search. We combine content-based, microblog-specific and temporal features into learning to rank models, which are found to model microblog posts effectively. To study the performance of learning to rank models, we evaluate our models using tweet data set provided by TERC 2011 and TREC 2012 microblogs track with the comparison of three state-of-the-art information retrieval baselines, vector space model, language model, BM25 model. Extensive experimental studies demonstrate the effectiveness of learning to rank models and the usefulness to integrate microblog-specific and temporal information for microblog search task.

This work is partially supported by General Research Fund of Hong Kong (417112), RGC Direct Grant (417613), and Huawei Noah’s Ark Lab, Hong Kong. We would like to thank Junjie Hu, Prof. Michael R. Lyu and anonymous reviewers for the useful comments. This work was done when Zhongyu Wei and Junwen Chen were at The Chinese University of Hong Kong.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

Dang, V.: Ranklib (2013)
Google Scholar
Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.Y.: An empirical study on learning to rank of tweets. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 295–303. Association for Computational Linguistics (2010)
Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research 4, 933–969 (2003)
MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of Statistics, 1189–1232 (2001)
Google Scholar
Han, Z., Li, X., Yang, M., Qi, H., Li, S., Zhao, T.: Hit at trec 2012 microblog track. In: Proceedings of the 21st Text REtrieval Conference (TREC) (2012)
Google Scholar
Hang, L.: A short introduction to learning to rank. IEICE Transactions on Information and Systems 94(10), 1854–1862 (2011)
Google Scholar
Lin, L., Efron, M.: Overview of the trec-2013 microblog track. In: Proceedings of the 23rd Text REtrieval Conference (TREC) (2013)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge university press, Cambridge (2008)
Book MATH Google Scholar
Metzler, D., Cai, C.: USC/ISI at trec 2011: microblog track. In: TREC (2011)
Google Scholar
Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Information Retrieval 10(3), 257–274 (2007)
Article Google Scholar
Obukhovskaya, Z., Pervyshev, K., Styskin, A., Serdyukov, P.: Yandex at trec 2011 microblog track. In: Proceedings of the 20th Text REtrieval Conference (TREC) (2011)
Google Scholar
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the trec-2011 microblog track. In: Proceedings of the 20th Text REtrieval Conference (TREC) (2011)
Google Scholar
Soboroff, I., Ounis, I., Lin, J., Soboroff, I.: Overview of the trec-2012 microblog track. In: Proceedings of the 21st Text REtrieval Conference (TREC) (2012)
Google Scholar
Wang, Y., Lin, J.: The impact of future term statistics in real-time tweet search. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 567–572. Springer, Heidelberg (2014)
Chapter Google Scholar
Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Information Retrieval 13(3), 254–270 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Jing Li, Hao Wei, Kangfei Zhao & Kam-Fai Wong
MoE Key Laboratory of High Confidence Software Technologies, Beijing, China
Jing Li & Kam-Fai Wong
The University of Texas at Dallas, Richardson, TX, USA
Zhongyu Wei
Tencent, Nanshan District, Shenzhen, China
Junwen Chen

Authors

Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyu Wei
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wei
View author publications
You can also search for this author in PubMed Google Scholar
Kangfei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Junwen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kam-Fai Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Juanzi Li
Rensselaer Polytechnic Institute, Troy, NY, USA
Heng Ji
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Wei, Z., Wei, H., Zhao, K., Chen, J., Wong, KF. (2015). Learning to Rank Microblog Posts for Real-Time Ad-Hoc Search. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2015. Lecture Notes in Computer Science(), vol 9362. Springer, Cham. https://doi.org/10.1007/978-3-319-25207-0_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-25207-0_40
Published: 20 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25206-3
Online ISBN: 978-3-319-25207-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics