Abstract
With the development of internet technology and widely used in mobile devices, the microblogging systems such as Twitter and Sina Weibo in China have become the most important platform for people to retrieve information and communicate with each other. The real-time search became a big challenge for microblogging systems because of the volume of data and users. Existing approaches build all microblogs in an index which will increase the cost of index update and query. The search results could not satisfy users’ timely and high quality requirements. In this paper, we propose a new real-time distributed index based on topic (RDIBT), which can build index for each topic. Those topical indices will be distributed to many sites, so it can improve the concurrently of queries. Extensive experiments demonstrate the effectiveness and efficiency of RDIBT on the real dataset.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abiteboul, S., Vianu, V.: Queries and computation on the Web. In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 262–275. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62222-5_50. Author, F.: Article title. Journal 2(5), 99–110 (2016)
Apache. Apache lucene (2012). http://lucene.apache.org/core/
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1), 107–117 (1998)
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 89–96. ACM (2005)
Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: real-time search at Twitter. In: Proceedings of 28th International Conference on Data Engineering (ICDE), Washington, DC, USA, pp. 1360–1369. IEEE (2012)
Chen, C., Li, F., Ooi, B.C., Wu, S.: TI: an efficient indexing mechanism for real-time search on tweets. In: Proceedings of the 30th International Conference on Management of Data, Athens, Greece, pp. 649–660. ACM (2011)
Chu, W., Keerthi, S.S.: Support vector ordinal regression. Neural Comput. 19(3), 792–815 (2007)
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M., et al.: Focused crawling using context graphs. In: Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 527–534. ACM (2000)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
Gao, M., Jin, C., Qian, W., Gong, X.: Real-time search over a microblogging system. In: Proceedings of the 2nd International Conference on Cloud and Green Computing, Xiangtan, Hunan, China, pp. 352–359. IEEE (2012)
Gao, M., Jin, C., Qian, W., Gong, X.: Real-time and personalized search over a microblogging system. Comput. J. 57(9), 1281–1295 (2013)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. U.S. Am. 101(Suppl 1), 5228–5235 (2004)
Herbrich, R., Graepel, T., Obermayer, K.: Support vector learning for ordinal regression. In: Proceedings of 9th International Conference on Artificial Neural Networks, Edinburgh, Scotland, pp. 97–102. IEEE (1999)
Kleinberg, J., Tomkins, A.: Applications of linear algebra in information retrieval and hypertext analysis. In: Proceedings of the 18th SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Philadelphia, Pennsylvania, pp. 185–193. ACM (1999)
Lu, Y., Zhai, C.: Opinion integration through semi-supervised topic modeling. In: Proceedings of the 17th International Conference on World Wide Web, Beijing, China, pp. 121–130. ACM (2008)
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, Edmonton, Alberta, Canada, pp. 352–359. Morgan Kaufmann Publishers Inc. (2002)
Pant, G., Srinivasan, P., Menczer, F.: Crawling the web. In: Pant, G., Srinivasan, P., Menczer, F. (eds.) Web Dynamics, pp. 153–177. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-662-10874-1_7
Teevan, J., Ramage, D., Morris, M.R.: Twittersearch: a comparison of microblog search and web search. In: Proceedings of the 4th International Conference on Web Search and Data Mining, HongKong, China, pp. 35–44. ACM (2011)
Wu, L., Lin, W., Xiao, X., Xu, Y.: LSII: an indexing structure for exact real-time search on microblogs. In: Proceedings of the 29th International Conference on Data Engineering, Brisbane, Australia, pp. 482–493. IEEE (2013)
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing Twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34
Acknowledgment
Thanks to the anonymous reviewers for their insightful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, Z., Wang, L., Yang, S. (2018). A Real-Time Distributed Index Based on Topic for Microblogging System. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_22
Download citation
DOI: https://doi.org/10.1007/978-981-10-8530-7_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8529-1
Online ISBN: 978-981-10-8530-7
eBook Packages: Computer ScienceComputer Science (R0)