Advertisement

Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data

  • Preeti Gupta
  • Rajni Jindal
  • Arun Sharma
Article

Abstract

Community detection plays an important role in creation and transfer of information. Active learning has been employed recently to improve the performance of community detection techniques. Active learning provides a semi-automatic approach in a selective sampling of data. Based on this, a community trolling approach for topic based community detection in big data is proposed. Community trolling selectively samples the data relevant to the current context from polluted big data using active learning. Fine-tuned data is then used to study community and its sub-communities. Community trolling as a precursor to community detection leads to a reduction of the huge unreliable dataset into a reliable dataset and results in the better prediction of community elements such as important topics and important entities. Finally, the effectiveness of approach was evaluated by implementing it on a real world Tumbler dataset. The results illustrate that community trolling provides a richer dataset resulting in more appropriate communities.

Keywords

Active learning Unlabeled big data Community trolling Community detection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abdelbary, H., El-Korany, A.: Semantic topics modeling approach for community detection. Int. J. Comput. Appl. 81(6), 50–58 (2013)Google Scholar
  2. 2.
    Agarwal, S., Sureka, A.: Semantically analyzed metadata of tumblr posts and bloggers. Accessed on February (2017)Google Scholar
  3. 3.
    Aggarwal, C.C., Kong, X., Gu, Q., Han, J., Yu, P.S.: Active learning: A survey. In: Data Classification: Algorithms and Applications, pp. 571–605. Chapman and Hall/CRC (2014)Google Scholar
  4. 4.
    Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Artificial Intelligence Lab Management Information Systems Department, U.o.A.: Islamic network forum dataset. https://s3-us-west-2.amazonaws.com/azsecure-forums-darkweb/IslamicNetwork.zip (2017). (Accessed on February, 2017)
  6. 6.
    Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., Sakr, S.: Big data 2.0 processing systems: Taxonomy and open challenges. J. Grid Comput. 14(3), 379–405 (2016)CrossRefGoogle Scholar
  7. 7.
    Balasubramanyan, R., Cohen, W.W.: Block-lda: Jointly modeling entity-annotated text and entity-entity links. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 450–461. SIAM (2011)Google Scholar
  8. 8.
    Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003. IEEE (2003)Google Scholar
  9. 9.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  10. 10.
    Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)CrossRefGoogle Scholar
  11. 11.
    Cohen, K., Johansson, F., Kaati, L., Mork, J.C.: Detecting linguistic markers for radical violence in social media. Terrorism and Political Violence 26(1), 246–256 (2014)CrossRefGoogle Scholar
  12. 12.
    Ding, Y.: Community detection: topological vs. topical. J. Inf. 5(4), 498–514 (2011)Google Scholar
  13. 13.
    Dos Santos, D.P., De Carvalho, A.C.: Comparison of active learning strategies and proposal of a multiclass hypothesis space search. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 618–629. Springer (2014)Google Scholar
  14. 14.
    Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning. Knowl. Inf. Syst. 35(2), 1–35 (2013)CrossRefGoogle Scholar
  16. 16.
    Gadde, A., Gad, E.E., Avestimehr, S., Ortega, A.: Active learning for community detection in stochastic block models. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1889–1893. IEEE (2016)Google Scholar
  17. 17.
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report Stanford 1(12) (2009)Google Scholar
  18. 18.
    Goldberg, Y., Levy, O.: Word2vec explained: Deriving Mikolov others’s negative-sampling word-embedding method. arXiv:1402.3722 (2014)
  19. 19.
    Gupta, P., Sharma, A., Jindal, R.: Scalable machine-learning algorithms for big data analytics: a comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(6), 194–214 (2016)Google Scholar
  20. 20.
    Habashi, S., Ghanem, N.M., Ismail, M.A.: Enhanced community detection in social networks using active spectral clustering. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1178–1181. ACM (2016)Google Scholar
  21. 21.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)CrossRefzbMATHGoogle Scholar
  22. 22.
    Leng, M., Yao, Y., Cheng, J., Lv, W., Chen, X.: Active semi-supervised community detection algorithm with label propagation. In: International Conference on Database Systems for Advanced Applications, pp. 324–338. Springer (2013)Google Scholar
  23. 23.
    Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM (2010)Google Scholar
  24. 24.
    Li, D., Ding, Y., Sugimoto, C., He, B., Tang, J., Yan, E., Lin, N., Qin, Z., Dong, T.: Modeling topic and community structure in social tagging: The ttr-lda-community model. J. Am. Soc. Inf. Sci. Technol. 62(9), 1849–1866 (2011)CrossRefGoogle Scholar
  25. 25.
    Li, X.L., Liu, B.: Rule-based classification. In: Data Classification: Algorithms and applications, pp. 121–156. CRC Press . http://www.crcnetbase.com/doi/abs/10.1201/b17320-6 (2014)
  26. 26.
    Liu, J., Li, J., Li, W., Wu, J.: Rethinking big data: A review on the data quality and usage issues. ISPRS J. Photogramm. Remote. Sens. 115, 134–142 (2016)CrossRefGoogle Scholar
  27. 27.
    McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: In Proc. of 2005 Int. Joint Conf. on Artificial Intelligence (IJCAI05), pp. 786–791 (2005)Google Scholar
  28. 28.
    McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. 30, 249–272 (2007)CrossRefGoogle Scholar
  29. 29.
    Meoni, M., Perego, R., Tonellotto, N.: Dataset popularity prediction for caching of CMS big data. J. Grid Comput. 16(2), 1–18 (2018)CrossRefGoogle Scholar
  30. 30.
    Mikalef, P., Pappas, I.O., Krogstie, J., Giannakos, M.: Big data analytics capabilities: a systematic literature review and research agenda. Information Systems and e-Business Management, 1–32 (2017)Google Scholar
  31. 31.
    Moore, C., Yan, X., Zhu, Y., Rouquier, J.B., Lane, T.: Active learning for node classification in assortative and disassortative networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 841– 849. ACM (2011)Google Scholar
  32. 32.
    Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD Workshop, vol. 8 (2008)Google Scholar
  33. 33.
    Planti’e, M., Crampes, M.: Survey on social community detection. In: Social Media Retrieval, pp. 65–85. Springer (2013)Google Scholar
  34. 34.
    Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)Google Scholar
  35. 35.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc, New York (1986)zbMATHGoogle Scholar
  36. 36.
    Sculley, D.: Online active learning methods for fast label-efficient spam filtering. In: CEAS, vol. 7 (2007)Google Scholar
  37. 37.
    Settles, B.: Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6(1), 1–114 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Silva, C., Antunes, M., Costa, J., Ribeiro, B.: Active manifold learning with twitter big data. Procedia Comput. Sci. 53, 208–215 (2015)CrossRefGoogle Scholar
  39. 39.
    Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM (2004)Google Scholar
  40. 40.
    Tang, L., Liu, H.: Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery 2(1), 1–137 (2010)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Tang, Z., Fu, Z., Gong, Z., Li, K., Li, K.: A parallel conditional random fields model based on spark computing environment. J. Grid Comput. 15(3), 323–342 (2017)CrossRefGoogle Scholar
  42. 42.
    Wang, M., Wang, C., Yu, J.X., Zhang, J.: Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proc. VLDB Endowment 8(10), 998–1009 (2015)CrossRefGoogle Scholar
  43. 43.
    Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. (csur) 45(4), 43 (2013)CrossRefzbMATHGoogle Scholar
  44. 44.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. HotCloud 10(10-10), 95 (2010)Google Scholar
  45. 45.
    Zhou, X., Saha, A., Sindhwani, V.: Semi-supervised learning. In: Cost-Sensitive Machine Learning, pp. 31–59. CRC Press (2011)Google Scholar
  46. 46.
    Zhu, J., Wang, H., Tsou, B.K., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)CrossRefGoogle Scholar
  47. 47.
    Zou, L., Song, W.W.: Lda-tm: A two-step approach to twitter topic data clustering. In: 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 342–347. IEEE (2016)Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.Department of ITIndira Gandhi Delhi Technical University for Women (IGDTUW)DelhiIndia
  2. 2.Department of CSEDelhi Technological University(DTU)DelhiIndia

Personalised recommendations