Visual topic models for healthcare data clustering

  • K. Rajendra PrasadEmail author
  • Moulana Mohammed
  • R. M. Noorullah
Special Issue


Social media is a great source to search health-related topics for envisages solutions towards healthcare. Topic models originated from Natural Language Processing that is receiving much attention in healthcare areas because of interpretability and its decision making, which motivated us to develop visual topic models. Topic models are used for the extraction of health topics for analyzing discriminative and coherent latent features of tweet documents in healthcare applications. Discovering the number of topics in topic models is an important issue. Sometimes, users enable an incorrect number of topics in traditional topic models, which leads to poor results in health data clustering. In such cases, proper visualizations are essential to extract information for identifying cluster trends. To aid in the visualization of topic clouds and health tendencies in the document collection, we present hybrid topic modeling techniques by integrating traditional topic models with visualization procedures. We believe proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual intJNon-negative Matrix Factorization (VintJNMF), and Visual Probabilistic Latent Schematic Indexing (VPLSI) are promising methods for extracting tendency of health topics from various sources in healthcare data clustering. Standard and benchmark social health datasets are used in an experimental study to demonstrate the efficiency of proposed models concerning clustering accuracy (CA), Normalized Mutual Information (NMI), precision (P), recall (R), F-Score (F) measures and computational complexities. VNMF visual model performs significantly at an increased rate of 32.4% under cosine based metric in the display of visual clusters and an increased rate of 35–40% in performance measures compared to other visual methods on different number of health topics.


Visual topic model Social data Visual clustering Cosine based metric Health tendency 



This work is supported by the Science & Engineering Research Board (SERB), Department of Science and Technology, Government of India for the Research Grant of DST Project Number ECR/2016/001556.


  1. 1.
    Vosecky J, Jiang D, Leung KW-T, Xing K, Ng W (2014) Integrating social and auxiliary semantics for multifaceted topic modeling in twitter. ACM Trans Internet Technol (TOIT) 14(4):27CrossRefGoogle Scholar
  2. 2.
    McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: Proceedings of the 19th international joint conference on Artificial intelligence, July 30–August 05, 2005, Edinburgh, Scotland, pp 786–791Google Scholar
  3. 3.
    Chen Z, Liu B (2014) Mining topics in documents: standing on the shoulders of big data. In: Proceedings 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1116–1125Google Scholar
  4. 4.
    Hassan MA (2016) A comparative study of classification algorithm in e-health environment. In: 2016 IEEE conference (ICDIPC), April 2016.
  5. 5.
    Hu Y, John A, Wang F, Kambhampati S (2012) Et-LDA: joint topic modeling for aligning events and their twitter feedback. In: AAAI conference on artificial intelligence (AAAI 2012), vol 12, Toronto, ON, Canada, pp 59–65Google Scholar
  6. 6.
    Singh V, Dubey SK (2014) Opinion mining and analysis: a literature review. In: 2014 IEEE conference, pp 25–26Google Scholar
  7. 7.
    Kolini F, Janczewski L (2017) Clustering and topic modelling: a new approach for analysis of national cybersecurity strategies. In: PACIS 2017 proceedings no. 126, pp 1–12Google Scholar
  8. 8.
    Eswara Reddy B, Rajendra Prasad K (2016) Improving the performance of visualized clustering method. Int J Syst Assur Eng Manag 7:102–111CrossRefGoogle Scholar
  9. 9.
    Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRefGoogle Scholar
  10. 10.
    T Hofmann (1999) Probabilistic latent semantic indexing. In: SIGIR, ACM, pp 50–57Google Scholar
  11. 11.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  12. 12.
    Choo J, Lee C, Reddy CK, Park H (2013) Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Visual Comput Graph 19(12):1992–2001CrossRefGoogle Scholar
  13. 13.
    Lee D, Seung H (2000) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems 13, NIPS 2000, Denver, CO, USA, pp 556–562Google Scholar
  14. 14.
    Nugroho R, Yang J, Zhong Y, Paris C, Nepal S (2015) Deriving topics in twitter by exploiting tweet interactions. In: Proceedings of the 4th IEEE international congress on big data, New York, USA. IEEE Services Computing CommunityGoogle Scholar
  15. 15.
    RobertusNugroho, Jian Yang, Weiliang Zhao, Cecile Paris, and Surya Nepal, “What and With Whom? Identifying Topics in Twitter Through Both Interactions and Text “, Journal of Latex Class Files, Vol.14, No.8, August 2015Google Scholar
  16. 16.
    Bezdek JC (2002) VAT: a tool for visual assessment of cluster tendency. In: IJCNN2002, Feb 2002.
  17. 17.
    Yan X, Guo J, Liu S, Cheng X, Wang Y (2013) Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the SIAM international conference on data mining (SIAM 2013), San Diego, CA, USA, SDMGoogle Scholar
  18. 18.
    Wuhan (2018) TF-IDF based feature words extraction and topic modeling for short text. In: ICMSS2018.
  19. 19.
    Dredze M (2012) How social media will change public health. IEEE Intell Syst 27(4):81–84CrossRefGoogle Scholar
  20. 20.
    Schofield A (2017) Pulling out the stop: rethinking stop words removal for topic models. In: 15th conference of ACM, pp 432–436Google Scholar
  21. 21.
    Agrawal A (2016) What is wrong with topic modeling? (And how to fix it using search-based SE). IEEE Trans Softw Eng. CrossRefGoogle Scholar
  22. 22.
    Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. CrossRefGoogle Scholar
  23. 23.
    Anisha PR, Kishor Kumar Reddy C, Narasimha Prasad LV (2015) A pragmatic approach for detecting liver cancer using image processing and data mining techniques. In: IEEE international conference on signal processing and communication engineering systems, pp 352–357Google Scholar
  24. 24.
    Avula M, Lakkakula NP, Raja MP (2014) Bone cancer detection from MRI scan imagery using mean pixel intensity. In: IEEE 8th Asia modelling symposium, pp 141–146Google Scholar
  25. 25.
  26. 26.
    Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Proceedings 7th international conference on weblogs and social mediaGoogle Scholar
  27. 27.
    Benny A, Phili M (2015) Keyword based tweet extraction and detection of related topics. In: ICICT 2014, vol 46, pp 364–371CrossRefGoogle Scholar
  28. 28.
    Pochampally R, Varma V (2011) User context as a source of topic retrieval in twitter. In: Workshop on enriching information retrieval (with ACM SIGIR), Beijing, China. ACM, pp 1–3Google Scholar
  29. 29.
    Albakour M, Macdonald C, Ounis I (2013) On scarcity and drift for effective real-time filtering in microblogs. In: Proceedings of the 22nd ACM international conference on information and knowledge management (CIKM2013), pp 419–428Google Scholar
  30. 30.
    Li J, Tai Z, Zhang R, Yu W, Liu L (2014) Online busty event detection from microblog. In: 2014 IEEE/ACM 7th international conference on utility and cloud computing (UCC), pp 865–870Google Scholar
  31. 31.
    Ramage D, Dumais ST, Liebling DJ (2010) Characterizing microblogs with topic models. Int AAAI Conf Web Soc Media (ICWSM) 10:130–137Google Scholar
  32. 32.
    Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval, online edition, vol 1. Cambridge.
  33. 33.
    Yan X, Guo J Learning topics in short text using ncut-weighted non-negative matrix factorization on term correlation matrix.
  34. 34.
    Singular Value Decomposition Scholar
  35. 35.
    Rajendra Prasad K, SulemanBasha M (2016) Improving the performance of speech clustering method. In: 10th International conference on intelligent systems and control (ISCO), 2016.
  36. 36.
    Yan X, Guo J (2012) Clustering short text using Ncut-weighted non-negative matrix factorization. In: Proceedings CIKM 2012, Miami, HI, USA, pp 2259–2262Google Scholar
  37. 37.
    Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38MathSciNetzbMATHGoogle Scholar
  38. 38.
    He Z, Xie S, Zdunek R, Zhou G, Cichocki A (2011) Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering. IEEE Trans Neural Networks 22(12):2117–2131CrossRefGoogle Scholar
  39. 39.
  40. 40.
    Paul M, Girju R (2010) A two-dimensional topic-aspect model for discovering multi-faceted topics. In: Proceedings 24th AAAI-10 conference on artificial intelligence, Atlanta, GA, USA, 2010Google Scholar
  41. 41.
    Pattanodom M, I am-On N, Boongoen T (2016) Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian conference on defense technology (ACDT).
  42. 42.
    Amelio A, Pizzuti C (2015) Is normalized mutual information a fair measure for comparing community detection methods? In: IEEE/ACM International conference on advances in social networks analysis and miningGoogle Scholar
  43. 43.
    Xu G, Meng Y, Chen Z, Qiu X, Wang C, Yao H (2019) Research on topic detection and tracking for online news texts. IEEE Access 7:58407–58418. CrossRefGoogle Scholar
  44. 44.
    Li Z, Shang W, Yan M (2016) News text classification model based on-topic model. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS).
  45. 45.
    Huang L, Ma J, Chen C (2017) Topic detection from microblogs using T-LDA and perplexity. In: 24th Asia-Pacific software engineering conference workshopsGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringInstitute of Aeronautical EngineeringHyderabadIndia
  2. 2.Department of Computer Science and EngineeringKoneru Lakshmaiah Education FoundationVaddeswaram, GunturIndia

Personalised recommendations