Visual topic models for healthcare data clustering

A Correction to this article is available

This article has been updated

Abstract

Social media is a great source to search health-related topics for envisages solutions towards healthcare. Topic models originated from Natural Language Processing that is receiving much attention in healthcare areas because of interpretability and its decision making, which motivated us to develop visual topic models. Topic models are used for the extraction of health topics for analyzing discriminative and coherent latent features of tweet documents in healthcare applications. Discovering the number of topics in topic models is an important issue. Sometimes, users enable an incorrect number of topics in traditional topic models, which leads to poor results in health data clustering. In such cases, proper visualizations are essential to extract information for identifying cluster trends. To aid in the visualization of topic clouds and health tendencies in the document collection, we present hybrid topic modeling techniques by integrating traditional topic models with visualization procedures. We believe proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual intJNon-negative Matrix Factorization (VintJNMF), and Visual Probabilistic Latent Schematic Indexing (VPLSI) are promising methods for extracting tendency of health topics from various sources in healthcare data clustering. Standard and benchmark social health datasets are used in an experimental study to demonstrate the efficiency of proposed models concerning clustering accuracy (CA), Normalized Mutual Information (NMI), precision (P), recall (R), F-Score (F) measures and computational complexities. VNMF visual model performs significantly at an increased rate of 32.4% under cosine based metric in the display of visual clusters and an increased rate of 35–40% in performance measures compared to other visual methods on different number of health topics.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Change history

  • 22 November 2019

    The algorithms in section 5 were missing in the online publrished article. Now, section 5 text is given below.

References

  1. 1.

    Vosecky J, Jiang D, Leung KW-T, Xing K, Ng W (2014) Integrating social and auxiliary semantics for multifaceted topic modeling in twitter. ACM Trans Internet Technol (TOIT) 14(4):27

    Article  Google Scholar 

  2. 2.

    McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: Proceedings of the 19th international joint conference on Artificial intelligence, July 30–August 05, 2005, Edinburgh, Scotland, pp 786–791

  3. 3.

    Chen Z, Liu B (2014) Mining topics in documents: standing on the shoulders of big data. In: Proceedings 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1116–1125

  4. 4.

    Hassan MA (2016) A comparative study of classification algorithm in e-health environment. In: 2016 IEEE conference (ICDIPC), April 2016. https://doi.org/10.1109/icdipc.2016.7470789

  5. 5.

    Hu Y, John A, Wang F, Kambhampati S (2012) Et-LDA: joint topic modeling for aligning events and their twitter feedback. In: AAAI conference on artificial intelligence (AAAI 2012), vol 12, Toronto, ON, Canada, pp 59–65

  6. 6.

    Singh V, Dubey SK (2014) Opinion mining and analysis: a literature review. In: 2014 IEEE conference, pp 25–26

  7. 7.

    Kolini F, Janczewski L (2017) Clustering and topic modelling: a new approach for analysis of national cybersecurity strategies. In: PACIS 2017 proceedings no. 126, pp 1–12

  8. 8.

    Eswara Reddy B, Rajendra Prasad K (2016) Improving the performance of visualized clustering method. Int J Syst Assur Eng Manag 7:102–111

    Article  Google Scholar 

  9. 9.

    Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  10. 10.

    T Hofmann (1999) Probabilistic latent semantic indexing. In: SIGIR, ACM, pp 50–57

  11. 11.

    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  12. 12.

    Choo J, Lee C, Reddy CK, Park H (2013) Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Visual Comput Graph 19(12):1992–2001

    Article  Google Scholar 

  13. 13.

    Lee D, Seung H (2000) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems 13, NIPS 2000, Denver, CO, USA, pp 556–562

  14. 14.

    Nugroho R, Yang J, Zhong Y, Paris C, Nepal S (2015) Deriving topics in twitter by exploiting tweet interactions. In: Proceedings of the 4th IEEE international congress on big data, New York, USA. IEEE Services Computing Community

  15. 15.

    RobertusNugroho, Jian Yang, Weiliang Zhao, Cecile Paris, and Surya Nepal, “What and With Whom? Identifying Topics in Twitter Through Both Interactions and Text “, Journal of Latex Class Files, Vol.14, No.8, August 2015

  16. 16.

    Bezdek JC (2002) VAT: a tool for visual assessment of cluster tendency. In: IJCNN2002, Feb 2002. https://doi.org/10.1109/ijcnn.2002.1007487

  17. 17.

    Yan X, Guo J, Liu S, Cheng X, Wang Y (2013) Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the SIAM international conference on data mining (SIAM 2013), San Diego, CA, USA, SDM

  18. 18.

    Wuhan (2018) TF-IDF based feature words extraction and topic modeling for short text. In: ICMSS2018. https://doi.org/10.1145/3180374.3181354

  19. 19.

    Dredze M (2012) How social media will change public health. IEEE Intell Syst 27(4):81–84

    Article  Google Scholar 

  20. 20.

    Schofield A (2017) Pulling out the stop: rethinking stop words removal for topic models. In: 15th conference of ACM, pp 432–436

  21. 21.

    Agrawal A (2016) What is wrong with topic modeling? (And how to fix it using search-based SE). IEEE Trans Softw Eng. https://doi.org/10.1016/j.infsof.2018.02.005

    Article  Google Scholar 

  22. 22.

    Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. https://doi.org/10.1109/tcyb.2015.2477416

    Article  Google Scholar 

  23. 23.

    Anisha PR, Kishor Kumar Reddy C, Narasimha Prasad LV (2015) A pragmatic approach for detecting liver cancer using image processing and data mining techniques. In: IEEE international conference on signal processing and communication engineering systems, pp 352–357

  24. 24.

    Avula M, Lakkakula NP, Raja MP (2014) Bone cancer detection from MRI scan imagery using mean pixel intensity. In: IEEE 8th Asia modelling symposium, pp 141–146

  25. 25.

    TREC2015. https://trec.nist.gov/pubs/trec24/trec2015.html

  26. 26.

    Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Proceedings 7th international conference on weblogs and social media

  27. 27.

    Benny A, Phili M (2015) Keyword based tweet extraction and detection of related topics. In: ICICT 2014, vol 46, pp 364–371

  28. 28.

    Pochampally R, Varma V (2011) User context as a source of topic retrieval in twitter. In: Workshop on enriching information retrieval (with ACM SIGIR), Beijing, China. ACM, pp 1–3

  29. 29.

    Albakour M, Macdonald C, Ounis I (2013) On scarcity and drift for effective real-time filtering in microblogs. In: Proceedings of the 22nd ACM international conference on information and knowledge management (CIKM2013), pp 419–428

  30. 30.

    Li J, Tai Z, Zhang R, Yu W, Liu L (2014) Online busty event detection from microblog. In: 2014 IEEE/ACM 7th international conference on utility and cloud computing (UCC), pp 865–870

  31. 31.

    Ramage D, Dumais ST, Liebling DJ (2010) Characterizing microblogs with topic models. Int AAAI Conf Web Soc Media (ICWSM) 10:130–137

    Google Scholar 

  32. 32.

    Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval, online edition, vol 1. Cambridge. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

  33. 33.

    Yan X, Guo J Learning topics in short text using ncut-weighted non-negative matrix factorization on term correlation matrix. http://xiaohuiyan.com/papers/TNMF-SDM-13.pdf

  34. 34.

    Singular Value Decomposition web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm

  35. 35.

    Rajendra Prasad K, SulemanBasha M (2016) Improving the performance of speech clustering method. In: 10th International conference on intelligent systems and control (ISCO), 2016. https://doi.org/10.1109/isco.2016.7726878

  36. 36.

    Yan X, Guo J (2012) Clustering short text using Ncut-weighted non-negative matrix factorization. In: Proceedings CIKM 2012, Miami, HI, USA, pp 2259–2262

  37. 37.

    Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  38. 38.

    He Z, Xie S, Zdunek R, Zhou G, Cichocki A (2011) Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering. IEEE Trans Neural Networks 22(12):2117–2131

    Article  Google Scholar 

  39. 39.

    TREC2014. https://trec.nist.gov/pubs/trec23/trec2014.HTML

  40. 40.

    Paul M, Girju R (2010) A two-dimensional topic-aspect model for discovering multi-faceted topics. In: Proceedings 24th AAAI-10 conference on artificial intelligence, Atlanta, GA, USA, 2010

  41. 41.

    Pattanodom M, I am-On N, Boongoen T (2016) Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian conference on defense technology (ACDT).https://doi.org/10.1109/acdt.2016.7437660

  42. 42.

    Amelio A, Pizzuti C (2015) Is normalized mutual information a fair measure for comparing community detection methods? In: IEEE/ACM International conference on advances in social networks analysis and mining

  43. 43.

    Xu G, Meng Y, Chen Z, Qiu X, Wang C, Yao H (2019) Research on topic detection and tracking for online news texts. IEEE Access 7:58407–58418. https://doi.org/10.1109/access.2019.2914097

    Article  Google Scholar 

  44. 44.

    Li Z, Shang W, Yan M (2016) News text classification model based on-topic model. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS). https://doi.org/10.1109/icis.2016.7550929

  45. 45.

    Huang L, Ma J, Chen C (2017) Topic detection from microblogs using T-LDA and perplexity. In: 24th Asia-Pacific software engineering conference workshops

Download references

Acknowledgements

This work is supported by the Science & Engineering Research Board (SERB), Department of Science and Technology, Government of India for the Research Grant of DST Project Number ECR/2016/001556.

Author information

Affiliations

Authors

Corresponding author

Correspondence to K. Rajendra Prasad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rajendra Prasad, K., Mohammed, M. & Noorullah, R.M. Visual topic models for healthcare data clustering. Evol. Intel. (2019). https://doi.org/10.1007/s12065-019-00300-y

Download citation

Keywords

  • Visual topic model
  • Social data
  • Visual clustering
  • Cosine based metric
  • Health tendency