Abstract
Social media is a great source to search health-related topics for envisages solutions towards healthcare. Topic models originated from Natural Language Processing that is receiving much attention in healthcare areas because of interpretability and its decision making, which motivated us to develop visual topic models. Topic models are used for the extraction of health topics for analyzing discriminative and coherent latent features of tweet documents in healthcare applications. Discovering the number of topics in topic models is an important issue. Sometimes, users enable an incorrect number of topics in traditional topic models, which leads to poor results in health data clustering. In such cases, proper visualizations are essential to extract information for identifying cluster trends. To aid in the visualization of topic clouds and health tendencies in the document collection, we present hybrid topic modeling techniques by integrating traditional topic models with visualization procedures. We believe proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual intJNon-negative Matrix Factorization (VintJNMF), and Visual Probabilistic Latent Schematic Indexing (VPLSI) are promising methods for extracting tendency of health topics from various sources in healthcare data clustering. Standard and benchmark social health datasets are used in an experimental study to demonstrate the efficiency of proposed models concerning clustering accuracy (CA), Normalized Mutual Information (NMI), precision (P), recall (R), F-Score (F) measures and computational complexities. VNMF visual model performs significantly at an increased rate of 32.4% under cosine based metric in the display of visual clusters and an increased rate of 35–40% in performance measures compared to other visual methods on different number of health topics.
This is a preview of subscription content, access via your institution.









Change history
22 November 2019
The algorithms in section 5 were missing in the online publrished article. Now, section 5 text is given below.
References
- 1.
Vosecky J, Jiang D, Leung KW-T, Xing K, Ng W (2014) Integrating social and auxiliary semantics for multifaceted topic modeling in twitter. ACM Trans Internet Technol (TOIT) 14(4):27
- 2.
McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: Proceedings of the 19th international joint conference on Artificial intelligence, July 30–August 05, 2005, Edinburgh, Scotland, pp 786–791
- 3.
Chen Z, Liu B (2014) Mining topics in documents: standing on the shoulders of big data. In: Proceedings 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1116–1125
- 4.
Hassan MA (2016) A comparative study of classification algorithm in e-health environment. In: 2016 IEEE conference (ICDIPC), April 2016. https://doi.org/10.1109/icdipc.2016.7470789
- 5.
Hu Y, John A, Wang F, Kambhampati S (2012) Et-LDA: joint topic modeling for aligning events and their twitter feedback. In: AAAI conference on artificial intelligence (AAAI 2012), vol 12, Toronto, ON, Canada, pp 59–65
- 6.
Singh V, Dubey SK (2014) Opinion mining and analysis: a literature review. In: 2014 IEEE conference, pp 25–26
- 7.
Kolini F, Janczewski L (2017) Clustering and topic modelling: a new approach for analysis of national cybersecurity strategies. In: PACIS 2017 proceedings no. 126, pp 1–12
- 8.
Eswara Reddy B, Rajendra Prasad K (2016) Improving the performance of visualized clustering method. Int J Syst Assur Eng Manag 7:102–111
- 9.
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
- 10.
T Hofmann (1999) Probabilistic latent semantic indexing. In: SIGIR, ACM, pp 50–57
- 11.
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
- 12.
Choo J, Lee C, Reddy CK, Park H (2013) Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Visual Comput Graph 19(12):1992–2001
- 13.
Lee D, Seung H (2000) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems 13, NIPS 2000, Denver, CO, USA, pp 556–562
- 14.
Nugroho R, Yang J, Zhong Y, Paris C, Nepal S (2015) Deriving topics in twitter by exploiting tweet interactions. In: Proceedings of the 4th IEEE international congress on big data, New York, USA. IEEE Services Computing Community
- 15.
RobertusNugroho, Jian Yang, Weiliang Zhao, Cecile Paris, and Surya Nepal, “What and With Whom? Identifying Topics in Twitter Through Both Interactions and Text “, Journal of Latex Class Files, Vol.14, No.8, August 2015
- 16.
Bezdek JC (2002) VAT: a tool for visual assessment of cluster tendency. In: IJCNN2002, Feb 2002. https://doi.org/10.1109/ijcnn.2002.1007487
- 17.
Yan X, Guo J, Liu S, Cheng X, Wang Y (2013) Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the SIAM international conference on data mining (SIAM 2013), San Diego, CA, USA, SDM
- 18.
Wuhan (2018) TF-IDF based feature words extraction and topic modeling for short text. In: ICMSS2018. https://doi.org/10.1145/3180374.3181354
- 19.
Dredze M (2012) How social media will change public health. IEEE Intell Syst 27(4):81–84
- 20.
Schofield A (2017) Pulling out the stop: rethinking stop words removal for topic models. In: 15th conference of ACM, pp 432–436
- 21.
Agrawal A (2016) What is wrong with topic modeling? (And how to fix it using search-based SE). IEEE Trans Softw Eng. https://doi.org/10.1016/j.infsof.2018.02.005
- 22.
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. https://doi.org/10.1109/tcyb.2015.2477416
- 23.
Anisha PR, Kishor Kumar Reddy C, Narasimha Prasad LV (2015) A pragmatic approach for detecting liver cancer using image processing and data mining techniques. In: IEEE international conference on signal processing and communication engineering systems, pp 352–357
- 24.
Avula M, Lakkakula NP, Raja MP (2014) Bone cancer detection from MRI scan imagery using mean pixel intensity. In: IEEE 8th Asia modelling symposium, pp 141–146
- 25.
- 26.
Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Proceedings 7th international conference on weblogs and social media
- 27.
Benny A, Phili M (2015) Keyword based tweet extraction and detection of related topics. In: ICICT 2014, vol 46, pp 364–371
- 28.
Pochampally R, Varma V (2011) User context as a source of topic retrieval in twitter. In: Workshop on enriching information retrieval (with ACM SIGIR), Beijing, China. ACM, pp 1–3
- 29.
Albakour M, Macdonald C, Ounis I (2013) On scarcity and drift for effective real-time filtering in microblogs. In: Proceedings of the 22nd ACM international conference on information and knowledge management (CIKM2013), pp 419–428
- 30.
Li J, Tai Z, Zhang R, Yu W, Liu L (2014) Online busty event detection from microblog. In: 2014 IEEE/ACM 7th international conference on utility and cloud computing (UCC), pp 865–870
- 31.
Ramage D, Dumais ST, Liebling DJ (2010) Characterizing microblogs with topic models. Int AAAI Conf Web Soc Media (ICWSM) 10:130–137
- 32.
Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval, online edition, vol 1. Cambridge. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf
- 33.
Yan X, Guo J Learning topics in short text using ncut-weighted non-negative matrix factorization on term correlation matrix. http://xiaohuiyan.com/papers/TNMF-SDM-13.pdf
- 34.
Singular Value Decomposition web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm
- 35.
Rajendra Prasad K, SulemanBasha M (2016) Improving the performance of speech clustering method. In: 10th International conference on intelligent systems and control (ISCO), 2016. https://doi.org/10.1109/isco.2016.7726878
- 36.
Yan X, Guo J (2012) Clustering short text using Ncut-weighted non-negative matrix factorization. In: Proceedings CIKM 2012, Miami, HI, USA, pp 2259–2262
- 37.
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38
- 38.
He Z, Xie S, Zdunek R, Zhou G, Cichocki A (2011) Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering. IEEE Trans Neural Networks 22(12):2117–2131
- 39.
- 40.
Paul M, Girju R (2010) A two-dimensional topic-aspect model for discovering multi-faceted topics. In: Proceedings 24th AAAI-10 conference on artificial intelligence, Atlanta, GA, USA, 2010
- 41.
Pattanodom M, I am-On N, Boongoen T (2016) Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian conference on defense technology (ACDT).https://doi.org/10.1109/acdt.2016.7437660
- 42.
Amelio A, Pizzuti C (2015) Is normalized mutual information a fair measure for comparing community detection methods? In: IEEE/ACM International conference on advances in social networks analysis and mining
- 43.
Xu G, Meng Y, Chen Z, Qiu X, Wang C, Yao H (2019) Research on topic detection and tracking for online news texts. IEEE Access 7:58407–58418. https://doi.org/10.1109/access.2019.2914097
- 44.
Li Z, Shang W, Yan M (2016) News text classification model based on-topic model. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS). https://doi.org/10.1109/icis.2016.7550929
- 45.
Huang L, Ma J, Chen C (2017) Topic detection from microblogs using T-LDA and perplexity. In: 24th Asia-Pacific software engineering conference workshops
Acknowledgements
This work is supported by the Science & Engineering Research Board (SERB), Department of Science and Technology, Government of India for the Research Grant of DST Project Number ECR/2016/001556.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rajendra Prasad, K., Mohammed, M. & Noorullah, R.M. Visual topic models for healthcare data clustering. Evol. Intel. (2019). https://doi.org/10.1007/s12065-019-00300-y
Received:
Revised:
Accepted:
Published:
Keywords
- Visual topic model
- Social data
- Visual clustering
- Cosine based metric
- Health tendency