Abstract
Social media is a great source to search health-related topics for envisages solutions towards healthcare. Topic models originated from Natural Language Processing that is receiving much attention in healthcare areas because of interpretability and its decision making, which motivated us to develop visual topic models. Topic models are used for the extraction of health topics for analyzing discriminative and coherent latent features of tweet documents in healthcare applications. Discovering the number of topics in topic models is an important issue. Sometimes, users enable an incorrect number of topics in traditional topic models, which leads to poor results in health data clustering. In such cases, proper visualizations are essential to extract information for identifying cluster trends. To aid in the visualization of topic clouds and health tendencies in the document collection, we present hybrid topic modeling techniques by integrating traditional topic models with visualization procedures. We believe proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual intJNon-negative Matrix Factorization (VintJNMF), and Visual Probabilistic Latent Schematic Indexing (VPLSI) are promising methods for extracting tendency of health topics from various sources in healthcare data clustering. Standard and benchmark social health datasets are used in an experimental study to demonstrate the efficiency of proposed models concerning clustering accuracy (CA), Normalized Mutual Information (NMI), precision (P), recall (R), F-Score (F) measures and computational complexities. VNMF visual model performs significantly at an increased rate of 32.4% under cosine based metric in the display of visual clusters and an increased rate of 35–40% in performance measures compared to other visual methods on different number of health topics.
Similar content being viewed by others
Change history
22 November 2019
A Correction to this paper has been published: https://doi.org/10.1007/s12065-019-00323-5
References
Vosecky J, Jiang D, Leung KW-T, Xing K, Ng W (2014) Integrating social and auxiliary semantics for multifaceted topic modeling in twitter. ACM Trans Internet Technol (TOIT) 14(4):27
McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: Proceedings of the 19th international joint conference on Artificial intelligence, July 30–August 05, 2005, Edinburgh, Scotland, pp 786–791
Chen Z, Liu B (2014) Mining topics in documents: standing on the shoulders of big data. In: Proceedings 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1116–1125
Hassan MA (2016) A comparative study of classification algorithm in e-health environment. In: 2016 IEEE conference (ICDIPC), April 2016. https://doi.org/10.1109/icdipc.2016.7470789
Hu Y, John A, Wang F, Kambhampati S (2012) Et-LDA: joint topic modeling for aligning events and their twitter feedback. In: AAAI conference on artificial intelligence (AAAI 2012), vol 12, Toronto, ON, Canada, pp 59–65
Singh V, Dubey SK (2014) Opinion mining and analysis: a literature review. In: 2014 IEEE conference, pp 25–26
Kolini F, Janczewski L (2017) Clustering and topic modelling: a new approach for analysis of national cybersecurity strategies. In: PACIS 2017 proceedings no. 126, pp 1–12
Eswara Reddy B, Rajendra Prasad K (2016) Improving the performance of visualized clustering method. Int J Syst Assur Eng Manag 7:102–111
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
T Hofmann (1999) Probabilistic latent semantic indexing. In: SIGIR, ACM, pp 50–57
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Choo J, Lee C, Reddy CK, Park H (2013) Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Visual Comput Graph 19(12):1992–2001
Lee D, Seung H (2000) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems 13, NIPS 2000, Denver, CO, USA, pp 556–562
Nugroho R, Yang J, Zhong Y, Paris C, Nepal S (2015) Deriving topics in twitter by exploiting tweet interactions. In: Proceedings of the 4th IEEE international congress on big data, New York, USA. IEEE Services Computing Community
RobertusNugroho, Jian Yang, Weiliang Zhao, Cecile Paris, and Surya Nepal, “What and With Whom? Identifying Topics in Twitter Through Both Interactions and Text “, Journal of Latex Class Files, Vol.14, No.8, August 2015
Bezdek JC (2002) VAT: a tool for visual assessment of cluster tendency. In: IJCNN2002, Feb 2002. https://doi.org/10.1109/ijcnn.2002.1007487
Yan X, Guo J, Liu S, Cheng X, Wang Y (2013) Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the SIAM international conference on data mining (SIAM 2013), San Diego, CA, USA, SDM
Wuhan (2018) TF-IDF based feature words extraction and topic modeling for short text. In: ICMSS2018. https://doi.org/10.1145/3180374.3181354
Dredze M (2012) How social media will change public health. IEEE Intell Syst 27(4):81–84
Schofield A (2017) Pulling out the stop: rethinking stop words removal for topic models. In: 15th conference of ACM, pp 432–436
Agrawal A (2016) What is wrong with topic modeling? (And how to fix it using search-based SE). IEEE Trans Softw Eng. https://doi.org/10.1016/j.infsof.2018.02.005
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. https://doi.org/10.1109/tcyb.2015.2477416
Anisha PR, Kishor Kumar Reddy C, Narasimha Prasad LV (2015) A pragmatic approach for detecting liver cancer using image processing and data mining techniques. In: IEEE international conference on signal processing and communication engineering systems, pp 352–357
Avula M, Lakkakula NP, Raja MP (2014) Bone cancer detection from MRI scan imagery using mean pixel intensity. In: IEEE 8th Asia modelling symposium, pp 141–146
Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Proceedings 7th international conference on weblogs and social media
Benny A, Phili M (2015) Keyword based tweet extraction and detection of related topics. In: ICICT 2014, vol 46, pp 364–371
Pochampally R, Varma V (2011) User context as a source of topic retrieval in twitter. In: Workshop on enriching information retrieval (with ACM SIGIR), Beijing, China. ACM, pp 1–3
Albakour M, Macdonald C, Ounis I (2013) On scarcity and drift for effective real-time filtering in microblogs. In: Proceedings of the 22nd ACM international conference on information and knowledge management (CIKM2013), pp 419–428
Li J, Tai Z, Zhang R, Yu W, Liu L (2014) Online busty event detection from microblog. In: 2014 IEEE/ACM 7th international conference on utility and cloud computing (UCC), pp 865–870
Ramage D, Dumais ST, Liebling DJ (2010) Characterizing microblogs with topic models. Int AAAI Conf Web Soc Media (ICWSM) 10:130–137
Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval, online edition, vol 1. Cambridge. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf
Yan X, Guo J Learning topics in short text using ncut-weighted non-negative matrix factorization on term correlation matrix. http://xiaohuiyan.com/papers/TNMF-SDM-13.pdf
Singular Value Decomposition web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm
Rajendra Prasad K, SulemanBasha M (2016) Improving the performance of speech clustering method. In: 10th International conference on intelligent systems and control (ISCO), 2016. https://doi.org/10.1109/isco.2016.7726878
Yan X, Guo J (2012) Clustering short text using Ncut-weighted non-negative matrix factorization. In: Proceedings CIKM 2012, Miami, HI, USA, pp 2259–2262
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38
He Z, Xie S, Zdunek R, Zhou G, Cichocki A (2011) Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering. IEEE Trans Neural Networks 22(12):2117–2131
Paul M, Girju R (2010) A two-dimensional topic-aspect model for discovering multi-faceted topics. In: Proceedings 24th AAAI-10 conference on artificial intelligence, Atlanta, GA, USA, 2010
Pattanodom M, I am-On N, Boongoen T (2016) Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian conference on defense technology (ACDT).https://doi.org/10.1109/acdt.2016.7437660
Amelio A, Pizzuti C (2015) Is normalized mutual information a fair measure for comparing community detection methods? In: IEEE/ACM International conference on advances in social networks analysis and mining
Xu G, Meng Y, Chen Z, Qiu X, Wang C, Yao H (2019) Research on topic detection and tracking for online news texts. IEEE Access 7:58407–58418. https://doi.org/10.1109/access.2019.2914097
Li Z, Shang W, Yan M (2016) News text classification model based on-topic model. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS). https://doi.org/10.1109/icis.2016.7550929
Huang L, Ma J, Chen C (2017) Topic detection from microblogs using T-LDA and perplexity. In: 24th Asia-Pacific software engineering conference workshops
Acknowledgements
This work is supported by the Science & Engineering Research Board (SERB), Department of Science and Technology, Government of India for the Research Grant of DST Project Number ECR/2016/001556.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rajendra Prasad, K., Mohammed, M. & Noorullah, R.M. Visual topic models for healthcare data clustering. Evol. Intel. 14, 545–562 (2021). https://doi.org/10.1007/s12065-019-00300-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00300-y