Correction to: Visual topic models for healthcare data clustering

The Original Article is available

Correction to: Evolutionary Intelligence https://doi.org/10.1007/s12065-019-00300-y

The algorithms in section 5 were missing in the online published article. Now, section 5 text is given below.

Proposed visual topic models

Visual topic models are beneficial in the assessment of tweets documents for finding distinct conceptual hidden topics that improvise the quality of clustering results. PLSI, NMF, LDA, and intJNMF are widely used topic models, however, they are unable to detect the number of topics discussed by social users, i.e., user involvement is required for random selection of this number during the performance of topics clustering. It is expected to get more accurate clustering results when we have prior knowledge about the number of topics. Visual topic models are extended for better assessment of the number of topics visually and accurate clustering results. These proposed models are devised as follows: Visual PLSI(VPLSI), Visual NMF (VNMF), Visual LDA (VLDA), and Visual intJNMF (VintJNMF).

The proposed visual topic model, include, VPLSI, VNMF, VLDA, and VintJNMF, are learned the topics of tweets document by deriving the low-rank matrices in their procedures. The low-rank matrices are topic document matrix ‘V’ (in VNMF, VLDA, and VLSI) and tweet-topic matrix ‘W’ in VintJNMF. In these models, less sparsely and low-rank matrices are generated which discovers the relationship between topics and tweet documents rather than the relationship between terms and tweet documents. The proposed VintJNMF exploits the relationships based on twitter interactions, mention, reply and retweet of tweets topic. Other models, i.e., VNMF, VLDA, and VPLSI exploit the relationships based on mention and reply twitter interactions and they ignore unwanted repeated retweet interactions, thus, these models establish the less sparsely tweet-topic document matrix when compared to VintJNMF.

In VPLSI, the term-tweet document matrix taken as input, in which term denoted as word (w) and document denoted as (d). Initial probability P(d) and conditional independence probability P(w|d) are computed for calculating the value of prior probability P(d, w). Three probability values P(w|z). P(z), P(d|z) are re-estimated using the EM algorithm for getting more accurate topic models distributions over to tweet documents. The topic-document probabilities are stored into matrix V. Both dissimilarity and reordered dissimilarity matrices are computed using Euclidean or in Cosine metrics for better assessment number of distinct topics in a visual form, in which each topic cluster is represented as square-shaped dark colored blocks with crisp partitions information. Crisp partitions are useful for determining predicted cluster labels of tweet documents in complete clustering results. The proposed VPLSI is shown in Algorithm 1.

figurea

The VPLSI finds the topic-document matrix based on the term-document matrix and other estimated parameters using the EM algorithm. It will not take an account of terms-correlation features while estimating topic parameters. Thus, another visual topic model is proposed, namely, VNMF finds the topic-document matrix using term-correlation matrix ‘S’ and deriving of conceptual or hidden topics, and term-topic matrix ‘U’. The steps of VNMF are shown in Algorithm 2.

figureb

Term-document X is the primary information about the relationship between tweets’ documents and terms. The term correlation matrix is derived from X to find optimal term-topic matrix ‘U’ by applying convergence in step 2 and finding reduced topic-document matrix ‘V’ in step 3 of VNMF. Each document represented in the form of the document vector. Distances are computed using either Euclidean or Cosine between document vectors that show the dissimilarities or similarities between document. These dissimilarity values are stored in the dissimilarity matrix (DM) and values are reordered in another matrix using [20], reordered dissimilarity matrix (RDM). Image of RDM shows various square-shaped dark colored blocks along the diagonal and each square-shaped dark colored block denotes the separate cluster. Several conceptual or hidden topics are assessed by counting of visual square-shaped dark colored blocks. Crisp partitions are derived for these topics in visual images of RDM and predicted cluster labels of documents are derived for determining complete clustering results.

The LDA uses Dirichlet coefficients for deriving the topic models for document clustering and it uses EM procedure [37] for updating the probability values of documents concerning topics and store these values into V for further computation of dissimilarity features of tweet documents. Topic-document matrix ‘V’ is derived based on Dirchilet coefficients for the topics of tweet documents in using the EM algorithm in Step 3 of following Algorithm 3

figurec

.

The visual image of RDM shows the topics clusters in the form of square-shaped dark colored blocks. Crisp partitions are derived from dark-colored blocks to obtain of topic labels of tweet documents in VLDA clustering.

The topic models, VNMF and VLDA consider twitter interactions, namely, ‘mention’ and ‘reply’ in the derivation of topic-document matrix, whereas, in VintJNMF, three interactions, namely, ‘mention’, ‘reply’ and ‘retweet’ are considered in the derivation of topic-document matrix. It strongly associates the content of the tweet with many relationships; however, it re-generates similar tweet content in retweet for expressing the same relationship. The steps of VintJNMF are shown in the following Algorithm 4.

figured

VintJNMFfollows the NMF [38] approach for obtaining topic-document matrix W from tweet-relationship matrix A. The matrix A makes the summation of three values i.e. relationship interactions among people, actions based on tweets and retweets, and similarity between tweets. Conceptual hidden topics with documents matrix is derived after applying NMF on tweet relationship matrix A. NMF is applied another time on term-document matrix V, and previous iterated W for finding final updated matrix of W. Visual approach steps of 4, 5, and 6 are applied for visual assessment of valid clusters and optimal crisp partitions describes the complete clustering results.

Author information

Affiliations

Authors

Corresponding author

Correspondence to K. Rajendra Prasad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rajendra Prasad, K., Mohammed, M. & Noorullah, R.M. Correction to: Visual topic models for healthcare data clustering. Evol. Intel. (2019). https://doi.org/10.1007/s12065-019-00323-5

Download citation