Advertisement

Advances in Data Analysis and Classification

, Volume 12, Issue 3, pp 537–558 | Cite as

Cluster-based sparse topical coding for topic mining and document clustering

  • Parvin Ahmadi
  • Iman Gholampour
  • Mahmoud Tabandeh
Regular Article
  • 115 Downloads

Abstract

In this paper, we introduce a document clustering method based on Sparse Topical Coding, called Cluster-based Sparse Topical Coding. Topic modeling is capable of improving textual document clustering by describing documents via bag-of-words models and projecting them into a topic space. The latent semantic descriptions derived by the topic model can be utilized as features in a clustering process. In our proposed method, document clustering and topic modeling are integrated in a unified framework in order to achieve the highest performance. This framework includes Sparse Topical Coding, which is responsible for topic mining, and K-means that discovers the latent clusters in documents collection. Experimental results on widely-used datasets show that our proposed method significantly outperforms the traditional and other topic model based clustering methods. Our method achieves from 4 to 39% improvement in clustering accuracy and from 2% to more than 44% improvement in normalized mutual information.

Keywords

Document clustering Topic model Sparse topical coding K-means 

Mathematics Subject Classification

68T50 

References

  1. Ahmadi P, Kaviani R, Gholampour I, Tabandeh M (2015) Clustering improvement via integrating with sparse topical coding. In: 23rd Iranian conference on electrical engineering, IEEE, pp 466–471. http://ieeexplore.ieee.org/document/7146260/
  2. Blei DM, Ng AY, Jordan MI, Lafferty J (2003) Latent Dirichlet allocation. J Mach Learn Res 3(Jan):993–1022zbMATHGoogle Scholar
  3. Fritzke B (1995) A growing neural gas network learns topologies. Adv Neural Inf Process Syst 7:625–632Google Scholar
  4. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 289–296Google Scholar
  5. Hyvarinen A (1999) Sparse code shrinkage: denoising of nongaussian data by maximum likelihood estimation. Neural Comput 10:1739–1768CrossRefGoogle Scholar
  6. Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Logist Q 2(1–2):83–97MathSciNetCrossRefGoogle Scholar
  7. Lamirel JC (2012) A new approach for automatizing the analysis of research topics dynamics: application to optoelectronics research. J Scientometr 93(1):151–166CrossRefGoogle Scholar
  8. Lamirel JC, Falk I, Gardent C (2015) Federating clustering and cluster labelling capabilities with a single approach based on feature maximization: French verb classes identification with IGNGF neural clustering. Neurocomputing 147:136–146CrossRefGoogle Scholar
  9. Lee H, Battle A, Raina R, Ng AY (2006) Efficient sparse coding algorithms. In: Advances in neural information processing systems, pp 801–808Google Scholar
  10. Li X, Ouyang J, Lu Y, Zhou X, Tian T (2014) Group topic model: organizing topics into groups. Inf Retr J 18(1):1–25Google Scholar
  11. Lu Y, Mei Q, Zhai C (2011) Investigating task performance of probabilistic topic models: an empirical study of plsa and lda. Inf Retr 14(2):178–203CrossRefGoogle Scholar
  12. Papoulis A, Pillai SU (2002) Probability, random variables and stochastic processes, 4th edn. McGraw-Hill, New YorkGoogle Scholar
  13. Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476):1566–1581MathSciNetCrossRefGoogle Scholar
  14. Wallach HM (2008) Structured topic models for language. Doctoral dissertation, Univ. of CambridgeGoogle Scholar
  15. Wang X, Ma X, Grimson WEL (2009) Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans Pattern Anal Mach Intell 31(3):539–555CrossRefGoogle Scholar
  16. Wang J, Fu W, Lu H, Ma S (2014) Bilayer sparse topic model for scene analysis in imbalanced surveillance videos. IEEE Trans Image Process 23(11):5198–5208MathSciNetCrossRefGoogle Scholar
  17. Xie P, Xing EP (2013) Integrating document clustering and topic modeling. In: Proceedings of the twenty-ninth conference on uncertainty in artificial intelligence, p 694. http://auai.org/uai2013/prints/papers/35.pdf
  18. Zhu J, Xing E (2011) Sparse topical coding. In: Proceedings of the twenty-seventh conference annual conference on uncertainty in artificial intelligence (UAI), pp 831–838. http://bigml.cs.tsinghua.edu.cn/~jun/code/stc/stc.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Parvin Ahmadi
    • 1
  • Iman Gholampour
    • 2
  • Mahmoud Tabandeh
    • 1
  1. 1.Department of Electrical EngineeringSharif University of TechnologyTehranIran
  2. 2.Electronics Research InstituteSharif University of TechnologyTehranIran

Personalised recommendations