Denoising Cluster Analysis

Zhang, Ruqi; Yang, Zhirong; Corander, Jukka

doi:10.1007/978-3-319-26555-1_49

Ruqi Zhang¹⁷,
Zhirong Yang¹⁷ &
Jukka Corander¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9491))

Included in the following conference series:

International Conference on Neural Information Processing

2763 Accesses
3 Altmetric

Abstract

Clustering or cluster analysis is an important and common task in data mining and analysis, with applications in many fields. However, most existing clustering methods are sensitive in the presence of limited amounts of data per cluster in real-world applications. Here we propose a new method called denoising cluster analysis to improve the accuracy. We first construct base clusterings with artificially corrupted data samples and later learn their ensemble based on mutual information. We develop multiplicative updates for learning the aggregated cluster assignment probabilities. Experiments on real-world data sets show that our method unequivocally improves cluster purity over several other clustering approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://archive.ics.uci.edu/ml/.

References

Arora, R., Gupta, M., Kapila, A., Fazel, M.: Clustering by left-stochastic matrix factorization. In: ICML (2011)
Google Scholar
Bishop, C.: Training with noise is equivalent to Tikhonov regularization. Neural Comput. 7(1), 108–116 (1995)
Article Google Scholar
Dikmen, O., Yang, Z., Oja, E.: Learning the information divergence. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1442–1454 (2015)
Article Google Scholar
Herbrich, R., Graepel, T.: Invariant pattern recognition by semidefinite programming machines. In: NIPS (2004)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)
Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Romano, S., Bailey, J., Nguyen, V., Verspoor, K.: Standardized mutual information for clustering comparisons: one step further in adjustment for chance. In: ICML (2014)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet MATH Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
MathSciNet MATH Google Scholar
Yang, Z., Hao, T., Dikmen, O., Chen, X., Oja, E.: Clustering by nonnegative matrix factorization using graph random walk. In: NIPS (2012)
Google Scholar
Yang, Z., Laaksonen, J.: Multiplicative updates for non-negative projections. Neurocomputing 71(1–3), 363–373 (2007)
Article Google Scholar
Yang, Z., Oja, E.: Linear and nonlinear projective nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(5), 734–749 (2010)
Article Google Scholar
Yang, Z., Oja, E.: Unified development of multiplicative algorithms for linear and quadratic nonnegative matrix factorization. IEEE Trans. Neural Netw. 22(12), 1878–1891 (2011)
Article Google Scholar
Yang, Z., Oja, E.: Clustering by low-rank doubly stochastic matrix decomposition. In: ICML (2012)
Google Scholar
Yang, Z., Oja, E.: Quadratic nonnegative matrix factorization. Pattern Recogn. 45(4), 1500–1510 (2012)
Article MATH Google Scholar
Yang, Z., Peltonen, J., Kaski, S.: Optimization equivalence of divergences improves neighbor embedding. In: ICML (2014)
Google Scholar
Yang, Z., Zhang, H., Yuan, Z., Oja, E.: Kullback-leibler divergence for nonnegative matrix factorization. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 250–257. Springer, Heidelberg (2011)
Chapter Google Scholar
Zhu, Z., Yang, Z., Oja, E.: Multiplicative updates for learning with stochastic matrices. In: Kämäräinen, J.-K., Koskela, M. (eds.) SCIA 2013. LNCS, vol. 7944, pp. 143–152. Springer, Heidelberg (2013)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
Ruqi Zhang, Zhirong Yang & Jukka Corander

Authors

Ruqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhirong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jukka Corander
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhirong Yang .

Editor information

Editors and Affiliations

University of Istanbul, Istanbul, Turkey
Sabri Arik
University at Qatar, Doha, Qatar
Tingwen Huang
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Weng Kin Lai
University of Science Technology, Wuhan, China
Qingshan Liu

Appendix: Proof of Theorem 1

Proof

We use W for current estimate, $\widetilde{W}$ for variable, and $W^\text {new}$ for the new estimate, respectively. The objective function $\widetilde{\mathcal {J}}$ fulfills the theorem conditions in [16]. Therefore, we can construct the majorization function

$$\begin{aligned} \nonumber G(\widetilde{W},W)=&\sum _{ik}\left[ \mathcal {\nabla }^+_{ik}\widetilde{W}_{ik}-\mathcal {\nabla }^-_{ik}W_{ik}\log \widetilde{W}_{ik}+\frac{B_{ik}}{A_{ik}}W_{ik}-\frac{W_{ik}}{A_{ik}}\log \widetilde{W}_{ik}\right] +\text {constant}\end{aligned}$$

such that $G(\widetilde{W},W)\ge \widetilde{\mathcal {J}}(\widetilde{W},\lambda )$ and $G(W,W)=\widetilde{\mathcal {J}}(W,\lambda )$. Let $W^\text {new}$ be the minimum of $G(\widetilde{W},W)$, which is implemented by zeroing $\partial G/\partial \widetilde{W}$ and yields Eq. 12. Therefore $\widetilde{\mathcal {J}}(W^\text {new},\lambda ) \le G(W^\text {new},W) \le G(W,W) =\widetilde{\mathcal {J}}(W,\lambda )$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, R., Yang, Z., Corander, J. (2015). Denoising Cluster Analysis. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9491. Springer, Cham. https://doi.org/10.1007/978-3-319-26555-1_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-26555-1_49
Published: 09 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26554-4
Online ISBN: 978-3-319-26555-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Denoising Cluster Analysis

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation