A Clustering Framework Based on Adaptive Space Mapping and Rescaling

Zeng, Yiling; Xu, Hongbo; Guo, Jiafeng; Wang, Yu; Bai, Shuo

doi:10.1007/978-3-642-04769-5_32

A Clustering Framework Based on Adaptive Space Mapping and Rescaling

Yiling Zeng²³,
Hongbo Xu²³,
Jiafeng Guo²³,
Yu Wang²³ &
…
Shuo Bai^23,24

Conference paper

831 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Abstract

Traditional clustering algorithms often suffer from model misfit problem when the distribution of real data does not fit the model assumptions. To address this problem, we propose a novel clustering framework based on adaptive space mapping and rescaling, referred as M-R framework. The basic idea of our approach is to adjust the data representation to make the data distribution fit the model assumptions better. Specifically, documents are first mapped into a low dimensional space with respect to the cluster centers so that the distribution statistics of each cluster could be analyzed on the corresponding dimension. With the statistics obtained in hand, a rescaling operation is then applied to regularize the data distribution based on the model assumptions. These two steps are conducted iteratively along with the clustering algorithm to constantly improve the clustering performance. In our work, we apply the M-R framework on the most widely used clustering algorithm, i.e. k-means, as an example. Experiments on well known datasets show that our M-R framework can obtain comparable performance with state-of-the-art methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dumais, S.T.: LSI Meets TREC: A Status Report. In: Harman, D. (ed.) The First Text REtrieval Conference (TREC1), pp. 137–152. National Institute of Standards and Technology Special Publication 500-207 (1993)
Google Scholar
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Buttersworth, London (1989)
MATH Google Scholar
Liu, X., Croft, W.B.: Cluster-Based Retrieval Using Language Models. In: Proc. of SIGIR 2004, pp. 186–193 (2004)
Google Scholar
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: SIGIR 1992, pp. 318–329 (1992)
Google Scholar
Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and Intuitive Clustering of Web Documents. In: KDD 1997, pp. 287–290 (1997)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishes, San Francisco (2006)
MATH Google Scholar
Wu, H., Phang, T.H., Liu, B., Li, X.: A Refinement Approach to Handling Model Misfit in Text Categorization. In: SIGKDD, pp. 207–216 (2002)
Google Scholar
Tan, S., Cheng, X., Ghanem, M.M., Wang, B., Xu, H.: A Novel Refinement Approach for Text Categorization. In: Proc. of the 14th ACM CIKM 2005, pp. 469–476 (2005)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Ng, A., Jordan, M., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)
Google Scholar
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Article Google Scholar
Chan, P.K., Schlag, D.F., Zien, J.Y.: Spectral K-way Ratio-Cut Partitioning and Clustering. IEEE Trans. Computer-Aided Design 13, 1088–1096 (1994)
Article Google Scholar
Ding, C., He, X., Zha, H., Gu, M., Simon, H.D.: A Min-Max Cut Algorithm for Graph Partitioning and Data Clustering. In: Proc. of ICDM 2001, pp. 107–114 (2001)
Google Scholar
Liu, X., Gong, Y.: Document Clustering with Cluster Refinement and Model Selection Capabilities. In: Proc. of SIGIR 2002, pp. 191–198 (2002)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience Publishes, Hoboken (2000)
MATH Google Scholar
Dhillon, I.: Co-clustering Documents and Words using Bipartite Spectral Graph Partitioning (Technical Report). Department of Computer Science, University of Texas at Austin (2001)
Google Scholar
Lewis, D.D., Yang, Y., Rose, T., Li, F.: RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research (2004)
Google Scholar
20 Newsgroups Data Set, http://www.ai.mit.edu/people/jrennie/20Newsgroups/

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China
Yiling Zeng, Hongbo Xu, Jiafeng Guo, Yu Wang & Shuo Bai
Shanghai Stock Exchange, Shanghai, 200120, China
Shuo Bai

Authors

Yiling Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Hongbo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Bai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
School of Computing, The Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Dawei Song
Microsoft Reseach Asia, 5F Beijing Sigma Center, 49 Zhichun Road, Haidian District, 100190, Beijing, P.R. China
Chin-Yew Lin
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430, Tokyo, Japan
Akiko Aizawa
School of Literature, Shirayuri College, 1-25 Midorigaoka, Chofu-shi, 182-8525, Tokyo, Japan
Kazuko Kuriyama
Graduate School of Information Science and Technology, Hokkaido University, North 14 West 9, Kita-ku. Sapporo-shi, 060-0814, Hokkaido, Japan
Masaharu Yoshioka
Microsoft Research Asia, 5F Beijing Sigma Center, 49 Zhichun Road, Haidian District, 100190, Beijing, P.R. China
Tetsuya Sakai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, Y., Xu, H., Guo, J., Wang, Y., Bai, S. (2009). A Clustering Framework Based on Adaptive Space Mapping and Rescaling. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-04769-5_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04768-8
Online ISBN: 978-3-642-04769-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics