Abstract
Unsupervised classification is used to identify similar entities in a dataset and is extensively used in many application domains such as spam filtering [5], medical diagnosis [15], demographic research [13], etc. Unsupervised classification using K-Means generally clusters data based on (1) distance-based attributes of the dataset [4, 16, 17, 23] or (2) combinatorial properties of a weighted graph representation of the dataset [8].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arthur D, Vassilvitskii S (2006) How slow is the k-means method? In: SCG ’06: Proceedings of the twenty-second annual symposium on Computational geometry. ACM, New York, pp 144–153, http://doi.acm.org/10.1145/1137856.1137880
Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Baker LD, McCallum AK (1998) Distributional clustering of words for text classification. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 96–103, http://doi.acm.org/10.1145/290941.290970
Ball G, Hall D (1965) Isodata, a novel method of data analysis and pattern classification. Tech report NTIS AD 699616, Stanford Research Institute, Stanford, CA
Bíró I, Szabó J, Benczúr A (2008) Latent dirichlet allocation in web spam filtering. In: AIRWeb ’08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web. ACM, New York, pp 29–32, http://doi.acm.org/10.1145/1451983.1451991
Chatterjee A, Bhowmick S, Raghavan R (2010) Feature subspace transformations for enhancing k-means clustering. In: Proceedings of the 19th ACM international conference on information and knowledge management, ACM
Chatterjee A, Raghavan P, Bhowmick S (2012) Similarity graph neighborhoods for enhanced supervised classification. In: Procedia computer science, Elsiever, pp 577–586, 10.1016/j.procs.2012.04.062
Dhillon I, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957, 10.1109/TPAMI.2007.1115
Ding C, He X (2004) K-means clustering via principal component analysis. ACM, New York, pp 225–232
Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Software Pract Exp 21(11):1129–1164
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28(1)
Inc TM (2007) Matlab and simulink for technical computing. http://www.mathworks.com
Kim K, Ahn H (2008) A recommender system using ga k-means clustering in an online shopping market. Expert Syst Appl 34(2):1200-1209, 10.1016/j.eswa.2006.12.025
Lang K (1995) Newsweeder: Learning to filter netnews. In: Proceedings of the twelfth international conference on machine learning, pp 331–339
Li X (2008) A volume segmentation algorithm for medical image based on k-means clustering. In: IIH-MSP ’08: Proceedings of the 2008 international conference on intelligent information hiding and multimedia signal processing. IEEE Computer Society, Washington, pp 881–884, http://dx.doi.org/10.1109/IIH-MSP.2008.161
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inform Theory 28:129–137
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley symposium on mathematics, statistics and probability. University of California Press, California
Michie D, Spiegelhalter DJ, Taylor C (1994) Machine learning, neural and statistical classification, Prentice hall. Edn. 1.
Neal R (1998) Assessing relevance determination methods using delve. In: Neural networks and machine learning. Springer, Berlin, pp 97–129, http://www.cs.toronto.edu/~delve/
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: Analysis and an algorithm In: T. Dietterich, S. Becker, Z. Ghahramani (eds). In Advances in Neural Information Processing Systems, pp. 849
Salton G (1971) Smart data set. ftp://ftp.cs.cornell.edu/pub/smart
Savaresi SM, Boley DL, Bittanti S, Gazzaniga G (2002) Cluster selection in divisive clustering algorithms. In: SIAM datamining conference, Arlington, VA
Steinhaus H (1956) Sur la division des corp materials en parties. Bull Acad Polon Sci IV (C1. III):801–804
Acknowledgments
This research was supported in part by the National Science Foundation through grants CNS 0720749, OCI 0821527, and CCF 0830679. Additionally, Dr. Sanjukta Bhowmick would like to acknowledge the support of the College of Information Science and Technology at the University of Nebraska at Omaha.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Chatterjee, A., Bhowmick, S., Raghavan, P. (2013). Improving Classifications Through Graph Embeddings. In: Fu, Y., Ma, Y. (eds) Graph Embedding for Pattern Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4457-2_5
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4457-2_5
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4456-5
Online ISBN: 978-1-4614-4457-2
eBook Packages: EngineeringEngineering (R0)