Abstract
Median clustering extends popular neural data analysis methods such as the self-organizing map or neural gas to general data structures given by a dissimilarity matrix only. This offers flexible and robust global data inspection methods which are particularly suited for a variety of data as occurs in biomedical domains. In this chapter, we give an overview about median clustering and its properties and extensions, with a particular focus on efficient implementations adapted to large scale data analysis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Al-Harbi, S., Rayward-Smith, V.: The use of a supervised k-means algorithm on real-valued data with applications in health. In: Chung, P.W.H., Hinde, C.J., Ali, M. (eds.) IEA/AIE 2003. LNCS, vol. 2718, pp. 575–581. Springer, Heidelberg (2003)
Alex, N., Hammer, B.: Parallelizing single pass patch clustering. In: Verleysen, M. (ed.) ESANN 2008, pp. 227–232 (2008)
Alex, N., Hammer, B., Klawonn, F.: Single pass clustering for large data sets. In: Proceedings of 6th International Workshop on Self-Organizing Maps (WSOM 2007), Bielefeld, Germany, September 3-6 (2007)
Ambroise, C., Govaert, G.: Analyzing dissimilarity matrices via Kohonen maps. In: Proceedings of 5th Conference of the International Federation of Classification Societies (IFCS 1996), Kobe (Japan), March 1996, vol. 2, pp. 96–99 (1996)
Anderson, E.: The irises of the gaspe peninsula. Bulletin of the American Iris Society 59, 25 (1935)
Arora, S., Raghavan, P., Rao, S.: Approximation schemes for euclidean k-medians and related problems. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pp. 106–113 (1998)
Barreto, G.A.: Time series prediction with the self-organizing map: A review. In: Hammer, B., Hitzler, P. (eds.) Perspectives on Neural-Symbolic Integration. Springer, Heidelberg (2007)
Boulet, R., Jouve, B., Rossi, F., Villa, N.: Batch kernel som and related laplacian methods for social network analysis. In: Neurocomputing (2008) (to be published)
Celeux, G., Diday, E., Govaert, G., Lechevallier, Y., Ralambondrainy, H.: Classification Automatique des Données. Bordas, Paris (1989)
Charikar, M., Guha, S., Tardos, A., Shmoys, D.B.: A constant-factor approcimation algorithm for the k-median problem. Journal of Computer and System Sciences 65, 129 (2002)
Conan-Guez, B., Rossi, F.: Speeding up the dissimilarity self-organizing maps by branch and bound. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 203–210. Springer, Heidelberg (2007)
Conan-Guez, B., Rossi, F., El Golli, A.: Fast algorithm and implementation of dissimilarity self-organizing maps. Neural Networks 19(6-7), 855–863 (2006)
Cottrell, M., Hammer, B., Hasenfuss, A., Villmann, T.: Batch and median neural gas. Neural Networks 19, 762–771 (2006)
Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. SIGKDD Explorations 2(1), 51–57 (2000)
Fisher, R.A.: The use of multiple measurements in axonomic problems. Annals of Eugenics 7, 179–188 (1936)
Fort, J.-C., Letrémy, P., Cottrell, M.: Advantages and drawbacks of the batch kohonen algorithm. In: Verleysen, M. (ed.) ESANN 2002, pp. 223–230. D Facto (2002)
Frey, B., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–977 (2007)
Frey, B., Dueck, D.: Response to clustering by passing messages between data points. Science 319, 726d (2008)
Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on pairwise proximity data. In: NIPS, vol. 11, pp. 438–444. MIT Press, Cambridge (1999)
Graepel, T., Obermayer, K.: A stochastic self-organizing map for proximity data. Neural Computation 11, 139–155 (1999)
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: IEEE Symposium on Foundations of Computer Science, pp. 359–366 (2000)
Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large datasets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)
Haasdonk, B., Bahlmann, C.: Learning with distance substitution kernels. In: Pattern Recognition - Proc. of the 26th DAGM Symposium (2004)
Hammer, B., Hasenfuss, A.: Relational neural gas. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS, vol. 4667, pp. 190–204. Springer, Heidelberg (2007)
Hammer, B., Jain, B.J.: Neural methods for non-standard data. In: Verleysen, M. (ed.) European Symposium on Artificial Neural Networks 2004, pp. 281–292. D-side publications (2004)
Hammer, B., Micheli, A., Sperduti, A., Strickert, M.: Recursive self-organizing network models. Neural Networks 17(8-9), 1061–1086 (2004)
Hammer, B., Villmann, T.: Classification using non standard metrics. In: Verleysen, M. (ed.) ESANN 2005, pp. 303–316. d-side publishing (2005)
Hansen, P., Mladenovic, M.: Todo. Location Science 5, 207 (1997)
Hasenfuss, A., Hammer, B.: Single pass clustering and classification of large dissimilarity datasets. In: AIPR (2008)
Hathaway, R.J., Bezdek, J.C.: Nerf c-means: Non-euclidean relational fuzzy clustering. Pattern Recognition 27(3), 429–437 (1994)
Hathaway, R.J., Davenport, J.W., Bezdek, J.C.: Relational duals of the c-means algorithms. Pattern Recognition 22, 205–212 (1989)
Heskes, T.: Self-organizing maps, vector quantization, and mixture modeling. IEEE Transactions on Neural Networks 12, 1299–1305 (2001)
Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(1), 1–14 (1997)
Jin, R., Goswami, A., Agrawal, G.: Fast and exact out-of-core and distributed k-means clustering. Knowledge and Information System 1, 17–40 (2006)
Juan, A., Vidal, E.: On the use of normalized edit distances and an efficient k-nn search technique (k-aesa) for fast and accurate string classification. In: ICPR 2000, vol. 2, pp. 680–683 (2000)
Kaski, S., Nikkilä, J., Oja, M., Venna, J., Törönen, P., Castren, E.: Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics 4 (2003)
Kaski, S., Nikkilä, J., Savia, E., Roos, C.: Discriminative clustering of yeast stress response. In: Seiffert, U., Jain, L., Schweizer, P. (eds.) Bioinformatics using Computational Intelligence Paradigms, pp. 75–92. Springer, Heidelberg (2005)
Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the L1-Norm and Related Methods, pp. 405–416. North-Holland, Amsterdam (1987)
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Kohonen, T.: Self-organizing maps of symbol strings. Technical report A42, Laboratory of computer and information science, Helsinki University of technology, Finland (1996)
Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15, 945–952 (2002)
Land, A.H., Doig, A.G.: An automatic method for solving discrete programming problems. Econometrica 28, 497–520 (1960)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 6, 707–710 (1966)
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.: Incremental genetic k-means algorithm and its application in gene expression data analysis. BMC Bioinformatics 5, 172 (2004)
Lundsteen, C., Phillip, J., Granum, E.: Quantitative analysis of 6985 digitized trypsin G-banded human metaphase chromosomes. Clinical Genetics 18, 355–370 (1980)
Martinetz, T., Berkovich, S., Schulten, K.: ‘neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks 4, 558–569 (1993)
Martinetz, T., Schulten, K.: Topology representing networks. Neural Networks 7(507-522) (1994)
Mevissen, H., Vingron, M.: Quantifying the local reliability of a sequence alignment. Protein Engineering 9, 127–132 (1996)
Neuhaus, M., Bunke, H.: Edit distance-based kernel functions for structural pattern classification. Pattern Recognition 39(10), 1852–1863 (2006)
Bradley, P.S., Fayyad, U., Reina, C.: Scaling clustering algorithms to large data sets. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 9–15. AAAI Press, Menlo Park (1998)
Qin, A.K., Suganthan, P.N.: Kernel neural gas algorithms with application to cluster analysis. In: ICPR 2004, vol. 4, pp. 617–620 (2004)
Rossi, F.: Model collisions in the dissimilarity SOM. In: Proceedings of XVth European Symposium on Artificial Neural Networks (ESANN 2007), Bruges (Belgium), pp. 25–30 (April 2007)
Shamir, R., Sharan, R.: Approaches to clustering gene expression data. In: Jiang, T., Smith, T., Xu, Y., Zhang, M.Q. (eds.) Current Topics in Computational Biology. MIT Press, Cambridge (2001)
Villmann, T., Seiffert, U., Schleif, F.-M., Brüß, C., Geweniger, T., Hammer, B.: Fuzzy labeled self-organizing map with label-adjusted prototypes. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS, vol. 4087, pp. 46–56. Springer, Heidelberg (2006)
Wang, W., Yang, J., Muntz, R.: Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd VLDB Conference, pp. 186–195 (1997)
Wolberg, W., Street, W., Heisey, D., Mangasarian, O.: Computer-derived nuclear features distinguish malignant from benign breast cytology. Human Pathology 26, 792–796 (1995)
Yang, Q., Wu, X.: 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5(4), 597–604 (2006)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: Proceedings of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Databas Systems, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hammer, B., Hasenfuss, A., Rossi, F. (2009). Median Topographic Maps for Biomedical Data Sets. In: Biehl, M., Hammer, B., Verleysen, M., Villmann, T. (eds) Similarity-Based Clustering. Lecture Notes in Computer Science(), vol 5400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01805-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-01805-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01804-6
Online ISBN: 978-3-642-01805-3
eBook Packages: Computer ScienceComputer Science (R0)