A Novel Locally Multiple Kernel k-means Based on Similarity

Fan, Shuyan; Ding, Shifei; Du, Mingjing; Xu, Xiao

doi:10.1007/978-3-319-48390-0_3

Shuyan Fan^18,19,
Shifei Ding^18,19,
Mingjing Du^18,19 &
…
Xiao Xu^18,19

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 486))

Included in the following conference series:

International Conference on Intelligent Information Processing

916 Accesses

Abstract

Most of multiple kernel clustering algorithms aim to find the optimal kernel combination and have to calculate kernel weights iteratively. For the kernel methods, the scale parameter of Gaussian kernel is usually searched in a number of candidate values of the parameter and the best is selected. In this paper, a novel multiple kernel k-means algorithm is proposed based on similarity measure. Our similarity measure meets the requirements of the clustering hypothesis, which can describe the relations between data points more reasonably by taking local and global structures into consideration. We assign to each data point a local scale parameter and combine the parameter with density factor to construct kernel matrix. According to the local distribution, the local scale parameter of Gaussian kernel is generated adaptively. The density factor is inspired by density-based algorithm. However, different from density-based algorithm, we first find neighbor data points using k nearest neighbor method and then find density-connected sets by union-find set method. Experiments show that the proposed algorithm can effectively deal with the clustering problem of datasets with complex structure or multiple scales.

You have full access to this open access chapter, Download conference paper PDF

Locally adaptive multiple kernel k-means algorithm based on shared nearest neighbors

Article 01 June 2017

Shifei Ding, Xiao Xu, … Yu Xue

An Improved Kernel K-means Clustering Algorithm

Graph Based Kernel k-Means Using Representative Data Points as Initial Centers

Keywords

1 Introduction

Unsupervised data analysis using clustering algorithms provides a useful tool. The aim of clustering analysis is to discover the hidden data structure of a dataset according to a certain similarity criterion such that all the data points are assigned into a number of distinctive clusters where points in the same cluster are similar to each other, while points from different clusters are dissimilar [1]. Clustering has been applied in a variety of scientific fields such as web search, social network analysis, image retrieval, medical imaging, gene expression analysis, recommendation systems and market analysis and so on.

Kernel clustering method can handle data sets that are not linearly separable in input space [2], thus, usually perform better than the Euclidean distance based clustering algorithms [3]. Due to simplicity and efficiency, kernel k-means has become a hot research topic. The kernel function is used to map the input data into a high-dimensional feature space, which makes clusters that are not linearly separable in input space become separable. A single kernel is sometimes insufficient to represent the data. Recently, multiple kernel clustering has gained increasing attention in machine learning. Huang et al. propose a multiple kernel fuzzy c-means [4]. By incorporating multiple kernels and automatically adjusting the kernel weights, ineffective kernels and irrelevant features are not crucial for kernel clustering. Zhou et al. use the maximum entropy method to regularize the kernel weights and decide the important kernels [5]. Gao applies multiple kernel fuzzy c-means to optimize clustering and presented mono-nuclear kernel function which is a set of Gaussian kernel function combination assigned different weights resolution [6]. Lu et al. applies multiple kernel k-means clustering algorithm into SAR image change detection [7]. They fuse various features through a weighted summation kernel by automatically and optimally computing the kernel weights, which leads to computational burden. Zhang et al. propose a locally multiple kernel clustering which assigns to each cluster a weight vector for feature selection and combines it with a Gaussian kernel to form a unique kernel for the corresponding cluster [8]. They search the scale parameter of Gaussian kernel by running their clustering algorithm repeatedly for a number of values of the parameter and selecting the best one. Tzortzis et al. overcome the kernel selection problem of maximum margin clustering by employing multiple kernel learning to jointly learn the kernel and a partitioning of the instances [9]. Yu et al. propose an optimized kernel k-means clustering which optimizes the cluster membership and kernel coefficients based on the same Rayleigh quotient objective [10]. Lu et al. improve kernel evaluation measure based on centered kernel alignment and their algorithm needs to be given the initial kernel fusion coefficients [11]. Although the above methods extend from different clustering algorithms, they all employ the alternating optimization technique to solve their extended problems. Specifically, cluster labels and kernel combination coefficients are alternatively optimized until convergence.

Our algorithm is proposed from perspective of similarity measure by calculating a local scale parameter for each data point, which can reflect local distribution of datasets. In addition, another parameter named density factor is introduced in Gaussian kernel function which can describe global structure of data set and avoid kernel k-means running into local optimum. Based on improved similarity measure, our algorithm has several advantages. First, as a kernel method, it has unusual ability in dealing with datasets with multiple scales. Second, it fuses automatically and optimally local and global structures of datasets. Furthermore, our algorithm does not need a good deal of iterations and calculate kernel weights until convergence.

The remainder of this paper is organized as follows: in Sect. 2 we introduce the related works. In Sect. 3 we give a detailed description of our algorithm. Section 4 presents the experimental results and evaluation of our algorithm. Finally, we conclude the paper in Sect. 5.

2 Related Work

2.1 Kernel K-Means

Girolami first proposed the kernel k-means clustering method. It first maps the data points from the input space to higher dimensional feature space through a nonlinear transformation $ \phi ( \cdot ) $ and then minimizes the clustering error in that feature space [12].

Let $ {\text{D}} = \{ x_{1} ,x_{2} , \ldots ,x_{n} \} $ be the data set of size n, k be the number of clusters required. The final partition of the entire data set is $ \Pi _{D} = \{ C_{1} ,C_{2} , \ldots ,C_{k} \} $. The objective function is to minimize the criterion function:

$$ {\text{J}} = \sum\nolimits_{j = 1}^{k} {\sum\nolimits_{{x_{i} \in C_{j} }} {\parallel \phi \left( {x_{i} } \right) - m_{j} \parallel^{2} } } $$

(1)

Where $ m_{j} $ is the mean of cluster $ C_{j} $. That is

$$ m_{j} = \sum\nolimits_{{x_{i} \in C_{j} }} {\frac{{\phi \left( {x_{i} } \right)}}{{|C_{j} |}}} $$

(2)

in the induced space.

$$ \begin{array}{*{20}c} {\left\| {\phi \left( {x_{i} } \right) - m_{j} \parallel^{2} = \parallel \phi \left( {x_{i} } \right) - \sum\nolimits_{{x_{i} \in C_{j} }} {\frac{{\phi \left( {x_{i} } \right)}}{{|C_{j} |}}} } \right\|^{2} } \\ { = \phi \left( {x_{i} } \right) \cdot \phi \left( {x_{i} } \right) + \frac{2}{{C_{j} }}\sum\nolimits_{{x_{i} \in C_{j} }} {\phi \left( {x_{l} } \right) \cdot \phi \left( {x_{i} } \right) + \frac{1}{{\left| {C_{j} } \right|^{2} }}\sum\nolimits_{{x_{l} \in C_{j} }} {\sum\nolimits_{{x_{s} \in C_{j} }} {\phi \left( {x_{l} } \right) \cdot \phi \left( {x_{s} } \right)} } } } \\ { = \kappa \left( {x_{i} ,x_{i} } \right) + \frac{2}{{C_{j} }}\sum\nolimits_{{x_{i} \in C_{j} }} {\kappa \left( {x_{i} ,x_{l} } \right) + \frac{1}{{|C_{j} |^{2} }}\mathop \sum \limits_{{x_{l} \in C_{j} }} \mathop \sum \limits_{{x_{s} \in C_{j} }} \kappa \left( {x_{l} ,x_{s} } \right)} } \\ \end{array} $$

(3)

Further, $ \parallel \phi \left( {x_{i} } \right) - m_{j} \parallel^{2} $ can be calculated without knowing the transformation $ \phi ( \cdot ) $ explicitly as formula (3).

Thus, only inner products are used in the computation of the Euclidean distance between a point and a centroid. If given a kernel matrix $ \kappa $, where $ \kappa_{ij} = \phi \left( {x_{i} } \right) \cdot \phi \left( {x_{j} } \right) $, A kernel function is commonly used to map the original points to inner products. Given a data set, the kernel k-means clustering has the following steps:

2.2 Multiple Kernel k-means

Weighted summation kernel is a common tool for multiple kernel learning. Huang et al. propose multiple kernel k-means algorithm by incorporating weighted summation kernel into the kernel k-means, which results in the multiple kernel k-means algorithm [4]. The MKKM algorithm is solved by updating iteratively the kernel weights. Its objective function is to minimize

$$ J_{M} = \sum\nolimits_{j = 1}^{k} {\sum\nolimits_{{x_{i} \in C_{j} }} {\sum\nolimits_{m}^{M} {w_{k}^{2} \parallel \phi_{k} \left( {x_{i} } \right) - v_{c} \parallel^{2} } } } $$

(4)

$$ w_{m} = \frac{{\frac{1}{{\beta_{m} }}}}{{(\frac{1}{{\beta_{1} }} + \frac{1}{{\beta_{2} }} + \ldots + \frac{1}{{\beta_{M} }})}},\beta_{m} = \sum\nolimits_{j = 1}^{k} {\sum\nolimits_{{x_{i} \in C_{j} }} {\parallel \phi \left( {x_{i} } \right) - m_{j} \parallel^{2} } } $$

Where $ \{ \phi_{k} \}_{m = 1}^{M} $ are the mapping functions corresponding to multiple kernel functions. $ w_{m} (m = 1,2,..,M) $ are kernel weights.

3 Locally Multiple Kernel k-means

3.1 Similarity Measure

Selecting a suitable method of similarity measure in cluster analysis is crucial, and it is used as the basis for division [13]. To handle the dataset with multiple scales, we calculate a local scaling parameter $ \sigma_{i} $ for each data point $ s_{i} $. The selection of the local scale $ \sigma_{i} $ can be done by studying the local statistics of the neighborhood of point $ s_{i} $. $ s_{K} $ is the K’th neighbor of point $ s_{i} $.

$$ \sigma_{i} = d(s_{i} ,s_{K} ) $$

According to the conception of clustering hypothesis, the data point of intra-class should locate in high-density region, and the data point of inter-class should be separated by low-density region [14]. In order to better describe global structure of data set and avoid kernel k-means running into local optimum, density factor ρ is introduced to discover clusters of arbitrary shape. Combined ρ with formula (6), we propose a new similarity measure as follows:

$$ S_{ij} = { \exp }(\frac{{ - d^{2} (s_{i} ,s_{j} )}}{{\sigma_{i} \sigma_{j} \rho_{ij} }}) $$

(5)

Density factor is obtained by a simple and powerful way. First, find k neighbor points for each point by k nearest neighbor algorithm and then use k nearest neighbor graph to depict the local neighborhood relation between data points. The neighborhood of a point p is denoted by $ N(p) $. For a sample point q, if $ q \in N(p) $, we think q is directly density-reachable from point p. Given a sample set $ {\text{D}} = \{ p_{1} ,p_{2} , \ldots ,p_{n} \} $, supposed that $ p_{i} $ is directly density-reachable from point $ p_{i + 1} $, $ p_{1} $ is density-reachable from $ p_{n} $. If there is a point o such that both, p and q are density-reachable from o, we consider that the point p is density-connected to a point q. Finally, according to all directly density-reachable data points, find density-connected sets by union-find set method which is a very sophisticated and practical data structures and mainly used for processing the merger of the problem of some disjoint sets [15]. Let $ \rho_{ij} $ denote density factor between the point $ s_{i} $ and $ s_{j} $, as follows:

$$ \rho_{ij} = \left\{ {\begin{array}{*{20}l} {1,\,if\,s_{i} ,s_{j} \;are\;in\;the\;same\;density - connected \, set} \hfill \\ {0,\,otherwise} \hfill \\ \end{array} } \right. $$

(6)

3.2 Algorithm

From the perspective of similarity measure, we propose a novel locally multiple kernel k-means algorithm (LMKKM). Its basic idea is: firstly, calculate the local scale parameter σ and density factor ρ; subsequently, construct kernel matrix based on our proposed similarity measure; finally, according to the kernel matrix, cluster dataset by kernel k-means. The detail steps of our algorithm are as follows.

Suppose n is the total number of points on the data set. Our local multiple kernel k-means contains three main components: calculating parameters, constructing similarity matrix, and clustering. In the phase of calculating parameters, the complexity of k nearest neighbor algorithm and union-find set method both are $ O(n) $. The complexity in calculating the similarity matrix is $ O(n^{2} ) $. At the last clustering phase, the complexity of k-means is $ O(n) $. Our algorithm does not increase the complexity of kernel k-means, but it improves performance.

4 Experiments

4.1 Artificial Data Clustering

In order to verify the effectiveness of our improved algorithm,we choose three artificial data sets, “smile face”, “four lines” and “blobs and circle” to perform an experiment and compare with kernel k-means (KKM) algorithm.

Figure 1 shows KKM algorithm’s clustering results on artificial data sets. The scale parameter of Gaussian kernel function is set to be 1 experientially. KKM algorithm measures the similarity between points based on Euclidean distance which can not reflect the intrinsic structure of dataset. Thus, KKM can only gather the similar points in local region into a cluster, but does not satisfy the global coherence hypothesis of the clustering or recognize complex manifold structure of dataset.

Figure 2 shows LMKKM algorithm’s clustering results on artificial datasets. It calculates the kernel matrix by formula (6). After the similarity measure involves density factor, it meets both the local coherence hypothesis and global coherence hypothesis of the clustering. Through the approach, the intra-class data points become more compact and the inner-class data points are more discrete.

4.2 Clustering Results

In this subsection, our method is compared with three baseline methods including kernel k-means (KKM), Self-Tuning Spectral Clustering (SSC) [16], locally adaptive multiple kernel clustering (LAMKC) [8]. SSC is a locally adaptive spectral clustering algorithm. LAMKC is a newly proposed multiple kernel clustering algorithm extending form kernel k-means. We carry out experiments on seven UCI datasets. These datasets are often used to test performance of machine learning algorithm. The characteristics of these data sets are shown in Table 1.

Table 1. Data characteristics of real data sets

Full size table

We use the accuracy (ACC) to evaluate the clustering performance. Considering the random initialization of clustering centers of kernel k-means, clustering results will be fluctuated. Therefore the clustering experiments are repeated 20 times. Results in first row are the means of the 20 trials, and results in second row (in parentheses) are corresponding standard deviations. The neighborhood size of K is set to 7. For all experiments, the cluster number is set to be the cluster number of each dataset. For LAMKC, the stop condition of the gradient descent method is set to be 0.0001 [8]. In order to compare experimental results fairly, we do not use the Kaufman Approach to select a set of initial centroids in LAMKC. All experiments are conducted on Intel Pentium G2030 CPU with 3.00 GHz processor and 4 G RAM running 64bit-Win7. Clustering results are shown in Table 2.

Table 2. Clustering results on real-world data sets

Full size table

The results on the seven UCI datasets are shown in Table 2. We use the boldface to mark the best result for each dataset. For the clustering performance measured by ACC, the experimental results are encouraging and our algorithm obtained five best results for the seven datasets. Comparable to the kernel k-means and self-tuning spectral clustering, performance of our algorithm is significantly better than that of them on seven datasets. For datasets WDBC and Dermatology, our algorithm is roughly comparable to LAMKC. In most cases, our algorithm can capture structures of datasets and calculate appropriate parameters adaptively while LAMKC searched the parameter of Gaussian kernel in a range. These indicate that our improved similarity measure has the capability to capture local and global structures of datasets with complexity so as that our algorithm can complete the tasks of clustering efficiently.

5 Conclusions

Conventional multiple kernel clustering algorithms aim to construct a global combination of multiple kernels in input space and have to kernel combination coefficients iteratively. In this paper, we proposed a local multiple kernel clustering method based on similarity measure. Our method is dedicated to the datasets with varying local distributions. Instead of using a uniform combination of multiple kernels over the whole input space, our method associates to each data point a localized kernel and combined with density factor simultaneously. Taking local and global structures into consideration, our similarity measure can depict distributed situation of dataset. Results of clustering experiments on artificial datasets and UCI datasets demonstrate that our locally multiple kernel clustering method can deal with datasets with multiple scales and not fall into local optimal.

There are three points remaining for further research. First, the time complexity of our algorithm is the same as kernel k-means’s, so it will spend much time when processing big data set. Further study is necessary on how to reduce the time complexity of the algorithm and improve the efficiency of clustering. Second, kernel k-means is sensitive to the initial cluster centers. We can improve kernel k-means from this perspective. Third, following the idea of this paper, we can construct better multiple kernel k-means methods based on the other kernel evaluation measures.

References

Ding, S., Zhang, J., Jia, H., et al.: An adaptive density data stream clustering algorithm. Cogn. Comput.n 8(1), 30–38 (2016)
Article Google Scholar
Chitta, R.: Kernel-based clustering of big data. Dissertations & Theses – Gradworks (2015)
Google Scholar
Chitta, R., Jin, R., Havens, T.C., Jain, A.K.: Scalable Kernel Clustering: Approximate Kernel k-means. Eprint Arxiv (2014)
Google Scholar
Huang, H.C., Chuang, Y.Y., Chen, C.S.: Multiple kernel fuzzy clustering. IEEE Trans. Fuzzy Syst. 20(1), 120–134 (2012)
Article Google Scholar
Zhou, J., Chen, C.L., Chen, L., Maximum-entropy-based multiple kernel fuzzy c-means clustering algorithm. In: IEEE International Conference on Systems, Man and Cybernetics IEEE (2014)
Google Scholar
Gao, S.: The application of clustering optimization in data mining based on multiple kernel function FCM. J. Comput. Inf. Syst. 11(11), 3977–3986 (2015)
Google Scholar
Jia, L., Li, M., Zhang, P., et al.: SAR image change detection based on multiple kernel k-means clustering with local-neighborhood information. IEEE Geosci. Remote Sens. Lett. 13(6), 1–5 (2016)
Article Google Scholar
Zhang, L., Hu, X.: Locally adaptive multiple kernel clustering. Neurocomputing 137(11), 192–197 (2014)
Article Google Scholar
Tzortzis, G., Likas, A.: Ratio-based multiple kernel clustering. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part III. LNCS, vol. 8726, pp. 241–257. Springer, Heidelberg (2014)
Google Scholar
Yu, S., Tranchevent, L., Moor, B.D., et al.: Optimized data fusion for kernel k-means clustering. IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1031–1039 (2011)
Google Scholar
Lu, Y., Wang, L., Lu, J., et al.: Multiple kernel clustering based on centered kernel alignment 47(11), 3656–3664 (2014)
Google Scholar
Girolami, M.: Mercer kernel-based clustering in feature space. IEEE Trans. Neural Netw. 13(3), 780–784 (2002)
Article Google Scholar
Yan, J., Cheng, D., Zong, M., Deng, Z.: Improved spectral clustering algorithm based on similarity measure. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS, vol. 8933, pp. 641–654. Springer, Heidelberg (2014)
Google Scholar
Jia, H., Ding, S., Meng, L., et al.: A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction. Neural Comput. Appl. 25(7–8), 1557–1567 (2014)
Article Google Scholar
Kaplan, H., Shafrir, N., Tarjan, R.E.: Union-find with deletions. In: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 19–28 (2002)
Google Scholar
Manor, M.L.: Self-tuning spectral clustering. Adv. Neural Inf. Process. Syst. 17, 1601–1608 (2004)
Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61379101, 61672522), and the National Key Basic Research Program of China (No. 2013CB329502).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Shuyan Fan, Shifei Ding, Mingjing Du & Xiao Xu
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Shuyan Fan, Shifei Ding, Mingjing Du & Xiao Xu

Authors

Shuyan Fan
View author publications
You can also search for this author in PubMed Google Scholar
Shifei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Mingjing Du
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifei Ding .

Editor information

Editors and Affiliations

Chinese Academy of Sciences , Beijing, China
Zhongzhi Shi
University of Salford , Salford, United Kingdom
Sunil Vadera
Deakin University , Burwood, Victoria, Australia
Gang Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, S., Ding, S., Du, M., Xu, X. (2016). A Novel Locally Multiple Kernel k-means Based on Similarity. In: Shi, Z., Vadera, S., Li, G. (eds) Intelligent Information Processing VIII. IIP 2016. IFIP Advances in Information and Communication Technology, vol 486. Springer, Cham. https://doi.org/10.1007/978-3-319-48390-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-48390-0_3
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48389-4
Online ISBN: 978-3-319-48390-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics