p-Spectral Clustering Based on Neighborhood Attribute Granulation

Ding, Shifei; Jia, Hongjie; Du, Mingjing; Hu, Qiankun

doi:10.1007/978-3-319-48390-0_6

Shifei Ding^18,19,
Hongjie Jia^18,19,
Mingjing Du^18,19 &
…
Qiankun Hu^18,19

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 486))

Included in the following conference series:

International Conference on Intelligent Information Processing

919 Accesses
3 Citations

Abstract

Clustering analysis is an important method for data mining and information statistics. Data clustering is to find the intrinsic links between objects and describe the internal structures of data sets. p-Spectral clustering is based on Cheeger cut criterion. It has good performance on many challenging data sets. But the original p-spectral clustering algorithm is not suitable for high-dimensional data. To solve this problem, this paper improves p-spectral clustering using neighborhood attribute granulation and proposes NAG-pSC algorithm. Neighborhood rough sets can directly process the continuous data. We introduce information entropy into the neighborhood rough sets to weaken the negative impact of noise data and redundant attributes on clustering. In this way, the data points within the same cluster are more compact, while the data points between different clusters are more separate. The effectiveness of the proposed NAG-pSC algorithm is tested on several benchmark data sets. Experiments show that the neighborhood attribute granulation will highlight the differences between data points while maintaining their characteristics in the clustering. With the help of neighborhood attribute granulation, NAG-pSC is able to recognize more complex data structures and has strong robustness to the noise or irrelevant features in high-dimensional data.

You have full access to this open access chapter, Download conference paper PDF

A spectral clustering algorithm based on attribute fluctuation and density peaks clustering algorithm

Article 19 August 2022

Intuitionistic fuzzy c-means clustering algorithm based on a novel weighted proximity measure and genetic algorithm

Article 09 October 2020

Basic Consideration of Co-Clustering Based on Rough Set Theory

Keywords

1 Introduction

Spectral clustering treats clustering problem as a graph partitioning problem. It can solve the graph cut objective function using the eigenvectors of graph Laplacian matrix [1]. Compared with the conventional clustering algorithms, spectral clustering is able to recognize more complex data structures, especially suitable for non-convex data sets. Recently, an improved version of normalized cut named Cheeger cut has aroused much attention [2]. Research shows that Cheeger cut is able to produce more balanced clusters through graph p-Laplacian matrix [3]. p-Laplacian matrix is a nonlinear generalization form of graph Laplacian.

p-spectral clustering is based on Cheeger cut to group data points. As it has solid theoretical foundation and good clustering results, the research in this area is very active at present. Dhanjal et al. present an incremental spectral clustering which updates the eigenvectors of the Laplacian in a computationally efficient way [4]. Gao et al. construct the sparse affinity graph on a small representative dataset and use local interpolation to improve the extension of the clustering results [5]. Semertzidis et al. inject the pairwise constraints to a small affinity sub-matrix and use a sparse coding strategy of a landmark spectral clustering to preserve low complexity [6].

Nowadays, science and technology is growing by leaps and bounds and massive data result in “data explosion”. These data are often accompanied by high dimensions. When dealing with high-dimensional data, some clustering algorithms that perform well in low-dimensional data space are often unable to get good clustering results, and even invalid [7]. Attribute reduction is an effective way to decrease the size of data, and it is often used as a preprocessing step for data mining. The essence of attribute reduction is to remove irrelevant or unnecessary attributes while maintaining the classification ability of knowledge base. Efficient attribute reduction not only can improve the knowledge clarity in intelligent information systems, but also reduce the cost of information systems to some extent. In order to effectively deal with high-dimensional data, we design a novel attribute reduction method based on neighborhood granulation and combine it with p-spectral clustering. The proposed algorithm inherits the advantages of neighborhood rough set and graph p-Laplacian. Its effectiveness is demonstrated by comprehensive experiments on benchmark data sets.

This paper is organized as follows: Sect. 2 introduces p-spectral clustering; Sect. 3 uses information entropy to improve the attribute reduction based on neighborhood rough sets; Sect. 4 improves p-spectral clustering with the neighborhood attribute granulation; Sect. 5 verifies the effectiveness of the proposed algorithm using benchmark data sets; finally, we summarize the main contribution of this paper.

2 p-Spectral Clustering

The idea of spectral clustering comes from spectral graph partition theory. Given a data set, we can construct an undirected weighted graph G = (V,E), where V is the set of vertices represented by data points, E is the set of edges weighted by the similarities between the edge’s two vertices. Suppose A is a subset of V, the complement of A is written as $ \bar{A} = V\backslash A $. The cut of A and $ \bar{A} $ is defined as:

$$ cut(A,\bar{A}) = \sum\limits_{{i \in A,\,j \in \bar{A}}} {w_{ij} } $$

(1)

where w _ij is the similarity between vertex i and vertex j.

In order to get more balanced clusters, Cheeger et al. propose Cheeger cut criterion, denoted as Ccut [8]:

$$ Ccut(A,\bar{A}) = \frac{{cut(A,\bar{A})}}{{\hbox{min} \{ \left| A \right|,\left| {\bar{A}} \right|\} }} $$

(2)

where $ \left| A \right| $ is the number of data points in set A. Cheeger cut is to minimize formula (2) to get a graph partition. The optimal graph partition means that the similarities within a cluster are as large as possible, while the similarities between clusters are as small as possible. But according to the Rayleigh quotient principle, calculating the optimal Cheeger cut is an NP-hard problem. Next we will try to get an approximate solution of Cheeger cut by introducing p-Laplacian into spectral clustering.

Hein et al. define the inner product form of graph p-Laplacian Δ_p as follows [9]:

$$ \left\langle {\text{f} ,\Delta_{p} \text{f} } \right\rangle = \frac{1}{2}\sum\limits_{i,\,j = 1}^{n} {w_{ij} (f_{i} - f_{j} )^{p} } $$

(3)

where p ∈ (1,2], f is the eigenvector of p-Laplacian matrix.

Theorem 1.

For p > 1 and every partition of V into A, $ \bar{A} $ there exists a function (f, A) such that the functional F _p associated to the p-Laplacian satisfies

$$ F_{p} (f,A) = \frac{{\left\langle {\text{f} ,\Delta_{p} \text{f} } \right\rangle }}{{\left\| \text{f} \right\|^{p} }} = cut(A,\bar{A})\left| {\frac{1}{{\left| A \right|^{{\tfrac{1}{p - 1}}} }} + \frac{1}{{\left| {\bar{A}} \right|^{{\tfrac{1}{p - 1}}} }}} \right|^{p - 1} $$

(4)

where $ \left\| \text{f} \right\|^{p} = \sum\limits_{i = 1}^{n} {\left| {f_{i} } \right|^{p} } $. The expression (4) can be interpreted as a balanced graph cut criterion, and we have the special cases

$$ \mathop {\lim }\limits_{p \to 1} F_{p} (f,A) = Ccut(A,\bar{A}) $$

(5)

Theorem 1 shows that Cheeger cut can be solved in polynomial time using p-Laplacian operator. So the solution of F _p(f) is a relaxed approximate solution of Cheeger cut and the optimal solution can be obtained by the eigen-decomposition of p-Laplacian:

$$ \lambda_{p} = \mathop {\arg \hbox{min} }\limits_{p \to 1} F_{p} (f) $$

(6)

where λ _p is the eigenvalue corresponding to eigenvector f.

Specifically, the second eigenvector $ v_{p}^{(2)} $ of p-Laplacian matrix will lead to a bipartition of the graph by setting an appropriate threshold [3]. The optimal threshold is determined by minimizing the corresponding Cheeger cut. For the second eigenvector $ v_{p}^{(2)} $ of graph p-Laplacian Δ_p, the threshold should satisfy:

$$ \mathop {\arg \hbox{min} }\limits_{{A_{t} = \{ i \in V|v_{p}^{(2)} (i) > t\} }} Ccut(A_{t} ,\bar{A}_{t} ) $$

(7)

3 Neighborhood Attribute Granulation

Rough set theory is proposed by professor Pawlak in 1982 [10]. Attribute reduction is one of the core contents of rough set knowledge discovery. However, Pawlak rough set is only suitable for discrete data. To solve this problem, Hu et al. propose neighborhood rough set model [11]. This model can directly analyze the attributes with continuous values. Therefore, it has great advantages in feature selection and classification accuracy.

Definition 1.

Domain $ U = \{ x_{1} ,x_{2} , \cdots ,x_{n} \} $ is a non-empty finite set in real space, for $ x_{i} \in U $, the δ-neighborhood of x _i is defined as:

$$ \delta (x_{i} ) = \{ x|x \in U,\Delta (x,x_{i} ) \le \delta \} $$

(8)

where $ \delta \ge 0 $, $ \delta (x_{i} ) $ is called the neighborhood particle of x _i, Δ is a distance function.

Definition 2.

Given a domain $ U = \{ x_{1} ,x_{2} , \cdots ,x_{n} \} $ located in real space. A represents the attribute set of U; D represents the decision attribute. If A is able to generate a family of neighborhood relationship of domain U, then $ NDT = \left\langle {U,A,D} \right\rangle $ is called a neighborhood decision system.

For a neighborhood decision system $ NDT = \left\langle {U,A,D} \right\rangle $, domain U is divided into N equivalence classes by decision attribute $ D:X_{1} ,X_{2} , \cdots ,X_{N} $. $ \forall B \subseteq A $, the lower approximation is $ \underline{{N_{B} }} D = \bigcup\limits_{i = 1}^{N} {\underline{{N_{B} }} X_{i} } $, where $ \underline{{N_{B} }} X_{i} = \{ x_{i} |\delta_{B} (x_{i} ) \subseteq X_{i} ,x_{i} \in U\} $.

According to the nature of lower approximation, we can define the dependence of decision attribute D on condition attribute B:

$$ \gamma_{B} (D) = \frac{{Card(\underline{{N_{B} }} D)}}{Card(U)} $$

(9)

where $ 0 \le \gamma_{B} (D) \le 1 $. Obviously, the greater the positive region $ \underline{{N_{B} }} D $, the stronger the dependence of decision D on condition B.

Definition 3.

Given a neighborhood decision system $ NDT = \left\langle {U,A,D} \right\rangle $, $ B \subseteq A $, $ \forall a \in A - B $, then the significant degree of a relative to B is defined as:

$$ SIG(a,B,D) = \gamma_{B \cup a} (D) - \gamma_{B} (D) $$

(10)

However, sometimes several attributes may have the same greatest importance degree. Traditional reduction algorithms take the approach of randomly choosing one of the attributes, which is obviously arbitrary does not taking into account the impact of other factors on attribute selection and may lead to poor reduction results. From the viewpoint of information theory to analyze attribute reduction can improve the reduction accuracy [12]. Here, we use information entropy as another criterion to evaluate attributes. The definition of entropy is given below.

Definition 4.

Given knowledge P and its partition $ U/P = \{ X_{1} ,X_{2} , \cdots ,X_{n} \} $ exported on domain U. The information entropy of knowledge P is defined as:

$$ H(P) = - \sum\limits_{i = 1}^{n} {p(X_{i} )\log p(X_{i} )} $$

(11)

where $ p(X_{i} ) = \left| {X_{i} } \right|/\left| U \right| $ represents the probability of equivalence class X _i on domain U.

If multiple attributes have the same greatest importance degree, then we may compare their information entropy and select the attribute with the minimum entropy (because it carries the least uncertain information). Incorporate the selected attribute into the reduction set, and repeat this process for each attribute until the reduction set no longer changes. This improved attribute reduction algorithm is shown as Algorithm 1.

4 p-Spectral Clustering Based on Neighborhood Attribute Granulation

Massive high-dimensional data processing has been a challenge problem in data mining. High-dimensional data is often accompanied by the “curse of dimensionality”, so traditional p-spectral clustering algorithms cannot play to their strengths very well. Moreover, real data sets often contain noise and irrelevant features, likely to cause “dimension trap”. It would interfere with the clustering process of algorithms, affecting the accuracy of clustering results [13]. To solve this problem, we propose a novel p-spectral clustering algorithm based on neighborhood attribute granulation (NAG-pSC). The detailed steps of NAG-pSC algorithm is given in Algorithm 2.

5 Experimental Analysis

To test the effectiveness of the proposed NAG-pSC algorithm, we use six benchmark data sets to do the experiments. The characteristics of these data sets are shown in Table 1.

Table 1. Data sets used in the experiments

Full size table

In this paper, we use F-measure to evaluate the merits of clustering results [14]. The F-score of each class i and the total F index of the clustering results are defined as:

$$ F(i) = \frac{2 \times P(i) \times R(i)}{P(i) + R(i)} $$

(12)

$$ F = \frac{1}{n}\sum\limits_{i = 1}^{k} {[N_{i} \times F(i)]} $$

(13)

where $ P(i) = N_{ii*} /N_{i*} $ is the precision rate and $ R(i) = N_{ii*} /N_{i} $ is the recall rate; N _ii* is the size of the intersection of class i and cluster i*; N _i is the size of class i; N _i* is the size of cluster i*; n is the number of data points; k is the class number; N _i is the size of class i. $ F \in [0,1] $, the greater the F index is, means the clustering results of the algorithm is closer to the real data category.

In the experiment, NAG-pSC algorithm is compared with the traditional spectral clustering (SC), density sensitive spectral clustering (D-SC) [1] and p-spectral clustering (pSC) [3]. The threshold δ is important in neighborhood rough set. Hu et al. recommend a value range [0.2, 0.4] of δ based on experimental analysis [11]. So we set the neighborhood size δ via a cross-validatory search in the range [0.2, 0.4] (with step size 0.05) for each data set. The clustering results of these four algorithms are shown in Fig. 1. The horizontal axis of the figure is the cluster label, and the vertical axis is the F-score of each cluster.

From Fig. 1 we can see that, the performance of SC algorithm is close to D-SC algorithm. This is mainly because that they all based on graph theory and turn the clustering problem into a graph partitioning problem. Using the p-Laplacian transform, pSC may find the global optimum solution. SC works well on Sonar data set. D-SC deals well with Colon Cancer data set. pSC can generate balanced clusters on WDBC data set. But for high dimensional clustering problems, their F-scores are lower than the proposed NAG-pSC algorithm. Because the information in each attribute of the instances is different, and they also make different contributions to the clustering. Improper feature selection would cause a greate impact on the clustering results. Traditional spectral clustering algorithm does not take this into account, susceptible to the interference of noise and irrelevant attributes. For further comparison, Table 2 lists the overall F index for each algorithm and the number of condition attributes of different data sets.

Table 2. Total F index of different algorithms

Full size table

Table 2 shows that NAG-pSC algorithm can well deal with high-dimensional data. NAG-pSC algorithm uses neighborhood rough sets to optimize data instances. The neighborhood attribute reduction based on information entropy diminishes the negative impact of noise data and redundant attributes on the clustering. So in most cases, NAG-pSC algorithm has higher clustering accuracy. NAG-pSC algorithm combines the advantages of p-spectral clustering and neighborhood attribute granulation. It has good robustness and strong generalization ability.

6 Conclusions

To improve the performance of p-spectral clustering on high-dimensional data, we modify the attribute reduction method based on neighborhood rough sets. In the new method, the attribute importance is combined with information entropy to select the appropriate attributes. Then we propose NAG-pSC algorithm based on the optimized attribute reduction set. Experiments show that NAG-pSC algorithm is superior to traditional spectral clustering, density sensitive spectral clustering and p-spectral clustering. In the future, we will study how to apply NAG-pSC algorithm to web data mining, image retrieval and other realistic scenes.

References

Yang, P., Zhu, Q., Huang, B.: Spectral clustering with density sensitive similarity function. Knowl.-Based Syst. 24(5), 621–628 (2011)
Article Google Scholar
Bresson, X., Szlam, A.D.: Total variation and cheeger cuts. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1039–1046 (2010)
Google Scholar
Bühler, T., Hein, M.: Spectral clustering based on the graph p-Laplacian. In: Proceedings of the 26th International Conference on Machine Learning, pp. 81–88 (2009)
Google Scholar
Dhanjal, C., Gaudel, R., Clémençon, S.: Efficient eigen-updating for spectral graph clustering. Neurocomputing 131, 440–452 (2014)
Article Google Scholar
Cao, J., Chen, P., Dai, Q., et al.: Local information-based fast approximate spectral clustering. Pattern Recogn. Lett. 38, 63–69 (2014)
Article Google Scholar
Semertzidis, T., Rafailidis, D., Strintzis, M.G., et al.: Large-scale spectral clustering based on pairwise constraints. Inf. Process. Manage. 51, 616–624 (2015)
Article Google Scholar
Jia, H., Ding, S., Xu, X., et al.: The latest research progress on spectral clustering. Neural Comput. Appl. 24(7–8), 1477–1486 (2014)
Article Google Scholar
Jia, H., Ding, S., Du, M.: Self-tuning p-Spectral clustering based on shared nearest neighbors. Cogn. Comput. 7(5), 622–632 (2015)
Article Google Scholar
Hein, M., Audibert, J.Y., Von Luxburg, U.: Graph Laplacians and their convergence on random neighborhood graphs. J. Mach. Learn. Res. 8(12), 1325–1368 (2007)
MathSciNet MATH Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982)
Article MathSciNet MATH Google Scholar
Hu, Q.H., Yu, D.R., Xie, Z.X.: Numerical attribute reduction based on neighborhood granulation and rough approximation. J. Softw. 19(3), 640–649 (2008)
Article MATH Google Scholar
Ding, S.F., Zhu, H., Xu, X.Z., et al.: Entropy-based fuzzy information measures. Chin. J. Comput. 35(4), 796–801 (2012)
Article MathSciNet Google Scholar
Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst. 99, 135–145 (2016)
Article MathSciNet Google Scholar
Jia, H., Ding, S., Zhu, H., et al.: A feature weighted spectral clustering algorithm based on knowledge entropy. J. Softw. 8(5), 1101–1108 (2013)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61379101, 61672522), and the National Key Basic Research Program of China (No. 2013CB329502).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Shifei Ding, Hongjie Jia, Mingjing Du & Qiankun Hu
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Shifei Ding, Hongjie Jia, Mingjing Du & Qiankun Hu

Authors

Shifei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Jia
View author publications
You can also search for this author in PubMed Google Scholar
Mingjing Du
View author publications
You can also search for this author in PubMed Google Scholar
Qiankun Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifei Ding .

Editor information

Editors and Affiliations

Chinese Academy of Sciences , Beijing, China
Zhongzhi Shi
University of Salford , Salford, United Kingdom
Sunil Vadera
Deakin University , Burwood, Victoria, Australia
Gang Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, S., Jia, H., Du, M., Hu, Q. (2016). p-Spectral Clustering Based on Neighborhood Attribute Granulation. In: Shi, Z., Vadera, S., Li, G. (eds) Intelligent Information Processing VIII. IIP 2016. IFIP Advances in Information and Communication Technology, vol 486. Springer, Cham. https://doi.org/10.1007/978-3-319-48390-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-48390-0_6
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48389-4
Online ISBN: 978-3-319-48390-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics