1 Introduction

In machine learning applications, particularly image processing, computer vision and bioinformatics, data is often represented in matrix form (second order tensor space). For example- a gray scale image is a order-2 tensor and a video is a order-3 tensor. Here, one of the critical task is to identify the hidden patterns in training data [1]. Most commonly, such data are converted in vector form so as to facilitate the use of vector based clustering or classification models. This arrangements, however, suffers from the limitations of under-representation and high dimensionality (sometimes over-fitting) problem leading to high training time complexity [2, 3].

Clustering is a powerful technique that aims to group together similar elements in same cluster while maximizing the segregation between dissimilar elements. Recently, in view of limitations of point-based clustering methods in dealing the data which is not distributed around several cluster points, plane based clustering methods such as Maximum Margin clustering (MMC) and Twin support vector Clustering (TWSVC) [4] have attracted considerable research interest. Taking motivation from TWSVC, we propose Treebased clustering framework for clustering second order tensor data. The main contributions of paper includes the following: First, we propose a modified tensor based LS-TWSTM named as Structural LSTWSTM (S-LSTWSTM) [5] classifier that formulates convex optimization problems as system of linear equations that takes care of structural risk associated with the data. Then, S-LSTWSTM has been extended to binary decision structure based clustering framework, termed as Tree-SLSTWSTC, which leads to fast and efficient cluster assignment in tensor framework. Finally, to make our Tree-SLSTWSTC more robust and stable, initialization technique based on Tensor k-means is proposed.

Experiments have been carried out on popular image datasets that establish the out-performance of our proposed algorithm over other vector and tensor based clustering techniques significantly.

The rest of the paper is organized as follows. Section 2 gives the background for our proposed approach. Section 3 discusses our proposed work. Experimental results have been shown in Sect. 4. Finally, Sect. 5 concludes our work and state possible future direction of work.

2 Related Work

Let \( X=\{ X_1, X_2,..., X_m\}\) be a training set of m data samples in second order tensor space i.e. \(X_i \in \mathbb {R}^{n_1} \times \mathbb {R}^{n_2}\). Let \(I_1\) represent the set of indices with \(y_i=1\), and \(I_2\) represent the set of indices with label \(y_i=-1\).

2.1 Least Squares Twin Support Tensor Machine

Working on the tensor generalization of Twin Support Vector Machine [7], Zhao et al. [5] proposed Least Squares Twin Support Tensor Machine (LS-TWSTM) which aims to find a pair of non-parallel hyperplanes given by \(f_1(X)=u_1^TXv_1+b_1\) and \(f_2(x)=u_2^TXv_2+b_2\) where \(u_1, u_2 \in \mathbb {R}^{n_1}\), \(v_1, v_2 \in \mathbb {R}^{n_2}\) and \(b_1, b_2 \in \mathbb {R}\). Following two QPPs are solved to find the corresponding non-parallel hyperplanes:

$$\begin{aligned} \text{(LS-TWSTM } \text{1) }&\underset{u_1, v_1, b_1, \xi _2}{min} \quad \frac{1}{2} \sum _{i \in I_1}^{}(u_1X_iv_1+b_1)^2+ c_1\sum _{j\in I_2}^{}\xi _{2j}^2 \\&\text{ subject } \text{ to } \,\,\, -(u_1^TX_jv_1+b_1)+\xi _{2j}=1, \quad j \in I_2. \end{aligned}$$
$$\begin{aligned} \text{(LS-TWSTM } \text{2) }&\underset{u_2, v_2, b_2, \xi _1}{min} \quad \frac{1}{2} \sum _{j \in I_2}^{}(u_2X_jv_2+b_2)^2+ c_2\sum _{i\in I_1}^{}\xi _{1i} \\&\text{ subject } \text{ to } \,\,\, (u_2^TX_iv_2+b_2)+\xi _{1i}= 1, \quad i \in I_1. \end{aligned}$$

Since the hyperplane parameters are interdependent, the problems are solved using alternate projection method [6]. A test point is assigned a label depending upon its proximity from two hyperplanes. Please refer to [5] for details.

3 Proposed Work

In this work, we first propose a novel tensor classifier termed as Structural Least Squares Twin Support Tensor Machine (S-LSTWSTM), which we further use in an unsupervised framework to propose a binary treebased clustering approach termed as Tree-SLSTWSTC.

3.1 Structural Least Squares Twin Support Tensor Machine

In the spirit of Least Squares Twin Support Tensor Machine (LS-TWSTM) [5], the proposed S-LSTWSTM seeks two non-parallel hyperplanes by considering the following optimization problems:

$$\begin{aligned} \text{(S-LSTWSTM } \text{1) }&\underset{u_1, v_1,b_1,\xi _2}{min} \quad \frac{1}{2}\sum _{i \in I_1}^{} (u_1X_iv_1 +e_1b_1)^2 +c_1 \sum _{j \in I_2}^{}\xi _{2j}^2 +c_2 (u_1^Tu_1+v_1^Tv_1+b_1^2) \nonumber \\&\text{ subject } \text{ to } \,\,\, (u_1^T{X_j}v_1 +b_1e_2)= e_2-\xi _{2j}~,~~~~j \in I_2, \end{aligned}$$
(1)
$$\begin{aligned} \text{(S-LSTWSTM } \text{2) }&\underset{u_2, v_2,b_2,\xi _1}{min} \quad \frac{1}{2} \sum _{j \in I_2}^{} (u_2{X_j}v_2 +e_2b_2)^2 +c_1 \sum _{i \in I_1}^{} \xi _{1i}^2 +c_2 (u_2^Tu_2+v_2^Tv_2+b_2^2) \nonumber \\&\text{ subject } \text{ to } \,\,\, (u_2^T{X_i}v_2 +b_2e_1)= e_1-\xi _{1i}, ~~~~i \in I_1, \end{aligned}$$
(2)

where \(\xi _1\) and \(\xi _2\) are error variables; and \(e_1\) and \(e_2\) are appropriate dimensional matrices of ones. The first term of Eqs. (1) and (2) calculates the empirical risk of the data. Thus, minimizing this term tends to keep the hyperplane close to the data matrices and the constraints require the hyperplane to be at unit distance from the other class. Further, S-LSTWSTM takes care of structural risk minimization (SRM) by introducing the term (\(u_i^Tu_i+v_i^Tv_i+b_i^2\), \(i=1,2\)) in the objective function and thus improves the generalization ability. It also takes care of the possible ill-conditioning that might arise during matrix inversion.

Working on the lines on LS-TWSTM [5] for Eq. (1) and setting the gradient of objective function with respect to (\(u_1\), \(v_1\), \(b_1\)) to zero, indicates that \(u_1\), \(v_1\) and \(b_1\) are inter-dependent and hence can not be solved independently. Therefore, we use alternating projection method [6].

For any given non-zero vector \(u_k \in \mathbb {R}^{n_1}\), let \(x_i^T={u_k}^TX_i\) and \({x}_j^T={u_k}^T{X}_j\), we then solve for the following modified optimization problem (obtained after substituting the value of \(\xi _{2j}\) in the objective function):

$$\begin{aligned} \underset{v_k,b_k}{min}&\frac{1}{2} \sum _{i \in I_1}^{} (x_iv_k +b_k)^2 +c_1 \sum _{j \in I_2}^{} ||e_2-({x_j}v_k +b_ke_2)||^2 +c_2 (v_k^Tv_k+b_k^2). \end{aligned}$$
(3)

Differentiating Lagrangian corresponding to (3) with respect to \(v_k\) and \(b_k\), leads to the following system of linear equations:

$$\begin{aligned} \left[ {\begin{array}{cc} v_k \\ b_k \end{array}}\right] =-\left[ \frac{1}{c_1}H_1^TH_1+G_1^TG_1+c_2I\right] ^{-1}G_1^Te_2, \end{aligned}$$
(4)

where \(H_1\) and \(G_1\) are matrices of points \(x_i\) and \({x_j}\) augmented with a column of ones; and I is an identity matrix of appropriate dimensions.

Once a non-zero vector \(v_k \in \mathbb {R}^{n_2}\) is obtained, let \(\hat{x}_i^T=X_i{v_k}\) and \(\hat{x}_j^T={X}_j{v_k}\), we solve for the following modified optimization problem:

$$\begin{aligned} \underset{u_k,b_k}{min}&\frac{1}{2} \sum (\hat{x}_iu_k +b_k)^2 +c_1 ||(\hat{x}_ju_k +b_ke_2)-e_2|| +c_2 (u_k^Tu_k+b_k^2). \end{aligned}$$
(5)

Working on the lines as above, we obtain \((u_k, b_k)\) as follows

$$\begin{aligned} \left[ {\begin{array}{cc} u_k \\ b_k \end{array}}\right] =-\left[ \frac{1}{c_1}H_2^TH_2+G_2^TG_2+c_2I\right] ^{-1}G_2^Te_2, \end{aligned}$$
(6)

where \(H_2\) and \(G_2\) are matrices of points \(\hat{x}_i\) and \(\hat{x_j}\) augmented with a column of ones. The Eqs. (4) and (6) are solved alternatively until \(u_k\), \(v_k\) and \(b_k\) converges.

On the similar lines as above, the solution of (2) is obtained. A new test point is assigned a class label similar to LS-TWSTM [5] based on proximity criteria.

3.2 Tree-based Structural Least Squares Twin Support Vector Clustering

Tree-SLSTWSTC algorithm creates a binary tree of clusters which partitions the data at multiple levels of the tree until desired number of clusters are obtained. Unlike TWSVC [4], Tree SLSTWSTC uses symmetric squared loss function at each internal node that handles the issue of premature convergence of cluster framework. The proposed algorithm Tree-SLSTWSVC starts with initial labels (\(+1\), \(-1\)). By using the initial labels, the data X with m data matrices is divided into two clusters, A and B, of size \((n_1 \times n_2 \times m_1)\) and \((n_1 \times n_2 \times m_2)\) respectively (where m = \(m_1\) + \(m_2\)). Each group is then individually partitioned further by considering inter-cluster relationship and is able to generate more stable results in lesser time. Tree-SLSTWSTC is summarized in Algorithm 1.

figure a

3.3 Initialization

In conventional plane-based clustering scenarios, the initial cluster labels for data are obtained by randomization which is highly unstable and inefficient technique. Here, we propose a novel tensor-based initialization algorithms which uses frobenius norm to find the distance between two order-2 tensors (matrices). For example, the distance between two data points \(x^\alpha =x(n_1,n_2,1)\) and \(x^\beta =x(n_1,n_2,2)\) is calculated as

$$\begin{aligned} d_({x^\alpha ,x^\beta })=\sqrt{\sum _{i=1}^{n_1}\sum _{j=1}^{n_2}(x^\alpha _{ij}-x^\beta _{ij})^2}. \end{aligned}$$
(7)

We have implemented Tensor k-means (Tk-means), which uses tensor data as input and return corresponding cluster labels in the spirit similar to vector based k-means algorithm. Similar to traditional k-means, iterative relocation algorithm is followed which minimize the mean squared error locally. Henceforth, the centroid of cluster is updated and the process is repeated until labels converges i.e. no more change in label is detected.

4 Experimental Results

To evaluate the performance of the proposed method, experiments were carried out on image dataset of face recognition and optical digit recognition systems. In order to prove competence of our proposed work, we used the Metric accuracy [4] and Learning time as the performance criteria.

For comparison of our proposed approach against other algorithms, we implemented conventional k-means and \(k{-}\)Nearest neighbour graph algorithm in tensor framework. Further, to minimize the effect of randomization (in k-means) and value of k (in NNG), the experiments were performed multiple times, and the best results are reported.

Table 1. Clustering results on face recognition and optical digit recognition application

Table 1 summarizes the results of experiments on the above-mentioned datasets. It is clearly evident here that k-means initialization based Tree-SLSTWSTC outperforms other methods in terms of clustering performance as well as learning time. We have also discussed clustering results obtained from other approaches in Table 1. It can be observed that the prediction accuracy of Tree-SLSTWSTC is significantly better than these methods. Also, it should be noticed that these methods use vector-based representation for clustering.

5 Conclusions

Based on the recently proposed LS-TWSTM, in this paper, we have proposed a novel treebased tensor based clustering algorithm namely Treebased Structural Least Squares Twin Support Tensor Clustering (Tree-SLSTWSTC) which has the capability to directly deal with the real world matrix data (second order tensor space) resulting into improved generalization and reduced Computational complexity. Moreover, it also handles the premature convergence problem as it considers structural risk associated with data. For initializing cluster labels, we have proposed Tensor k-means algorithm which helps to overcome the instability incurred by random initialization. Experimental comparisons of proposed approach against other related approaches on face recognition and handwritten image dataset, establish the suitability of the proposed algorithms to deal with the tensor based data directly (as direct image input).

In future, the application of proposed approach in more challenging real-world applications with higher order tensor space like image segmentation and computer vision can be explored.