Skip to main content
Log in

Multi-view local linear KNN classification: theoretical and experimental studies on image classification

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

When handling special multi-view scenarios where data from each view keep the same features, we may perhaps encounter two serious challenges: (1) samples from different views of the same class are less similar than those from the same view but different class, which sometimes happen in local way in both training and/or testing phases; (2) training an explicit prediction model becomes unreliable and even infeasible for test samples in multi-view scenarios. In this study, we prefer the philosophy of the k nearest neighbor method (KNN) to circumvent the second challenge. Without an explicit prediction model trained directly from the above multi-view data, a new multi-view local linear k nearest neighbor method (MV-LLKNN) is then developed to circumvent the two challenges so as to predict the label of each test sample. MV-LLKNN has its two reliable assumptions. One is the theoretically and experimentally provable assumption that any test sample can be well approximated by a linear combination of its neighbors in the multi-view training dataset. The other assumes that these neighbors should demonstrate their clustering property according to certain commonality-based similarity measure between the multi-view test sample and these multi-view neighbors so as to avoid the first challenge. MV-LLKNN can realize its effective prediction for a test multi-view sample by cheaply using both on-hand fast iterative shrinkage thresholding algorithm (FISTA) and KNN. Our theoretical analysis and experimental results about real multi-view face datasets indicate the effectiveness of MV-LLKNN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Cleuziou G, Exbrayat M, Martin L, Sublemontier J-H (2009) CoFKM: a centralized method for multiple-view clustering. In Proceedings 9th IEEE ICDM. pp. 752–757

  2. Huang X, Lei Z, Fan M, Wang X, Li SZ (2013) Regularized discriminative spectral regression method for heterogeneous face matching. IEEE Trans Image Process 22(1):353–362

    MathSciNet  MATH  Google Scholar 

  3. Kan M, Shan S, Zhang H et al (2016) Multi-view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194

    Google Scholar 

  4. Ding Z, Fu Y (2014) Low-rank common subspace for multi-view learning. In: Proceedings IEEE ICDM. pp. 110–119

  5. Yu J, Rui Y, Tang YY, Tao D (2014) High-order distance-based multiview stochastic learning in image classification. IEEE Trans Cybern 44:2431–2442

    Google Scholar 

  6. Jiang Yizhang, Chung Fu-Lai, Wang Shitong, Deng Zhaohong, Wang Jun, Qian Pengjiang (2015) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(4):688–701

    Google Scholar 

  7. Farquhar J, Hardoon D, Meng H, Shawe-Taylor J, Szedmak S (2006) Two viewlearning: SVM-2 K, theory and practice. Adv Neural Inf Process Syst 18:355–362

    Google Scholar 

  8. Sun S (2013) Multi-view Laplacian support vector machines. Lect Notes Artif Intell 41(4):209–222

    Google Scholar 

  9. Zhu F, Shao L, Lin M (2013) Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognit Lett 34(1):20–24

    Google Scholar 

  10. Xu Z, Sun S (2010) An algorithm on multi-view AdaBoost. In: Proceedings of 17th International conference on neural information processing, pp. 355–362

    Google Scholar 

  11. Peng J, Luo P, Guan Z, Fan J (2017) Graph-regularized multi-view semantic subspace learning. Int J Mach Learn Cybern 3(4):1–17

    Google Scholar 

  12. Xia T, Tao D, Mei T, Zhang Y (2010) Multiview spectral embedding. IEEE Trans Syst Man Cybern B Cybern 40(6):1438–1446

    Google Scholar 

  13. Tzortzis GF, Likas AC (2012) Kernel-based weighted multi-view clustering. In: Proceedings of the 2012 IEEE 12th international conference on data mining, pp. 675–684

  14. Tzortzis G, Likas A (2009) Convex mixture models for multi-view clustering. In: Proceedings of 19th international conference artificial neural networks, pp 205–214

    Google Scholar 

  15. Zong L, Zhang X, Zhao L, Yu H, Zhao Q (2017) Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Netw 88:74–89

    Google Scholar 

  16. Kakade SM, Foster DP (2007) Multi-view regression via canonical correlation analysis. In Proceedings of 20th annual conference on learning theory 2007, pp. 82–96

  17. Merugu S, Rosset S, Perlich C (2006) A new multi-view regression approach with an application to customer wallet estimation. In: Proceedings 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 656–661

  18. Kusakunniran W, Wu Q, Zhang J, Li H (2010) Support vector regression for multi-view gait recognition based on local motion feature selection. In: Proceedings IEEE conference CVPR, pp. 974–981

  19. Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inform Fusion 38:43–54

    Google Scholar 

  20. Zhang Z, Xu Y, Yang J, Li X, Zhang D (2015) A survey of sparse representation: algorithms and applications. IEEE Access 3:490–530

    Google Scholar 

  21. Wright J, Yang AY, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Google Scholar 

  22. Liu Q, Liu C (2017) A novel locally linear KNN method with applications to visual recognition. IEEE Trans Neural Netw Learn Syst 28(9):2010–2021

    MathSciNet  Google Scholar 

  23. Zheng H, Zhu J, Yang Z, Jin Z (2017) Effective micro-expression recognition using relaxed K-SVD algorithm. Int J Mach Learn Cybern 8(6):2043–2049

    Google Scholar 

  24. CandŁs E, Romberg J (2007) Sparsity and incoherence in compressive sampling. Inverse Prob 23(3):969

    MathSciNet  MATH  Google Scholar 

  25. Lu X, Wu H, Yuan Y, Yan P, Li X (2013) Manifold regularized sparse NMF for hyperspectral unmixing. IEEE Trans Geosci Remote Sens 51(5):2815–2826

    Google Scholar 

  26. Mao W, Wang J, Xue Z (2017) An ELM-based model with sparse-weighting strategy for sequential data imbalance problem. Int J Mach Learn Cybern 8(4):1333–1345

    Google Scholar 

  27. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings IEEE conference CVPR, pp. 1794–1801

  28. Wang J, Yang J, Yu K, et al (2010) Locality-constrained linear coding for image classification. In: Proceedings of IEEE conference on CVPR, pp. 3360–3367

  29. Gao S, Tsang IW-H, Chia L-T (2013) Laplacian sparse coding, hypergraph Laplacian sparse coding, and applications. IEEE Trans Pattern Anal Mach Intell 35(1):92–104

    Google Scholar 

  30. Deng W, Hu J, Guo J (2012) Extended SRC: undersampled face recognition via intraclass variant dictionary. IEEE Trans Pattern Anal Mach Intell 34(9):1864–1870

    Google Scholar 

  31. Deng W, Hu J, Guo J (2013) In defense of sparsity based face recognition. In: Proceedings of IEEE conference CVPR, pp 399–406

  32. Zhang Q, Li B (2010) Discriminative K-SVD for dictionary learning in face recognition. In: Proceedings of IEEE conference on CVPR, pp. 2691–2698

  33. Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of 9th ACM conference CIKM, pp. 86–93

  34. Muslea I, Minton S, Knoblock C (2006) Active learning with multiple views. J Artif Intell Res 27:203–233

    MathSciNet  MATH  Google Scholar 

  35. Sun S, Jin F (2011) Robust co-training. Int J Pattern Recognit Artif Intell 25(07):1113–1126

    MathSciNet  Google Scholar 

  36. Huang Chengquan, Chung Fu-Lai, Wang Shitong (2016) Multi-view L2-SVM and its multiview core vector machine. Neural Netw 75:110–125

    MATH  Google Scholar 

  37. Sun S, Chao G (2013) Multi-view maximum entropy discrimination. In: Proceedings of 23rd IJCAI, pp. 1706–1712

  38. Chao G, Sun S (2016) Alternative multi-view maximum entropy discrimination. IEEE Trans Neural Netw Learn Syst 27(07):1445–1456

    MathSciNet  Google Scholar 

  39. Chao G, Sun S (2016) Consensus and complementarity based maximun entropy discrimination for multi-view classification. Inf Sci 367:296–310

    MATH  Google Scholar 

  40. Xu C, Tao D, Xu C (2013) A Survey on Multi-view Learning. Computer Science

  41. Xia T, Tao D, Mei T, Zhang Y (2010) Multiview spectral embedding. IEEE Trans Syst Man Cybern Part B 40:61438–61446

    Google Scholar 

  42. Xie B, Mu Y, Tao D, Huang K (2011) M-sne: multiview stochastic neighbor embedding. IEEE Trans Syst Man Cybern Part B 41(4):1088–1096

    Google Scholar 

  43. Han Y, Wu F, Tao D et al (2012) Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans Circuts Syst Video 22(10):1485

    Google Scholar 

  44. Jiang Z, Lin Z, Davis LS (2013) Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35(11):2651–2664

    Google Scholar 

  45. Yang M, Zhang L, Feng X, Zhang D (2014) Sparse representation based Fisher discrimination dictionary learning for image classification. Int J Comput Vis 109(3):209–232

    MathSciNet  MATH  Google Scholar 

  46. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  47. Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, New York

    MATH  Google Scholar 

  48. Halldorsson GH, Benediktsson JA, Sveinsson JR (2003) Supportvector machines in multisource classification. In: Proceedings IGARSS, Toulouse, France, Jul. 2003, pp. 2054–2056

  49. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202

    MathSciNet  MATH  Google Scholar 

  50. Sim T, Baker S, Bsat M (2002) The cmu pose, illumination, and expression (pie) database. In: Proceedings of the fifth IEEE international conference on Automatic Face Gesture Recognition. IEEE, pp 46–51

  51. Cai D, He X, Han J (2007) Spectral regression for efficient regularized subspace learning. In: Proceedings of the 11th international conference on Computer Vision. IEEE, pp 1–8

  52. https://www.cl.cam.ac.uk/Research/DTG/attarchive/facedatabase.html

  53. Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE Workshop on Applications of Computer Vision, pp. 138–142

  54. Sharmanska V, Quadrianto N, Lampert CH (2013) Learning to rank using privileged information. In: Proceedings of 14th IEEE ICCV, pp 825–832

    Google Scholar 

  55. Motiian S, Piccirilli M, Adjeroh DA, Doretto G (2016) Information bottleneck learning using privileged information for visual recognition: In: Proceeding of international conference on computer vis pattern recognition, June 2016, pp 1496–1505

    Google Scholar 

  56. Parambath SP, Usunier N, Grandvalet Y (2014) Optimizing F-measures by cost-sensitive classification. In: Proceedings NIPS, pp 2123–2131

    Google Scholar 

  57. Jiang Y, Deng Z, Chung F-L, Wang S (2017) Realizing two-view TSK fuzzy classification system by using collaborative learning. IEEE Trans Syst Man Cybern 47(1):145–160

    Google Scholar 

  58. Jiang Y, Deng Z, Chung F-L, Wang G, Qian P, Choi K-S, Wang S (2017) Recognition of epileptic EEG signals using a novel multiview TSK fuzzy system. IEEE Trans Fuzzy Syst 25(1):3–20

    Google Scholar 

  59. Wang X, Lu S, Zhai J (2008) Fast fuzzy multicategory SVM based on support vector domain description. Int J Pattern Recognit 22(1):109–120

    Google Scholar 

  60. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cognit Neurosci 3(1):71–86

    Google Scholar 

  61. Comaniciu D, Meer P (1999) Mean shift analysis and applications. In: Proceedings of 7th IEEE ICCV, pp 1197–1203

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shitong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Proof of Theorem 1.

Proof:

If \({\mathbf{w}}^{k}\) is the representation obtained by MV-LLKNN on the multi-view data, then \(J\left( {{\mathbf{w}}^{K} } \right) \le J\left( {\mathbf{0}} \right)\),

$$\begin{aligned} \left\| {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\mathbf{w}}^{k} } \right\|_{2}^{2} + \alpha \left\| {{\mathbf{w}}^{k} } \right\|_{1} + \beta \left\| {{\mathbf{w}}^{k} - \eta {\mathbf{s}}} \right\|_{2}^{2} \hfill \\ \le \left\| {{\mathbf{x}}^{k} } \right\|^{2} + \beta \left\| {\eta {\mathbf{s}}} \right\|^{2} \times \hfill \\ \end{aligned}$$
(39)

Since

$$\left\| {{\mathbf{w}}^{k} - \beta {\mathbf{s}}} \right\|_{2} \le \left( {\frac{1}{\beta } + \left\| {\eta {\mathbf{s}}} \right\|^{2} } \right)^{{\frac{1}{2}}}$$
(40)

we know that \(\left\| {{\mathbf{w}}^{k} - \eta {\mathbf{s}}} \right\|_{2}\) is actually bounded by a small positive constant. That is to say, \({\mathbf{w}}^{k} \approx \eta {\mathbf{s}} + {\mathbf{const}}\).

The transformations in Eqs. (12) and (13) guarantees that each term in \({\mathbf{w}}^{k}\) satisfies that \(0 \le w_{i}^{k} \le 1\) and \(\sum\nolimits_{i = 1}^{m} {w_{i}^{k} } = 1\). It is worth noting that the transformations do not affect the classification results. Based on the similarity measure between the test sample and the training sample, MV-LLKNN+ and MV-LLKNN* are designed.

For MV-LLKNN+, it can be approximated as follows:

$$\begin{aligned} c^{ * } & = \mathop {\arg \hbox{max} }\limits_{c} \sum\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {w_{i}^{k} } } \\ & \approx \mathop {\arg \hbox{max} }\limits_{c} \sum\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {\eta s_{i} + \text{const}} } \\ & \propto \mathop {\arg \hbox{max} }\limits_{c} \sum\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {\left( {1 - \frac{{\sum\limits_{l = 1}^{K} {\gamma_{l} \left\| {{\mathbf{x}}^{l} - {\mathbf{a}}_{i}^{l} } \right\|^{2} } }}{{2\sigma^{2} }}} \right)} } \\ \end{aligned}$$
(41)

In this study, let us consider the Epanechnikov kernel [61] : \(h\left( u \right) = \frac{3}{4}\left( {1 - u^{2} } \right)\). Then

$$\begin{aligned} c^{ * } = \mathop {\arg \hbox{max} }\limits_{c} \sum\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {w_{i}^{k} } } \hfill \\ \, \propto \mathop {\arg \hbox{max} }\limits_{c} \sum\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {\sum\limits_{l = 1}^{K} {h\left( {\frac{{{\mathbf{x}}^{l} - {\mathbf{a}}_{i}^{l} }}{\sigma }} \right)} } } \hfill \\ \end{aligned}$$
(42)

where \(\sum\nolimits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {\sum\nolimits_{l = 1}^{K} {h\left( {\frac{{{\mathbf{x}}^{l} - {\mathbf{a}}_{i}^{l} }}{\sigma }} \right)} }\) becomes the kernel density estimation of the conditional probability \(p\left( {{\mathbf{x}}^{k} \left| c \right.} \right)\left( {k = 1,2, \ldots ,K} \right)\).

Since each view is assumed to be classified separately and independently. Therefore, if the prior probability \(p\left( c \right)\) is the same for all the classes, then

$$\begin{aligned} c^{ * } & = \mathop {\arg \hbox{max} }\limits_{c} \sum\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {w_{i}^{k} } } \\ & \approx \mathop {\arg \hbox{max} }\limits_{c} \sum\limits_{k = 1}^{K} {p\left( {{\mathbf{x}}^{k} \left| c \right.} \right)} \\ & \propto \mathop {\arg \hbox{max} }\limits_{c} \sum\limits_{k = 1}^{K} {p\left( {c\left| {{\mathbf{x}}^{k} } \right.} \right)} \left( {{\text{i}} . {\text{e}} . , {\text{ Bayes classifier}}} \right) \\ \end{aligned}$$
(43)

Similarly, we have the following derivations for MV-LLKNN*:

$$\begin{aligned} c^{ * } & = \mathop {\arg \hbox{max} }\limits_{c} \prod\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {w_{i}^{k} } } \\ & \approx \mathop {\arg \hbox{max} }\limits_{c} \prod\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {\eta s_{i} + \text{const}} } \\ & \propto \mathop {\arg \hbox{max} }\limits_{c} \prod\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {\prod\limits_{l = 1}^{K} {\left( {1 - \frac{{\left\| {{\mathbf{x}}^{l} - {\mathbf{a}}_{i}^{l} } \right\|^{2} }}{{\sigma^{2} }}} \right)^{{\gamma_{l} }} } } } \\ \end{aligned}$$
(44)

Then, we also consider another Epanechnikov kernel

$$\begin{aligned} c^{ * } = \mathop {\arg \hbox{max} }\limits_{c} \prod\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {w_{i}^{k} } } \hfill \\ \, \propto \mathop {\arg \hbox{max} }\limits_{c} \prod\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {\prod\limits_{l = 1}^{K} {h\left( {\frac{{{\mathbf{x}}^{l} - {\mathbf{a}}_{i}^{l} }}{\sigma }} \right)} } } \hfill \\ \end{aligned}$$
(45)

Therefore, if the prior probability \(p\left( c \right)\) is the same for all the classes, then

$$\begin{aligned} c^{ * } & = \mathop {\arg \hbox{max} }\limits_{c} \prod\limits_{k = 1}^{K} {\sum\limits_{{{\mathbf{a}}_{i}^{k} \in {\mathbf{A}}_{c}^{k} }} {w_{i}^{k} } } \\ & \approx \mathop {\arg \hbox{max} }\limits_{c} \prod\limits_{k = 1}^{K} {p\left( {{\mathbf{x}}^{k} \left| c \right.} \right)} \\ & \propto \mathop {\arg \hbox{max} }\limits_{c} \prod\limits_{k = 1}^{K} {p\left( {c\left| {{\mathbf{x}}^{k} } \right.} \right)} \left( {{\text{i}} . {\text{e}} . , {\text{ Bayes classifier}}} \right) \\ \end{aligned}$$
(46)

In summary, from the perspective of density estimation, MV-LLKNN+ and MV-LLKNN* approximate the Bayes decision rule for minimum error and the approximation error mainly comes from \({\mathbf{w}}^{k} \approx \eta {\mathbf{s}} + {\mathbf{const}}\) and the kernel density estimation error.

Appendix 2

Proof of Theorem 2.

Proof

Let us observe Eq. (3) which is equivalent to

$$\begin{aligned} \mathop {arg\hbox{min} }\limits_{{\left[ {w_{1}^{k} ,w_{2}^{k} , \ldots ,w_{m}^{k} } \right]^{T} }} J\left( {\left[ {w_{1}^{k} ,w_{2}^{k} , \ldots ,w_{m}^{k} } \right]^{T} } \right) & = \sum\limits_{k = 1}^{K} {\left\| {{\mathbf{x}}^{k} - \sum\limits_{i = 1}^{m} {w_{i}^{k} {\mathbf{a}}_{i}^{k} } } \right\|_{2}^{2} } \\ & \quad + \alpha \sum\limits_{k = 1}^{K} {\sum\limits_{i = 1}^{m} {\left| {w_{i}^{k} } \right|} } + \beta \sum\limits_{k = 1}^{K} {\sum\limits_{i = 1}^{m} {\left( {w_{i}^{k} - \eta s_{i} } \right)^{2} } } \\ \end{aligned}$$
(47)

Let \({\tilde{\mathbf{w}}}^{k} \varvec{ = }\left[ {\tilde{w}_{1}^{k} ,\tilde{w}_{2}^{k} , \ldots ,\tilde{w}_{m}^{k} } \right]^{T}\) is the representation obtained by MV-LLKNN on the multi-view data, then we take the derivatives with respective \(\tilde{w}_{i}^{k}\) and \(\tilde{w}_{j}^{k}\):

$$\begin{aligned} \frac{\partial J}{{\partial \tilde{w}_{i}^{k} }} & = - 2\left( {{\mathbf{a}}_{i}^{k} } \right)^{T} \left( {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\tilde{\mathbf{w}}}^{k} } \right) + \alpha sign\left( {\tilde{w}_{i}^{k} } \right) \\ & \quad + 2\beta \left( {\tilde{w}_{i}^{k * } - \eta s_{i} } \right) \\ \end{aligned}$$
(48)
$$\begin{aligned} \frac{\partial J}{{\partial \tilde{w}_{j}^{k} }} & = - 2\left( {{\mathbf{a}}_{j}^{k} } \right)^{T} \left( {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\tilde{\mathbf{w}}}^{k} } \right) + \alpha sign\left( {\tilde{w}_{j}^{k} } \right) \\ & \quad + 2\beta \left( {\tilde{w}_{j}^{k} - \eta s_{j} } \right) \\ \end{aligned}$$
(49)

Let the above two derivatives to be zero. Since \({\text{sign}}(\tilde{w}_{i}^{k} ) = {\text{sign}}(\tilde{w}_{j}^{k} )\), then, \(\frac{\partial J}{{\partial \tilde{w}_{i}^{k} }} - \frac{\partial J}{{\partial \tilde{w}_{j}^{k} }}\) is:

$$\begin{aligned} \beta \left( {\tilde{w}_{i}^{k} - \tilde{w}_{j}^{k} } \right) & = \left( {\left( {{\mathbf{a}}_{i}^{k} } \right)^{T} - \left( {{\mathbf{a}}_{j}^{k} } \right)^{T} } \right)\left( {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\tilde{\mathbf{w}}}^{k} } \right) \\ & \quad + \beta \eta \left( {s_{i} - s_{j} } \right) \\ \end{aligned}$$
(50)

By \(J\left( {{\tilde{\mathbf{w}}}^{k} } \right) \le J\left( {\mathbf{0}} \right), \, \left\| {{\mathbf{x}}^{k} } \right\|^{2} = 1\), we can get:

$$\begin{aligned} \left| {\tilde{w}_{i}^{k} - \tilde{w}_{j}^{k} } \right| & = \frac{1}{\beta }\left| {\left( {\left( {{\mathbf{a}}_{i}^{k} } \right)^{T} - \left( {{\mathbf{a}}_{j}^{k} } \right)^{T} } \right)\left( {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\tilde{\mathbf{w}}}^{k} } \right) + \beta \eta \left( {s_{i} - s_{j} } \right)} \right| \\ & \le \frac{1}{\beta }\left| {\left( {\left( {{\mathbf{a}}_{i}^{k} } \right)^{T} - \left( {{\mathbf{a}}_{j}^{k} } \right)^{T} } \right)\left( {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\tilde{\mathbf{w}}}^{k} } \right)} \right| + \eta \left| {s_{i} - s_{j} } \right| \\ & \le \frac{1}{\beta }\left( {\left( {{\mathbf{a}}_{i}^{k} } \right)^{T} - \left( {{\mathbf{a}}_{j}^{k} } \right)^{T} } \right)\left( {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\tilde{\mathbf{w}}}^{k} } \right) + \eta \left| {s_{i} - s_{j} } \right| \\ & = \frac{1}{\beta }\sqrt {2\left( {1 - \delta_{k} } \right)} \left( {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\tilde{\mathbf{w}}}^{k} } \right) + \eta \left| {s_{i} - s_{j} } \right| \\ \end{aligned}$$
(51)

Then, since

$$\left\| {{\mathbf{x}}^{k} - {\mathbf{A}}^{k} {\tilde{\mathbf{w}}}^{k} } \right\|_{2}^{2} \le \left\| {{\mathbf{x}}^{k} } \right\|^{2} + \beta \left\| {\eta {\mathbf{s}}} \right\|^{2}$$
(52)

so we have

$$\begin{aligned} \left| {\tilde{w}_{i}^{k} - \tilde{w}_{j}^{k} } \right| & \le \frac{1}{\beta }\sqrt {2\left( {1 - \delta_{k} } \right)\left( {\left\| {{\mathbf{x}}^{k} } \right\|^{2} + \beta \eta^{2} \left\| {\mathbf{s}} \right\|^{2} } \right)} + \eta \left| {s_{i} - s_{j} } \right| \\ & = \frac{G}{\beta }\sqrt {2\left( {1 - \delta_{k} } \right)} + \eta \left| {s_{i} - s_{j} } \right| \\ \end{aligned}$$
(53)

where \(G = \sqrt {1 + \beta \eta^{2} \left\| {\mathbf{s}} \right\|^{2} }\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Z., Bian, Z. & Wang, S. Multi-view local linear KNN classification: theoretical and experimental studies on image classification. Int. J. Mach. Learn. & Cyber. 11, 525–543 (2020). https://doi.org/10.1007/s13042-019-00992-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-00992-9

Keywords

Navigation