Structure Fusion and Propagation for Zero-Shot Learning

Lin, Guangfeng; Chen, Yajun; Zhao, Fan

doi:10.1007/978-3-030-03338-5_39

Guangfeng Lin²⁰,
Yajun Chen²⁰ &
Fan Zhao²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11258))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1949 Accesses
2 Citations
1 Altmetric

Abstract

The key of zero-shot learning (ZSL) is how to find the information transfer model for bridging the gap between images and semantic information (texts or attributes). Existing ZSL methods usually construct the compatibility function between images and class labels with consideration of the relevance on the semantic classes (the manifold structure of semantic classes). However, the relationship of image classes (the manifold structure of image classes) is also very important for the compatibility model construction. It is difficult to capture the relationship among image classes due to unseen classes, so that the manifold structure of image classes often is ignored in ZSL. To complement each other between the manifold structure of image classes and that of semantic classes information, we propose structure fusion and propagation (SFP) for improving the performance of ZSL for classification. SFP can jointly consider the manifold structure of image classes and that of semantic classes for approximating to the intrinsic structure of object classes. Moreover, the SFP can describe the constraint condition between the compatibility function and these manifold structures for balancing the influence of the structure fusion and propagation iteration. The SFP solution provides not only unseen class labels but also the relationship of two manifold structures that encodes the positive transfer in structure fusion and propagation. Experiments demonstrate that SFP can attain the promising results on the AwA, CUB, Dogs and SUN datasets.

Supported by NSFC (Program No. 61771386, Program No. 61671376 and Program No. 61671374), Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2016JM6045, Program No. 2017JZ020).

You have full access to this open access chapter, Download conference paper PDF

Learning exclusive discriminative semantic information for zero-shot learning

Article 25 September 2022

A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning

An Inverse Mapping with Manifold Alignment for Zero-Shot Learning

Keywords

1 Introduction

Although deep learning [32] depending on large-scale labeled data training has been generally used for visual recognition [31], a daunting challenge still exists to recognize visual object “in the wild”. In fact, in specific applications it is impossible to collect all class data for training deep model, so training (seen classes) and testing classes(unseen classes) are often disjoint. The main idea of ZSL is to handle this problem by exploiting the transfer model from the redundant relevance of the semantic description. To recognize unseen classes from seen classes, ZSL needs face to two challenges [3]. One is how to utilize the semantic information for constructing the relationship between unseen classes and seen classes, and other is how to find the compatibility among all kinds of information for obtaining the optimal discriminative characteristics on unseen classes.

ZSL can bridge the gap among the different domains to recognize unseen class objects by semantic embedding of class labels. These semantic embeddings can come from vision (attributes [11]) and language information (text [25]) by the manual annotation, machine learning [29]or data mining [5]. In term of the transformation relationship of different embedding, recent ZSL methods mainly fall into linear embedding, nonlinear embedding and similarity embedding. Linear embedding [1, 2, 7, 13, 24] implements the linear transformation method among different embedding spaces for learning the relevance between unseen class objects and class labels. Nonlinear embedding [23, 25, 28] can realize the nonlinear mapping of the embedding space for building the compatibility function or classifier, which can be learned by deep networks [14, 30]. Similarity embedding [3, 9, 15, 19, 33] builds the classifier by the similarity metrics, which mostly include structure learning or class-wise similarities. In our approach, the similarity metric is extended from semantic space to image space, we attempt to find the relationship of similarities (manifold structure in the different space) for constraining the compatibility function, and further capture to the positive structure propagation for the significantly improvement of the unseen object classification.

In this paper, our motivation is inspired by structure fusion [16,17,18] for jointly dealing with two challenges. The intrinsic manifold structure is crucial for object classification. However, in fact, we only can attain the observation data of the manifold structure, which can represent different aspects of the intrinsic manifold structure. For recovering or approximating the intrinsic structure, we can fuse various manifold structures from observation data. Based on the above idea, we try to capture different manifold structures in image and semantic space for improving the recognition performance of unseen classes in ZSL. Therefore, we expect to construct the compatibility function for predicting labels of unseen classes by building the manifold structure of image classes. On the other end, we attempt to find the relevance between the manifold structure of semantic classes and that of image classes in model space for encoding the influence between the negative and positive transfer, and further make the better compatibility function for classifying unseen class objects. Model space corresponding to visual appearances is the jointed projection space of semantic space and image space, and can preserve the respective manifold structure. Figure 1 illustrates the idea of the proposed method conceptually. SFP considers not only semantic and image structures but also the positive structure propagation for ameliorating unseen objects classification, while SynC [3] only focus on manifold structure in semantic space for combining the base classifier in ZSL.

In our main contribution, a novel idea have tow aspects to recover or approximate the intrinsic manifold structure from seen classes to unseen classes by fusing the different space manifold structure for handling the challenging unseen classes recognition. Specifically, one constructs the projected manifold structure for real and phantom class in model space, another constrains the compatibility function and the relationship of the manifold structure for the positive structure propagation.

2 Structure Fusion and Propagation

In ZSL, we have training data set $\mathscr {D}=\{(x_{n}\in R^{D},y_{n})\}_{n=1}^{N}$, in which $x_{n}$ is image representation (it can be extracted based on deep model, and the detail is described in Table 1) and $y_{n}(n=1,...,N)$ is the class label in the seen class set . We can denote the unseen class set as $\mathscr {U}=\{u|u=S+1,...,S+U\}$. $a_{c}\in R_{D}$ is the linear transformation vector of the class.

2.1 Classification Model and Manifold Structure

We construct a pair-wise linear classifier [3] in the visual image feature space, and determinate a estimated label $\hat{y}$ to a feature x by the following formula.

$$\begin{aligned} \begin{aligned} \hat{y}=\arg \max _{c}a_{c}^{T}x, \end{aligned} \end{aligned}$$

(1)

here, $a_{c}\in R^{D}$ is not only the transformation vector of the feature x, but also the representation of the class c in model. In other words, the above formula can describe the pair-wise linear relation between the feature space and the class label space for characterizing the class representation in the model.

To measure the manifold structure, we can compute the similarity of the related representation in the homogeneous space, which has the same scale and metric. To this end, we respectively build a bipartite graph between unseen classes and seen classes in semantic space and image space (this space includes all image representations). In these bipartite graphs, nodes are corresponding to unseen classes or seen classes, and weights of these nodes connect unseen classes with seen classes. Because we focus on the transfer relation between unseen classes and seen classes, no connection exists in unseen classes or seen classes. Supposing $G_{b}{<}V_{b},E_{b}{>}$ can denote the manifold structure of semantic classes. Here, $V_{b}=V_{bs}\bigcup V_{bu}$ and $\emptyset =V_{bs}\bigcap V_{bu}$. $E_{b}$ includes connections between $V_{bs}$ (seen classes set in semantic space) and $V_{bu}$ (unseen classes set in semantic space); $G_{x}{<}V_{x},E_{x}{>}$ for the manifold structure of image classes. Here, $V_{x}=V_{xs}\bigcup V_{xu}$ and $\emptyset =V_{xs}\bigcap V_{xu}$. $E_{x}$ includes the connections between $V_{xs}$ (seen classes set in image space) and $V_{xu}$ (unseen classes set in image space). Therefore, the similarity of semantic and image space is respectively regarded as the weight between nodes, which can be defined as following.

$$\begin{aligned} w_{su}^{(b)}=\frac{\exp (-d(b_{s},b_{u}))}{\sum _{u=1}^{U}\exp (-d(b_{s},b_{u}))},w_{su}^{(x)}=\frac{\exp (-d(x_{s},x_{u}))}{\sum _{u=1}^{U}\exp (-d(x_{s},x_{u}))}, \end{aligned}$$

(2)

here, $b_{s}$ and $x_{s}$ are respectively the semantic and image representation (the detail is described in Table 1) of the seen class s, while $b_{u}$ and $x_{u}$ are respectively the semantic and image representation of the unseen class u. $w_{su}^{(b)}$ and $w_{su}^{(x)}$ are respectively the weight (the similarity) between the seen class s and the unseen class u in semantic and image representation space. $d(b_{s},b_{u})$ and $d(x_{s},x_{u})$ are respectively the distance metric [3] of each space, and can be defined as following.

$$\begin{aligned} d(b_{s},b_{u})=(b_{s}-b_{u})^{T}\varSigma _{b}^{-1}(b_{s}-b_{u}),d(x_{s},x_{u})=(x_{s}-x_{u})^{T}\varSigma _{x}^{-1}(x_{s}-x_{u}), \end{aligned}$$

(3)

here, $\varSigma _{b}=\sigma _{b}I$ can be learned from the semantic representation by cross-validation (We alternately divide the training classes set into two part in according with the proportion between the training classes set and the test classes set. One part is to learn the model, and another is to validate the model. We give the range of $\sigma _{b}$, which is form $2^{-5}$ to $2^{5}$, and select the parameter corresponding to the best result as the value of $\sigma _{b}$.) $\varSigma _{x}=\sigma _{x}I$ can be learned from the image representation by cross-validation (It is the same procedure like $\sigma _{b}$ learning.). In image space, the differentiation compared with the semantic space is that $x_{u}$ is not determined because of unseen classes, while $x_{s}$ can be obtained from training data by computing the mean value of the seen class. The way to produce the center of the class as a representation is simple for convenient computation, and it is reasonable to preserve the base characteristic of image representation according with the distribution of the same class. $x_{u}$ can be attained by pre-classification of unseen classes (the detail in the next section).

In (1), $a_{c}$ is the transformation vector, and also is the class representation in model space. In (2), $b_{s}$ and $b_{u}$ is the class representation in semantic space, while $x_{s}$ and $x_{u}$ is the class representation in image space. We expect to construct the link among these space by $v_{s}$ and $v_{u}$, which are respectively the phantom class of seen or unseen classes in model. For preserving the manifold structure of two bipartite graphs and aligning the image, the semantic and the model space, we build the optimization formula under the condition of the distortion error minimization, which is defined as following.

$$\begin{aligned} \begin{aligned} (a_{c},v_{u},\varvec{\beta })=\arg \min _{a_{c},v_{u},\varvec{\beta }}&\Vert a_{c}-\sum _{u=1}^{U}\varvec{\beta }^{T}\left[ \begin{matrix} w_{su}^{(x)}&{}w_{su}^{(b)}\\ \end{matrix} \right] ^{T}v_{u}-\sum _{s=1}^{S}\varvec{\gamma }^{T}\left[ \begin{matrix} w_{ss}^{(x)}&{}w_{ss}^{(b)}\\ \end{matrix} \right] ^{T}v_{s}\Vert _{2}^{2},\\ s.t. \quad&\varvec{\beta }^{T}\varvec{\mathbf {1}}=1,\varvec{\gamma }^{T}\varvec{\mathbf {1}}=1,0\le \beta _{i}\le 1,0\le \gamma _{i}\le 1 \quad (i=1,2) \end{aligned} \end{aligned}$$

(4)

here, $\varvec{\beta }=\left[ \begin{matrix} \beta _{1} &{}\beta _{2}\\ \end{matrix} \right] ^{T}$, $\varvec{\gamma }=\left[ \begin{matrix} \gamma _{1} &{}\gamma _{2}\\ \end{matrix} \right] ^{T}$, and $\varvec{\mathbf {1}}=\left[ \begin{matrix} 1 &{}1\\ \end{matrix} \right] ^{T}$. Because no connection exists between unseen classes or seen classes in tow bipartite graphs, $w_{ss}^{(b)}=0$ and $w_{ss}^{(x)}=0$. The analytical solution of (4) can find the relation between $a_{c}$ and $v_{u}$.

$$\begin{aligned} \begin{aligned} a_{c}=&\sum _{u=1}^{U}\varvec{\beta }^{T}\left[ \begin{matrix} w_{su}^{(x)}&{}w_{su}^{(b)}\\ \end{matrix} \right] ^{T}v_{u},\\ s.t. \quad&\varvec{\beta }^{T}\varvec{\mathbf {1}}=1,0\le \beta _{i}\le 1 \quad (i=1,2) \end{aligned} \end{aligned}$$

(5)

here, $\forall c\in \{1,2,...,S+U\}$.

2.2 Phantom Classes and Structure Relation Learning

For obtaining phantom class $v_{u}(u=1,...,U)$ and the manifold structure of the weight coefficient vector $\beta $, we further reformulate the optimization formula for one-versus-other classifier [3].

$$\begin{aligned} \begin{aligned} (v_{1},...,v_{U},\varvec{\beta })=&\arg \min _{v_{1},...,v_{U},\varvec{\beta }}\sum _{c=1}^{S}\sum _{n=1}^{N}\ell (x_{n},\mathbb {I}_{y_{n},c},a_{c})\\&+\frac{\lambda }{2}\sum _{c=1}^{S}\Vert a_{c}\Vert _{2}^{2}+\frac{\gamma }{2}\Vert \beta _{1}W^{x}-\beta _{2}W^{b}\Vert _{2}^{2},\\&s.t. \quad a_{c}=\sum _{u=1}^{U}\varvec{\beta }^{T}\left[ \begin{matrix} w_{su}^{(x)}&{}w_{su}^{(b)}\\ \end{matrix} \right] ^{T}v_{u},\\&\varvec{\beta }^{T}\varvec{\mathbf {1}}=1,0\le \beta _{i}\le 1 \quad (i=1,2) \end{aligned} \end{aligned}$$

(6)

here, $w_{su}^{(x)}$ is the element of the matrix $W^{x}$, and $w_{su}^{(b)}$ is the element of the matrix $W^{b}$. The first term of formula (6) is the squared hinge loss, which can be defined as $\ell (x_{n},\mathbb {I}_{y_{n},c}, a_{c})=\max (0,1-\mathbb {I}_{y_{n},c}a_{c}x_{n})$. $\mathbb {I}_{y_{n},c}\in \{-1,1\}$ determines whether or not $y_{n}=c$. The second term of formula (6) is $a_{c}$ of a regularization tern, which avoids over-fitting problem on the pair-wise linear classifier for modeling the relationship between the class label and the image representation. The third term of formula (6) is the constraint of the manifold structure similarity for preventing the negative structure propagation in image space. The alternating optimization can be implemented for minimizing the formula (6) with respect to $\{v_{u}\}_{u=1}^{U} $ and $\varvec{\beta }$ by solving the quadratic programming problem.

To depict the whole process of the structure fusion and propagation mechanism, we show the pseudo code of the proposed SFP algorithm in Algorithm 1.

2.3 Complexity Analysis

Formula (6) can be solved by alternately quadratic programming, which of the complexity includes two parts. In the first part, when $\varvec{\beta }$ is fixed, formula (6) is related to $\{v_{u}\}_{u=1}^{U} $ of a quadratic programming problem, which of the complexity is $O(U^{3})$ for the worst. In the second part, while $\{v_{u}\}_{u=1}^{U} $ is fixed, formula (6) is corresponding to $\varvec{\beta }$ of a quadratic programming problem, which of the complexity is $O(k^{3})$ (k is the dimension of $\varvec{\beta }$) for the worst. Given the proposed algorithm SFP needs P iterations, it’s complexity is $O(PU^{3}+Pk^{3})$.

3 Experiment

3.1 Datasets

For evaluating the proposed algorithm SFP^{Footnote 1}, we carry out the experiment in four challenging datasets, which are Animals with Attributes (AwA) [12], CUB-200-2011 Birds (CUB) [27], Stanford Dogs (Dogs) [4], and SUN Attribute (SUN) [21]. These datasets can be used for fine-grained recognition (CUB and Dogs) or non-fine-grained recognition (AwA and SUN) in ZSL. In semantic space, AwA and CUB respectively are described by att [6], w2v [20], glo [22] and hie [1], while Dogs is represented by w2v [20], glo [22] and hie [1]. SUN is only depicted by att [6]. Table 1 provides the statistics and the extracted features for these datasets. In addition, for conveniently comparing with the state-of-art methods, we adopt image feature provided by [1].

Table 1. Datasets statistics and the extracted feature in experiments.

Full size table

3.2 Comparison with the Baseline Methods

In this paper, there are three methods as the baseline for comparing with the proposed SFP method because of the semantic structure mining. The first method is structured joint embedding (SJE) [1], which can build the bilinear compatibility function with consideration of the structured output space for predicting the label of the unseen class. The second method is latent embedding model (LatEm) [28],which can construct the pair-wise bilinear (nonlinear) compatibility function according to model number selection for recognizing unseen classes. The third method is synthesized classifiers (SynC) [3], which can make nonlinear compatibility function with manifold structure in semantic space for combining the base classifier in ZSL. Table 2 shows the performance of the structure fusion and propagation (the proposed SFP method) greatly outperforms that of other three methods.

3.3 Classification and Validation Protocols

Classification accuracy is average value of all test class accuracy in each database. Because the learned model involves four parameters, which are $\lambda , \gamma , \sigma _{b}$ and $\sigma _{x}$ (respectively are in formula (3) in formula (6)). We alternately divide the training classes set into two part in according with the proportion between the training classes set and the test classes set. One part is to learn the model, and another is to validate the model. Firstly, we set $\sigma _{b}$ and $\sigma _{x}$ to 1, and obtain $\gamma $ and $\lambda $ corresponding to the best result in $\gamma $ (form $2^{-24}$ to $2^{-9}$) and $\lambda $ (form $2^{-24}$ to $2^{-9}$) by cross validation. Secondly, we learn $\sigma _{b}$ and $\sigma _{x}$ corresponding to the best result in $\sigma _{b}$ and $\sigma _{x}$ (form $2^{-5}$ to $2^{5}$) by cross validation.

Table 2. Comparison of SFP method with SJE [1], LatEm [28] and SynC [3] in each semantic space, average per-class Top-1 accuracy (%) of unseen classes is reported based on the same data configurations, same images and semantic features in AwA. w: the fusion includes att, w2v, glo and hie, while w/o: the fusion contains w2v, glo and hie.

Full size table

3.4 Structure Fusion and Propagation with the Iteration

The main idea of the proposed SFP method shows three contents. In the first content, the manifold structure of images is considered for constructing the compatibility function between the class label and the visual feature. In the second content, the relationship between multi-manifold structures is found for booting the influence of the positive structure. In the last content, it is the most important to propagate the positive structure and fuse multi-manifold structures by the iteration computation. Therefore, we carry out the related experiment for evaluating the effect of the iteration on the structure evolution in AwA. The recognition accuracy can show the approximation degree of the class manifold structure. In other word, the better recognition accuracy is proportional to the more similar relationship between the reconstruction manifold structure and the intrinsic manifold structure of classes. Figure 2 demonstrates the recognition accuracy change with the iteration. In the beginning, the recognition accuracy rapidly increases with the iteration, and then reaches a stable state. It means that structure fusion and propagation with the iteration can advance the recognition accuracy and finally obtain the best state.

3.5 Comparison with State-of-the-Arts

In term of the image data utilization of unseen classes in testing, we can divide ZSL methods into two categories, which are inductive ZSL and transductive ZSL. Inductive ZSL methods can serially process unseen samples without the consideration of the underlying manifold structure in unseen samples [1, 3, 28, 33], while transductive ZSL can usually use the manifold structure of unseen samples to improve ZSL performance [8, 10, 15]. SFP can find the structure of unseen classes in image feature space to enhance the transfer model between seen and unseen classes, so SFP belongs to a transductive ZSL method. For a fair comparison, we use deep feature of images based on GoogleNet [26] in contrasting methods, which include our method, one transductive ZSL method (DMaP [15]), and three inductive ZSL methods (SJE [1], LatEm [28] and SynC [3]). To the best of our knowledge, these methods are state-of-the-art methods for ZSL. Table 3 shows their results for ZSL on three benchmark datasets. SFP mostly outperforms the state-of-the-art methods except DMaP on CUB. DMaP focuses on the manifold structure consistency between the semantic representation and the image feature, and can better distinguish fine-grained classes. SFP can complement the manifold structure between the semantic representation and the image feature, and better recognize coarse-grained classes. Therefore, integrating two ideas is expected to further improve the ZSL performance in future work.

Table 3. Comparison of SFP method with state-of-the-art methods for ZSL, average per-class Top-1 accuracy (%) of unseen classes is reported based on the same data configurations. ‘+’ indicates fusion operation.

Full size table

3.6 Experimental Result Analysis

From the above experiments, we can attain the following observations.

The semantic description have the different contribution for classifying unseen classes. The supervised attribute tend to obtain the better recognition performance than the unsupervised semantic representation (w2v, glo and hie) in AwA and CUB. In the unsupervised semantic representation, the recognition accuracy of w2v or glo is better than that of hie in AwA and CUB, but the performance of hie is superior to that of w2v or glo in Dogs. This is mainly due to the flexibility and uncertainty of the semantic representation in the unsupervised way.
The performance of SFP is better than that of other three methods, which are SJE, LatEm, and SynC. However, the performance improvement is different in the various datasets. The obvious improvement can be found in AwA, Dogs and SUN, while the slight improvement can be shown in CUB. The main reason of this situation is related to whether or not effectively to propagate the positive structure in the optimization computation in term of data differences.
SFP emphasizes on the different manifold structure complement, while DMaP focuses on the various manifold structure consistency. Therefore, the performance of SFP is superior to that of DMaP because the structure complementarity plays the important role for learning transfer model in AwA and Dogs, and the performance of DMaP is better than that of SFP because the structure consistency is a key point for classifying unseen classes in CUB.
SFP performs better with the positive structure fusion and propagation. SFP has demonstrated great promise in above experiments due to multi-manifold structure consideration and alternated optimization between the weight computation and the manifold structure estimation for ZSL.
The proposed fusion method can attain the better performance than the non-fusion method because of appropriate complementing each other. w or w/o always performs better on AwA, CUB and Dogs.

4 Conclusion

We have proposed a new ZSL method, which called structure fusion and propagation (SFP). This method can not only directly model the relevance among the manifold structures in semantic and image space, but also dynamically propagate the positive structure by the crossing iteration. Specifically, the proposed SFP method mainly includes four parts. First, nonlinear model constructs the mapping relationship between the class label and the visual image representation. Second, graph describes the relevance between seen classes and unseen classes in semantic or image space. Three, loss function indicates the constrains relationship of multi-manifold structure to balance the structure dependance. Last, structure fusion and propagation is implemented by the crossing iteration computation between phantom classes and weights solving. For evaluating the proposed SFP, we carry out the experiment on AwA, CUB, Dogs and SUN. Experimental results show that SFP can obtain the promising results for ZSL.

Notes

1.
Source code: https://github.com/lgf78103/Structure-propagation-for-zero-shot-learning.

References

Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2927–2936 (2015)
Google Scholar
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)
Article Google Scholar
Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 5327–5336 (2016)
Google Scholar
Deng, J., Krause, J., Fei-Fei, L.: Fine-grained crowdsourcing for fine-grained recognition. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 580–587 (2013)
Google Scholar
Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE International Conference on Computer Vision(ICCV), pp. 2584–2591 (2013)
Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1778–1785 (2009)
Google Scholar
Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems (NIPS), pp. 2121–2129 (2013)
Google Scholar
Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Transductive multi-view zero-shot learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2332–2345 (2015)
Article Google Scholar
Fu, Z., Xiang, T.A., Kodirov, E., Gong, S.: Zero-shot object recognition by semantic manifold distance. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2635–2644 (2015)
Google Scholar
Kodirov, E., Xiang, T., Fu, Z., Gong, S.: Unsupervised domain adaptation for zero-shot learning. In: IEEE International Conference on Computer Vision (ICCV), pp. 2452–2460 (2015)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 951–958 (2009)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)
Article Google Scholar
Li, X., Guo, Y., Schuurmans, D.: Semi-supervised zero-shot classification with label representation learning. In: IEEE International Conference on Computer Vision (ICCV), pp. 4211–4219 (2016)
Google Scholar
Li, Y., Zhang, J., Zhang, J., Huang, K.: Discriminative learning of latent features for zero-shot recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7463–7471 (2018)
Google Scholar
Li, Y., Wang, D., Hu, H., Lin, Y., Zhuang, Y.: Zero-shot recognition using dual visual-semantic mapping paths. arXiv preprint arXiv:1703.05002 (2017)
Lin, G., Fan, C., Zhu, H., Miu, Y., Kang, X.: Visual feature coding based on heterogeneous structure fusion for image classification. Inf. Fusion 36, 275–283 (2017)
Article Google Scholar
Lin, G., Fan, G., Kang, X., Zhang, E., Yu, L.: Heterogeneous feature structure fusion for classification. Pattern Recognit. 53, 1–11 (2016)
Article Google Scholar
Lin, G., Liao, K., Sun, B., Chen, Y., Zhao, F.: Dynamic graph fusion label propagation for semi-supervised multi-modality classification. Pattern Recognit. 68, 14–23 (2017)
Article Google Scholar
Mensink, T., Gavves, E., Snoek, C.G.M.: Costa: co-occurrence statistics for zero-shot classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2441–2448 (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (NIPS), pp. 3111–3119 (2013)
Google Scholar
Patterson, G., Xu, C., Su, H., Hays, J.: The sun attribute database: beyond categories for deeper scene understanding. Int. J. Comput. Vis. 108(1), 59–81 (2014)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Qi, G.J., Liu, W., Aggarwal, C., Huang, T.S.: Joint intermodal and intramodal label transfers for extremely rare or unseen classes. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2016). https://doi.org/10.1109/TPAMI.2016.2587643
Article Google Scholar
Romera-Paredes, B., Torr, P.H.: An embarrassingly simple approach to zero-shot learning. In: International Conference on Machine Learning (ICML), pp. 2152–2161 (2015)
Google Scholar
Socher, R., Ganjoo, M., Sridhar, H., Bastani, O., Manning, C.D., Ng, A.Y.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems (NIPS), pp. 935–943 (2013)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds200-2011 dataset. California Institute of Technology (2011)
Google Scholar
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 69–77 (2016)
Google Scholar
Yu, F.X., Cao, L., Feris, R.S., Smith, J.R., Chang, S.F.: Designing category-level attributes for discriminative visual recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 771–778 (2013)
Google Scholar
Zhang, C., Peng, Y.: Visual data synthesis via GAN for zero-shot video classification. arXiv preprint arXiv:1804.10073 (2018)
Zhang, E., Chen, W., Zhang, Z., Zhang, Y.: Local surface geometric feature for 3D human action recognition. Neurocomputing 208, 281–289 (2016)
Article Google Scholar
Zhang, Y., Zhang, E., Chen, W.: Deep neural network for halftone image classification based on sparse auto-encoder. Eng. Appl. Artif. Intell. 50, 245–255 (2016)
Article Google Scholar
Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6034–6042 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Xi’an University of Technology, Xi’an, 710048, Shaanxi Province, People’s Republic of China
Guangfeng Lin, Yajun Chen & Fan Zhao

Authors

Guangfeng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yajun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangfeng Lin .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Jian-Huang Lai
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xilin Chen
Tsinghua University, Beijing, China
Jie Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi'an Jiaotong University, Xi'an, China
Nanning Zheng
Peking University, Beijing, China
Hongbin Zha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, G., Chen, Y., Zhao, F. (2018). Structure Fusion and Propagation for Zero-Shot Learning. In: Lai, JH., et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11258. Springer, Cham. https://doi.org/10.1007/978-3-030-03338-5_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-03338-5_39
Published: 03 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03337-8
Online ISBN: 978-3-030-03338-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Structure Fusion and Propagation for Zero-Shot Learning

Abstract

Similar content being viewed by others

Learning exclusive discriminative semantic information for zero-shot learning

A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning

An Inverse Mapping with Manifold Alignment for Zero-Shot Learning

Keywords

1 Introduction

2 Structure Fusion and Propagation

2.1 Classification Model and Manifold Structure

2.2 Phantom Classes and Structure Relation Learning

2.3 Complexity Analysis

3 Experiment

3.1 Datasets

3.2 Comparison with the Baseline Methods

3.3 Classification and Validation Protocols

3.4 Structure Fusion and Propagation with the Iteration

3.5 Comparison with State-of-the-Arts

3.6 Experimental Result Analysis

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Structure Fusion and Propagation for Zero-Shot Learning

Abstract

Similar content being viewed by others

Learning exclusive discriminative semantic information for zero-shot learning

A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning

An Inverse Mapping with Manifold Alignment for Zero-Shot Learning

Keywords

1 Introduction

2 Structure Fusion and Propagation

2.1 Classification Model and Manifold Structure

2.2 Phantom Classes and Structure Relation Learning

2.3 Complexity Analysis

3 Experiment

3.1 Datasets

3.2 Comparison with the Baseline Methods

3.3 Classification and Validation Protocols

3.4 Structure Fusion and Propagation with the Iteration

3.5 Comparison with State-of-the-Arts

3.6 Experimental Result Analysis

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation