Combining Deep Facial and Ambient Features for First Impression Estimation

Gürpınar, Furkan; Kaya, Heysem; Salah, Albert Ali

doi:10.1007/978-3-319-49409-8_30

Furkan Gürpınar¹⁵,
Heysem Kaya¹⁶ &
Albert Ali Salah¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9915))

Included in the following conference series:

European Conference on Computer Vision

8019 Accesses
12 Citations

Abstract

First impressions influence the behavior of people towards a newly encountered person or a human-like agent. Apart from the physical characteristics of the encountered face, the emotional expressions displayed on it, as well as ambient information affect these impressions. In this work, we propose an approach to predict the first impressions people will have for a given video depicting a face within a context. We employ pre-trained Deep Convolutional Neural Networks to extract facial expressions, as well as ambient information. After video modeling, visual features that represent facial expression and scene are combined and fed to a Kernel Extreme Learning Machine regressor. The proposed system is evaluated on the ChaLearn Challenge Dataset on First Impression Recognition, where the classification target is the “Big Five” personality trait labels for each video. Our system achieved an accuracy of 90.94 % on the sequestered test set, 0.36 % points below the top system in the competition.

You have full access to this open access chapter, Download conference paper PDF

Genuine Personality Recognition from Highly Constrained Face Images

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Article 01 September 2022

A. V. Savchenko & L. V. Savchenko

Predicting the Sixteen Personality Factors (16PF) of an individual by analyzing facial features

Article Open access 30 August 2017

Mihai Gavrilescu & Nicolae Vizireanu

Keywords

1 Introduction and Related Work

It is not possible to judge the personality of a person by a mere glimpse of the face, but people attribute apparent personality traits for a face they newly encounter, in a stereotypical way, and with remarkable consistency [1]. In this work, we tackle the problem of predicting the apparent personality using the data and protocol from the ChaLearn Looking at People 2016 First Impression Challenge [2].

It is not surprising that emotional expressions influence the attribution of personality traits. It is more likely for a smiling person to be perceived as more trustworthy, and friendly. Todorov et al. convincingly argued that rapid, unreflective trait inferences from faces can influence consequential decisions [3]. This is why people do not typically use frowning or angry pictures in their resumés. Also the context of the image can affect the perception of the face. In our proposed approach, we estimate emotional facial expressions, as well as cues from the context of the face to predict first impressions.

Before describing the followed approach, we provide a brief literature review on automatic personality trait recognition. In the past, various approaches have been used for recognizing apparent personality traits from different modalities such as audio [4, 5], text [6–8] and visual information [9, 10]. As in other recognition problems, multimodal systems are also investigated to improve robustness of prediction [11–14]. These works aim to estimate personality traits from given input. In psychology, personality is often assessed by running a “Big Five” questionnaire that measures Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (OCEAN) [15]. Apparent personality is also frequently assessed in these five dimensions.

In their work, Borkenau and Liebler used the Brunswik’s lens model and categorized the particular cues that may communicate a certain personality [16]. They included a large number of indicators such as overall impression variables (e.g. estimated age, masculinity, attractiveness), acoustic variables (e.g. softness of voice, pleasantness, clarity), static visual variables (e.g. appearance, make-up, garments, thin lips, hair style, facial expression), and dynamic visual variables (e.g. movement speed, hand movements, walking style). In order to assess the personality trait attributions, they measured “validity,” which indicates the correlation between self-ratings of personality and ratings by strangers or acquaintances. The Brunswik’s lens model looks at cues used for perceived traits, and links some of these cues to actual traits by assessing their ecological validity [17]. It is a useful conceptualization, also used in approaches to personality computing [18].

According to the literature, faces are a rich source of cues for apparent personality attribution, related to stereotype judgments. For an automatic analysis system, the first steps of a visual face analysis pipeline are face detection [19, 20] and facial landmark localization [21–23]. Face alignment (or registration) is an important step, as all further processing depends on its accuracy. Recent deep neural network approaches are known to be more resistant to registration errors.

Face alignment is followed by visual feature extraction, which can include image-level appearance descriptors such as Local Binary Patterns (LBP) [24], Histogram of Oriented Gradients (HOG) [25], Scale-invariant Feature Transform (SIFT) [26], video-level descriptors such as Local Gabor Binary Patterns from Three Orthogonal Planes (LGBP-TOP) [27] and Local Phase Quantization (LPQ)-TOP [28], or geometric information [9, 10].

Deep learning based approaches have achieved state-of-the-art results in human behavior analysis. These approaches, when trained with large datasets, can provide representations that are very robust to variations exhibited in the data. Deep learning has been successfully applied to many tasks related to computer vision such as object recognition [29, 30], face recognition [31], emotion recognition [32] and age estimation [33–37]. Moreover, deep representations of images are often usable for multiple tasks, enabling transfer learning from pre-trained models. The disadvantages are the relatively high computational requirements for training such systems, the large amount of training data required, and (relatively) poor temporal extension to video processing.

In recent approaches to personality impressions classification, Support Vector Machines (SVM) [38] have been widely used [5, 8, 12, 14]. Recently, a learning approach called Extreme Learning Machines (ELM) that is similar to SVMs but providing faster learning schemes has become popular [39]. The use of ELM’s name is debated in the literature, because of its strong resemblance to earlier methods. We continue to use it in this work for convenience. The approach has been shown to provide good performance in a number of applications including face recognition [40, 41], emotion recognition [42, 43], and smile detection [44].

Given the success of deep learning approaches and the speed of ELM, we propose to use a fusion of deep face and scene features, followed by regularized regression with a kernel ELM classifier. The main contribution of this work is the effective combination of emotion related and ambient features that are efficiently extracted from pre-trained/fine-tuned Deep Convolutional Neural Network (DCNN) models. Our method is illustrated in a simplified flowchart in Fig. 1.

The remainder of this paper is organized as follows. In the next section we provide background and details on the methodology. Then in Sect. 3, we present the experimental results, followed by implementation details in Sect. 4. Finally, Sect. 5 concludes with future directions.

2 Methodology

Our proposed approach evaluates a short video clip that contains a single person, and outputs an estimate of apparent personality traits in the five dimensions mentioned earlier. In this section, we describe the three main steps of our pipeline, namely, face alignment, feature extraction, and modeling.

2.1 Face Alignment

For detecting and aligning faces from the videos, we use Xiong and de la Torre’s Supervised Descent Method (SDM), also known as IntraFace [21]. This approach locates 49 landmarks on the face. After the landmarks are located, we estimate the roll angle of the face from the eye corner locations and rotate the image to rectify the face. We then add a margin of 20 % interocular distance around the outer landmarks to compute a loose bounding box from which we crop facial images. After the face is cropped, it is resized to $64\times 64$ pixels, and registered as a new frame. Frames from a sample input video and the corresponding aligned face images are shown in Fig. 2.

2.2 Feature Extraction

We extract facial features that are summarized over an entire video segment, and scene features from the first image of each video. The assumption is that videos do not stretch over multiple shots.

Face Features: After aligning the faces, we extract image-level deep features from a network that is trained for facial emotion recognition. The training of this network is explained in more detail in Sect. 2.3. For comparison, we also extract features from the original VGG-Face network that was trained for face recognition [31]. For both networks, we use the response of the 33$^{rd}$ layer of the 37-layer architecture, which is the lowest-level 4096-dimensional descriptor.

We compare deep features with traditional appearance descriptors and geometric information that is shown to be effective in emotion recognition [45]. We report the cross validation accuracy of each approach in Sect. 3.2.

Video Features: After extracting frame-level features from each registered face, we summarize the videos by computing functional statistics of each dimension over time. The functionals include mean, standard deviation, offset, slope, and curvature. Offset and slope are calculated from the first order polynomial fit to each feature contour, while curvature is the leading coefficient of the second order polynomial. An empirical comparison of the individual functionals is given in Sect. 3.2.

Scene Features: In order to use ambient information in the images to our advantage, we extract features using the VGG-19 network [30], which is trained for an object recognition task on the ILSVRC 2012 dataset. Similar to face features, we use the 4096-dimensional feature from the 39$^{th}$ layer of the 43-layer architecture, hence we obtain a description of the overall image that contains both the face and the scene, which we combine with face features using feature-level fusion.

2.3 CNN Fine Tuning

We start with the VGG-Face network [31], changing the final layer (originally a 2622-dimensional recognition layer), to a 7-dimensional emotion recognition layer, where the weights are initialized randomly. We fine-tune this network with the softmax loss function using around 30,000 training images from the FER-2013 dataset [46]. We choose an initial learning rate of 0.0001, a momentum of 0.9 and a batch size of 64. We train the model for 5 epochs, and we show the validation set performance for each epoch in Fig. 3.

2.4 Regression with Kernel ELM

In order to model personality traits from visual features, we used kernel ELM, due to the learning speed and accuracy of the algorithm. In the following paragraphs, we briefly explain the learning strategy of ELM.

ELM proposes a simple and robust learning algorithm for single-hidden layer feedforward networks. The input layer’s bias and weights are initialized randomly to obtain the output of the second (hidden) layer. The bias and weights of the second layer are calculated by a simple generalized inverse operation of the hidden layer output matrix.

ELM tries to find the mapping between the hidden node output matrix $\mathbf {H} \in \mathbb {R}^{N \times h}$ and the label vector $\mathbf {T} \in \mathbb {R}^{N \times 1}$ where N and h denote the number of samples and the hidden neurons, respectively. The set of output weights $\mathbf {\beta } \in \mathbb {R}^{h \times 1}$ is calculated by the least squares solution of the set of linear equations $\mathbf {H} \mathbf {\beta }=\mathbf {T}$, as:

$$\begin{aligned} \mathbf {\beta } = \mathbf {H}^{\dagger }\mathbf {T}, \end{aligned}$$

(1)

where $\mathbf {H}^{\dagger }$ denotes the Moore-Penrose generalized inverse [47] that minimizes the $L_2$ norms of $||\mathbf {H}\mathbf {\beta }-\mathbf {T}||$ and $||\mathbf {\beta }||$ simultaneously.

To increase the robustness and the generalization capability of ELM, a regularization coefficient $\mathbf {C}$ is included in the optimization procedure. Therefore, given a kernel $\mathbf {K}$, the set of weights is learned as follows:

$$\begin{aligned} \mathbf {\beta } = (\frac{\mathbf {I}}{C}+\mathbf {K})^{-1} \mathbf {T}. \end{aligned}$$

(2)

In order to prevent parameter overfitting, we use the linear kernel $\mathbf {K}(x,y) = x^Ty$, where x and y are the original feature vectors after min-max normalization of each dimension among the training samples. With this approach, the only parameter of our model is the regularization coefficient C, which we optimize with a 5-fold subject independent cross-validation on the training set. In Sect. 3.2, we report the average score of each fold with the selected parameter.

3 Experiments

3.1 Challenge and Corpus

The “ChaLearn LAP Apparent Personality Analysis: First Impressions” challenge consists of 10,000 clips collected from 5,563 YouTube videos, where the poses are more or less frontal, but the resolution, lighting and background conditions are not controlled, hence providing a dataset with in-the-wild conditions. Each clip in the training set is labeled for the Big Five personality traits. Basic statistics of the dataset partitions are provided in Table 1. The detailed information on the challenge and corpus can be found in [2].

Table 1. Dataset summary

Full size table

Performance Evaluation: The performance score in this challenge is the Mean Absolute Error subtracted from 1, which is formulated as follows:

$$\begin{aligned} 1-\sum _{i}^{N} \frac{|\hat{y}_i-y_i|}{N}, \end{aligned}$$

(3)

where N is the number of samples, $\hat{y}$ is the predicted label and y is the true label ($0\le y \le 1$). This score is then averaged over five tasks. This means the final score varies between 0 (worst case) and 1 (best case).

3.2 Experimental Results

In this section, we report the regression performance of various visual descriptors. Tables 2 and 3 summarize the performances of the different systems with 5-fold subject-independent cross-validation on the training set.

We first look at the performance of individual functionals, which are described in Sect. 2.2. As can be seen in Table 2, the combination of mean, standard deviation, and offset features works well, and the mean by itself is the most informative functional.

Table 2. Functional statistics with deep face features.

Full size table

We evaluate a set of features with different dimensionalities individually. Geometric features (GEO), LPQ-TOP, LBP-TOP, and different deep neural network features were individually tested. Table 3 summarizes the results, and gives the dimensionality of each selected feature set. We observe that features from the deep face model fine tuned on the FER emotion corpus provide higher performances compared to both original deep features and hand-crafted visual features. Combining these features with ambient (scene) information further improves the prediction performance.

Table 3. Regression performance with various visual descriptors

Full size table

The best fusion system (ID 9 in Table 3) gives a test set mean accuracy of 0.9094, which ranks the fifth in the official competition. Considering the obtained test set performance in comparison to other competitors’ accuracies (see Table 4), we observe that the performances are around 0.90–0.91 in general. The top accuracy is 0.9130, while the top six teams’ accuracies are higher than 0.9.

Table 4. Final ranking on the test set

Full size table

We show the estimations of our system during cross validation in Figs. 4 and 5. The results in Fig. 4 show how precisely our system can estimate the personality traits under various imaging conditions. Figure 5 shows that examples with labels very close to 0 or 1 tend to have higher error, which might be due to the approximately normal distribution of training labels with mean values around 0.5.

4 Implementation Details

The whole system is implemented in MATLAB R2015b on a 64-bit Windows 10 PC with 32 GB RAM and an Intel i7-6700 CPU. For fine-tuning and feature extraction with CNNs, the MatConvNet library [48] has been used with GPU parallelization, using an NVidia GeForce GTX 970 GPU. Time spent on important parts of the pipeline is summarized in Table 5.

Table 5. Time requirement for each step of the pipeline

Full size table

5 Conclusions

In this paper, we proposed to use transfer learning in order to estimate the personality trait perceptions during first impressions. We use deep convolutional neural networks (DCNN) that are originally trained for other tasks such as face, object, and emotion recognition, and we employ their features directly. Hence, we show the feasibility of deep transfer learning for this task.

Combining two sets of DCNN features that carry facial expression and ambient information, we achieve better results compared to each of these approaches, as well as compared to other hand-crafted visual features. In this work, we did not make use of the audio modality, which was shown to be beneficial in earlier works. Audio-based and multimodal analyses constitute our future work. In this work, video modeling is carried out using simple statistical functionals. This approach is fast and shown to be accurate. For future works, a wider set of functionals will be investigated.

References

Cuddy, A.J., Fiske, S.T., Glick, P.: Warmth and competence as universal dimensions of social perception: the stereotype content model and the bias map. Adv. Exp. Soc. Psychol. 40, 61–149 (2008)
Article Google Scholar
Lopez, V.P., Chen, B., Places, A., Oliu, M., Corneanu, C., Baro, X., Escalante, H.J., Guyon, I., Escalera, S.: ChaLearn LAP 2016: first round challenge on first impressions - dataset and results. In: ChaLearn Looking at People Workshop on Apparent Personality Analysis, ECCV Workshop Proceedings (2016)
Google Scholar
Todorov, A., Mandisodza, A.N., Goren, A., Hall, C.C.: Inferences of competence from faces predict election outcomes. Science 308(5728), 1623–1626 (2005)
Article Google Scholar
Valente, F., Kim, S., Motlicek, P.: Annotation and recognition of personality traits in spoken conversations from the AMI meetings corpus. In: INTERSPEECH, pp. 1183–1186 (2012)
Google Scholar
Madzlan, N., Han, J., Bonin, F., Campbell, N.: Towards automatic recognition of attitudes: prosodic analysis of video blogs. In: Speech Prosody, Dublin, Ireland, pp. 91–94 (2014)
Google Scholar
Alam, F., Stepanov, E.A., Riccardi, G.: Personality traits recognition on social network-Facebook. In: WCPR (ICWSM-2013), Cambridge, MA, USA (2013)
Google Scholar
Nowson, S., Gill, A.J.: Look! who’s talking? Projection of extraversion across different social contexts. In: Proceedings of the 2014 ACM Multimedia Workshop on Computational Personality Recognition, pp. 23–26. ACM (2014)
Google Scholar
Gievska, S., Koroveshovski, K.: The impact of affective verbal content on predicting personality impressions in Youtube videos. In: Proceedings of the 2014 ACM Multimedia Workshop on Computational Personality Recognition, pp. 19–22. ACM (2014)
Google Scholar
Fernando, T., et al.: Persons’ personality traits recognition using machine learning algorithms and image processing techniques. Adv. Comput. Sci.: Int. J. 5(1), 40–44 (2016)
MathSciNet Google Scholar
Qin, R., Gao, W., Xu, H., Hu, Z.: Modern physiognomy: an investigation on predicting personality traits and intelligence from the human face. arXiv preprint arXiv:1604.07499 (2016)
Sarkar, C., Bhatia, S., Agarwal, A., Li, J.: Feature analysis for computational personality recognition using Youtube personality data set. In: Proceedings of the 2014 ACM Multimedia Workshop on Computational Personality Recognition, pp. 11–14. ACM (2014)
Google Scholar
Alam, F., Riccardi, G.: Predicting personality traits using multimodal information. In: Proceedings of the 2014 ACM Multimedia Workshop on Computational Personality Recognition, pp. 15–18. ACM (2014)
Google Scholar
Farnadi, G., Sushmita, S., Sitaraman, G., Ton, N., De Cock, M., Davalos, S.: A multivariate regression approach to personality impression recognition of vloggers. In: Proceedings of the 2014 ACM Multimedia Workshop on Computational Personality Recognition, pp. 1–6. ACM (2014)
Google Scholar
Sidorov, M., Ultes, S., Schmitt, A.: Automatic recognition of personality traits: a multimodal approach. In: Proceedings of the 2014 Workshop on Mapping Personality Traits Challenge, pp. 11–15. ACM (2014)
Google Scholar
Gosling, S.D., Rentfrow, P.J., Swann, W.B.: A very brief measure of the big-five personality domains. J. Res. Pers. 37(6), 504–528 (2003)
Article Google Scholar
Borkenau, P., Liebler, A.: Trait inferences: sources of validity at zero acquaintance. J. Pers. Soc. Psychol. 62(4), 645 (1992)
Article Google Scholar
Zebrowitz, L.A., Collins, M.A.: Accurate social perception at zero acquaintance: the affordances of a gibsonian approach. Pers. Soc. Psychol. Rev. 1(3), 204–223 (1997)
Article Google Scholar
Vinciarelli, A., Mohammadi, G.: A survey of personality computing. IEEE Trans. Affect. Comput. 5(3), 273–291 (2014)
Article Google Scholar
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Article Google Scholar
Mathias, M., Benenson, R., Pedersoli, M., Gool, L.: Face detection without bells and whistles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 720–735. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_47
Google Scholar
Xiong, X., De la Torre, F.: Supervised descent method and its application to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539(2013)
Google Scholar
Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 FPS via regressing local binary features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1685–1692 (2014)
Google Scholar
Xiong, X., De la Torre, F.: Global supervised descent method. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2664–2673 (2015)
Google Scholar
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Article MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Almaev, T.R., Valstar, M.F.: Local Gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 356–361. IEEE (2013)
Google Scholar
Jiang, B., Valstar, M.F., Pantic, M.: Action unit detection using sparse appearance descriptors in space-time video volumes. In: 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG 2011), pp. 314–321. IEEE (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: British Machine Vision Conference (2015)
Google Scholar
Kim, B.K., Lee, H., Roh, J., Lee, S.Y.: Hierarchical committee of deep CNNs with exponentially-weighted decision fusion for static facial expression recognition. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 427–434. ACM (2015)
Google Scholar
Rothe, R., Timofte, R., Gool, L.: Dex: deep expectation of apparent age from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 10–15 (2015)
Google Scholar
Liu, X., Li, S., Kan, M., Zhang, J., Wu, S., Liu, W., Han, H., Shan, S., Chen, X.: Agenet: deeply learned regressor and classifier for robust apparent age estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 16–24 (2015)
Google Scholar
Zhu, Y., Li, Y., Mu, G., Guo, G.: A study on apparent age estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 25–31 (2015)
Google Scholar
Escalera, S., Torres, M., Martinez, B., Baro, X., Jair Escalante, H., Guyon, I., Tzimiropoulos, G., Corneou, C., Oliu, M., Bagheri, M.A., Valstar, M.: Chalearn looking at people and faces of the world: Face analysis workshop and challenge 2016. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 1–8, June 2016
Google Scholar
Gürpınar, F., Kaya, H., Dibeklioğlu, H., Salah, A.A.: Kernel ELM and CNN based facial age estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Las Vegas, Nevada, USA, pp. 80–86, June 2016
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012)
Article Google Scholar
Zong, W., Huang, G.B.: Face recognition based on extreme learning machine. Neurocomputing 74(16), 2541–2551 (2011)
Article Google Scholar
Mohammed, A.A., Minhas, R., Wu, Q.J., Sid-Ahmed, M.A.: Human face recognition based on multidimensional PCA and extreme learning machine. Pattern Recogn. 44(10), 2588–2597 (2011)
Article MATH Google Scholar
Utama, P., Ajie, H., et al.: A framework of human emotion recognition using extreme learning machine. In: 2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA), pp. 315–320. IEEE (2014)
Google Scholar
Kaya, H., Karpov, A.A., Salah, A.A.: Robust acoustic emotion recognition based on cascaded normalization and extreme learning machines. In: Cheng, L., Liu, Q., Ronzhin, A. (eds.) ISNN 2016. LNCS, vol. 9719, pp. 115–123. Springer, Heidelberg (2016). doi:10.1007/978-3-319-40663-3_14
Chapter Google Scholar
An, L., Yang, S., Bhanu, B.: Efficient smile detection by extreme learning machine. Neurocomputing 149, 354–363 (2015)
Article Google Scholar
Kaya, H., Gürpınar, F., Afshar, S., Salah, A.A.: Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 459–466. ACM (2015)
Google Scholar
Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). doi:10.1007/978-3-642-42051-1_16
Chapter Google Scholar
Rao, C.R., Mitra, S.K.: Generalized Inverse of Matrices and Its Applications, vol. 7. Wiley, New York (1971)
MATH Google Scholar
Vedaldi, A., Lenc, K.: MatConvNet - convolutional neural networks for MATLAB. (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Program of Computational Science and Engineering, Boğaziçi University, Bebek, Istanbul, Turkey
Furkan Gürpınar
Department of Computer Engineering, Namık Kemal University, Çorlu, Tekirdağ, Turkey
Heysem Kaya
Department of Computer Engineering, Boğaziçi University, Bebek, Istanbul, Turkey
Albert Ali Salah

Authors

Furkan Gürpınar
View author publications
You can also search for this author in PubMed Google Scholar
Heysem Kaya
View author publications
You can also search for this author in PubMed Google Scholar
Albert Ali Salah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heysem Kaya .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Gang Hua
Facebook AI Research (FAIR), Menlo Park, USA
Hervé Jégou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gürpınar, F., Kaya, H., Salah, A.A. (2016). Combining Deep Facial and Ambient Features for First Impression Estimation. In: Hua, G., Jégou, H. (eds) Computer Vision – ECCV 2016 Workshops. ECCV 2016. Lecture Notes in Computer Science(), vol 9915. Springer, Cham. https://doi.org/10.1007/978-3-319-49409-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-49409-8_30
Published: 24 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49408-1
Online ISBN: 978-3-319-49409-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combining Deep Facial and Ambient Features for First Impression Estimation

Abstract

Similar content being viewed by others

Genuine Personality Recognition from Highly Constrained Face Images

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Predicting the Sixteen Personality Factors (16PF) of an individual by analyzing facial features

Keywords

1 Introduction and Related Work