Keywords

1 Introduction

The sentiment analysis is a process to extract the feelings and emotions of the users [1]. Liu [2] defines it as the field of study that analyzes the opinions, feelings, evaluations, validations, attitudes and emotions of people towards entities such as products, services, organizations, individuals, themes, events and its attributes. Chen [3] speaks about the different forms or computational approaches in which the sentiment analysis can be performed, such as: text-based, voice-based, visual and multimodal. One of the techniques or approaches used for this discipline is machine learning and Mitchel [4] defines it as a sub-area of computer science that studies methods to construct predictive computational models from observational data. The sentiment analysis can be applied to a myriad of disciplines and areas: in economics, in medicine, psychology, state security, politics; for the case of this work, it is applied in psychology and more exactly in the job interviews within organizations. The job interview is the most important process in recruitment and is used for various purposes: measurement of cognitive qualities, personality, motor and physical skills [5].

Job interviews are a popular selection technique from many points of view. In organizations around the world, job interviews are still one of the most widely used methods to evaluate candidates for employment. Among organizational decision makers, interviews have been found to be the most preferred assessment method by supervisors and human resources professionals. In addition, applicants perceive that interviews are fair compared to other selection procedures and applicants expect interviews as part of a selection process. In fact, from the perspective of the applicant, obtaining a job interview is essential for success in the search for a job [6].

For the Sentiment Analysis, there are some instruments and techniques that also require a specialist person to interpret that analysis, as well as the costs of some of these devices are relatively high [7,8,9]. In pre-employment interviews, for example, a person is the one who analyzes behavior, gestures and certain key patterns such as the look, tone of voice and other expressions of the inquired or interviewed. On the other hand, it is worth mentioning one of the most used devices, the polygraph, which measures physiological alterations of people [10]; Furthermore, Chica [11] mentions some disadvantages of this device, where she states that, on the other hand, there are also several “tricks” that can alter the test. Another device is also the Magnetic Resonance Scanner, which uses one of the technologies considered to be the best in the detection of lies, however, it only focuses on this, it is very expensive and requires a rigorous process [12].

Because certain shortcomings and deficiencies in the techniques and devices, among others, a low-cost model is proposed that can accurately interpret the feelings of people with eye tracking techniques and provide support to the current decision-making techniques for the personnel in charge of conducting job interviews in organizations, thanks to the information systems, mathematical theories and psychology.

This paper is organized as follows: in Sect. 2, the related works are presented. In Sect. 3, the sentiment analysis model using machine learning. In Sect. 4, the experiments and results. The conclusion is finally presented.

2 Related Works

There are some related works on sentiment analysis using machine learning:

Borth [13] proposed an approach based on understanding of the visual concepts that are strongly related to sentiments. Present a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology, using a detector library for visual sentiment analysis. Wang [14] proposed a visual sentiment analysis approach with coupled deep adjective and noun neural networks, considers visual sentiment analysis as a binary prediction problem that is to classify an image as positive or negative from its visual content. Baecchi [15] did a study that uses a multimodal feature learning approach, using neural network-based models, to address sentiment analysis of micro-blogging content, such as Twitter short messages, that are composed by a short text and, possibly, an image.

Zadeh [16] presented a model termed “tensor fusion” network (neural network-based) for sentiment analysis, highlighting the growth of research in this area through multiple modes and the use of machine learning. Similarly, Chinsatit [17] used neural networks-based pupil center detection method for a wearable gaze estimation, mentioning the applications that can have this in several disciplines of knowledge, including psychology. For its part, Poria [18] proposed a multimodal affective data analysis framework to extract user opinion and emotions from video content and combines text, audio and video; the paper also proposes an extensive study on decision-level fusion.

On the other hand, Chen [3] uses a convolutional neuronal network for the prediction of sentiments through the joint learning of textual and visual sentiments from training examples. George [19] uses a convolutional neuronal network too, propose a real-time framework for the classification of eye gaze direction and estimation of eye accessing cues.

3 Sentiment Analysis Model Using Machine Learning

In this research, the characterization on sentiment analysis was initially carried out, among which are: textual, voice, visual and multimodal. After this characterization and given the final validation designed to apply it in job interviews, this work focuses on the visual approach. To do this, we use the register of positions through coordinates of the center of the pupil of the eye, following an algorithm called Eye-Tracking [20], which is the process of measuring the movement of an eye in relation to the head or the point where the gaze is fixed (Fig. 1).

Fig. 1.
figure 1

Eye Accessing cues [21]

The model shown in Fig. 1 establishes that the specific characteristics of the thinking mechanism relate to a non-visual orientation of the gaze [21]. A person is related to the way he moves his eyes [22]. “According to a popular proverb, the eyes are the window of the soul. And, in fact, people have wondered for a long time if there is something in our eyes indicative of character” [23].

For this study, in addition, the exploration of the different automatic learning algorithms used for the sentiment analysis was carried out (Fig. 2).

Fig. 2.
figure 2

Approaches and algorithms for sentiment analysis [24].

After making comparisons between the algorithms, the supervised machine learning approach was selected using artificial neural network techniques, using as a criterion that people have different behaviors in their eyes when asked about something and the time of permanence looking towards certain coordinates it is not the same for everyone, nor are the coordinates usually the same, so the relationship between the variables can be considered with a non-linear trend.

3.1 Architecture of the Artificial Neural Network

The prototype used in the sentiment analysis model proposed in the present article has enabled six of the seven patterns (visual defocused, visual created, visual remembered, auditory created, remembered auditory, kinesthetic sensations, internal dialog), excluding the defocused visual pattern.

Next, the architecture of the neural network is proposed to validate the coordinates of the model. The variables of inputs are related to the coordinates of the Cartesian plane (x, y) and the stay time of the gaze. When a question is asked during the job interview at a certain time t, the person fixe his gaze towards certain places, which, when captured, counts the number of times in all the coordinates:

  • Number of looks towards visual defocused (#VD)

  • Number of looks towards remembered visual (#VR)

  • Number of looks towards visual created (#VC)

  • Number of looks towards remembered auditory (#AR)

  • Number of looks towards auditory created (#AC)

  • Number of looks towards internal dialogue (#DI)

  • Number of looks towards kinesthetic sensations (#KI)

The input data are normalized by Eq. (1):

$$ x^{\prime} = d1 + \frac{{\left( {x\, - \,xmin} \right)\left( {d2 - d1} \right)}}{xmax\, - \,xmin} $$
(1)
  • Where,

  • \( x:value\, to\,normalize \)

  • \( \left[ {xmax, \,xmin} \right]:value\,range\,x \)

  • \( \left[ {d1, d2} \right]:range\,to\,which\,the\,value\,of\,x\,will\,be\,reduced \)

The outputs are multiple and correspond to the value of the coordinates, since it is a multiclass classification layer (0, 1, 2, 3, 4, 5, 6 correspond to: VD, VR, VC, AR, AC, DI, KI, respectively).

After the characterization of the machine learning algorithms and the design of several neural network architectures, making combinations with different numbers of hidden layers, with different amounts of nodes in these layers, with different activation functions and different algorithms of learning, a multilayer perceptron is selected with two hidden layers of “processing of the sentiments” with 2 inputs (and Bias) and 7 neurons in each hidden layer, and the number of the output is 7 as well, as it was mentioned is multiple (classification). They are activated with the sigmoid activation function, as an error function, categorical cross entropy with Adam optimizer was chosen (Fig. 3).

Fig. 3.
figure 3

Architecture of the multilayer perceptron neural network used to validate the patterns [Own development].

The model proposed in this paper can perform the visual sentiment analysis of people with machine learning, performing the interpretation of the eye Accessing cues with the help of the eye tracking technique. It is applied in job interviews now in which an interviewee is asked questions of a personal nature, and this fixes his gaze towards certain coordinates that have a meaning according to the areas of study of psychology. After the results obtained with the variables of the fixations of the gaze, the personnel in charge (usually of human resources), analyze said results and make their own decision about the candidate for the job.

3.2 Interview

The interview is based on the work done in Costa [25] on the five (5) personality dimensions (Big Five): Extraversion, kindness, responsibility, emotional stability, openness to experience. In Rauthmann [23] a study is carried out with the ocular tracking demonstrating with mixed linear models that the personality predicts the number of fixations, the duration of the fixation and the time of permanence in two different abstract animations. Hooft [26] investigate whether ocular tracking technology can improve people’s understanding of the response process when it is said. For his part, Broz [27] collects the data from the look of human conversational pairs in order to understand which characteristics of the interlocutors of the conversation influence this behavior. These three (3) works made use of the Costa test [25].

Open questions are carried out based on the Costa test [25] carried out in order to evaluate each of the previous dimensions, so that the person responding projects their gaze towards certain coordinates involuntarily according to their feelings. The human being is capable of manifesting three (3) feelings at the same time with his eye cues.

4 Experimentos y Resultados

Next, the experiments and results made for the sentiment analysis model are presented. For the experiment, it should be suggested to the person not to wear glasses, not to turn the face to the sides, so that the webcam can constantly monitor the movement of the retina.

4.1 Prototype

The prototype consists of a client-server system with graphical user interface (GUI) web client, which captures the coordinates of the gaze and the time spent in these coordinates, using a conventional web camera. The detection of the retina is done with the help of the webcam, this is captured with a javascript library called Webgazer [28], which uses internal machine learning. First the calibration of the “library” is performed, the more it is calibrated, the greater the precision of the detection of the coordinates of the gaze.

In Fig. 4, the user is in front of the computer that has a conventional web camera and captures the movement of the retina, this is reflected in the graphical user interface web on the screen of said computer, capturing the needed information. The collection of the information of the variables is done in real time through the prototype eye-tracking and stored in a database for further processing.

Fig. 4.
figure 4

GUI - prototype for eye tracking [Own development].

Fig. 5.
figure 5

Coordinates of a matrix of the eye-chimera dataset, person looking at the center [Own development].

In each coordinate where the person interviewed fixes the gaze, the absolute frequency or number of fixations to that coordinate is calculated; then, a point (x, y) closest to the interpolation point described by the data set \( \left( {{ \hbox{min} }\left( {\left\| {X - IP} \right\|} \right)} \right) \), if it belongs to the first three (3) major absolute frequencies. It is making a scale of the point (x, y) captured in the width and height (in pixels) of the browser window viewport with respect to the square of the size of the rectangular image of the eye area. A translation of Cartesian coordinates is carried out and then the normalization of the data with respect to the pixel size of the rectangle of the eye region [29].

4.2 Evaluation

For training the neural network, we take the Eye-Chimera data set used in [21], transformed to coordinate matrices of the images. They contain a matrix of (14 × 2) of each look (described in each image) in which the rows are the coordinates; and the columns, the x and y axes. The five (5) first rows are coordinates of the left eye, as well as the last 2 (two); the rows from 6 to 10 are coordinates of the right eye; the coordinates of the pupil are rows 5 and 10, row 1 is the left end of the left eye, row 6 the left end of the right eye, row 11 is the upper end of the left eye, and row 13 is the upper end of the right eye (which are the variables of interest). Taking these values and performing measurements normalization to a rectangle located in the region of the eye. The images of the taken dataset have a size of 640 px wide by 480 px high, the region of the eye is reduced to 35 × 18 px.

In Fig. 5, two eyes are shown taking values from a matrix of dotted positions with the pupil in the center.

Fig. 6.
figure 6

End user interview interface. This is a question related to extroversion [Own development].

The eye coordinates are captured in five (5) volunteers, asking a personality question for each dimension of the Big-Five (5 questions), only explaining to the person who answers the questions that will appear on the computer screen or device, the client was hidden from the interface of capturing the gaze and only the questions that had to be answered appeared and he clicked when he answered it verbally, without explaining what dimension of personality would be measured either. And at the end of the test, present the results of the quantities as shown in Fig. 5, for each question answered.

For each question, the absolute frequencies of fixation in the established coordinates are obtained, and with that the six (6) relative frequencies in each coordinate are calculated and the three (3) first major ones are taken (relative frequencies greater than 10%). The average duration of a fixation ranges between 200 and 350 ms [30].

Table 1 shows the results of the people evaluated and the sentiments corresponding obtained with the eye tracking:

Table 1. Results with absolute frequencies in the coordinates in the determined response time for the five (5) questions.

Each question is about identifying a dimension of the candidate for the job: Question 1, Extraversion; question 2, kindness; question 3, responsibility; question 4, emotional stability, question 5, openness to experience. The response time the person takes is not relevant in the results for the verification, although the absolute frequencies depend on it.

After training of the model with the neural network shown in Fig. 4 with the Eye-Chimera data set (885 samples in total), the results without label are processed, for this the “Tensorflowjs” tool is used.

For each point of the first 3 major ratios of the previous table, take a point closer to the curve described, of each set of points in each of those three (3) coordinates.

Table 2 shows the matrix of validation of the results, showing the percentage of success or certainty with the coordinate. The neural network throws each one of the results of the patterns to be checked with the highest corresponding values with respect to the others.

Table 2. Validation of the results in the corresponding coordinates.

5 Conclusion

In this paper, we have presented a sentiment analysis model with the use of advanced machine learning techniques, seeing that it is an emergent application area in many fields, for this case, it is applied to decision making in the process of job interviews to assess personality. There are several approaches to analyzing feelings, but in this case algorithms with neural networks were taken because of their advantages analyzed against the subject under treatment.

With the results of the personality test of each question, the person in charge of defining the future of the candidates for a job position (normally a psychologist or human resources professional), relies on the model correlating in the results of the variables to then give their final judgment in this regard.